The whole point of optimizing compilers is that the user should not need to know the intimate details of the processor and code should be written for the user's understanding rather than the processor's. Good optimizing compilers look not only for an exact pattern but also any equivalents. This is why compilers are typically composed of multiple optimization passes including the same optimization at different points in the process.
I'm not terribly familiar with the current state of the art in synthesis. I would hope that they do better than just looking for exact matches to the on-chip resources. The output from both Vivado and yosys implies that they do quite a bit more.
They do a lot of optimization, but they can not make major make structural changes to the design, even if they are functionally equivalent (driven by the requirement that you formally verify that you actually get what you ask for). In a poorly constructed coding analogy, you are asking for the equivalent of coding a bubble sort and expect the compiler to detect this and replace it with a quick sort.
Take for example a really simple example.... a 10001 stage shift register, with a reset that sets all the elements to 0. The reset signal has a single stage of synchronization on it.
Version 1 - not architecture aware:
Artix-7 resources used : 1668 slices of 8150
Maximum clock rate : approx 322 MHz.
Version 2, which is written in an architecture aware manor, but 100% functionally equivalent:
Resources used : 84 slices of 8150
Performance : approx 354 MHz.
So in this case architecture aware coding saves 95% of the FPGA resources (and power), and runs about 10% faster. Would it also give better results on Altera Cyclone parts? I don't know - I've not read the device's architecture manual.
If nothing else, yosys should provide some advancement in the area of optimization. It provides an open platform for academic researchers to test out new algorithms and see the results on many designs.
There is already that sort of work out there - eg the Verilog-to-Routing project (
https://vtr.readthedocs.org ). In general academic research tools are well behind the performance of commercial tools. From
https://hes.elis.ugent.be/publications/analyzing-divide-between-fpga-academic-and-commercial-results :
In this work we compare the latest Xilinx commercial tools and products with these well-known academic tools to identify the gap in the major figures of merit. Our results show that there is a significant 2.2X gap in speed-performance for similar process technology
In case you are interested, Version 1 of the shift register:
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
entity connect_switch_to_led_a is
Port ( clk : in STD_LOGIC;
reset : in STD_LOGIC;
switch : in STD_LOGIC;
led : out STD_LOGIC);
end connect_switch_to_led_a;
architecture Behavioral of connect_switch_to_led_a is
signal shift_reg : std_logic_vector(10000 downto 0) := (others => '0');
signal reset_last : std_logic := '0';
begin
process(clk)
begin
if rising_edge(clk) then
if reset_last = '1' then
led <= '0';
shift_reg <= (others => '0');
else
led <= shift_reg(shift_reg'high);
shift_reg <= shift_reg(shift_reg'high-1 downto 0) & switch;
end if;
reset_last <= reset;
end if;
end process;
end Behavioral;
Version 2 of the shift register, which uses a counter to implement the reset, allowing the shift register to be packed into the LUTs:
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.NUMERIC_STD.ALL;
entity connect_switch_to_led_b is
Port ( clk : in STD_LOGIC;
reset : in STD_LOGIC;
switch : in STD_LOGIC;
led : out STD_LOGIC);
end connect_switch_to_led_b;
architecture Behavioral of connect_switch_to_led_b is
signal shift_reg : std_logic_vector(10000 downto 0) := (others => '0');
signal count : unsigned(13 downto 0) := (others => '0');
signal reset_last : std_logic := '0';
begin
process(clk)
begin
if rising_edge(clk) then
if reset_last = '1' then
led <= '0';
count <= to_unsigned(10000+1, 14);
else
if count > 0 then
led <= '0';
count <= count-1;
else
led <= shift_reg(shift_reg'high);
end if;
shift_reg <= shift_reg(shift_reg'high-1 downto 0) & switch;
end if;
reset_last <= reset;
end if;
end process;
end Behavioral;