That is -I'm sorry to say this- because they are right. Going at it from an RTL perspective entirely defeats the purpose of using an HDL. If you describe the function instead of the logic you'll get the job done in much less time and produce code which is much easier to understand, maintain and extend. I often need 3 lines of VHDL where others need 20.
Your method is like using Karnaugh diagrams to reduce logic equations by hand while the software tool which can do that for you in less than 1 second is right in front of you.
One thing that is starting to make itself obvious to me is that for the highest performance design, the HDL code looks like it has absolutely nothing to do with what is in the functional specification - usually the final form gets abstracted completely away from function. Once you get above about 150MHz or so you have to pipeline the snot out of everything and be very aware of the physical resources on the FPGA....
For example, I've recently written code for HDMI output that generates two 10-bit TMDS codes per cycle, for feed into a 20:1 serializer. The end result is a short pipeline - Stage one looks up the encoded parity values for the two data values in a 256x12-bit table, stage two does the bulk of the work and uses the parity information to select which codes is being sent, totals up the running parity, and starts lookup of the 10-bit codes in a 1024x10-bit table. Then the final stage sends the two 10-bit codes off to the serializer as a 20-bit word. It looks nothing like the HDMI spec would have you believe it should... all the smarts are actually in a separate bit of C code that generates the contents for the tables. And it could be just as easily written in VHDL or Verilog, the actual code that does stuff is about 60 lines with 200 lines of lookup tables.
Likewise with generating BCH codes, or calculating CRCs at multiple words per cycle, or implementing scramblers/unscramblers - and you can't get away from it. With FPGA clocking at a few 100MHz but data rates getting up to multiple Gb per second for the likes of Display Port, HDMI 2.0, 10G Ethernet or UHD video. You have to go very wide in your data processing, and keep down the per-cycle complexity of the logic as low as possible - and in a lot of cases you can't do that easily with code that is purely functionally descriptive.
When it comes to inferring DSP functions things are much the same. I have a function that needs to calculate the square of a 35-bit signed number. When inferring with "b <= a*a;" it uses four DSP blocks, but the actual function only needs three. After telling it to do exactly what I want I can do 25% more work on the same FPGA. The design also had quite a few free LUTs, so if I put one of the remaining three multipliers that make up the square function in LUTs rather than a DSP block, and can now get nearly twice the processing out of the same FPGA.
If you aren't working FPGAs hard then functional descriptions work really well and are preferred, but to get the best you really have to define the structure of what you want, and ensure that that structure maps nicely onto the underlying physical resources.