Yes HDL languages are still programing languages but the big difference is that the end result of compiled code is a logic circuit, not a sequence of instructions.
Main point of a HDL language is to save you from having to do everything with logic gates. You can describe your intent using better readable structured code, then the language figures out a way to actually implement it. But in the end it all becomes logic again. Some seemingly simple things in code are actually very tough to implement in logic, so a single line of HDL code could actually cause a giant pile of logic to be placed into the final result that takes up a lot of valuable FPGA fabric space and might run really slow too.
This is especially important because in FPGAs the tradeoff between size and speed can be enormous. In a CPU you have a certain number of registers available and you want to use most of them, then they get reused for other tasks anyway. But in a FPGA most the circuitry you put down is dedicated to doing only a single task and there is no limit to how much resources you can dedicate to a single task. So if you need to add up an array of 512 numbers in 32bit you could run them 1 by 1 trough a 32bit adder and take 512 clock cycles to do it, or alternatively make 512 adders and do the calculation in 1 clock cycle. Similar decisions have to be made inside HDL compilers, yet not all of your logic should be optimized for purely speed or for space. So as a result you have to be a lot more aware of what you want the final output to look like. This is why i often tend to look at the synthesized output of my HDL code to see if i agree with what the compiler decided to do. If it got the wrong idea i might change the code around a bit to be more specific about the way i want it done.
When it comes to complex HDL designs like processors you definitely want professional specialized HDL designers. But when it comes to simple things like making some glue logic or a simple video format converter or something, you can in most cases teach a digital engineer to be good enough in HDL in a few days.
But the one thing i have learned is that software developers will always want the most powerful over the top hardware possible. They appear to need 200MHz and 64KB of RAM to do the most basic of things these days.
That is just as BS as saying that HW engineers want 12 layer boards so they don't have to sweat when laying out the traces.
The unfortunate thing seperating HW from SW is requirements and product features that say are frozen for hardware at start time 0+1 year.
Then the board is finished tested and done.
With SW the initial requirements should also be frozen at that time.
Then you can get an optimized design.
Well yes some modern products are complex to a point where they need some some large complex code to do what they do, or perhaps they are doing a job where there is no way around lots of horsepower such as driving a high resolution display.
I am talking more about simple applications where it doesn't actually have to do much. Things like reading 4 analog voltages and 2 digital inputs and sending it over CAN. Or read some MEMS IMU sensor values and send them over RS485. Or a battery monitor that calculates the SOC from current and voltage and makes that available on CAN etc... All of this are examples for some reason required a pricey high performance ARM MCU with a a FPU and memory in the triple digit KB. When they ware told to stay in sleep mode as much as possible to not drain the battery, the result was the device sleeping for 1 second followed by running at max CPU clock for 3 seconds.
On the other hand in one of my projects i end up using a PIC16F with 256 bytes of RAM and about 1.5KB of ROM to implement a motor controller with PID regulation, UI, battery management etc. The software side of things did need a fair bit of math trickery to run fast (since this MCU doesn't even have a 8bit fixed point multiply instruction) but it worked and it was dirt cheep and very low power.
But not that i would be the kind of person that ONLY writes in assembly for ancient obsolete chips. I do also make use of high performance ARM MCUs or even Linux SoCs when the application needs more oomph. I just use a crappy little MCU when there is no need to use something bigger.