I see that this metaprogramming argument is somewhat confusing, so let me share my story with Chisel.
I have a long-running personal
project in the area of advanced data structures. It's mostly a personal research and a source of fun, but I had some non-trivial production experience with it. The idea is somewhat similar to Chisel/RocketChip, but instead of generating hardware, I'm generating application-specific
data storage and processing engines based on high-level requirements. Implementation is very different, Memoria is using C++ TMP, and because of that its metaprogramming core is not even nearly sophisticated as Chisel's one.
Advanced data structures have been considered impractical because of high CPU costs. Data usually is stored in memory in a some compressed format, so each access operation might have pretty big hidden constants in O-estimation of its time complexity. OoO helps a lot but doesn't make a real difference. Nevertheless, potential is huge and very appealing. For example, we can use compressed multidimensional spatial trees (with space complexity of 2 bits per node) to encode functional approximation and use it instead of multilayer perceptrones (modulo the curse of dimensionality). Perceptrones have linear complexity form the number of parameters in the hidden layer, but spatial trees have logarithmic complexity. Huge difference for power-constrained use cases.
Compute-intensive applications require special hardware that, from another side, must have software-driven design. Or
software-hardware co-design, if I don't misuse this term. I tried to jump into HDL area to develop application accelerators for Memoria, but actually HDL is an another full-time+ job. And I'm already having two (regular job and personal projects). Anyway, there were no FPGAs with HBM memory, and all such things I'm talking with are memory-bound. But recently I could be able to buy Alveo U50 with 8GM HBM, that is a game changer for hobbyists like myself. Also, Arty A7-100 as a "starter kit".
Long story short, I was able to get custom version of RV64 multi-core setup with DDR3 running on my Arty A7-100 in just a few weekends. Zero-to-hero. Original Freedom SDK does not provide DDR3 for E310 soft risc-v core (because it's an MCU), I had to solve the problem myself. Important thing is that I have a lot of freedom within the design space. I can implement single "fat" OoO core, or even multicore SoC, or manycore (dozens+ on U50) system of small cores equipped with application specific accelerators.
The latter is the most important thing. Given specific class of problems, I can infer the kind of hardware acceleration it is required (matrix multiplication, FP-intensive, integer-intensive, memory-intensive, some combination of the previous, some specific memory hierarchy, etc) and
to generate specific accelerators for RVxx cores, as well as the
software part.
This is what software-hardware co-design is (at list it is how I see it). Chisel + RocketChip is responsible for the hardware part. But in order to understand this technology better, it should be put into the context of much larger process where there are multiple generators responsible for different aspects of the system.
Chisel, as standalone technology, is not that impressive even for myself. Scala is good for metaprogramming, as I have said above, but unfortunately it's not
good enough, given modern requirements. Scala's build tools are still based on the good old "make" paradigm from 80th, but now "on steroids". What we need is fully-featured dataflow-based data platform with elaborate RESTful API for external automation and integration, exposing the entire build process together with all intermediate artifacts available through this API to external tools like IDEs, we need extensible analytics on top of intermediate data. With today's Scala you will have hard times trying to get RocketChip working normally in an IDE. Emerging LanguageServer pattern is the right way to go, but it's still in the infancy. C++ is nowhere near as well as Scala, but I'm working to
fix this situation.
The whole thing of Chisel is not the HDL itself, and not the Scala, of course. It's that advanced software engineering practices like utilizing intermediate representation preprocessing for various automation tasks are entering traditionally conservative hardware design and engineering. Chisel is just one of the early birds. Much better things are coming. Thanks to the RocketChip, I was able to jumpstart completely new area for my project in just a few weekends. Yes, it's a good time to be alive