Perhaps let me explain further what I'm trying to do.
I'm (re)writing/refactoring some of my low-level C++ libraries that are facing hardware registers or model abstract concepts (for example a circular buffer). As you can imagine, these libraries make heavy use of templates in order to make them low overhead when instantiated, also such that overhead from virtual methods is avoided where possible.
I'm using googletest for my unit tests. I wrote a small Python script that extracts the test code from all test cases, and copies the code over to a dedicated cpp file with it's own main. Since my C++ library currently only contains templated code, all classes live in header files, which makes compilation trivial to do.
I currently use the size and nm utility to view the resulting code size.
Considering the memory available in modern ARM chips, it's nice to also allow some more software luxury. For example, instead of having the user application calling I2cStart() I2cTxByte() and I2cStop() in during device communication; it's also possible to assembles a frame of 'tokens' (or bytes) to transmit, which can then be iterated by the driver and transmitted.
What is nice, is that there is more context knowledge available in the software. For example I could log all I2C transactions that fail and immediately know what the complete frame was..
Obviously all this high-level sauce adds overhead to the system, which may or may not be a problem (for I2C it's likely not, as synchronous implementations often burn cycles in a spinlock). This is what I want to characterize, or at least have some comparable figures between changes, rather than "guessing" how nice the assembly looks like (which is always going to be bloated depending on the level of abstraction and/or logging enabled during compilation).
To me simulation sounds like a reasonable choice. Remember that I still don't actually care if I'm testing against a real hardware peripheral. Ultimately I2C, SPI or any other protocol is a certain means of putting bytes onto a wire. It's not hard to write a "virtual" driver that just acts as if it's writing to peripheral space and back.
This is why I called it "figure of merit". I'm aware that I cannot measure how many microseconds or cycles exactly these operations are going to take, due to caches, bus contention, peripheral speed (peripheral clock domain could be 1/16 of CPU, for example), etc. In fact, if you're reading up on WCET for real time systems, you'll find that it's very hard to determine this for modern computer architectures (and discusses that measured WCETs are often not the "worst case")
You could argue this is not worth it. I know, this is just an experiment :-) This is also the reasons why many of my hobby projects never get "finished"..
To some degree, I think you could also argue that a CI system is not worth it; on face value it does not have a "functional value" in a software production environment. There are plenty of companies around that don't use such a system, and they still deliver their software to customers. Yet it's an great tool to make sure quality standards are met, that no discrepancies appear between builds and/or they are dealt with as quickly as possible..
I guess I could also use "real hardware" to accomplish this goal, but if there is a software solution available I would rather use that. It's easier to deploy at any random server (I run my CI stuff in a Docker containers, which again runs in a VM on a Proxmox host). In my home lab, ofcourse I could just dig up any ARM devboard with debugger and run all tests on it..
I think OVP wrote something about providing CPU models generated from RTL, so that certainly sounds like an interesting utility to try out. But I'm not holding my hopes up for a turnkey solution, considering that "regular" simulators are not easy to go by it seems.