there are some special-snowflake SPI devices that don't use the CS line for framing.
Indeed - even the very newest incarnation of the STM32 SPI peripheral, which has had claimed "hardware nSS" for over a decade, and they have gone through numerous iterations so they have had chances to fix the damn thing -- and it
still doesn't actually support CS line for framing in slave mode, which is ridiculously funny as it would be trivial for them to implement. What you need to do is to program a general-purpose interrupt on the CS pin, and then go there and use the power/clock control registers to issue a full SPI peripheral reset, since it cannot be even reliably controlled using its own registers.
This is made especially awkward since the whole grand idea behind using an actual electrical line for
framing is the best thing since sliced bread - not only for speed and synchronization, but also for robustness and
code simplicity, for all parties involved. It's
the main selling point for SPI. The only downside really is having to route an extra signal, but it's definitely worth it. And yet some designers (at ST, for example) are utterly stupid enough not to see this, and completely misuse the SPI, losing its #1 benefit. But apparently, there are people out there who don't know what "framing" means, why it is needed, and how difficult it is to live without - these are most likely software folks who are so accustomed to creating complex, hard-to-debug state machines that they don't know sometimes there is an easier way to do it that fundamentally avoids most of the protocol complexity they have learned to live with.
OTOH, I2C implementations tend to be equally (or even more) broken; and microcontroller I2C implementations have notorious history of being massively bloated, hard to configure and providing little benefit over bit-banging. This has recently got better: for example, the newest I2C implementations on the newest STM32 devices
can do actual DMA transactions in the background (woohoo!

), without you babysitting every signal level change. This was impossible just a few years ago; you needed to poll and bit-bang the control registers with correct timing to the extent that you could have bit-banged the actual SDA/SCL instead. So while a higher-level, easy to use I2C libraries do exist (and due to the standardization of how the protocol is used, they even tend to work!), they also tend to be blocking, slow calls to the extent that you just cannot use them in an actual product that needs to multitask something else than just communicating with I2C devices, which is almost always the case. So now you are left with trying to utilize the (usually broken or difficult to configure) HW I2C peripheral to the full extent, and writing interrupt-based I2C implementation that works with your specific device and application.
SPI, on the other hand, tends to be much easier to get working interrupt or DMA based, with less bloat, and fewer states. For example, a specific interrupt-driven I2C implementation which just efficiently reads sensors on an STM32F205 was around 1000 lines and developed for a full working week; another working with SPI similarly was below 100 lines and developed in half a day.
I do agree that if you are expecting speed, synchronous operation and timing predictability out of I2C, you are using the wrong interface. Many sensor devices which
could be used in more timing-critical products, do have a selectable I2C vs. SPI interfaces, and you are supposed to use it in SPI mode if you need these features.
I2C's great when all you need is a bunch of easy-to-glue-on sensors, and you can afford controlling them in inefficient blocking calls, every now and then in the non-timing-critical main thread. For everything else, it starts to get really messy, really quick. If you feel like you need to tune the I2C bus frequency to exact 400kHz, you are probably in this "don't use I2C" territory.