As it happens I have the processor manuals for both the R2K and the 88K.
So do I :-) And correspond frequently with two of the main designers of the 88K, one of whom also did a lot on the original Athlon64, and both were involved in designing a new GPU I was working on at Samsung from Sep 2016 to Mar 2018. Both were also while at Motorola co-authors on the seminal 1991 paper "Single Instruction Stream Parallelism Is Greater than Two" (in fact they found it's typically over a dozen, with perfect branch prediction, but 2 to 6 with reasonably achievable hardware at the time)
There are at least 3 editions of P & H since the R2K was mentioned. Those editions came out not because they wanted to sell books, but because the world had changed so drastically that the previous edition was no longer relevant to current products. The discussion in the 1st ed is dominated by the branch delay slot. Then came speculative execution and branch prediction. I have no idea where we are now.
That's not entirely accurate. Yes, branch delay slots were a mistake, as both H and P agree now (which is why RISC-V doesn't have them). They happened to be convenient at one particular point in processor implementation, but complicate all other implementations. And with good branch prediction as we have now the reason is completely gone anyway, even on simple implementations -- we now know how to make fairly small branch predictors with just a couple of percent mispredict rate, while the number of branch delay slots that can't be filled (except by a NOP) is typically in the tens of percents.
The reason to change processors in later editions of H&P is I think to use examples with which students are familiar -- or at least have easy access. Hence the last two editions using Aarch64 and then RISC-V.
I've got the 4th ed on my shelf, but haven't had an HPC project, so not a lot of point in wading though all that detail. It's probably entirely different now. And certainly will change drastically after the discovery of the speculative execution vulnerabilities.
These vulnerabilities are pretty easy to avoid if only you realise that you need to avoid them!
The very simple principle to follow is to ensure that work done during speculative execution is completely undone in the event of a misprediction.
Of course this was already followed for such obvious things as stores to memory. There are just a few more things now that people are starting to realise must be buffered and completely rolled back as well.
As an example, if a cache line is fetched as a result of speculative execution, it must be held in a special area until the speculation is over, and only then written to the cache itself (and in particular, not result in the eviction of any other cache line until that point).
The same goes for updates to the TLB, branch history buffer, branch target buffer.
I think we will also see a move to non-inclusive caches. In the recent practice, a cache line brought from main memory will be stored at the same time into all of L1, L2, and L3 caches. This complicates rollback. In future I think we will see cache lines brought just to L1 at first. Only when they are evicted from L1 by a newer cache line will they be written to L2.
None of this is very difficult. Nor is it expensive in circuitry. And it does not cost execution speed.
The difficult part is mostly in realising that you *should* do it.