well, dealing with Branches in the processor pipeline has been solved by several branch prediction techniques, from
Static Branch Prediction (the cheapest) to
Dynamic Branch Prediction.
in every conditional branch Instruction, the branch is taken only if the condition is satisfied, and the branch target address is stored in the Program Counter (PC) instead of the address of the next instruction in the sequential instruction stream.
The branch that is guessed to be the most likely is then fetched and speculatively executed. If it is later detected that the guess was wrong then the speculatively executed or partially executed instructions are discarded and the pipeline is flushed and it starts over with the correct branch, incurring a delay.
In short, the smarter the branch prediction goes the fewer penalties are in computation being.
Therefore the point is: enhancing the prediction is a good point, and within the class of Dynamic Branch Prediction we are able, in theory, to collect a deep history of instructions and conditions running in the CPU in order to apply advanced statistics and even AI algorithms to predict branches with a low failure rate, it's called hyperdynamic branch prediction, which sounds exciting (so they said in Intel, "hyper" comes from the need of using a hyperplane, a matrix with 3 dimensions, to keep the information) except this introduces more complexity in the design of the chip, it potentially slows down the CPU ( it doesn't scale well on frequency) and more area needs to be taken in the silicon.
So ... we all prefer the KIS approach, and we have resolved the problem with the compromise of accepting a smaller pipeline. The smaller the pipeline goes (which means fewer stages) the shorter the delay of penalty goes on wrong branch predictions.
oh, reintroducing instructions like "do loop until this counter is greater than zero" would be great, but ... again it costs more complexity in RISC design.
umm, to be honest, not so much, and "Arm" stands for [A]dvanced [R]ISC [M]achine therefore if it wishes it can copy this feature from whatever implementations we have seen in CISC CPUs, even take the challenge of implementing the
hyperdynamic branch prediction that will turn a CPU into a Skynet AI-driven chip(1), but ... for sure it's not compliant to the pure and minimalistic approach of MIPS processors
(1) kidding. The chip mentioned in Terminator movies.