a platform with a JIT compiler tool chain with billions of dollars of development effort behind it.
That is not an "architecture feature." That's just capitalizing on "prior art"!
I want to know which CPU features make it easy to write better interpreters WITHOUT that much development effort.
I don't think there is a shortcut. Dynamic recompilation with hotspot analysis is the way to get good performance out of an interpreted language. It's not that architecture doesn't matter at all, just that as long as you have something reasonably sane and generally high performance you are going to be able to write the code generator for it.
Trying to have the processor directly execute the interprete code, as with Bruces' comment about jazelle, is the wrong way to do it. Maybe there are techniques that would really help, but mostly JIT compilers are compilers. Most of the things that make compiled programs fast make JITs fast.
(Sort of like "x86 chips prove that CISC is as good or better than RISC, if only because AMD and Intel put so much effort into making it so." (observation is from about when Apple switched to x86 from PPC, and the MIPS desktop vendors went away. Probably less valid now...))
No that shows that when you reach a certain scale instruction decoding becomes a progressively smaller fraction of your processor. There are still costs associated with e.g., tracking micro-ops through the pipeline so you can handle exceptions properly but once you have a super scalar processor with deep reorder buffers, register renaming, and multiple execution units, speculative execution, branch prediction and other internal caches, and so on that's where most of the effort goes. RISC is clearly superior, but that doesn't save you all the hard work of making an actually high performance processor, at least not since the 90s.