I also reviewed both ucPython, and eLUA and they are both pretty slick projects, very nicely done I thought. They do require a LOT of Flash and RAM compared to Forth
Yup, they are for "big-MPUs", but this can make sense nowadays. For example, ucPython runs decently on the CASIO above, and it's more than useful on the Numworks calculator.
The CASIO-BASIC is cool (and has improved a lot since the first CASIO FX-7500), but ucPython is better
There was an interesting project with Forth: the GameDuino1. It's an Arduino-Shield (5V, it fits on Arduino-2009 boards) with a Xilinx FPGA (Spartan3/200-LE-something) on it performing a sort of graphical VDU with a VGA interface, and it's able to accept Forth-like statements via SPI.
The FPGA implements a stack machine (written in Verilog) which is somehow "Forth-modified" but still "Forth-compliant".
Potentially it's very powerful, but ... people have never appreciated it so much, in fact, the second generation, the GameDuino2 is completely different.