Well, maybe I missed it, but a simple memory map would have helped a lot.
Apparently the message queues get allocated at the bottom (not quite but they seem to almost start there), then the threads/tasks get allocated as you define them (with the proviso that the order gets a bit funny if a thread starts off other threads in which case, as you might expect, those get allocated last when the whole thing starts up), and eventually you can see how much RAM is left at the end, by looking at what you originally filled the whole block with (I filled with 0xAA, while the RTOS fills each thread's "stack space" with 0xA5 once it starts up).
As per my other thread on the stack sizes, what is left of each thread/task's stack space after running for a bit can be checked properly only by looking at how much of the 0xA5 fill is remaining. A graphical tool which maps out that whole memory block onto a big LCD, with 0xA5 bytes in a different colour, would be dead handy
Or even just write out a file to the filesystem if you have one (which I do; might do that) and then somebody in Ukraine, on freelancer.com, can write a little win32 utility which plots that file graphically
EDIT: just done the code to generate that 64k file and put the project on Freelancer.
Some funny SP switching takes place because interrupts still use the general stack, so you don't need to leave space on each thread's stack for ISRs - AIUI. Not sure how they do that; must be some ISR time overhead.