1. Yes, there is a real need for safety-critical application. Memory failures do happen. Not all of them may be caught by a simple test, but when the test does catch one, you will be thankful you've implemented it.
2. Do your checks way before OS starts. They need to happen after clock initialization, but before anything else happens. You can write the checking routine in assembly and make it not use the RAM at all, but this is not necessary. Some small amount of RAM would be used by the checking code, and it is fine in most cases.
3. None. It is necessary.
EDIT: Sorry, misread. For a non-safety critical application is is not necessary at all. On the flip side, you can just phone-in the check and make something trivial just to check the box.
But generally, the more checks you can implement the better. It is a matter of how much money you want to spend upfront on trying to catch any hardware issues vs how much it may cost you to recover if 1 out of a million devices fails in the field.
For automotive applications this calculation is easy, it is way cheaper to do the checks, especially given that you only need to write the procedure once and it gets amortized over many years of use. And getting a recall even on a single car is logistically hard.
If you are making a $20 toaster, who cares?