I can't really make any suggestions on whether you should buy them all. You know what your budget is like, I don't. $60 each is a very good price. They do look a bit rough, but that could be just dust and dirt. I'm guessing that the humidity is high where you are. Is there any sign of rust? One thing to remember is that due to their age, they might report the wrong date due to the 1024-week rollover bug that affects all GPS Time/Frequency receivers. This only becomes an issue if you wanted to use one with an NTP server and even then, there are often workarounds.
Once you decide how many to buy, the fun begins. Of course, you have to test them all. Even if they all work, wiser people than me have stated that 10811 oscillators are not created equal.
http://www.leapsecond.com/pages/z3801a-osc/ . Your CNT-90 counter and Wenzel oscillator should be able to determine which is best. Use Timelab to collect the data.
I don't see the point in sending out for calibration. I don't know what that would consist of, but if your tests show that it works, then it works. What more could a cal lab do?

I'm not familiar with the advantages of an ensemble, but the idea of an ensemble of identical units sounds like a lot of work for limited return. An ensemble of different units would have different strengths and weaknesses and so would have a bit more value, but it still sounds like a lot of work for limited return.
I suspect that, if the budget allowed, most people would buy them all, keep the best 2 or 3 and sell the others.
Ed