If the channel is single point, you can get by with source termination.
If the channel is multi-point (multiple receivers), you can use source termination, provided the receivers are connected with sufficiently short stubs, have low capacitance, and have some form of glitch prevention: hysteresis, input delay, synchronous input window, etc.
The reason is, at the source end, the transmitter goes from 0 to VCC, and the line instantaneously goes to VCC/2 (the line looks like a resistor, locally). This VCC/2 wavefront propagates down the line, until it bounces off the unterminated end. As the reflection propagates back, it still carries an amplitude of VCC/2, but this now superimposes upon the VCC/2 already present, and the line goes up to VCC. After 2*t (t = electrical length of the line), the voltage is settled. So, the inputs need to tolerate an indeterminate logic state (i.e., VCC/2) for a short period (zero duration for the last load, where the voltage flips instantly from 0 to VCC as the wavefront reflects off that point, up to 2*t for a receiver directly at the transmitter).
A synchronous input window (that is, anything where the signal goes into a clocked flip-flop, subject to setup and hold times: typical examples include synchronous serial, SPI, strobed parallel buses, etc.) is fine with poor signal quality, so long as the duration of the disturbance (of the order 2*t, but may be several times longer for a very badly terminated line that exhibits multiple reflections) does not violate the setup and hold times.
An historical example: the PCI bus ran up to 66MHz, a frequency which was limited by the maximum length of the motherboard expansion bus, with all cards equipped, assuming all cards are using the maximum allowable stub lengths. PCI is source terminated, so cards in different physical slots experience different signal quality, and all must tolerate this effect. PCI is a strobed parallel bus architecture, so all the data and signal lines were fine, but the control and clock signals (which define timing) must be treated differently, using one of the other methods I listed above for deglitching.
Load or source-load termination is generally used where very high bandwidth is required: you cannot get a stable logic signal before 2*t on a source terminated line, and that's that. Just as source-only termination does not depend upon a load terminator, load-only termination does not depend upon a source terminator. Indeed, for strong logic levels, you must drive the line with a source impedance much less than the line's. Which is what you've been intuitively following: a beefy driver, a big line, and a big old termination resistor.
For sending SPI long distances, I would recommend:
1. Don't. Use a robust long distance method, like asynchronous serial, or one of the -buses (CANbus, Profibus, ...), or you could do worse than whole friggin' Ethernet. These are designed to be, or can be made to be, very robust in the presence of signal degradation and environmental noise. Many will require isolated receivers (although Ethernet holds advantage here, as it's already transformer isolated).
2. If you must, at least use a good signaling standard, like RS-422 (as Mike suggested).
NOTE: you won't succeed sending this down CAT5 much distance, because the propagation delays between pairs are NOT matched; in fact, it's explicitly made different, to reduce crosstalk (namely, the twist rates are different -- which changes the effective velocity factor per pair). You will require matched [electrical] length cable, which usually means [individually] shielded twisted pair: cable of this construction can have equal delays, and therefore can be used to transmit parallel synchronous data a moderate distance (you're still limited by the delay mismatch tolerance though).
3. To be fair, coax isn't bad, and the fact that it is its own ground and shield has some advantage over the twisted pair plus implicit ground.
Woe be unto those who dare attempt RS-422 without a common ground, and a good one at that -- long RS-422 cables either require contiguous shielding or individually isolated receivers!
But coax is expensive, bulky and inconvenient to handle.
There's also the option of ribbon cable, if you're feeling cheap and don't need it to be tough -- ground every other wire and it kinda looks like a balanced twisted pair, except untwisted, and more than a pair alone. Delay match should be good, but don't expect noise performance to be great.
Tim