There are usually three ranges (per rail) typical, corresponding to standard output stage designs. Darlington followers don't go closer than 1.2V, and typically run out of headroom in the 1.5 to 2V range. Many audio amplifiers are this way, because they are intended to run from +/-15V supplies, and +/-13Vpk is more than enough for any audio signal purpose. These usually have low distortion and full specifications (like a well controlled current limit).
The next step up is a single follower, which gets within maybe 0.6 to 1V. Often, one side acts like this, because a PNP current source might be used to pull-up an NPN follower -- the NPN has higher beta, so is more suitable for driving the output. The pull-down might be a common-emitter NPN (as in the LM358, hence it can saturate nearly to V-), or a "boosted" combination (like a Sziklai pair, giving PNP behavior with the performance of the NPN).
BTW, crappy PNPs were a characteristic of early bipolar circuits. For example, 74xx series TTL was entirely NPN (hence the crappy pull-up strength and logic signals biased towards GND). Fabbing PNP would've required an extra fab step, increasing cost and reducing yield. Early analog chips (most anything from the 300 and 700 series by LM, uA, etc. prefixes) used "lateral PNP", meaning they hacked a PNP transistor using a poor, sideways geometry, leading to minuscule hFE (typically 1-5) and symmetrical breakdown voltages (a benefit for some purposes, as this gives the full +/-30V input range of many op-amps and comparators). Modern bipolar processes include complementary and BiCMOS (good bipolar and CMOS together in one chip) so this isn't a problem anymore.
True rail to rail capability is only attained with "open collector" (or drain, for CMOS) design. This is challenging because the output node becomes a gain node, which makes compensation complicated. Current limit is also poorly defined, because it becomes dependent on hFE and bulk resistance -- neither of which is very well controlled in an IC. (The ratios are well controlled, but it's very difficult to arrange a good ratio in this situation.)
The same problem befalls LDO regulators, which have notoriously inaccurate current limits -- often enough only to survive startup and very momentary shorts, not long term stress.
Design of the output stage itself is complicated, because in the complementary emitter follower (and circuits like it), it is easy and well defined, which side is conducting current -- one turns on and the other turns off. With independent common-emitter outputs, all the current control has to be designed perfectly, before reaching the outputs; and if you get that wrong, you might sink the full output stage current (which for an op-amp, might be around the current limit rating of 20mA+) straight from +V to -V accidentally, massively increasing power dissipation!
Tim