One of the primary determinants of hFE is base layer thickness.
When a transistor is diffused, a very small difference in time*temp results in the dopant going deeper into the material (or not). Mind, the critical distances we're talking are microns here. For a double diffused BJT, the device starts with the collector, from moderately doped substrate. The base is diffused in, using opposite polarity dopant (i.e., N substrate, P base). The concentration of P is more than N, so that P dominates; the N dopant is a contaminant that tends to reduce breakdown voltage, carrier lifetime and increase leakage current. (Or, wait, is it just leakage? It's been so many years since I calculated a BJT from principles!) If that dopant goes deeper, it will be more spread out (more lightly graded junction = higher breakdown voltage), and the base's P concentration will be lower. Then the emitter diffusion is made, at even higher concentration N (since it has to dominate over the base's P), and for shorter time (less depth), giving a sharper junction, and lower breakdown voltage.
This is fundamentally why V_ebo is low (typically 7V): the emitter concentration is high and the junction is thin, same way you make a low voltage zener diode.
(By the way, when I say "junction", I really mean "depletion region".)
hFE goes as base concentration * emitter concentration / collector concentration, roughly speaking (I forget what function is applied, if it goes as, say, sqrt(conc) or log(conc) or what). So you want a lightly doped collector (which also gives a usefully high breakdown voltage), strong base, and very strong emitter.
hFE also goes as base thickness. The carrier diffusion distance in silicon is on the order of 10 microns. So a base thicker than this has practically no hFE: the emitter emits charges into the base, but they recombine before being collectored. (Good example: the input structures of a typical 74HC gate are separated by distances this large -- the input pins' GND side (substrate) ESD diodes exhibit a small hFE between themselves, while the VCC side diodes do so between inputs of a given gate. The two hFEs combined, of course, makes an SCR, which is where CMOS latchup comes in. In 74HC, latchup typically occurs at quite high currents, ~140mA.)
You can't make the base too thin, because for one, it's hard to (especially in a diffused process), but also because the depletion region varies with voltage. If the base is too lightly doped and thin, the collector junction can swallow the entire junction, and it then overlaps the emitter junction: this is called punch-through. As collector voltage rises, base thickness falls, so hFE rises even further -- Early effect. In the extreme, Early effect takes over completely, which looks like avalanche breakdown.
So now we get to the relevant part: a thin base is also slow. You have a resistive sheet sandwiched by two capacitors. If the sheet spans the entire width of the junction, uninterrupted, it doesn't go very fast: germanium alloy junction power transistors may be a good example of this, yielding fT in the 10kHz range. Most power transistors (since the 60s, say) have an interdigitated structure, with lots of base perimeter making contact with metallization -- this keeps the maximum width of the base sheet modest, giving fT of a few MHz.
On the other extreme, RF transistors are all perimeter and hardly any sheet width: heavy doping is used to keep carrier lifetime down (giving Vceo typically 10-30V, Vebo 2 or 3V). This also reduces diffusion distance, forcing smaller feature size. The base and emitter are heavily interdigitated, giving low base resistance (Rbb') and a short time constant. They're made of so much perimeter that the vertical (sidewall) and surface parts of the junctions contribute significantly to performance, not just the buried, planar part. (This necessitates different SPICE model designs, offered in the enterprise grade simulators.) DC hFE is typically lower, 20-100, but it extends to far higher frequencies: as far as I know, plain Si BJTs run fT up to ~10GHz (with SiGe HBTs taking over above, as well as PHEMTs and the like).
fT is normally assumed to be a dominant pole effect, so that means the transistor considers "DC" as anything up to (fT / hFE), or >100MHz for such an example.
Obviously(?!), this applies to other kinds of BJTs as well, not just diffused. Triple diffused types can have lower resistance (the collector isn't a thick slab of lightly-doped resistance). Epitaxial types have total control over layer thickness and doping profile (so, don't need to be subject to compromised mixtures of dopants -- lower leakage, better figure of merit).
Tim