1. Bitrate is twice the bandwidth.
2. Bandwidth is how much EM frequency you can shove down the structure, and receive at the other end.
3. If you have parallel wires, then the bandwidth includes DC, and extends to a modest upper limit, where the waves start to break up because their wavelengths are less than the wire-to-wire spacing or separation. (This limits coax and CAT5 to a few GHz.)
4. If you shrink the structure, you can fit smaller waves between them. But losses go up.
5. If you replace the wires with dielectrics, then you can't carry DC anymore (there will be a LF cutoff), but losses go down. (Waveguides are an example of this: consider a coax cable with the center conductor removed.)
6. If you replace everything with dielectrics of varying material (instead of a shield and conductor), you get fiber optic cable.
Fiber optic cable has a fairly strange frequency response, where bands (of some 10s of GHz width) propagate smoothly (meaning, without dispersion: some frequencies showing up sooner or later than others within the band), separated by anti-bands (as it were) where propagation is poor (highly dispersive, the velocity is changing with frequency: so pulses of information won't stay together as pulses). The number of bands (modes) is quite reasonable (hundreds, for near IR I think..?), so the total bandwidth can be quite high (Tbps).
Whether this counts as serial anymore is up to you, I guess. Typically, each band is treated as a serial channel, with a clock recovery system, and modest bandwidth (minding that "modest" here is comparable to the CPU-memory buses of last decade's state-of-the-art PCs). Data is incoherent between channels, because of the differing delays (the velocity of each band is different), so a single bitstream would have to be reconstructed with the help of buffers. In a sense, it's neither parallel nor serial: the channels certainly aren't synchronized (as in a parallel bus), and there isn't a single serial stream.
I don't think there can be a technology that offers higher bandwidth than fiber optics, not more than arithmetically so.
First of all, electrical bandwidth of sufficient magnitude is "light" (yellow is around 500THz). Frequencies that high can't be confined by metallic conductors, so we must use dielectric structures: dielectric waveguides, fiber optics.
I don't know that a waveguide can be designed in such a way as to eliminate modes; perhaps a gradient index and exponential cross section could confine smaller wavelengths to a smaller section of the waveguide, keeping the geometry, and therefore velocity, constant, thus allowing nearly full, continuous bandwidth without breaking it into bands (i.e., using a solid, say, 50THz or more, around a 500THz carrier).
Significantly higher carrier frequencies aren't practical, because many materials absorb UV. There are still ways to deal with it, like vacuum and glancing-angle optics (which is how the Chandra X-ray Telescope is constructed), but I think I would be surprised if the efficiency of such a system could ever be practical.
Other media can transport information: acoustic waves for example, but that's much too slow (and lossy above mere ~MHz). Gravity waves propagate much the same as light, but it seems unlikely that there could ever be a means of manipulating that for more than very low bandwidths (perhaps uHz or less, since we're talking about altering the orbits of, say, neutron stars..). The propagation also isn't any faster than light, though it does penetrate through, well, everything in the universe.
So we're a fraction of the way there already: some Tbps in a single fiber. Perhaps 100Tbps will be achievable some day. Beyond that, I think it will come down to printed (or grown) structures: optimizing for lower energy (perhaps middle to near IR, which costs less voltage to emit) and greater numbers of channels (perhaps a multi-core fiber that packs a thousand, or a million channels, into one strand). Any more than that would seem to strain any kind of information-theoretic requirement; it will be much more expensive to distribute such quantities of information over such distances where this would be required, versus piling a bit more computronium* in one place to achieve a similar task. Even if it takes actually quite a lot more, because of network prediction requirements as well as just whatever's being computed along the way.
*Computronium: a supposed substance, of macroscopically amorphous nature, which (somehow or another) receives a higher grade of power (be it electrical, thermal or what) and information signals (electrical, optical, etc.), and outputs low grade power (usually waste heat) and processed information signals. A block of silicon chips is a step in this direction, but the information-theoretic limit (in terms of computation/watt and computation/cm^3) is something like >6 orders of magnitude more dense than where we're at now. (The far-out fantasies being something like: suppose you have a computronium shell, of a sort which is powered by a thermal gradient. Construct the shell around a star. Now construct a shell around the shell, and so on, each layer operating on a lower average temperature, and whatever thermal gradient it gets. Now you have a Dyson sphere with truly astronomical thinking power!)
Tim