Get back to the derivation of the channel capacity theorem. If you pollute individual QAM symbols with AWGN, the symbols where the AWGN content is below half the Euclidean separation of the constellation points will demodulate correctly. Those where it is above half will not. If you could average out all the peaks and troughs of the AWGN to a fixed level, a separation of constellation points just over twice this average level would ensure every symbol demodulates correctly. Any type of modulation will give you the same picture. I just used QAM as an example, as its easier to visualise what is going on with a simple constellation, than with something more complex like OFDM. It is conceptually possible to flatten the noise, and the capacity (the maximum possible error free throughput) of the channel is where you are staying just the right side of the noise line. Work the maths of that through, and you have Shannon's equation. The big question is how do you smooth the noise to pretty much a constant level, while keeping the information bits flowing through the system at the required rate? Essentially, you need to spread the information over time, to the point where the average of the noise over the spreading window is pretty much constant. If low latency is important in your application you might need to compromise on the spreading, but conceptually you can spread and spread until the noise is nearly constant, and then you nearly achieve capacity. Modulation schemes like OFDM spread the information quite a bit through time, but most modulation schemes use short term symbols. If you try to make really slow QAM symbols you will have to put more bits in each symbol, which is not the effect we are looking for. So, something more than modulation is needed to approach capacity. Channel coding, which smears the information bits through time, is the solution. Exactly how you optimally channel code has been a research topic for years. Recent developments like Polar Codes, have got us pretty near to optimal, and we can get darned close to channel capacity without ludicrous latency. Let's be clear, simply calling channel coding FEC is misleading. There are many forms of FEC used in comms systems, but a channel coding scheme specifically works because it efficiently performs the required temporal smearing of the information.
Most designs start with a given channel bandwidth. Then you choose a modulation scheme. The bandwidth and modulation scheme pretty much define your symbol rate. You should know the bit rate you want to achieve, and from that and the symbol rate you have your minimum bits per symbol before coding. The channel capacity equation will tell you the SNR it will take to achieve that. Now you need to devise a combination of modulation and channel coding which will let you get close to capacity, and achieve your desired bit rate with a low BER. Let's say you use QAM. Every extra bit you add to the information stream for channel coding doubles the number of constellation points, and loses you 3dB, which is a big cost that the coding needs to recover before it offers you any benefits. Noise mostly causes you to mistake a constellation point for one that's one or two steps away. It would be a once in a lifetime super extreme point on an AWGN waveform that takes you far across the constellation. Most coded QAM systems don't even code the bits which represent big steps across the constellation. They code the bits which represent small steps, and they usually end up with just one or two extra bits in the stream to the modulator. The improvement channel coding schemes bring comes more from how effectively they smear things through time, than how many bits there are.