Speaking of CPLDs, the major difference AFAIK is about resource allocation/organization. Think, multiple PLDs in one chip. Big wide min/max-terms arrays, the usual flip-flops, cascade terms, and one or a couple outputs (complements, cascade, clocked, etc.).
Whereas FPGAs are, more or less the same per cell, except with a tiny LUT instead of an AND/OR array. Often 4x4, simple RAM lookup, but bigger ones go 6x6 or more. But rarely the, like, 12x7 or whatever array CPLDs may have.
PLDs in turn, I guess, arise from the input pins; you want to matrix all possible combinations, since it's a small chip, and you can offer some reasonably powerful functions that way. So, kind of a lot of options, but feasible for just a single chip. Then, suppose you put a few of them together, well, then you get a CPLD, right? etc.
FPGA interconnects I think tend to be more complex, including crossbar matrices and buffering, and just more routes to choose from, cells to place (including potentially routing through cells to reduce bus contention), etc., which is really where the timing comes in. Compared to CPLDs, there's just less to route between dozens or hundreds of CPLD cells, vs. thousands in FPGA, but that really just means you have more responsibility to set timing constraints, and less freedom in them (fewer cells, and permutations thereof, in a critical path), but because FPGA cells are faster than PLDs per cell, it's about even either way.
Tim