The simple version:
Let's say you have a very simple FPGA with each cell containing a single 2-input gate, and an optional latch. The inputs can be driven by the outputs of any of the neighbouring cells, or by lines that stretch across the whole FPGA.
The way you do this is to have multiplexers on each of the gate's inputs, selecting one out of all of the available sources for each input. The address inputs for these multiplexers are driven by registers that are loaded by the bitstream.
Every cell contains a latch. The clock input is driven by another multiplexer (selecting either the global clock or a constant '0'). The output of the cell is another multiplexer, choosing either the output of the gate or the output of the latch. That makes the latch optional. All of these multiplexers are driven by more registers loaded from the bitstream.
The gate itself is just a small 4x1 memory. Feed the inputs to the address lines, and load the truth table for whatever function you like into it.
Real FPGAs have more inputs (which means a larger truth table) and a lot more optional features in their cells. But that's basically it. It's all multiplexers and small memories.