"100/125mhz clk/pclk" means that there are 2 clock domains in the design -i.e. part of the design is clocked at 100mhz, and part at 125mhz.
PCLK is the clock generated by the usb3 clock recovery circuitry. Don't worry about that.
Fmax generally refers to the maximum frequency you could clock a certain blob of logic at, before it starts failing. I'm not sure why CAST decided to use it for the internal clock. It's usually something the fitter tool generates telling you how much margin you have left.
BRAM/Memory bits are the same thing, vendors quantify them differently. It's like L1 cache memory in CPUs, very fast, expensive, and small
DSP/M14k/Multiplier blocks are hardware multiplers, MACs, etc.
Each vendor has their own FPGA architecture that's slightly different and theres no standardization.
I got the average gate counts by taking the 95000 ASIC gates of the reference implementation and dividing by the number of vendor-specific logic blocks in each type of FPGA.
After a while of seeing these numbers you can just grab numbers out of your posterior and usually be right.
So to have a "gate count" number and figure out how big of an fpga you need, it depends mostly on:
1. FPGA Family
Altera: Stratix, Cyclone, Arria,
Xilinx: Virtex/Kintex, Spartan/Artix (new 7 series parts are the *tix names)
2. Other IP inside the fpga
The more stuff you put inside, the worse routing gets. You may have a bunch of logic elements or slices but lots of the time, you run out of interconnects! The result is you fall off the edge of a knee curve with regards to your Fmax I mentioned earlier. Very easily a crowded FPGA will not be able to meet timing requirements. The solution is to use a 15%-20% bigger device than you actually need.