So 10-20MHz polling? That may be doable with a micro, but a lot of their pin frequencies (outside of the hardware specific high speed interfaces) are going to be lower than that. As for the memory.... virtually any external decent speed memory should be fine. You're talking up to 128Mbit per second (64 bits by 20MHz), which could even be done on a fairly narrow data bus. You share RAM on the cycles you're not writing data to memory... and for any fast processor or fast external memory, that will be at least several cycles of other stuff that can happen on a single core. Using external acquisition (like latching buffers triggered by your sample clock), you could set up a small counter to cycle through banks of latching buffers, so that you could acquire at full speed and then read at slower rates. Two banks and a flip flop on the sample clock would halve the rate needed by the GPIO to read, but you'd need double the pins to read it. That said, I've seen reports of RPi GPIO being read at over 10MHz (even over 20MHz), though I don't know if there would be gaps in acquisition timing for other tasks.
Still, the simplest way to be sure is an FPGA. Tons of GPIO pins, plenty of RAM for small scale storage on chip and ready made interfaces for fast external RAM, good overall speed, and the ability to run entirely separate from your main program loop - with bigger ones, you can even make a software core in remaining FPGA fabric or get one with an integrated hardware core so you don't even need a different chip. The strength of FPGAs is parallel data processing in a fully configurable manner..... and that really sounds like what you need unless you want to lay out a ton of discrete logic or spin your own ASIC.