I'd throw in that one:
https://github.com/yol/ethernet_mac/It's VHDL though. But generic enough to run on Xilinx/Lattice, etc.
There are also some more exotic implementations using Forth driven stack machines, like this one:
https://excamera.com/files/j1.pdf. Some inspiration you might also get from the Lattice EVDK reference Gbit streaming reference design.
If you're used to standard uC DMA approaches, you can get a pretty good performance with the above core plus a DMA auto buffer packet FIFO, as described here:
https://section5.ch/index.php/2017/05/10/dma-autobuffering-techniques/.
With the GMII interface you're on the safe side with plenty of matching PHYs, only RGMII was found to be a bit tricky with regard to DDR capable I/O. So if you're routing your own board, you might want to cross check with existing refdesigns.