For example I worked with:
iCE40LP384 is 32pin qfn, 384 LEs, 1.2V core, 3.3V IO, $1.5 at single quantity.
iCE40UP5k is 48pin qfn, 1.2V core, 3.3V IO, 5280 LEs, 128kB SRAM, 15kB BRAM, 8xDSP, PLL, 2xI2C and 2xSPI hardened on chip, $6 single quantity.
Free ICECUBE2, RADIANT, ICESTORM (open source) dev tools.
If I buy it here it is $2 for single. LCSC doesn't carry it themselves but I can buy it through them from Mouser or Digi-key.
As of the dev tools, is there any low cost or free interface hardware, or can Lattice tools borrow another vendor's debug pods? (I kind of wish CMSIS-DAP would take off as the universal USB to JTAG interface here. While it is designed by ARM it is fully open source and vendor independent, and can work on almost any USB-capable microcontroller.)
The problem with these parts is that they are SRAM-based, meaning you will need some sort of external storage to boot these things up, while MAX10 is flash-based so it's (almost) instant-on and requires no external components. The good news is that ICE parts (like most other SRAM-based FPGAs) can be boot-strapped by the MCU if you happen to have it onboard, this allows to consolidate storage (or simply use MCU's flash memory to store bitstream by incorporating it into MCU's firmware). These parts to have OTP memory, but this is unsuitable to me as I want to be able to reprogram them.
These small MAX10 guys would be the perfect choice for me if Antel would for once use common sense when designing pinout such that VCC and GND balls would be in the middle of the device (like pretty much all other BGAs I've ever worked with, except perhaps DDR3, but that's a different story) allowing using small planelets with a single via in between balls.
I am okay with SRAM-based FPGA at least on this board and that specific project, if the FPGA can be configured through slave SPI interface and the same pins can be used after configuration is loaded.
For this board there is that 16MB QSPI chip attached on the micro. I doubt I would ever really have that many code or resources on a 32-pin micro, so it does seem natural to put FPGA bitstream in QSPI. I can just set up QSPI in XIP mode and start a DMA memory-to-peripheral transaction to dump QSPI contents into the FPGA-facing SPI port - no CPU involvement here even while the CPU is starting up other hardware and waiting for USB enumeration.
For the other board the GigaDevice chip there has a peculiarity: the latter 64kB half of the Flash on the 128kB device is not SRAM shadowed (they use QSPI internally) so it is rarely used as code storage. I just figure out a way to convince the MCU to shadow the first 64kB fully, then start a similar DMA transaction dumping the latter half of the Flash into the FPGA.
The iCE40LP384 requires min 8kB bitstream flash (or reads the bitsream off MCU, or OTP).
The iCE40UP5k requires min 128kB bitstream flash (or reads the bitstream off MCU, or OTP).
The nice thing is the bitstream flash is accessible off the iCE40 (via the boot SPI pins) so the FPGA usercode can read/write its own data into the bitstream flash.
I believe none of those can fill up a 16MB QSPI chip here, so set up XIP and start DMA dumping.
However for my other project I do have a 64kB bitstream image size cap. Maybe I should just run libz on it and decompress it into the FPGA when MCU boots?
I second iCE40LP384-SG32. I've used it in various projects, mostly for timing.
It requires 8kB of config data, easily fits in a modern ARM CM chip.
Since the config bits are highly sparse in nature, you can implement some sort of simple compression scheme to easily shrink file size by 20%~50%.
I've done booting from MSP430FR2433.
The bootloading protocol is very simple and is fully open to the public,as described in Lattice TN1248 document.
Maybe I should evaluate that chip some time then. How do I find the debug pods though?