Question is -the topic of this thread- is it worth it to spend the effort if I cannot -realistically- reuse old PC ISA cards?
(I guess not)
Interfacing SDRAM to an FPGA is not particularly difficult, there are lots of controllers out there already implemented that you can use in your own projects.
There's a *LONG* history of graphics coprocessor cards, e.g. on the PC, TMS340 based TIGA cards, or the IBM 8514/A card, and its ISA bus clones. However its a *LOT* of work and a fairly deep rabbit hole to develop one for the Z80 from scratch, especially 'bare metal' on a processor you aren't particularly familiar with. IMHO your efforts would be far better spent developing a hosted Z80 system, similar in concept to the BBC micro Tube second processor interface with the Acorn or Torch Z80 units, or the Microsoft Z80 SoftCard for the Apple II. Of course it would be fairly crazy to build one now hosted on a legacy system, but small Linux SBCs with HDMI and USB are cheap and readily available, and using one as the graphics and I/O processor for your Z80 is really not so much of a stretch if you are already considering a STM32 based coprocessor.
Personally I'd probably go for a Raspberry Pi - just because of the popularity, stability and OS support, with a FPGA or a CPLD + a bunch of tristateable latches for the bus interface (or possibly four 74HCT40105 FIFOs to support byte wide queued inter-processor communications). Do initial development under Linux, on a full-featured Pi then decide if you are a rabid enough purist to write a Pi side 'bare metal version' to run on a Pi Zero fitted as a daughter board to your Z80 system.
The fastest possible interface for a CPU is writing to memory locations. This gives the CPU flexibility to move data in and out of video memory in any way it likes. It doesn't have to ask the GPU to do something for it by setting a handful of registers. Rather than writing a bunch of registers to tell the GPU to change a tilemap ID it could simply write a single byte to the location where that tilemap is stored. It gets the job done with less memory operations and it allows you to DMA data between main and video memory. This means the CPU spends less time interacting with the GPU, it just writes what it wants to show on the screen to RAM and gets on with other tasks. Keep in mind that the CPU rarely writes raw pixels to the frame buffer, instead its usually just updating tilemaps, palletes or strings. And the video memory rarely contains a framebuffer at all, it usually just contains some graphics assets in the forms of small icons and sprites and the instructions on how to put them together. The icons and sprites are loaded in once and then just the instructions on where to put them are updated. Then the GPU follows those instructions in an endless loop, generating the video output at 50 or 60 fps live.
As for 5V there are some older FPGAs/CPLDs that are 5V compatible, but level shifting such a small memory bus is not that terribly hard. As for a raspberry pi, you most definitely don't want to run bare metal, its way too complex and undocumented. But you can still just let it boot into a linux console and autorun your C++ app on boot. If you ever wrote software for windows then linux on ARM is not that much different. From there you can use OpenGL to get full hardware accelerated graphics with 3D support and programmable sharers. Very powerful but if you ask me sticking a modern day mobile GPU onto a Z80 doesn't make for a very satisfying retro experience because its suddenly capable of the 3D graphics equivalent to a typical early Windows XP machine.
Well thing is that software emulating a parallel memory bus is hard work. Even a slow 20MHz bus is actually pretty fast in terms of latency. Sure MCUs have interrupts but a modern ARM take about 10 to 20 or even more cycles to enter an interrupt routine because it has to context switch. So even if it takes 10 cycles and you have a 180 MHz CPU you can only enter an interrupt at a rate of 18MHz, but this doesn't include the time taken to also exit it so that the interrupt can be called again, halving it again to 9MHz. And this is without any code inside the routine, as soon as you add actual code that does something it will take even longer to complete the interrupt. So this makes it very difficult for a MCU to pretend as being SRAM on the bus. What is easier to do is to talk to the MCU over something like a parallel port since in that case the Z80 can talk to it however slowly it wants. But you could implement something in between by having a dual port FIFO chip for write operations to the MCU. That way the Z80 can barf data at it at full speed and get back to doing other things while the MCU works trough it as its own pace. So then only read operations would be slow.
All of this means you also can't really use a Raspberry Pi GPIO bus to directly connect to a Z80 bus. Sure the CPU might run at 1.2 GHz but its still not responsive enough because GPIO is on a slower peripheral bus, interrupt latency is really long and running linux means the CPU is constantly interrupted by the OS scheduler so response times are all over the place. As for getting OpenGL working there is a nice simple library for it called RayLib (https://www.raylib.com/) and you don't have to use it for any 3D, works for 2D just as well but all of that 2D is done in hardware by the GPU so its very fast.
On the other hand a FPGA could handle running a Z80 bus at 200 MHz just fine while simultaneously generating 4K video output and emulating a FM synth with 32 voices. Actually mplementing a simple GPU in a FPGA is not all that hard. Tilemaing, sprites and ROM text are easy, because its just a matter of looking up the correct pixels in memory. Things get more complicated if you want fancy features like the SNES Mode 7 effects and similar things from the 16bit era. These simple GPUs also usually have no features for drawing lines or shapes (They tend to get some of those abilities once they are for 3D ).
This may be of some interest:
http://tasvideos.org/5384S.html
Tl;dr is, the guy used a menu corruption glitch to get ACE on the GBC, effectively making the controller port a parallel data port to bootload code. This is not nearly enough bandwidth to stream video controller register data through the CPU (which would be the easiest way to play e.g. graphics from another game on the same console, simply playing back a log of register data obtained from an emulator). Instead part of the display routines were recreated, and display state changes were streamed. Essentially, this constitutes a very special purpose and high level data compression format.
Since you have so much horsepower in the STM, you might consider crafting a display command protocol that offers various powerful, rich and customizable primitives, and transmit display state change information instead. This would be built very much like modern GUI programs, where you write the main backend code which runs in one thread (which would be on the Z80), while the OS handles the graphics objects and rendering and interaction. (You'd still run interaction through the Z80 I suppose.)
The other stuff is a bit more complicated and is more specific to LCDs. The DBI mode i think stands for Data Bus Interface. This is similar to the 16x2 character LCD displays where the display has a SRAM like data bus. These type of displays have a built in LCD controller and framebuffer, so the display is connected to the main system bus just like SRAM, EPROM or peripherals and the CPU writes into the framebuffer RAM located inside the display and send it commands by writing into its registers. Its sort of a LCD with a built in "graphics card". This is mostly designed to be driven by slow CPUs without a built in controller.
The STM32F4 has a built in LCD controller that is capable of generating DPI RGB video directly out of its pins [..]
So yeah if you plan to make fast paced smooth scrolling games run on it [..]
But if you want to have the graphical capabilities of a DOS PC that SSD19xx chip sitting directly on the Z80 bus is all you need.
Ah yes you are correct the STM32F446 you sugested does indeed not have any LCD controller built in. So just use a different chip from the STM32F4 family that does.