Back in the 80's radio shack /tandy produced two lines of computers
The TRS-80 model 1 series
You had the Model I, Model III, Model IV, Model 4P
All have dual port video memory
The Model I & Model III used logic to create video out.
The Model IV & Model 4P used a 6845 80x24 is hard to do in logic, with a 6845 it is a lot easer.
All could have a memory map starting a 0 of 12k rom, 4k I/O(video & keyboard in here), 48k ram.
The 4 & 4P could change memory map to run CP/M and had option for 128k.
The 4P did not have the 12k rom and instead loaded a disk file to act like a rom.
Some of these used 16kx1 dram so 3x8 chips. when 64lx1 cost dropped was cheaper to use 64K chips and not use 16k
Cheap low cost logic used in these with pal's used in later versions.
The model II series
Model II, Model 12
The Model 16b was a model II with added cards containing a 68000.
The Model 6000 was a Model 12 with 68000 cards.
These were built using the add on Z80 chips( dma, sio, pio ).
Used a card cage to add more boards and could have added memory boards.
Not a down to a cost system.
http://www.classiccmp.org/cpmarchives/trs80/Library/Manuals/Hardware/The Service Manual or Technical Reference Manual shows the circuits and describe how the sections work.
The LOBO MAX-80 could run most Model III & Model IV software and CP/M-3.
128k ram, 4 x 5.25" floppies, 4x 8" floppies. (single or double sided and single or double density). SCSI-1 interface. Only one rom in system the 512 byte boot rom.
A ram chip was used instead of a character rom, so a second block of dual port memory.
The manual for this talks about using the Z80 with a banked memory.
A little reading of above could save you a lot of time and help you build something better.
The 6845 data sheet is not bad about explaining character based video and even using this chip for graphics displays. Newer designs use a micro controller or fpga.
Fixing the timing of signals like /MREQ is not hard. Putting a signal through two inverters in series gives you a time delay. There are time delay chips where you have taps that have a fixed delay. You can also use clocked logic to do this.
In the dynamic ram examples There is a 2 to 1 buss switch on address lines while generating new /mreq delayed signals(/ras,mux,/cas). If you look at examples using the 64kx1 dram you will see a switch on A7 to extend Z80's 7-bit refresh address to 8-bits.
When you add bank switched memory or a memory mapper, both take time to function. The greater the delay the faster the down stream memory speed to keep the Z80 at same speed.
The 256kx8 chip is cheap & fast. nothing says you need to use all the space.
To have 32k blocks of memory you are replacing the Z80's A15 with a new version. with one 8 bit output port you could have 4 bits to specify the address of the lower 32k of memory and 4 bits to specify the address of the upper 32k of memory. This would let you have 32k blocks from 512k of memory. To this you would need to add a way to have some common memory. You could use two 8 bit output ports and you would gain 4 more bits for addressing at the cost of one I/O writes to change both banks. The Z80's A15 would be controlling a 2 to 1 switch like the 157. This adds two chips for one 8-bit port and four for the two 8-bit ports. You would be adding the 157's time delay for switching inputs to the Z80's memory access time.
If you change to 16k blocks you need more output ports and need to use a 4 to 1 switch.
With 8k blocks a 8 to 1 and with 4k blocks a 16 to 1 (74150) switch.
More and more chips are needed.
The ram chip is the 8-bit output ports and the 16 x 1 switch in one package.
The difference is the time delay of the 16 to 1 vs ram delay and saving a lot of chips.
Now in place of using a small ram for this you can use a larger ram. You could strap the extra address lines making large ram small or you could connect these to a 8 bit latch. This 8 bit port could switch between 1 of 256 different memory maps with one I/O output. Changing between banks like in picture is just one I/O output.
On power up you could load this ram one time. To get a 4k common at top of memory you just make the same entry for that block in each map.
You have one very fast ram. One 244 to connect the Z80 data buss to the data inputs of the ram chip while loading the maps. Using /IORQ to write the map prevents the expanded memory from responding as expanded memory uses /MREQ.
On power up the contents of this ram is unknown. If power-up prevents ram /OE and ram data lines have pull-up resistors, the first 4k of the Z80's memory will be mapped to the highest 4k of expanded memory.
To increase the expanded memory by 8-bits would require one additional very fast ram chip and small change while loading maps.