The reason I've done it this way and not your suggested way is because I didn't want the MMU's select line sitting behind another 138 (with the associated additional propagation delay that would introduce) and it would still be sitting behind the ~M1 line which would still be attached to U5.
Think a second on critical path
MMU is used for memory
The critical time path is Z80 A14 & A15 to Q1-Q4
so A0 -A13 go direct
A14 & A15 in the form of Qn are delayed by 74LS670
Read time is the critical path.
Z80 I/O write you have a lot of time.
So MMU being connected to U13 makes no difference for a memory cycle.
Think of 74LS670 being a memory chip
Read mode is used during the memory cycle.
RA & RB are address pins used
Qn is data outputs used
Gr is the OE
When updating the map you use write mode
WA &WB are address pins used
Dn is data input
Gw is write, chip select and address select input.
If you were using a cache memory chip
A14 & A15 & memory read time to data output is critical time.
the CS & OE are active all the time except for I/O write.
The other address inputs to this memory chip can be strapped to a level and you would have one map like the 74LS670
If you connected other address inputs to the output of a chip, these are still static for memory cycle. But if you change output of chip in a I/O output you can switch between many different maps with just ONE i/o out.
So looking at your last
Y7 connection works fine with both chips.
When connected to U13 the MMU uses one output address a savings of 7.
So drawing looks OK
Think you know that U5 is working now.
so only changes are U12 & U13
To help with future logic designs, not to change what you have now.
U12 version with WR could be a 138 like I stated.
You could still have your special U12 that adds M1 connected to a different output
Now if you were trying for max speed, you would need to look again at the Z80 I/o chips. CS for these chips is only address select.
For PIO
The internal control logic synchronizes the CPU
data bus to the peripheral device interfaces
Note the PIO has no WR pin
The chip gets the Z80 clock and by using the other inputs from Z80 can build an internal copy of what the missing Z80 pins would do.
the WR is an internal PIO creation.
the WR is an internal SIO creation.
the WR is an internal CTC creation.
These chips use a directly connected IORQ to qualify the address.
So if you want to plan for the future and higher Z80 clock speed
You want an address only decoder for all the Z80 I/O chips.
Each use 4 addresses
A2-A4 connected to 138 gives this
A4 when connected to E2 then gives a Low address A4 block of I/O
A4 when connected to E1 then gives a High address A4 block of I/O
Then by how you connect A5,A6 & A7 determines if you have one block or many mirrors of the blocks in I/O address space.
Back in the old days, You would have many boards connected to a buss. Each board would have it's decoder chips.
Most had jumpers that you strapped to set the boards address
A 138 could be connected to A5,A6 & A7 and you would connect one output of this decoder to one of the inputs of the four address decoder.A real pain in the ____.
Better designs used a MAGNITUDE COMPARATOR like 74xx85
This change lets board designer put a dip switch on the edge of the board to set the board's address range.
The next step up from this is plug & play
There were many different versions
Some were based on what slot board was connected to. You could think that each slot had a output of a 138. The problem with this is some boards need a lot of addresses an some one or two addresses.
Then it would be nice if the CPU could know what was plugged into a slot.
Today a version of SPI is used to find out what is connected and to configure it.
So you need to make baby steps while thinking of future paths this might go.
The MMU is as simple as it can be for what you have.
In future it can be replaced or added to supporting your needs.
Like wise your decoder can be used as drawn or some changes for future.
The most important thing is TIME
Your RAM or ROM must have the proper data when Z80 reads it and must capture the Z80 data on write properly.
To check this you match data times then work back in time to see if some time is too short.
At some Z80 clock speed your circuit and memory will be too slow unless you add wait states.
And you should note that wait states is not a cure for every thing.
Wait states for example do not change time from when the data outputs of a device turn tri-state and when Z80 uses data buss for output.
Keep in mind that the Z80 does not have to run at an even MHZ. A Z80 can be faster with a little slower clock if memory changes from 1 wait state to 0 wait state.
A wait state is a step in time.