On a side note:
I designed a TTL computer which works fine up to 200KHz.
I had a look. You are exceeding the capabilities of the 555s. Return them to the kindergarten and replace with a crisp, clean, 50:50 square wave. I think you could much more than double the operational clock speed without touching anything else. The Nano's hardware timers
are a convenient source of up to 8MHz square waves, if you swap some pins around.
To return to topic, during design of such systems, you should construct a timing budget, similar to the one Dave showed in the video. Instead of focusing on wire delays of picoseconds, you concentrate on gate delays of tens of ns. For example, at Vcc=5V±10%, the HC04 takes as much as 14ns from input stimulus to output response. The HC00 can take up to 27ns from input to output. Counters and ALUs take even longer to propagate, and also have setup and hold times relative to the clock which must be satisfied. Memories can take yet longer. All of these delays add up.
You already have a design, so instead you could compile a timing report, tracing each path through the design, starting at the rising clock edge, through the outputs triggered by that clock edge, and propagating outward and around through each other device and its consequent delays, until you've reached the next clocked input. You can mostly treat a bus as a single signal for this purpose. The path with the longest time is called the critical path, and it determines the maximum speed of the design. As a repayment for your efforts, you may find that some components and the delays that come with them are unnecessary to satisfy setup and hold requirements, and only waste time. The inverter on 8B1B's clock input, for example.
Assuming your construction is clean and neat, and the bus series resistors (edit: and LEDs) aren't dragging it down, 1MHz or better operation should be easy to achieve with just minor tweaks.