Welcome to the forum.
Q1: The output (Q) does not reach 5V; it stays around 4V when high. How can I get this to be 5V?
With this output style you can’t. Q4 is a
voltage follower (common collector configuration) and it’s emitter will not go above supply voltage minus B-E junction drop. That is approximately 5 V - 0.6 V = 4.4 V in your case. If any noticeable current is drawn from output, R4 and D3 will cause an even larger drop.
As mentioned by ledtester, this isn’t a problem. Your circuit actually does well. TI’s datasheet for SN7400 gives typical 3.4 V output, at 4.75 V supply and 400 nA load.

Q2: Looking at the attached waveforms, I have one input tied high and the other input fed with a square wave. As seen in the waveform, the output does not keep up with the input at higher frequencies. What could be causing this, and how can I improve the response?
By turning it into an integrated circuit.

Or by using a different thing.
BJTs exhibit some capacitance and it takes time for that to discharge. In particular a saturated transistor takes tremendously long time to turn off.
(1) To avoid that problem, TTL used
Schottky transistors in 74S*, 74LS*, 74AS*, and 74ALS* families. Nowadays you can see mostly 74HC* and 74HCT*, which should give you a hint: there are limits to the architecture even with Schottky hacks.
But what is more important: implementing integrated circuit designs in discrete components may yield poor results. It’s nice for leaning basic concepts. It’s great for understanding inherent limitations of the circuits, and how the actual implementation affects observable behavior of an IC. But there is are major differences, which leads to discrete circuits having poor performance or — in particular with more modern designs — not work at all.
One crucial thing is timing. Signals don’t travel instantly. The ultra-short distances you get on a silicon die are a part of the design and you can’t get around it with wires that are orders of magnitude longer. Parasitic inductances and capacitances are also very different in those scales. Another huge difference is that the diagrams are meant to give a general idea of how the circuit behaves. To start with: in the actual 7400, on which this diagram was based, Q1 and Q3 are
one transistor with two emitters. This alone changes its characteristics. And then we have all the magic and weirdness of things implemented in silicon.
Here’s a beautiful
shot of the 1965 SN7400 by “Mister rf” of Wikimedia Commons. Each quadrant is a single NAND gate. In each, connections on sides go from top: output, input, input. In bottom-outside of each gate you can see inputs go to a single element: this is the BJT with two emitters. Then the collector is the lower line going towards inside and then turning back. What it reaches is two huge
Darlington pairs. One would roughly correspond to Q2+Q6 in your diagram, the other would be Q4… except that Q4 is in the actual die an exact copy of Q2+Q6, and both have much higher amplification than the transistors in your diagram. And you can’t get the same performance, at least not in terms of bandwidth, as you’d squeeze out of that microscopic structure.
As a bonus: here is a
74AHC* series version (CMOS) from Zeptobars.
(1) Relative to the time scales we are talking about here.