Or, by switching to dots mode display while turning off the sin(x)/x interpolation, and observe if the preceding bump disappears.
I usually prefer "dots mode". See what the scope captures, and I'll interpret it

Drawing the lines can introduce some subtle errors that will catch out the unwary. Besides, you need dots mode for eye diagrams and any of the "equivalent time" sampling modes.
I removed resistor R2 and added a resistor R3 between the generator output PA6 and the inverter input U1 pin 2. The rise time decreased from 960ps to 820p/780ps. There are no measurable differences in the rise time on my scope at different supply voltages of +3.3V and +5V.
ATtiny_Pulser_3.0

Yellow line +5V and Red line +3.3V (change Scale)

RiseTime +5V. Display point representation and smallest timebase setting
I don't have a teensy, but...
How are you changing the state of the outputs?
If you're calling "digitalWrite()" four times them they won't all switch simultaneously.
You need to write code so that a single instruction switches all of them on in the same clock cycle (would be easy on an Arduino but I don't know much about the chip on that thing)
I don't have a teensy, but...
How are you changing the state of the outputs?
If you're calling "digitalWrite()" four times them they won't all switch simultaneously.
You need to write code so that a single instruction switches all of them on in the same clock cycle (would be easy on an Arduino but I don't know much about the chip on that thing)
Code parallel out for Teensy4
// 100KHz Square signal on Teensy4 with 4 Outs
const int my_pulse_pin23 = 23;
const int my_pulse_pin22 = 22;
const int my_pulse_pin21 = 21;
const int my_pulse_pin20 = 20;
void setup()
{
int DRIVE_STRENGTH = 7;//ein Wert von 0 bis 7. (3 = 53R 3.3V)
int SPEED = 3;//ein Wert von 0 bis 3.
pinMode(my_pulse_pin23, OUTPUT);
pinMode(my_pulse_pin22, OUTPUT);
pinMode(my_pulse_pin21, OUTPUT);
pinMode(my_pulse_pin20, OUTPUT);
CORE_PIN23_PADCONFIG |= 0xF9;
CORE_PIN22_PADCONFIG |= 0xF9;
CORE_PIN21_PADCONFIG |= 0xF9;
CORE_PIN20_PADCONFIG |= 0xF9;
*(portControlRegister(my_pulse_pin23)) = IOMUXC_PAD_DSE(DRIVE_STRENGTH) | IOMUXC_PAD_SPEED(SPEED);
*(portControlRegister(my_pulse_pin22)) = IOMUXC_PAD_DSE(DRIVE_STRENGTH) | IOMUXC_PAD_SPEED(SPEED);
*(portControlRegister(my_pulse_pin21)) = IOMUXC_PAD_DSE(DRIVE_STRENGTH) | IOMUXC_PAD_SPEED(SPEED);
*(portControlRegister(my_pulse_pin20)) = IOMUXC_PAD_DSE(DRIVE_STRENGTH) | IOMUXC_PAD_SPEED(SPEED);
}
void loop()
{
uint32_t pattern = CORE_PIN23_BITMASK | CORE_PIN22_BITMASK | CORE_PIN21_BITMASK | CORE_PIN20_BITMASK;
int state = 0;
while( 1 ) {
asm volatile( "@ --- writes happen here" ); // helpful for navigating generated asm
if( state ) {
GPIO6_DR_SET = pattern; // turn pins on with SET register
}
else {
GPIO6_DR_CLEAR = pattern; // turn pins off with CLEAR register
}
asm volatile( "@ --- done with writes" );
state = !state;
delayMicroseconds(10);
}
}
Looks like you made the edge even faster than in the previous version! 
Thank you for posting the results. I've seen something similar with a 7414 in a video from W2AEW, where he's using all the remaining gates to drive the output. https://youtu.be/9cP6w2odGUc
Seen that video some time ago, and for some reason I was under the impression that he used only one resistor, and that's why I was insisting for only one resistor. Now I see he used a resistor for each gate. Not sure where I've seen the 7414 gates in parallel and a single resistor, or if that was just a test idea that came while watching his video years ago. 
Maybe it was something that I only assumed it will work better, or maybe I've seen it in some other schematic, I can't tell. Writing these because, now I'm puzzled. Why did he used a resistor for each gate? Maybe there is something that I'm missing. Maybe a resistor for each gate makes the edge even faster, or maybe it doesn't matter, I don't know. The thing is, now I'm not that sure about my reasoning with the series parasitic inductance for each R. 
7414 has 4 gates, 1 used used as a driver for the other 3 and a resistor on each gate for the output is to provide a 50 Ohm source.
I'll rebuild it again and solder 150R resistors to each output. I'm curious...
I'll rebuild it again and solder 150R resistors to each output. I'm curious... 
Try also to match output trace length best you can.
.........this is where things get interesting, the pulser I had drawn up in Altium had a trace length matching option, 3 squiggly lines to the output.
Never built it 'cause as previously mentioned I had no gear to prove its RT other than deduct it from scope datasheet RT which is always gunna be worst case scenario.
You can spend hours and hours on these so that's why Leo's pulser is actually such a good deal.
I'll rebuild it again and solder 150R resistors to each output. I'm curious... 
Best if the resistors are non-inductive, e.g. are short and don't have a spiral or serpentine internal construction.
Ditto the decoupling capacitor.
I'll rebuild it again and solder 150R resistors to each output. I'm curious... 
Yes please!

That is what I am very curious, too, just that I didn't dare to ask you to do that. I would have tried it myself, only that my fastest scope is a Rigol DS1054Z (100MHz and 1GSa/s), too slow for such experiments.
I do have an Adalm Pluto SDR that can be used as an improvised spectrum analyzer up to 6GHz, but it is still not high enough I guess. I'm not sure if 6GHz would be enough to spot any difference, particularly because its ADC is only 12bits, so the amplitude measurements won't be very precise.
The idea was to look at the spectrum of the pulses with the SA, and to write down the amplitude of each harmonic. Then, to compare the spectrum amplitudes for different versions of the schematic, and see which schematic has the strongest harmonics. (Because the fastest an edge is, the more reach in harmonics its spectrum will be).
Eventually, starting from the amplitude of each harmonic, it should be possible to apply a reverse FFT in order to reconstruct the time domain pulses, and measure the edges raise/falling time on the reconstructed signal in the time domain (TD reconstruction being possible by assuming the phase - which is not measured by the SA - is always according to an input signal with perfect edges).
Forgot to say, regarding the circuit layout for very high speed signals, you may want to watch this talk (a teaser for some paid classes for high speed layout from Altium):
https://www.eevblog.com/forum/chat/fun-for-nerds/msg5583117/#msg5583117Please ignore the video title, and the little "rants" he goes into, here and there. The title might be misleading, because the entire talk is not about grounding only. There are more key aspects, and principles disclosed there (regarding high speed layout). That video doesn't dive into details (he always says "we will address this later", meaning in the payed classes). I didn't took the payed classes, but even so, by watching that video and a few more from the same guy, I did find a ton of good advice and revealing explanations. See if you find those videos useful. (There is also a book mentioned there, and many guidelines come from that book).
Eventually, starting from the amplitude of each harmonic, it should be possible to apply a reverse FFT in order to reconstruct the time domain pulses.
You need not only magnitue, but also phase of the harmonics (relative to each other) to do that. A swept SA cannot measure phase, and a RTSA has usually not enough real-time bandwidth to cover all harmonics of interest. So the best estimate you can probably calculate from the magnitude spectrum alone is a minimum phase response. The true shape may still differ.
Yes, the phase is not known. What I wanted to do was to assume the edges are perfect, and to consider the unknown phase as if the phase at each harmonic was the phase corresponding to perfect/ideal edges. Then, starting from the ideal (assumed only) phase, to put ideal phase together with each measured amplitude of the harmonics, and to reconstruct the TD shape of the signal.
It won't be very accurate, but I guess it should be good enough, the goal being to decide which schematic produces the fastest edges.
Is there any reason I'm missing, such that it will make the TD reconstruction impossible (or very hard) in practice?
Asking because I didn't try that idea in practice, and my DSP skills are not very good. I only guess that idea should work.
Eventually, starting from the amplitude of each harmonic, it should be possible to apply a reverse FFT in order to reconstruct the time domain pulses.
You need not only magnitue, but also phase of the harmonics (relative to each other) to do that. A swept SA cannot measure phase, and a RTSA has usually not enough real-time bandwidth to cover all harmonics of interest. So the best estimate you can probably calculate from the magnitude spectrum alone is a minimum phase response. The true shape may still differ.
Going the other way is easier, but points out how it can be done.
Differentiate the time domain signal and then FFT it. Now the FFT magnitude and phase information returns the bode plot of the system response. Phase in the time domain is handled by aligning the trigger point with the midpoint of the edge. This is a way to turn an oscilloscope into a network analyzer, and make the measurement in one shot.
Comparison RiseTime measurements with +3.3V and +5.0V
The circuit was connected directly to the oscilloscope input with a 50R termination resistor
Measurements with +3.3V

Measurements with +5.0V

Circuit
I soldering a new version of ATtiny_Pulser with 3x 74LVC1G14 chips
1.Step

2.Step
I soldering a new version of ATtiny_Pulser with 3x 74LVC1G14 chips
1.Step
(Attachment Link)
2.Step
(Attachment Link)
Second side of PCB You can use for decoupling capacitors. From one perspective, chips will have shorter way to energy from there. From other hand, it will create additional capacitance from one side to another one.
I made another small change. The ATtiny412 microcontroller now has a stable 16MHz crystal. The measuring output now has a 100KHz square wave signal.
It is supplied with a voltage of +5 volts via the contact strip. There's also a pin for the programming adapter.
I bought an old pulse generator with a fast rise time (40ps BNC version) from Leo Bodnar. I'm curious...
(Attachment Link)
Where do you find software configurations app for Leo Bodner pulse generator with a fast rise time (40ps BNC version) ?
Where do you find software configurations app for Leo Bodner pulse generator with a fast rise time (40ps BNC version) ?
Click on the download software on this
page.
Thank you.. I ordered it on 1.May and I hope it arrives soon..
Btw., what is the output impedance of those G14s?
Hi,
This is the 50 ohm CH1 of my LeCroy LA314.

.
The signal comes from an Analog Devices HMC363S8 DIVIDE-BY-8 DC to 12 GHz MMIC I built as a probe.
Input to the divider is 1 GHz sine. Output is 125 MHz 100 pS transition time.
The LA314 input is given for 870 pS which seems realistic here.
Renaud