As best as I can understand it, each box containing "tD" means that it represents a t-stage shift register, in which each signal that enters, comes out t cycles later. You do need to know how the registers are initialized by a reset, which should be specified somewhere in the document. I don't know what the small "D-4" under the unequal delay section means: maybe "D" is itself variable, so the length of delay changes based on the frame's position in the sector?
Do these delay lines mean, that I must take "delayed" bytes from following F1 frames?
Contrariwise, the symbols which pass through the delay line are used
by later frames, which mean they must come
from earlier frames than the one being encoded. This is made clear by the penultimate column of the chart, which shows the word of the audio data from which each byte of the sector derives. For example, byte 18 says "W12n+6-12(18D+1),A", which tells you that it comes from "W12n+6,A", which derives from the input word "L6n+3", after being delayed by "18D+1" frames.
Note that there is an obvious typo in the chart: the last four rows should say "/
P12n-12", etc.
And what about last 98th frame, which has no following frames?
This should also be explained in the document. Either the registers are reset, throwing away the symbols in the pipeline, or they continue to flow into the next sector.