Each frame has an ID that identifies it in its 10th pixel, which is one of the dead ones anyway (there's a pattern of dead pixels, by design).
The first frames have the following IDs:
frame 1: 4
frame 2: 9
frame 3: 8
frame 4: 7
frame 5: 10
frame 6: 5
frame 7: 1
frame 8: 3
After which, you'll periodically get ID 6 and then followed by ID 1 along with a shutter noise. These frames are similar and they serve as offset calibration.
After the offset calibration frames, you'll receive a number of ID 3 frames - actual images.
So, after the initialization sequence, you'll get something like this:
6, 1, 3, 3, 3, 6, 1, 3, 3, 3, 3, 3, 6, 1, 3, 3, 3, 3, 3, 3 etc.
Edit: Typo
Edit 2:
I think you're right. I'm only guessing but I think that those two frames can be used to derive the gain variation. From what I'm reading micorbolometer arrays don't have linear gain, which means that the gain has to be compensated differently based on the value being read.