Do you have any experience debugging SerDes inputs? Apologies for the lengthy post, this is proving a particularly difficult bug.
This concerns a high speed parallel DAQ. Right now we are getting erroneous output data when sampling a sinewave (attached image).
We have 14 bits coming in from the ADC, serial, DDR at 580 Mb/s. Xilinx SerDes IP cores implement 1:7 decoding, we then implement 7:14 demultiplexing. The data from this is often stable, outputting correct portions of the input sinusoid (see attached).
The system 'bitslips', until the framing signal is correctly locked, i.e. '01111111000000' is bitslipped by one to obtain '11111110000000'. This works and remains locked for greater than 10 mins.
The user demultiplexing appears to work fine, as we get portions of the sinusoid and the boundary between the top nibble (7 bits) and bottom nibble (7 bits) seems to also work fine.
In the output sinusoid, we get periodic amplitude jumps that happen to be exactly 2^10, i.e. 1024 from an erroneous bit toggle on the 10th bit. Rather than being a single sample glitch, this lasts for approx 50 samples. Meanwhile the other bits correctly switch, allowing the shape of the sinusoid to be correct within that 50 sample period. The output sinusoid therefore looks to have 2^10 amplitude jumps within it (see attached).
Now, the 10th bit is in the middle of the top 7bit nibble. As the other bits of that nibble are correct, it is clearly not an issue with 7 bit to 14 bit time demultiplexing. Likewise if it was PCB trace routing delays, it would be more random rather than periodic and would probably effect other bits. It would also change with different physical constraints and firmware re-builds as logic is moved around on the FPGA, which it doesn't. I've checked the bitslip functionality and it appears to be fine. The ADCs are definitely configured, and their clock (160 MHZ) is stable.
As many of the bits are correct, and indeed bit 10 is often correct for 100s of samples, I don't think it is SerDes singaling (timing or voltages). I'm using a test widget to pretend it is the synchronisation module, allowing the RAW FIFO to be setup and to grab some data. With test data this is perfect. The test widget starts RAW acquisition 800-ish clock cycles after the widget is enabled from the embedded software, and crucially grabs data into the FIFO just once, hence no problem with data writing when we are reading the FIFO etc.
Now, there are other things going on. For example the input sinusoid has an amplitude of 200mVac and frequency 200kHz with signal generator DC bias set to the same as the ADC's natural DC biasing point. However with no change in hardware, if I increase amplitude to 400mVac, I get almost a flat-line with +/-200 DN noise, no sinusoid!
The issue doesn't seem to go away if we reduce the ADC speed, however I suspect it is not LVDS SerDes signalling anyway, but somewhere in the 1:7:14 deserialisation process. I've re-coded the 7:17 user demultiplexers for more robust timing, demultiplexing in time rather than two 7bit registers with asynchronous multiplexers. I.e. increasing the demultiplexing action from tclk/2 for the case of asynchronous multiplexer CLBs to the full tclk for synchronous 7:14 demultiplexing.
Now, there are other issues, one being single sample glitches, the other being another periodic bit issue with the 14bit signing bit (MSB), hence the offsets that go from positive to negative etc. But I expect those are symptoms of the same issue, i.e. fix the issue with bit 10 and bit 13 is also likely to be fixed.
Any pointers would be much appreciated. This is truly doing my head in...