When we say '1b0', what we are saying is if you look as the serial output on a scope, it means you will see:
...HxLHxLHxLHxLH.... and so on...
The 'x' will be set high or low according to the data you wish to transmit.
The 'H' will always be high and the 'L' will always be low. Every transition L->H looks like a positive edge clock. If you were to lock your scope on this rising edge, you will see a clean data bit right after 'H'.
So, if we choose a 24bit parallel to serial 1 bit SERDES transmitter, the 24 bit on the parallel input will look like:
"1x01x01x01x01x01x01x01x0", where X would be an 8 bit word...
But, the receiving end needs to know which 'x' is the true first bit to begin.
For the DLL/PLL clock input, we will use the DLL where at every '01' transition, it will divide that by 2 making a clean 125MHz reference clock as the 'x' data bit after the 1 will be random noise. A simple PLL (Spartain 6 has one of those) can create a single or multiple clock outputs by multiplying and dividing the source clock by a set of fixed integers, like 1,2,3,4,5,6,7,8,9...1023 and also deliver you optional multiple output offset phases. I'm assuming the Spartain 6 DLL is a bit simpler than it's PLL as it can only multiply by integers of 1,2,4,8,16 and offer something like 4 or 8 optional different phases.
Ok, back to you bandwidth requirements:
125 bits x 10 000 x 200 = 250 000 000. No room for anything.
Let's say we go for 130 bits, 1 start bit, 125 data bits and 4 stop bits.
(130*10000*200)= 260 000 000 baud, x4bits = 1040mbaud. The speed limit of the OSERDES2 for the Spartain6. (I'm using x4 instead of x3) This means H[xx]LH[xx]LH[xx]LH... The [xx] need to be the same, IE 1 bit value, they are just twice as wide instead of using the x3 pattern. This means compatibility with the DLL saving your PLL for something else if needed.
Every time you get a start bit, start your next sample, while feeding through the previous data appending your previous sample to the end of the stream being fed from your previous board to the next board.
Will this work for you?
Each board will sample in parallel with an approximate 5-10ns delay + cable length from each other since they need to decode the serial stream looking for that first 'start/go' bit. So, board #200's sample will be delayed by ~100-200ns. Though, with additional coding, you can counteract that 10ns by predicting the start bit's arrival because of it's perfectly repetitive nature. Basically your internal 10lhz clock will be set to begin sampling early by the 4-8 clocks on the 260MHz side it takes to see the 'start/go' bit. I'm not sure how you will deal with the cable length, but with a 10khz sample rate, I don't think a global 200ns offset can be interpreted.
If everything is ok, these are my recommended next steps:
Design a SystemVerilog test-bench which will synthesize your master board's serial data chain. Then when you begin coding for your Spartan 6, you will add that Spartan 6 code in your testbench, feeding it your custom 125bits serial input and see if your FPGA will lock onto your clock data and create a new internal clock from it while decoding and passing all the data though.
Then you can append your own Spartan 6 temporary dummy data onto the stream.
Then you can modify your Spartan 6 code to synthesize it's own master serial data option to replace your test-bench's beginner stream, basically clocking that Spartan 6 from a regular crystal with an IO pin set high or low to define whether it will run as a slave serial input, or run as the first master board from a crystal oscillator input.
Next, add multiple boards of your Spartan 6 code to your testbench, chained together as if wired in real life to verify each board adds it's own data into the stream without errors or missing bits and verify the phase of your internal generated 10khz sampling clock.
Then you may add you data acquisition sampling IOs to feed the true data into the chain. (This will be a separate testbench just to verify you sampler connections as you already did the com, then you may merge the 200 board setup with the sampling IO version if you like.)
The goal it to create your entire 200 board system in something like ModelSim (So long as Xilinx has it's DLL and OSERDES models for ModelSim, or, whichever simulator Xilinx uses.) and see the entire board-boars system power-up and function.
You want to test everything before even creating a schematic so you know what you build will work.