Author Topic: 100MHz Serial Data Injector  (Read 1182 times)

0 Members and 1 Guest are viewing this topic.

Offline bauto601Topic starter

  • Contributor
  • Posts: 13
  • Country: nl
100MHz Serial Data Injector
« on: May 31, 2023, 05:56:44 pm »
Hello everyone,

For a project i'm in need of something that's a bit unusual i think. I've built an AMD Slot-A to Socket-A adapter but it seems like i have to change the CPU configuration that's sent from the Northbridge to the CPU on startup. The configuration is hardcoded into the chipset and can't be changed by bios unfortunately.

The signal consists out of 33 bits that are serially sent to CPU, synchronized to the PLL clock input of the CPU. This PLL clock runs at 100MHz, here is a scope shot of the signal:

(this signal decodes to: start bit - 001101100000010010000001001100010)

The signal transfer begins with a start bit (1), i'm thinking about to inject my own serial code after this start bit. Using the start bit as a trigger to intercept the package and replace it with an adjusted package. I was thinking about using a microcontroller for this but i'm unsure if there's even one that's fast enough to transmit data at this frequency. Do you guys have any ideas how/if this could be done? :)
 

Offline mon2

  • Frequent Contributor
  • **
  • Posts: 463
  • Country: ca
Re: 100MHz Serial Data Injector
« Reply #1 on: May 31, 2023, 06:14:51 pm »
Use a FPGA (recommended; Gowin or Efinix - low cost kits from Amazon / Aliexpress - Sipeed, etc.) or consider the RP2040 state machine. Otherwise, a microcontroller will not be fast enough.
 
The following users thanked this post: bauto601

Offline bauto601Topic starter

  • Contributor
  • Posts: 13
  • Country: nl
Re: 100MHz Serial Data Injector
« Reply #2 on: May 31, 2023, 09:57:41 pm »
Use a FPGA (recommended; Gowin or Efinix - low cost kits from Amazon / Aliexpress - Sipeed, etc.) or consider the RP2040 state machine. Otherwise, a microcontroller will not be fast enough.
The RP2040 looks like a relatively obtainable option. Well documented and cheap, i'm going to give it a go with this one  :-+
 

Offline PCB.Wiz

  • Super Contributor
  • ***
  • Posts: 1535
  • Country: au
Re: 100MHz Serial Data Injector
« Reply #3 on: May 31, 2023, 11:02:09 pm »
The signal transfer begins with a start bit (1), i'm thinking about to inject my own serial code after this start bit.
How many bits do you need to change, and how close are they to the start bit ?
It's hard enough to generate a 100MHz stream, let alone trying to 'switch live'.
You may be better to generate the whole bit stream, and inject all of the non-zero bits, start bit included.
A FPGA/CPLD looks easier to do this, than a MCU, as you need to sample on this clock, and control the phase of output, relative to that clock.

Is that 1V/div, so 1.5V logic ?
 
The following users thanked this post: bauto601

Offline bauto601Topic starter

  • Contributor
  • Posts: 13
  • Country: nl
Re: 100MHz Serial Data Injector
« Reply #4 on: June 01, 2023, 11:42:55 am »
The signal transfer begins with a start bit (1), i'm thinking about to inject my own serial code after this start bit.
How many bits do you need to change, and how close are they to the start bit ?
It's hard enough to generate a 100MHz stream, let alone trying to 'switch live'.
You may be better to generate the whole bit stream, and inject all of the non-zero bits, start bit included.
A FPGA/CPLD looks easier to do this, than a MCU, as you need to sample on this clock, and control the phase of output, relative to that clock.

Is that 1V/div, so 1.5V logic ?

I'll give a bit more context: This signal is used for the transfer of the CPU configuration data on startup, after that it isn't used for data transfer anymore but the signal will be asserted when the pc enters sleep mode and such. The signal is only driven in a single direction, from the northbridge to the CPU. It's part of the power management system, so it must keep functioning even after the data transfer. The signal doesn't seem to be timing sensitive, a delay of a couple of clock cycles won't be a problem. The signal is skew sensitive though, it must be closely aligned to the clock signal. The clock signal comes from the motherboard PLL and can fluctuate a bit between 99.5~100.5MHz, when overclocking an increase to 110~115MHz can be expected.

The signal voltage depends on the CPU Vcore voltage, which is around ~1.65v but can be increased to higher levels when overclocking. So it would be nice if there is a small operating range (something like 1.60~1.80v) in order to make room for finetuning the CPU performance. Overclocking capabilities will be a requirement for this project since the CPU will be fairly limited by the relatively low bus speed. (the cpu's that run in this adapter normally run at a 166~200MHz bus speed and have to use a fairly high multiplier to reach their rated clock speed with this 100MHz bus)

After the start-bit, the 21st bit needs to be changed from 0 to 1 which enables the Push-Pull databus drivers on the CPU. Without enabling these drivers, the CPU won't communicate. (ask me how i know...) I guess this is a legacy option from the Slot-A platform which used open-drain drivers. Which is why this bit is set to 0 on these older motherboards.

My idea is currently to implement a RP2040 into the signal path as following:
Northbridge -> RP2040 -> CPU

The RP2040 will read the signal each clock cycle on an input-pin and forwards it to an output-pin, functioning as a passthrough. The first time it detects a logic high (1, the start bit) it will still be passing through the received signal but it will start counting the clock cycles. Once it reaches bit number 21 it will always output a logic high (1) for that single clock cycle thus enabling the PushPull bit in the serial signal. After that it will function as a simple passthrough again. I don't know if this is completely possible in the way i described it with the PIO functionality of the RP2040. I'm currently trying to understand how they work and how they have to be programmed.

The second challenge will be the variable voltage level of the signal. I saw in the datasheet that the RP2040 has a dedicated voltage input pin for the digital IO. This pin could be connected to the Vcore of the CPU, this way the output voltage of the RP2040 will track the Vcore voltage. The RP2040 also has an internal register to adjust the logic IO voltage threshold for 1.8v operation, i'm hoping that this will suffice for 1.65v signals. The core of the RP2040 can be fed with 3.3v, that voltage is available on the adapter fortunately.

The last challenge would be the timing/skew alignment. I'm hoping that the RP2040 isn't too picky about using an external clock signal. In that case i can use the outgoing clock signal from the AMD CPU (which is also used in the scope shots) for clocking the RP2040. This way it should be skew-aligned nicely and adaptive to CPU bus frequency changes.

Does this sound attainable to you guys? Or am i grossly overestimating the capabilities of the RP2040 here? :-//

EDIT:
I see that the external (GPIO) clock input is rated for a maximum frequency of ~50MHz. For some testing work i could probably overclock the RP2040 to ~400MHz such that the maximum clock skew is only 1/4 of a clock cycle. A nicer solution would probably be to put a 1/4 clock prescaler on the output clock of the AMD CPU and feed the 25MHz clock signal to the clock input of the RP2040...
« Last Edit: June 01, 2023, 03:15:17 pm by bauto601 »
 

Offline mlamoore

  • Newbie
  • Posts: 9
  • Country: us
Re: 100MHz Serial Data Injector
« Reply #5 on: June 01, 2023, 03:48:58 pm »
I'm having trouble finding timing data in the RP2040 datasheet, it isn't nearly as thoroughly documented as microcontrollers from major suppliers.

In RP2040 datasheet section 2.16.1, it says that an external oscillator can drive the XIN pin at up to 50MHz, so you won't be able to clock the RP2040 off of the Northbridge 100MHz clock. Even if it could clock in that way, I'd be concerned about the delay in the PIO state machine; just a single 315MHz discrete flip-flop like SN74AUC74RGYR has propagation delay of 0.7ns typical, 2.5ns max at 1.5V.

If you want programmable logic, there's piles of options available, but I haven't used anything like that since college (although I've perused datasheets when contemplating personal projects).

That said, if you just want to listen to the signal, pass through everything with a 1 cycle delay, and fix the 32nd bit after the start bit, I feel like you could implement that with discrete logic more easily than programmable logic (as someone who is very rusty with programmable logic).

Your trace looks like the data transitions on the falling edge, so have one D-type flip flop read in the Northbridge data on the rising edge and a second D-type flip flop send the (possibly modified) data on the falling edge. The flip flop delay determines your clock skew; you could use something like SN74AUC74RGYR with 0.7ns typical, 2.5ns max propagation delay at 1.5V, which should let you meet your timing requirements as long as that leaves enough setup time for the CPU before it reads in the data on the rising edge.

Then you have a counter, two more state flip flops, and some logic gates to implement your state machine. One flip flop starts at 0 and is set to 1 when the start bit is read, which enables the counter. When the counter hits 32 (tested with an inverter and 6-input NOR gate), a 1 is or'ed in with the signal between the two pass-through flip-flops, setting that next bit to 1 instead of what the Northbridge sent, and also setting the second flip flop to 1, which disables the counter and forces the flip flops to always pass through data until the next reset, which resets the state flip-flops.

One nice thing: that flip flop runs on 0.8-2.7V VDD, so you could run it directly on Vcore. That's the most speed-critical part; you could probably find all the other components that could also run on Vcore.

That same basic state machine would be very easy on a FPGA in VHDL or Verilog; you'd just have to add in all the configuration of clocks and pins and delays to get everything wired up properly.

If those discrete components are too big, or the flip-flop 0.7-2.5ns delay isn't fast enough (because the CPU input needs a setup time longer than 4.3ns typical, 2.5ns worst case at 100MHz), then I would next investigate a FPGA that can input a 100-120MHz external clock and has IO that lets you control the phase between a clock and output with sub-ns precision. I know they exist, but I'm not sure of how low-end you can go while having that capability.

For example, a Xilinx Spartan-7 accepts a 19-800MHz input clock, has multiple orders of magnitude more logic capabilities than you need, has GPIO timings that measure from fractions of a ns up to 2.0ns at 1.5V (better at 1.8V), and has clock management modules that can add sub-ns phase shifts to let you tweak to exactly the right timing. However, it needs 0.95V or 1.0V supply voltage (IO voltage can be Vcore), and the cheapest Spartan-7 in stock at DigiKey is $21.91 in a 225-ball 0.8mm-pitch BGA, and if you want the 196-ball 1.0mm-pitch BGA that's their easiest to solder option, the cheapest is $26.74. I haven't looked, I'm not sure if you can keep the clock features you need while getting to a cheaper, easier to solder part that might even run directly off of 1.5-1.8V Vcore.

Good luck!
« Last Edit: June 01, 2023, 03:56:56 pm by mlamoore »
 
The following users thanked this post: bauto601

Offline PCB.Wiz

  • Super Contributor
  • ***
  • Posts: 1535
  • Country: au
Re: 100MHz Serial Data Injector
« Reply #6 on: June 01, 2023, 09:16:53 pm »
Does this sound attainable to you guys? Or am i grossly overestimating the capabilities of the RP2040 here? :-//
I've never tried the RP2040 for this sort of operation, but I'd guess the clock-sync side would be a real challenge.

.. clock signal.  99.5~100.5MHz, when overclocking an increase to 110~115MHz can be expected.
The signal voltage depends on the CPU Vcore voltage, which is around ~1.65v but can be increased to higher levels when overclocking.
...a small operating range (something like 1.60~1.80v)

After the start-bit, the 21st bit needs to be changed from 0 to 1 which enables the Push-Pull databus drivers on the CPU.
.. Once it reaches bit number 21 it will always output a logic high (1) for that single clock cycle thus enabling the PushPull bit in the serial signal. After that it will function as a simple passthrough again.
The logic needed here is not massive, but the timing is tough.

This may suit a CPLD like a XC2C32 / XC2C64 or LC4032ZE  LC4064ZE.
Those have 1.5~3.3V IO and 1.8V cores, and are rated >> 100MHz internally.

The challenge here will be clock-data skew, if you want to gate the data, that inserts a delay which will break the skew.
If you are ok with breaking the clock and data lines, you could pass the CLK thru the CPLD, and pass or swap the data, and the difference in those delays is much less.

Or, you could break the data only, and sniff and delay that ?
You capture the data on the active clock edge, and then output on the next active clock edge, and the CPLD delay make that appear (ideally) 1.5 clocks later, so Tsu,Th are ok.

Otherwise a less elegant but simpler way to force 1, with no PCB breaks, would be to simply whack the data line with an aggressive pullup at the precise time, enough to read logic 1.

These little FPGA boards are cute, but you'd need to be able to isolate and lower VCCIO  :
https://www.aliexpress.com/item/1005004526884573.html
https://www.aliexpress.com/item/4000366030520.html

and this CPLD one does have VCCIO available.
https://www.aliexpress.com/item/1005005474989628.html
and this includes a jtag programmer
https://www.aliexpress.com/item/1005004872591240.html
or newer (and larger)
https://www.tindie.com/products/earth_people_technology/intel-fpga-max-v-cpld-dev-system-unoprologic/

and this looks more suitable, some sort of XBOX mod, but it has a XC2C64A on a small board, with JTAG header holes, and enough IO to be useful.
https://www.aliexpress.com/item/1005004605332339.html
It looks like the larger package option, has more IO's connected to test pads ?
The web images suggest VCCIO1 VCCIO2 are top routed, so easy to change ?
Someone has info here
http://blog.dimitrioskouzisloukas.com/2019/08/use-matrix-glitcher-as-coolrunner-ii.html

and for 'a foot in both camps' :
https://www.tindie.com/products/tinyvision_ai/pico-ice-rp2040-plus-lattice-ice40up5k-fpga/
« Last Edit: June 02, 2023, 04:55:54 am by PCB.Wiz »
 
The following users thanked this post: bauto601

Offline bauto601Topic starter

  • Contributor
  • Posts: 13
  • Country: nl
Re: 100MHz Serial Data Injector
« Reply #7 on: June 04, 2023, 05:16:12 pm »
Does this sound attainable to you guys? Or am i grossly overestimating the capabilities of the RP2040 here? :-//
I've never tried the RP2040 for this sort of operation, but I'd guess the clock-sync side would be a real challenge.

.. clock signal.  99.5~100.5MHz, when overclocking an increase to 110~115MHz can be expected.
The signal voltage depends on the CPU Vcore voltage, which is around ~1.65v but can be increased to higher levels when overclocking.
...a small operating range (something like 1.60~1.80v)

After the start-bit, the 21st bit needs to be changed from 0 to 1 which enables the Push-Pull databus drivers on the CPU.
.. Once it reaches bit number 21 it will always output a logic high (1) for that single clock cycle thus enabling the PushPull bit in the serial signal. After that it will function as a simple passthrough again.
The logic needed here is not massive, but the timing is tough.

This may suit a CPLD like a XC2C32 / XC2C64 or LC4032ZE  LC4064ZE.
Those have 1.5~3.3V IO and 1.8V cores, and are rated >> 100MHz internally.

The challenge here will be clock-data skew, if you want to gate the data, that inserts a delay which will break the skew.
If you are ok with breaking the clock and data lines, you could pass the CLK thru the CPLD, and pass or swap the data, and the difference in those delays is much less.

Or, you could break the data only, and sniff and delay that ?
You capture the data on the active clock edge, and then output on the next active clock edge, and the CPLD delay make that appear (ideally) 1.5 clocks later, so Tsu,Th are ok.

Otherwise a less elegant but simpler way to force 1, with no PCB breaks, would be to simply whack the data line with an aggressive pullup at the precise time, enough to read logic 1.

These little FPGA boards are cute, but you'd need to be able to isolate and lower VCCIO  :
https://www.aliexpress.com/item/1005004526884573.html
https://www.aliexpress.com/item/4000366030520.html

and this CPLD one does have VCCIO available.
https://www.aliexpress.com/item/1005005474989628.html
and this includes a jtag programmer
https://www.aliexpress.com/item/1005004872591240.html
or newer (and larger)
https://www.tindie.com/products/earth_people_technology/intel-fpga-max-v-cpld-dev-system-unoprologic/

and this looks more suitable, some sort of XBOX mod, but it has a XC2C64A on a small board, with JTAG header holes, and enough IO to be useful.
https://www.aliexpress.com/item/1005004605332339.html
It looks like the larger package option, has more IO's connected to test pads ?
The web images suggest VCCIO1 VCCIO2 are top routed, so easy to change ?
Someone has info here
http://blog.dimitrioskouzisloukas.com/2019/08/use-matrix-glitcher-as-coolrunner-ii.html

and for 'a foot in both camps' :
https://www.tindie.com/products/tinyvision_ai/pico-ice-rp2040-plus-lattice-ice40up5k-fpga/

Hmm, i've been playing around with the RP2040 and it's just not quite what i was looking for indeed. Even when overclocked to 400mhz the maximum actual sampling frequency will be around 133/200mhz depending on the code. That is wayy too slow for correct timing without clock sync. If the effective sample rate of the could would be 400mhz, the maximum clock skew would be 1/4 of a clock theoretically which isn't too bad.

The XC2Cxxx series looks very promising. They aren't too expensive or complex to implement in the current design and they are very flexible regarding to clock speeds and signal voltages. This will require quite some time to get familiar with though. I'm going to order the XBOX board and try to understand the whole coding model for these kind of devices. Thanks for all the help till this point and i'll probably pop up here after some time to ask some questions about the XC2C controller. :-+

EDIT:
Breaking up the clock signal will be tricky, breaking up the clock signal to the CPU will make it run out-of-sync with the northbridge. The northbridge clock should also be ran through the CPLD which requires motherboard modifications which is something i'm trying to avoid. Even if that was possible, the PCI/AGP clock will be out-of-sync so breaking up the CPU's clock opens up a whole new box of pandora unfortunately.
« Last Edit: June 04, 2023, 07:04:33 pm by bauto601 »
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf