Author Topic: Using ULPI USB PHYs for custom data links (Read 6222 times)

SiliconWizard · « **on:** April 28, 2022, 05:44:34 pm »

I've read about this on a couple of forums, but without much details.

Has any of you implemented (or at least considered) custom data links using an USB PHY (so, with a custom protocol, and not USB)? That would give you the PHY for pretty cheap (you can find USB 2.0 PHYs, so a 480Mb/s link, for less than $2 or so). Good for cables up to 5m according to the standard, but I'm pretty sure one can achieve longer than this with off-the-shelf PHYs using custom cables and protocols, and not transporting power.

The reason for considering USB PHYs is flexibility, data rate and cost. While those are cheap (for market reasons), gigabit Ethernet PHYs tend to be more expensive (but don't hesitate to point me to "cheap" ones), and dedicated "SERDES" ICs with integrated CDR tend to be VERY expensive. (Like easily $20 to $40 per 1.)

Is ULPI flexible enough to allow this? (I've just started studying it so I'll still have to figure this out.)

I posted that in the FPGA section, but of course that could also be used with MCUs.

Speaking of FPGAs, the higher-end ones usually embed SERDES blocks, some with clock/data recovery and everything, so that implementing ustom "high-speed" data links is relatively straightforward, but for lower-end ones, there's no such thing, and so you're pretty limited in the max data rate you can reasonably achieve (and having to implement CDR yourself using oversampling and such...)

Any thoughts welcome.

asmi · « **Reply #1 on:** April 29, 2022, 05:04:41 am »

Quote from: SiliconWizard on April 28, 2022, 05:44:34 pm

I've read about this on a couple of forums, but without much details.

Has any of you implemented (or at least considered) custom data links using an USB PHY (so, with a custom protocol, and not USB)? That would give you the PHY for pretty cheap (you can find USB 2.0 PHYs, so a 480Mb/s link, for less than $2 or so). Good for cables up to 5m according to the standard, but I'm pretty sure one can achieve longer than this with off-the-shelf PHYs using custom cables and protocols, and not transporting power.

The reason for considering USB PHYs is flexibility, data rate and cost. While those are cheap (for market reasons), gigabit Ethernet PHYs tend to be more expensive (but don't hesitate to point me to "cheap" ones), and dedicated "SERDES" ICs with integrated CDR tend to be VERY expensive. (Like easily $20 to $40 per 1.)

You can use RS485/422 transceivers, they are relatively cheap (unless you want galvanic isolation), and some can hit some fairly impressive speeds (up to 50 Mbit/s over a distance of a few meters).

Quote from: SiliconWizard on April 28, 2022, 05:44:34 pm

Is ULPI flexible enough to allow this? (I've just started studying it so I'll still have to figure this out.)

Why not? It could be possible, since most PHYs only concepn themselves with well PHYsical coding and access. You might want to read the ULPI spec and experiment a bit. I suspect that you will still need to follow an initial handshake protocol because it's required for syncronization, but once "connection" is established, it should work.

I would be wary of using such a hodge-podge solution in any real commercial project, but if it's "for science" - why not give it a try and see what happens?

Quote from: SiliconWizard on April 28, 2022, 05:44:34 pm

Speaking of FPGAs, the higher-end ones usually embed SERDES blocks, some with clock/data recovery and everything, so that implementing ustom "high-speed" data links is relatively straightforward, but for lower-end ones, there's no such thing, and so you're pretty limited in the max data rate you can reasonably achieve (and having to implement CDR yourself using oversampling and such...)

Any thoughts welcome.

In Xilinx world (at least 7 series and above), the minimum official supported line speed for MGT is 500 Mbit/s, but they also provide ready-to-use HDL for any SERDES-based solution (and a SERDES module is embedded into every single IO tile in all Xilinx devices) for speeds up to 1.25 Gbit/s, and these SERDES modules have a special mode when it consists of essentially four strings of flip-flops (3 in each) allowing you to implement a 4x oversampling with minimal effort. For even lower speeds, you can basically just run the IO part of it at a multiple of a line frequency and implement a multisampling manually.

SiliconWizard · « **Reply #2 on:** April 29, 2022, 08:39:02 pm »

I've used LVDS transceivers for links up to 50 Mbps or so, and this is rather straightforward.
But we're talking about an order of magnitude faster with the idea of using USB PHYs in HS mode.

I know this is doable with SERDES, but as I said, some FPGAs do not even have true SERDES, and even when they do,while the TX part is straightforward, the RX part, not necessarily so for recovering the clock. Some FPGAs do have proper CDR, with some other, you need to hand-implement them. This is not trivial to implement robust CDR. I'm sure some vendors provide ready-to-use IPs for this. I'll have to see for the Artix-7, but otherwise, you often have to go for higher-end devices. I've read a couple papers about CDRs too, and implementing this on FPGAs with no CDR and just by oversampling, it seems hard to reach beyond something like 200-300Mbps reliably, and it's not even trivial...

To make things clear, I'm talking about transmission over a single pair, requiring clock recovery on the RX end.

So. After having read the ULPI spec, I actually think using USB 2.0 PHYs should be absolutely doable. The "only" constraint you'd have is that the transmission of data would have to occur in data packets as defined by the USB standard - that's basically a sync preamble, packet ID, data payload, CRC and end of packet. Since there's a sync preamble at the beginning of each packet, there is nothing else to do for clock synchronization actually, if I understood it right. All the rest is for the USB protocol (like speed "negotiation", tokens, etc).

This is for experimenting at the moment, but why would it not be usable for a commercial solution? As long as one follows the ULPI spec, and the physical part of the USB spec, I can't really see a potential problem. Now as I mentioned, benefits are that we get a relatively robust PHY solution able to drive cables of several meters, for basically the cost of an LVDS or RS485 transceiver. But any reasonable counter-argument is welcome.

asmi · « **Reply #3 on:** April 29, 2022, 09:11:23 pm »

Quote from: SiliconWizard on April 29, 2022, 08:39:02 pm

I've used LVDS transceivers for links up to 50 Mbps or so, and this is rather straightforward.
But we're talking about an order of magnitude faster with the idea of using USB PHYs in HS mode.

You never said what kind of speed you're looking for, so I tried to cover the whole spectrum. For lower speeds and long transmission lines, RS485/422 is definetly the best choice as it's the most reliable channel.

Quote from: SiliconWizard on April 29, 2022, 08:39:02 pm

I know this is doable with SERDES, but as I said, some FPGAs do not even have true SERDES, and even when they do,while the TX part is straightforward, the RX part, not necessarily so for recovering the clock. Some FPGAs do have proper CDR, with some other, you need to hand-implement them. This is not trivial to implement robust CDR. I'm sure some vendors provide ready-to-use IPs for this. I'll have to see for the Artix-7, but otherwise, you often have to go for higher-end devices. I've read a couple papers about CDRs too, and implementing this on FPGAs with no CDR and just by oversampling, it seems hard to reach beyond something like 200-300Mbps reliably, and it's not even trivial...

Xilinx provides an appnote with ready-to-use HDL for implementing 4x oversampling up to 1.25 Gpbs, so it's completely trivial. You just need any 7 series FPGA with speed grade of 2 or higher (or you are willing to overclock the SG1 device).

Quote from: SiliconWizard on April 29, 2022, 08:39:02 pm

So. After having read the ULPI spec, I actually think using USB 2.0 PHYs should be absolutely doable. The "only" constraint you'd have is that the transmission of data would have to occur in data packets as defined by the USB standard - that's basically a sync preamble, packet ID, data payload, CRC and end of packet. Since there's a sync preamble at the beginning of each packet, there is nothing else to do for clock synchronization actually, if I understood it right. All the rest is for the USB protocol (like speed "negotiation", tokens, etc).

Well give it a try and let us know how it went.

Quote from: SiliconWizard on April 29, 2022, 08:39:02 pm

This is for experimenting at the moment, but why would it not be usable for a commercial solution? As long as one follows the ULPI spec, and the physical part of the USB spec, I can't really see a potential problem. Now as I mentioned, benefits are that we get a relatively robust PHY solution able to drive cables of several meters, for basically the cost of an LVDS or RS485 transceiver. But any reasonable counter-argument is welcome.

Just a few reasons I can think of off top of my head:
1. Users are stupid. What will happen to both sides if you connect your device to a PC? What about connecting some random peripheral? What if they accidentally use a power-only USB cable?
2. USB was not designed for high-EMI environments, and most USB 2 cables I've come across are not properly shielded.
3. Since USB is half-duplex, you only get half of the claimed bandwidth if you need to send data in both directions.
4. Using MGT is much easier as it requires no external components (aside from AC-coupling caps), and there are fairly low-end FPGAs which have them, and pretty fast too (up to 6.6 Gpbs for Artix in some packages).
5. This one is personal - I live by KISS principle, and hate all of this smart-assery, which tends to bite you in the back when you least expect it, so I do my best to avoid off-label uses. Few bucks saved on parts often end up costing tens of thousands $$$ more for extra development, testing and validation - I've seen this far too many times to be willing to repeat it.

langwadt · « **Reply #4 on:** April 29, 2022, 09:50:33 pm »

Quote from: asmi on April 29, 2022, 09:11:23 pm

Quote from: SiliconWizard on April 29, 2022, 08:39:02 pm
I've used LVDS transceivers for links up to 50 Mbps or so, and this is rather straightforward.
But we're talking about an order of magnitude faster with the idea of using USB PHYs in HS mode.
You never said what kind of speed you're looking for, so I tried to cover the whole spectrum. For lower speeds and long transmission lines, RS485/422 is definetly the best choice as it's the most reliable channel.

Quote from: SiliconWizard on April 29, 2022, 08:39:02 pm
I know this is doable with SERDES, but as I said, some FPGAs do not even have true SERDES, and even when they do,while the TX part is straightforward, the RX part, not necessarily so for recovering the clock. Some FPGAs do have proper CDR, with some other, you need to hand-implement them. This is not trivial to implement robust CDR. I'm sure some vendors provide ready-to-use IPs for this. I'll have to see for the Artix-7, but otherwise, you often have to go for higher-end devices. I've read a couple papers about CDRs too, and implementing this on FPGAs with no CDR and just by oversampling, it seems hard to reach beyond something like 200-300Mbps reliably, and it's not even trivial...
Xilinx provides an appnote with ready-to-use HDL for implementing 4x oversampling up to 1.25 Gpbs, so it's completely trivial. You just need any 7 series FPGA with speed grade of 2 or higher (or you are willing to overclock the SG1 device).

Quote from: SiliconWizard on April 29, 2022, 08:39:02 pm
So. After having read the ULPI spec, I actually think using USB 2.0 PHYs should be absolutely doable. The "only" constraint you'd have is that the transmission of data would have to occur in data packets as defined by the USB standard - that's basically a sync preamble, packet ID, data payload, CRC and end of packet. Since there's a sync preamble at the beginning of each packet, there is nothing else to do for clock synchronization actually, if I understood it right. All the rest is for the USB protocol (like speed "negotiation", tokens, etc).
Well give it a try and let us know how it went.

Quote from: SiliconWizard on April 29, 2022, 08:39:02 pm
This is for experimenting at the moment, but why would it not be usable for a commercial solution? As long as one follows the ULPI spec, and the physical part of the USB spec, I can't really see a potential problem. Now as I mentioned, benefits are that we get a relatively robust PHY solution able to drive cables of several meters, for basically the cost of an LVDS or RS485 transceiver. But any reasonable counter-argument is welcome.
Just a few reasons I can think of off top of my head:
1. Users are stupid. What will happen to both sides if you connect your device to a PC? What about connecting some random peripheral? What if they accidentally use a power-only USB cable?
2. USB was not designed for high-EMI environments, and most USB 2 cables I've come across are not properly shielded.
3. Since USB is half-duplex, you only get half of the claimed bandwidth if you need to send data in both directions.
4. Using MGT is much easier as it requires no external components (aside from AC-coupling caps), and there are fairly low-end FPGAs which have them, and pretty fast too (up to 6.6 Gpbs for Artix in some packages).
5. This one is personal - I live by KISS principle, and hate all of this smart-assery, which tends to bite you in the back when you least expect it, so I do my best to avoid off-label uses. Few bucks saved on parts often end up costing tens of thousands $$$ more for extra development, testing and validation - I've seen this far too many times to be willing to repeat it.

noone says you need to use USB cables and connectors, and MGT is going to need cables and connectors too

SiliconWizard · « **Reply #5 on:** April 30, 2022, 12:04:53 am »

Quote from: asmi on April 29, 2022, 09:11:23 pm

Quote from: SiliconWizard on April 29, 2022, 08:39:02 pm
I've used LVDS transceivers for links up to 50 Mbps or so, and this is rather straightforward.
But we're talking about an order of magnitude faster with the idea of using USB PHYs in HS mode.
You never said what kind of speed you're looking for, so I tried to cover the whole spectrum. For lower speeds and long transmission lines, RS485/422 is definetly the best choice as it's the most reliable channel.

Well, alright, that's why I made it clearer in the second post.

Still, I assumed that talking about 480Mb/s rate and gigabit Ethernet PHYs as an alternative would have at least given an idea of the range.

Quote from: asmi on April 29, 2022, 09:11:23 pm

Quote from: SiliconWizard on April 29, 2022, 08:39:02 pm
I know this is doable with SERDES, but as I said, some FPGAs do not even have true SERDES, and even when they do,while the TX part is straightforward, the RX part, not necessarily so for recovering the clock. Some FPGAs do have proper CDR, with some other, you need to hand-implement them. This is not trivial to implement robust CDR. I'm sure some vendors provide ready-to-use IPs for this. I'll have to see for the Artix-7, but otherwise, you often have to go for higher-end devices. I've read a couple papers about CDRs too, and implementing this on FPGAs with no CDR and just by oversampling, it seems hard to reach beyond something like 200-300Mbps reliably, and it's not even trivial...
Xilinx provides an appnote with ready-to-use HDL for implementing 4x oversampling up to 1.25 Gpbs, so it's completely trivial. You just need any 7 series FPGA with speed grade of 2 or higher (or you are willing to overclock the SG1 device).

So that was one of the FPGAs for which that would be easy. While I agree the Artix-7 is not particularly high-end, Xilinx is one of the few offering that. Certainly something to keep in mind, but I would like having more options and being a bit less tied to a particular vendor, especially at the moment.

Quote from: asmi on April 29, 2022, 09:11:23 pm

Quote from: SiliconWizard on April 29, 2022, 08:39:02 pm
So. After having read the ULPI spec, I actually think using USB 2.0 PHYs should be absolutely doable. The "only" constraint you'd have is that the transmission of data would have to occur in data packets as defined by the USB standard - that's basically a sync preamble, packet ID, data payload, CRC and end of packet. Since there's a sync preamble at the beginning of each packet, there is nothing else to do for clock synchronization actually, if I understood it right. All the rest is for the USB protocol (like speed "negotiation", tokens, etc).
Well give it a try and let us know how it went.

I'll try to. I ordered a few.

Quote from: asmi on April 29, 2022, 09:11:23 pm

Quote from: SiliconWizard on April 29, 2022, 08:39:02 pm
This is for experimenting at the moment, but why would it not be usable for a commercial solution? As long as one follows the ULPI spec, and the physical part of the USB spec, I can't really see a potential problem. Now as I mentioned, benefits are that we get a relatively robust PHY solution able to drive cables of several meters, for basically the cost of an LVDS or RS485 transceiver. But any reasonable counter-argument is welcome.
Just a few reasons I can think of off top of my head:
1. Users are stupid. What will happen to both sides if you connect your device to a PC? What about connecting some random peripheral? What if they accidentally use a power-only USB cable?

I agree, but as langwadt said, no need to use regular USB connectors and cables. I absolutely avoid using standard connectors for any other purpose than what's standard, so yep. But if I did here, I would make sure plugging it to a standard USB host or device would have no consequence other than a device not enumerating.

Quote from: asmi on April 29, 2022, 09:11:23 pm

2. USB was not designed for high-EMI environments, and most USB 2 cables I've come across are not properly shielded.

That's a fair one. The cable part is not a real concern, as I said, I don't really intend on using off-the-shelf USB cables anyway. But true that beyond cables, even the low-level protocol is not particularly good for EMI. No spread spectrum, no TMDS, no scrambling... But you can use decent quality shielded cables, and dedicated common-mode filters to make things significantly better. I think USB 3 SS should be better in that regard.

Quote from: asmi on April 29, 2022, 09:11:23 pm

3. Since USB is half-duplex, you only get half of the claimed bandwidth if you need to send data in both directions.

Yup, that's obviously what you get when using a single pair. Not necessarily a problem. But if full duplex is needed, one could always use two PHYs each with a dedicated direction. That would still be reasonably cheap.

Quote from: asmi on April 29, 2022, 09:11:23 pm

4. Using MGT is much easier as it requires no external components (aside from AC-coupling caps), and there are fairly low-end FPGAs which have them, and pretty fast too (up to 6.6 Gpbs for Artix in some packages).

It is easier, and more flexible, as long as it's available. As to external components... for board-to-board interconnect, no problem with that, but for connection using cables, I tend to add LVDS transceivers anyway. I don't feel very comfortable directly connecting FPGA IOs to a connector made for connection through cables several meters long. Maybe It's not justified.

Quote from: asmi on April 29, 2022, 09:11:23 pm

5. This one is personal - I live by KISS principle, and hate all of this smart-assery, which tends to bite you in the back when you least expect it, so I do my best to avoid off-label uses. Few bucks saved on parts often end up costing tens of thousands $$$ more for extra development, testing and validation - I've seen this far too many times to be willing to repeat it.

I don't blame. But I don't think this is really the case here. It's not like using some component out of spec. It's just using a subset of its features in a documented way, with the benefit of being dictacted by a standard spec, so that any ULPI-compliant PHY should give a pretty similar result.

I agree that using dedicated resources in some FPGA would be more flexible and possibly less "icky", but these days, the idea of using a standard and ubiquitous solution, reusable with any FPGA almost as is, is kind of attractive.

ali_asadzadeh · « **Reply #6 on:** April 30, 2022, 08:05:47 am »

Since you need a controller, Maybe a cheap FPGA like Gowin would come handy as they have a USB 2.0 Highspeed IP PHY and controller , which would need around 2K to 5K LUTs based on the config and it can be used in almost all of their parts.

SiliconWizard · « **Reply #7 on:** April 30, 2022, 08:37:14 pm »

Cheap is one factor, but as I said, the main one is availability and not being tied to any particular vendor. So no vendor-specific IP. (Out of curiosity, what's the licensing for the Gowin IP?)

As to FPGA-based solutions, I suppose asmi was referring to the XAPP523 app note from Xilinx. I'm reading it. Looks like pretty textbook oversampling stuff. It's not overly problematic to implement this in a generic way if you don't want Xilinx-specific stuff, the only issue is having true SERDES blocks in order to reach a decent rate. I'm currently using ECP5, for instance, and the bare version doesnt have true SERDES. The version with SERDES requires licensing for Diamond. Also, most of their more advanced IPs are paid ones, so you're stuck implementing this by hand. Not that this is overly complicated, but the devil's always in the details. Even the non-SERDES version of ECP5 might be ok for something around 400-500Mbps, but that might not be easy. Something to investigate.

Also, as I mentioned, the more high-end FPGAs often embed true CDR blocks, so there's no need to fiddle with oversampling, and the result is more robust and more flexible.

Regarding "no external components", what do you guys think? As I said, I don't much like directly routing FPGA IOs to connectors. Typical USB PHYs embed various means of protection. Sure you can always add discrete protection.

A side note for this idea is that the question of implementing some "high-speed" link between a MCU and a FPGA often pops up. Above a few tens of Mb/s, using SPI, you're often stuck. Or you'll have to resort to some parallel bus / memory interface, requiring many IOs and having all sorts of limitations depending on the MCU itself. While, if using a MCU with embedded USB HS, it's likely possibly to implement such a link using the same idea: using the embedded USB core with a custom protocol, which would save having to implement full-blown USB when this would be largely overkill - knowing that USB cores in MCUs are often pretty similar to external USB PHYs in terms of functionalities, the bulk of the USB protocol being implemented in software.

asmi · « **Reply #8 on:** April 30, 2022, 10:03:29 pm »

Quote from: SiliconWizard on April 30, 2022, 08:37:14 pm

As to FPGA-based solutions, I suppose asmi was referring to the XAPP523 app note from Xilinx. I'm reading it. Looks like pretty textbook oversampling stuff. It's not overly problematic to implement this in a generic way if you don't want Xilinx-specific stuff, the only issue is having true SERDES blocks in order to reach a decent rate. I'm currently using ECP5, for instance, and the bare version doesnt have true SERDES. The version with SERDES requires licensing for Diamond. Also, most of their more advanced IPs are paid ones, so you're stuck implementing this by hand. Not that this is overly complicated, but the devil's always in the details. Even the non-SERDES version of ECP5 might be ok for something around 400-500Mbps, but that might not be easy. Something to investigate.

You keep confusing SERDES and Multi-Gigabit Transceivers, these are not the same things. SERDES is a part of MGT, but there are also standalone SERDES'es embedded into I/O tiles, and as such available for every single I/O ball of Xilinx devices. Same goes for other vendors - ECP5 has GDDRs which support up to 7:1 (de)serialization and as such is a SERDES in it's own right, even though their manual doesn't name it for what it is. Unfortunately Lattice's datasheet is super-confusing and doesn't callout the max transfer rate for these modules, if it is only 400 MHz (which is what buffer is rated at) than it's rather pathetic I must say because even cheapest Spartan-7 xc7s6 can go all the way up to 1.25 Gbps per differential pair.

Quote from: SiliconWizard on April 30, 2022, 08:37:14 pm

Also, as I mentioned, the more high-end FPGAs often embed true CDR blocks, so there's no need to fiddle with oversampling, and the result is more robust and more flexible.

All MGTs include CDR block as MGT can't function without it, it doesn't mean anything about a part being high-end or not. All 7 series family is a low-end nowadays, high-end is Virtex US+.

Quote from: SiliconWizard on April 30, 2022, 08:37:14 pm

Regarding "no external components", what do you guys think? As I said, I don't much like directly routing FPGA IOs to connectors. Typical USB PHYs embed various means of protection. Sure you can always add discrete protection.

Why? MGTs have been specifically designed to drive transmission lines, infact they usually include blocks to deal with problems such transmission lines typicaly have (like pre- and post-emphasis, line equalization, filters, etc.), also most serial buses are AC-coupled so any DC voltage that somehow makes it's way onto these conductors will not damage anything. So it's completely safe to connect them directly to the transmission lines, the only protection you might want to add is ESD if transmission line entails connector. You can of course use redrivers if you so desire, but unless the transmission line is extra lossy and long, I think it's a waste of money and precious PCB space. Read, for example, Xilinx UG476 document to see what's actually inside a typical MGT block, and you will see that it's much more than just a SERDES with CDR - there is a whole bunch of circuitry - both analog and digital - to build a PMA/PCS for pretty much any serial protocol that is electrically compatible with MGT.

Quote from: SiliconWizard on April 30, 2022, 08:37:14 pm

A side note for this idea is that the question of implementing some "high-speed" link between a MCU and a FPGA often pops up. Above a few tens of Mb/s, using SPI, you're often stuck. Or you'll have to resort to some parallel bus / memory interface, requiring many IOs and having all sorts of limitations depending on the MCU itself. While, if using a MCU with embedded USB HS, it's likely possibly to implement such a link using the same idea: using the embedded USB core with a custom protocol, which would save having to implement full-blown USB when this would be largely overkill - knowing that USB cores in MCUs are often pretty similar to external USB PHYs in terms of functionalities, the bulk of the USB protocol being implemented in software.

MCUs which can actually utilize a high-speed connection to FPGA typically has some connection options - like HyperBus for STM32H7, QSPI for STM32F4 and F7, or PCI Express for SoCs like iMX processors (though it's rather pathetic at PCIE 2.0x1, so only 5 Gbps in each direction). As for USB, most MCUs have some sort of hardware USB MAC (like HW endpoint FIFOs, config registers and the like) and so you don't have direct access to the PHY like what you get with FPGA, so you might be forced to use full USB protocol instead of a hodge-podge that you have in mind. That said, I obviously didn't check all MCUs out there, so there might be some which make it possible.
--------
As for availability, for prototypes and one-offs you can try your luck on Aliexpress - there you can buy pretty much any Artix-7 device for relatively cheap, I've bought a bunch of various 7 series devices over the years and so far all of them worked (though there are some devices which I snagged just because the price was too good to miss - like 5 Zynq-030's with Kintex-based fabric and 10G+ MGTs for like $60 delivered, but I'm yet to come up with a use for them), but of course your mileage may vary. Just a bit of advice - the pricing there is all over the place, you will see some crazy stuff like higher-end parts being cheaper than lower-end ones (K325T is cheaper than pretty much any Artix), so look at all options before buying.

BrianHG · « **Reply #9 on:** April 30, 2022, 11:15:56 pm »

Quote from: SiliconWizard on April 28, 2022, 05:44:34 pm

Is ULPI flexible enough to allow this? (I've just started studying it so I'll still have to figure this out.)

The ULPI is a standard connection channel specification to your FPGA. A ULPI PHY IC is a USB cable driver with a simple ser & deser, clock recovery circuitry in it and simple USB arbitration, so your FPGA interface only runs at 1/2 or 1/4 480mb speed (3 pin VS 6 pin mode). Meaning you do not need a 480megabit ser-des in your FPGA. These ULPI PHY ICs are stupid other than a ser-des plus USB status. Everything else you need to program in the FPGA, but with the knowledge you are receiving and transmitting your packet structures and contents in bytes.

You should have access to 100% of the USB2.x capabilities with a ULPI PHY IC.

SiliconWizard · « **Reply #10 on:** May 01, 2022, 02:50:08 am »

Quote from: BrianHG on April 30, 2022, 11:15:56 pm

Quote from: SiliconWizard on April 28, 2022, 05:44:34 pm
Is ULPI flexible enough to allow this? (I've just started studying it so I'll still have to figure this out.)
The ULPI is a standard connection channel specification to your FPGA. A ULPI PHY IC is a USB cable driver with a simple ser & deser, clock recovery circuitry in it and simple USB arbitration, so your FPGA interface only runs at 1/2 or 1/4 480mb speed (3 pin VS 6 pin mode). Meaning you do not need a 480megabit ser-des in your FPGA. These ULPI PHY ICs are stupid other than a ser-des plus USB status. Everything else you need to program in the FPGA, but with the knowledge you are receiving and transmitting your packet structures and contents in bytes.

You should have access to 100% of the USB2.x capabilities with a ULPI PHY IC.

I've now read the ULPI spec fully and there should be no problem doing what I have in mind.

ULPI-compliant PHYs actually do a little more than what you wrote above, they also handle NRZI encoding, CRC, and form packets.
They also handle some additional functions that can be handy to have, like cable disconnection, VBUS detection (which could be used for something else), etc.

As I mentioned above, the only constraint you'll have is that the PHY will generate low-level USB packets. That's SYNC+packet ID+payload+CRC+EOP. You only need to provide packet ID and payload, the rest is automatic. The SYNC part is 32 bytes long in HS, so to optimize throughput, you better use large payloads. The USB std defines 1024 bytes maximum for data packets, but I don't know whether the PHYs actually check for length or not. Packet ID is 1 byte, you can write anything you want if you don't have to comply with USB. And CRC is 16 bits. EOP is a few bits. All in all, the SYNC preamble is what eats the most extra bandwidth, and for 1024 data packets, the overhead is about 3%. Since it handles CRC, you have a reasonable check for data integrity for free.

Just a note, ULPI supports various bus widths, but the most common is 8 data bits. So it's 8 bit @60MHz.

As to using MCUs with embedded USB 2 PHYs, doing the same will greatly depend on the MCU. Some have a PHY that isn't directly accessible, some do. But that was just another idea to investigate. That can't be as "portable" as doing this with a separate ULPI PHY.

For higher data rates, I guess doing something similar with an USB 3 PHY in SS mode should be doable as well. Now we're talking about 5Gsps. The interface requires more (and faster) IOs though, it's a PIPE interface, but that should still be doable with modest FPGAs.

Fully implementing links in FPGAs that have fast enough transceivers is of course also on the list, but I thought this idea might be worth investigating.

As to terms, yeah. Lattice calls SERDES their high-speed transceivers, and flags their FPGA without that as not having SERDES. It's a matter of terms. Xilinx actually has a number of variants for their high-speed transceivers, from GTP to GTX to... And yes, their higher-end FPGAs such as the Virtex series have much faster, but also more sophisticated transceivers. Another term used, CDR, let's also state what were are talking about. Even the GTP that the Artix (and I think Spartan 7?) have seem to have some kind of CDR, but as far as I can tell, it's not a full CDR as I think of it: they can't recover clock and data on their own (unless I missed something, I'm not an expert with the Xilinx 7-series), which is why the app note mentioned earlier shows a way to recover data (not the clock) using oversampling. That's not any kind of hardware CDR as I call it. But, as far as I've seen, the Virtex seem to have that. Now whether you'd get better, or at least as good performance (data integrity) relative to jitter and clock drift for a ~500Mbps link than a dedicated USB PHY, I do not know. But having read some papers about clock/data recovery and oversampling, I'm not completely sure that is the case with 4x oversampling and a relatively simple scheme. Maybe it is though. I'd be curious to see eye patterns and compare them - problem being that you can't directly get an eye pattern with the oversampling method, as you don't recover the clock.

Small edit: The SYNC preamble in data packets are 32 *bits* for USB HS, not 32 bytes. So the overhead is much less severe.

asmi · « **Reply #11 on:** May 01, 2022, 04:09:15 am »

Quote from: SiliconWizard on May 01, 2022, 02:50:08 am

Even the GTP that the Artix (and I think Spartan 7?) have seem to have some kind of CDR, but as far as I can tell, it's not a full CDR as I think of it: they can't recover clock and data on their own (unless I missed something, I'm not an expert with the Xilinx 7-series), which is why the app note mentioned earlier shows a way to recover data (not the clock) using oversampling. That's not any kind of hardware CDR as I call it. But, as far as I've seen, the Virtex seem to have that. Now whether you'd get better, or at least as good performance (data integrity) relative to jitter and clock drift for a ~500Mbps link than a dedicated USB PHY, I do not know. But having read some papers about clock/data recovery and oversampling, I'm not completely sure that is the case with 4x oversampling and a relatively simple scheme. Maybe it is though. I'd be curious to see eye patterns and compare them - problem being that you can't directly get an eye pattern with the oversampling method, as you don't recover the clock.

You mixed things up again

1. ALL 7 series devices have SERDES'es embedded into I/O tiles, these are available on all devices across the series, and on all user pins. These SERDES do NOT have CDR, and 4x oversampling appnote applies to these pins.
2. In addition to (1), some packages of Artix-7, and all Kintex-7 and Virtex-7 devices have Multi Gigabit Transceivers (MGT), these are GTP (for Artix), GTX (for Kintex) and GTH (for Virtex). These transceivers have dedicated IO pins (not shared with regular user I/O pins), each transceiver has a transmitter and receiver, they are grouped by 4 (so called "quads"), with each quad having some shared resources like dedicated clock inputs (2 per quad), quad PLL and some other circuitry. Each specific device has a certain number of those quads bonded out (with exception of Artix devices in CP236/238 package, which only has 2 transceivers bonded out from a quad). Receiver in each of those transceivers contain a full clock recovery module. Because of presense of this module, MGTs do not need any oversampling and handle clock recovery internally (provided right reference clock signal of course).

Once again, 4x oversampling appnote applies to REGULAR user I/O pins (and each of those pins is capable of reaching 1.25 Gbps wire speed), while all MGTs (even GTPs) have dedicated CDR modules and so don't need any oversampling.

DiTBho · « **Reply #12 on:** May 01, 2022, 03:45:22 pm »

what do you want to achieve exactly?
a fast link for what?

SiliconWizard · « **Reply #13 on:** May 01, 2022, 11:08:00 pm »

I'll have to take a deeper look at what the Xilinx 7 series offers. I guess from the mentioned app note, I assumed oversampling was the only way of implementing this on the Artix 7, and the little I read from the Xilinx fact sheets was not really helpful. That'll be for later. I do have 2 Artix dev boards actually, so that's something I'll probably work on at some point.

The ECP5 without SERDES (again, as *Lattice* calls it, which would be the equivalent of MGT), from what I've read, is limited to 400 MHz on its IOs. So that's a bit rough. But you wouldn't get better with many other FPGAs of this "range", or even lower end. For those, alternatives can be nice to have.

This is not an X/Y problem. I know what the alternative is if you use FPGAs with fast enough IOs or dedicated transceivers. So the idea is for when you can't use that, while using a standard solution (even if it means using a subset of it), allowing for *portability* of the approach. And portability is a pretty nice thing to have these days due to gigantic problems of availability. I was curious about whether others had already thought about it, or done it. So, the thread was mostly about that. And, if that can give ideas to some people, great. If not, that's fine too.

As also suggested initially, using a gigabit Ethernet PHY for rates in the same order (faster though obviously) could also be an option, and if anyone has experience with this, that can also be welcome. Implementing custom protocols over Ethernet seems much more common though, and it looks like it doesn't "tickle" people as much as the thought of doing this with USB, probably because USB is more "monolithic" compared to Ethernet.

BrianHG · « **Reply #14 on:** May 02, 2022, 12:24:54 am »

Quote from: SiliconWizard on May 01, 2022, 11:08:00 pm

The ECP5 without SERDES (again, as *Lattice* calls it, which would be the equivalent of MGT), from what I've read, is limited to 400 MHz on its IOs. So that's a bit rough. But you wouldn't get better with many other FPGAs of this "range", or even lower end. For those, alternatives can be nice to have.

800MBPS for -8, 700MBPS for -7, 624MBPS on the -6 when running it's SER-DES in DDRX2 mode, 500MBPS in DDRX1 mode for all speed grades.

Remember, a 400MHz toggle rate means a 800mbps ser-des capability. If not, it would be impossible to use DDR3 ram with these FPGAs which requires a minimum 606mbps ser-des.

Page 66, 67 and 68, Table 3.22. ECP5/ECP5-5G External Switching Characteristics in the ECP5 and ECP5-5G Family data sheet. Read the 'data rates' column.

asmi · « **Reply #15 on:** May 02, 2022, 12:43:34 am »

Quote from: BrianHG on May 02, 2022, 12:24:54 am

800MBPS for -8, 700MBPS for -7, 624MBPS on the -6 when running it's SER-DES in DDRX2 mode, 500MBPS in DDRX1 mode for all speed grades.

Remember, a 400MHz toggle rate means a 800mbps ser-des capability. If not, it would be impossible to use DDR3 ram with these FPGAs which requires a minimum 606mbps ser-des.

Page 66, 67 and 68, Table 3.22. ECP5/ECP5-5G External Switching Characteristics in the ECP5 and ECP5-5G Family data sheet. Read the 'data rates' column.

That's still rather pathetic compared to 7 series which can do 950 Mbps in the slowest speed grade (1), and 1.25 Gbps for any other speed grades. And that for any differential pair pins, of which there are 24 pairs per I/O bank - so 48 out of 50 I/O in the bank are capable of reaching max wire speed.

SiliconWizard · « **Reply #16 on:** May 02, 2022, 12:45:29 am »

Quote from: BrianHG on May 02, 2022, 12:24:54 am

Quote from: SiliconWizard on May 01, 2022, 11:08:00 pm
The ECP5 without SERDES (again, as *Lattice* calls it, which would be the equivalent of MGT), from what I've read, is limited to 400 MHz on its IOs. So that's a bit rough. But you wouldn't get better with many other FPGAs of this "range", or even lower end. For those, alternatives can be nice to have.
800MBPS for -8, 700MBPS for -7, 624MBPS on the -6 when running it's SER-DES in DDRX2 mode, 500MBPS in DDRX1 mode for all speed grades.

Remember, a 400MHz toggle rate means a 800mbps ser-des capability. If not, it would be impossible to use DDR3 ram with these FPGAs which requires a minimum 606mbps ser-des.

Page 66, 67 and 68, Table 3.22. ECP5/ECP5-5G External Switching Characteristics in the ECP5 and ECP5-5G Family data sheet. Read the 'data rates' column.

OK thanks. I'll have to see what can be achieved with that using oversampling. Yet another thing to do.
The USB PHY approach would still be interesting anyway for lower-end FPGAs, such as the iCE40 stuff, and I can think of a number of applications for this.

As to availability, well, even the ECP5 that was available a few months back when others weren't, it's now become pretty much unobtainium as well...

BrianHG · « **Reply #17 on:** May 02, 2022, 12:48:27 am »

Quote from: asmi on May 02, 2022, 12:43:34 am

Quote from: BrianHG on May 02, 2022, 12:24:54 am
800MBPS for -8, 700MBPS for -7, 624MBPS on the -6 when running it's SER-DES in DDRX2 mode, 500MBPS in DDRX1 mode for all speed grades.

Remember, a 400MHz toggle rate means a 800mbps ser-des capability. If not, it would be impossible to use DDR3 ram with these FPGAs which requires a minimum 606mbps ser-des.

Page 66, 67 and 68, Table 3.22. ECP5/ECP5-5G External Switching Characteristics in the ECP5 and ECP5-5G Family data sheet. Read the 'data rates' column.
That's still rather pathetic compared to 7 series which can do 950 Mbps in the slowest speed grade (1), and 1.25 Gbps for any other speed grades. And that for any differential pair pins, of which there are 24 pairs per I/O bank - so 48 out of 50 I/O in the bank are capable of reaching max wire speed.

It's a bloody 6$ FPGA for 25Kgate 1megabit ram, 12$ for 45kgate with 2 megabit ram.
This is not a Vendor A VS. Vendor B discussion. I was pointing out an error in SiliconWizard's reading of the data sheet.
And at 18$/36$ respectively, you can get the same FPGA with a number of 5gbit serdes ports.

BrianHG · « **Reply #18 on:** May 02, 2022, 12:57:31 am »

Quote from: SiliconWizard on May 02, 2022, 12:45:29 am

As to availability, well, even the ECP5 that was available a few months back when others weren't, it's now become pretty much unobtainium as well...

https://www.verical.com/pd/lattice-semiconductor-fpga-lfe5u-25f-6bg256c-5869565
25kgates at 8$.
Though 17924 in stock, it ships from Hong Kong.

Digikey has 80 of the the top end ones with the high speed transceivers:
https://www.digikey.com/en/products/detail/lattice-semiconductor-corporation/LFE5UM5G-85F-8BG381I/6173749?s=N4IgjCBcoLQCxVAYygMwIYBsDOBTANCAPZQDaIcArABxgCcIhlcA7AAxsgC6AvoTACZEIFJAAuAJwCuBYmRCVuPPiCGRymdGLEBLJLgAEqAA4BzdN0IA2YToAmUEDDBsIhY2MeMQYgJ7HcR3RsFGUgA

asmi · « **Reply #19 on:** May 02, 2022, 01:03:33 am »

Quote from: SiliconWizard on May 01, 2022, 11:08:00 pm

This is not an X/Y problem. I know what the alternative is if you use FPGAs with fast enough IOs or dedicated transceivers. So the idea is for when you can't use that, while using a standard solution (even if it means using a subset of it), allowing for *portability* of the approach. And portability is a pretty nice thing to have these days due to gigantic problems of availability. I was curious about whether others had already thought about it, or done it. So, the thread was mostly about that. And, if that can give ideas to some people, great. If not, that's fine too.

This is only possible if you take comparable FPGAs, otherwise you are grossly under-utilizing some in order to have them all perform identical task. Just make an experiment - take some small-ish softcore (say 1-2K LUTs) and try implementing it on ECP5, Cyclone 5 and Artix-7 at max possible frequency and you will see just how large the difference is. In just one experiment I had - I took my WIP RV64 softcore which works at 177 MHz on Artix-7 SG2 and by just moving it to a new Artix Ultrascale+ I was able to more than double the frequency (I was able to reach 388 MHz if my memory serves me). And that was for a "generic" enough HDL such that I could just re-target the code to a new family with almost no changes (AFAIR I only changed the component name for a PLL and tweaked some of it's settings to set up new target frequency). If you want to extract the absolute max performance from any FPGA, you will need to use device-specific features, which would make keeping code portable a nightmare.

So - messing about with USB PHY makes absolutely zero sense when you work with 7 series because it can reach higher speed on it's own, and you can have as many lanes as you want (like I said, even the cheapest and slowest S7 or A7 provide at least two full IO banks and so 48 differential pairs). In my experience having ability to scale the bandwidth is important, so I tend to use source syncronous LVDS on all FPGAs and whatever "native" IO block circuitly is available on a target device. Scaling USB PHY line bandwidth would be a nightmake, while source sync LVDS easily scales to many lanes (though unless you're willing to do per-line link training you will need to length-match all of these, so at some point routing it becomes harder).

SiliconWizard · « **Reply #20 on:** May 02, 2022, 01:04:12 am »

Quote from: BrianHG on May 02, 2022, 12:57:31 am

Quote from: SiliconWizard on May 02, 2022, 12:45:29 am
As to availability, well, even the ECP5 that was available a few months back when others weren't, it's now become pretty much unobtainium as well...

https://www.verical.com/pd/lattice-semiconductor-fpga-lfe5u-25f-6bg256c-5869565
25kgates at 8$.
Though 17924 in stock, it ships from Hong Kong.

Digikey has 80 of the the top end ones with the high speed transceivers:
https://www.digikey.com/en/products/detail/lattice-semiconductor-corporation/LFE5UM5G-85F-8BG381I/6173749?s=N4IgjCBcoLQCxVAYygMwIYBsDOBTANCAPZQDaIcArABxgCcIhlcA7AAxsgC6AvoTACZEIFJAAuAJwCuBYmRCVuPPiCGRymdGLEBLJLgAEqAA4BzdN0IA2YToAmUEDDBsIhY2MeMQYgJ7HcR3RsFGUgA

Thanks for the first link, I'll have a look. I didn't know about them. Edit: Yes, uh, apparently they only have this one reference of ECP5, and nothing else.
As to Digikey, I looked a couple days ago and could not find any in stock I think. The one you linked to has 74 in stock and probably zero pretty soon...

asmi · « **Reply #21 on:** May 02, 2022, 01:05:17 am »

Quote from: BrianHG on May 02, 2022, 12:48:27 am

And at 18$/36$ respectively, you can get the same FPGA with a number of 3.25gbit serdes ports.

Only if you are willing to pay 2k$/year licence fee

Oh, and you can get a relatively cheap Artix with 6.6G transceivers too

asmi · « **Reply #22 on:** May 02, 2022, 01:09:05 am »

Quote from: SiliconWizard on May 02, 2022, 01:04:12 am

Thanks for the first link, I'll have a look. I didn't know about them.
As to Digikey, I looked a couple days ago and could find any in stock I think. The one you linked to has 74 in stock and probably zero pretty soon...

These are LFE5UM, which require a paid license according to this page.

SiliconWizard · « **Reply #23 on:** May 02, 2022, 01:11:06 am »

Yes, the ECP5UM requires paid subscription. It's not cheap. Not a huge problem if you're selling products with that. More of a problem if not...

BrianHG · « **Reply #24 on:** May 02, 2022, 01:20:59 am »

And it's gonna be at least another 2 years before stock of FPGAs begins to come back to something sorta normal. For now, we got to scrounge up whatever becomes available, or, schedule large orders and hope you get them on time.


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

Author Topic: Using ULPI USB PHYs for custom data links (Read 6222 times)

Share me