Author Topic: Using ULPI USB PHYs for custom data links (Read 6115 times)

SiliconWizard · « **on:** April 28, 2022, 05:44:34 pm »

I've read about this on a couple of forums, but without much details.

Has any of you implemented (or at least considered) custom data links using an USB PHY (so, with a custom protocol, and not USB)? That would give you the PHY for pretty cheap (you can find USB 2.0 PHYs, so a 480Mb/s link, for less than $2 or so). Good for cables up to 5m according to the standard, but I'm pretty sure one can achieve longer than this with off-the-shelf PHYs using custom cables and protocols, and not transporting power.

The reason for considering USB PHYs is flexibility, data rate and cost. While those are cheap (for market reasons), gigabit Ethernet PHYs tend to be more expensive (but don't hesitate to point me to "cheap" ones), and dedicated "SERDES" ICs with integrated CDR tend to be VERY expensive. (Like easily $20 to $40 per 1.)

Is ULPI flexible enough to allow this? (I've just started studying it so I'll still have to figure this out.)

I posted that in the FPGA section, but of course that could also be used with MCUs.

Speaking of FPGAs, the higher-end ones usually embed SERDES blocks, some with clock/data recovery and everything, so that implementing ustom "high-speed" data links is relatively straightforward, but for lower-end ones, there's no such thing, and so you're pretty limited in the max data rate you can reasonably achieve (and having to implement CDR yourself using oversampling and such...)

Any thoughts welcome.

asmi · « **Reply #1 on:** April 29, 2022, 05:04:41 am »

Quote from: SiliconWizard on April 28, 2022, 05:44:34 pm

I've read about this on a couple of forums, but without much details.

Has any of you implemented (or at least considered) custom data links using an USB PHY (so, with a custom protocol, and not USB)? That would give you the PHY for pretty cheap (you can find USB 2.0 PHYs, so a 480Mb/s link, for less than $2 or so). Good for cables up to 5m according to the standard, but I'm pretty sure one can achieve longer than this with off-the-shelf PHYs using custom cables and protocols, and not transporting power.

The reason for considering USB PHYs is flexibility, data rate and cost. While those are cheap (for market reasons), gigabit Ethernet PHYs tend to be more expensive (but don't hesitate to point me to "cheap" ones), and dedicated "SERDES" ICs with integrated CDR tend to be VERY expensive. (Like easily $20 to $40 per 1.)

You can use RS485/422 transceivers, they are relatively cheap (unless you want galvanic isolation), and some can hit some fairly impressive speeds (up to 50 Mbit/s over a distance of a few meters).

Quote from: SiliconWizard on April 28, 2022, 05:44:34 pm

Is ULPI flexible enough to allow this? (I've just started studying it so I'll still have to figure this out.)

Why not? It could be possible, since most PHYs only concepn themselves with well PHYsical coding and access. You might want to read the ULPI spec and experiment a bit. I suspect that you will still need to follow an initial handshake protocol because it's required for syncronization, but once "connection" is established, it should work.

I would be wary of using such a hodge-podge solution in any real commercial project, but if it's "for science" - why not give it a try and see what happens?

Quote from: SiliconWizard on April 28, 2022, 05:44:34 pm

Speaking of FPGAs, the higher-end ones usually embed SERDES blocks, some with clock/data recovery and everything, so that implementing ustom "high-speed" data links is relatively straightforward, but for lower-end ones, there's no such thing, and so you're pretty limited in the max data rate you can reasonably achieve (and having to implement CDR yourself using oversampling and such...)

Any thoughts welcome.

In Xilinx world (at least 7 series and above), the minimum official supported line speed for MGT is 500 Mbit/s, but they also provide ready-to-use HDL for any SERDES-based solution (and a SERDES module is embedded into every single IO tile in all Xilinx devices) for speeds up to 1.25 Gbit/s, and these SERDES modules have a special mode when it consists of essentially four strings of flip-flops (3 in each) allowing you to implement a 4x oversampling with minimal effort. For even lower speeds, you can basically just run the IO part of it at a multiple of a line frequency and implement a multisampling manually.

SiliconWizard · « **Reply #2 on:** April 29, 2022, 08:39:02 pm »

I've used LVDS transceivers for links up to 50 Mbps or so, and this is rather straightforward.
But we're talking about an order of magnitude faster with the idea of using USB PHYs in HS mode.

I know this is doable with SERDES, but as I said, some FPGAs do not even have true SERDES, and even when they do,while the TX part is straightforward, the RX part, not necessarily so for recovering the clock. Some FPGAs do have proper CDR, with some other, you need to hand-implement them. This is not trivial to implement robust CDR. I'm sure some vendors provide ready-to-use IPs for this. I'll have to see for the Artix-7, but otherwise, you often have to go for higher-end devices. I've read a couple papers about CDRs too, and implementing this on FPGAs with no CDR and just by oversampling, it seems hard to reach beyond something like 200-300Mbps reliably, and it's not even trivial...

To make things clear, I'm talking about transmission over a single pair, requiring clock recovery on the RX end.

So. After having read the ULPI spec, I actually think using USB 2.0 PHYs should be absolutely doable. The "only" constraint you'd have is that the transmission of data would have to occur in data packets as defined by the USB standard - that's basically a sync preamble, packet ID, data payload, CRC and end of packet. Since there's a sync preamble at the beginning of each packet, there is nothing else to do for clock synchronization actually, if I understood it right. All the rest is for the USB protocol (like speed "negotiation", tokens, etc).

This is for experimenting at the moment, but why would it not be usable for a commercial solution? As long as one follows the ULPI spec, and the physical part of the USB spec, I can't really see a potential problem. Now as I mentioned, benefits are that we get a relatively robust PHY solution able to drive cables of several meters, for basically the cost of an LVDS or RS485 transceiver. But any reasonable counter-argument is welcome.

asmi · « **Reply #3 on:** April 29, 2022, 09:11:23 pm »

Quote from: SiliconWizard on April 29, 2022, 08:39:02 pm

I've used LVDS transceivers for links up to 50 Mbps or so, and this is rather straightforward.
But we're talking about an order of magnitude faster with the idea of using USB PHYs in HS mode.

You never said what kind of speed you're looking for, so I tried to cover the whole spectrum. For lower speeds and long transmission lines, RS485/422 is definetly the best choice as it's the most reliable channel.

Quote from: SiliconWizard on April 29, 2022, 08:39:02 pm

I know this is doable with SERDES, but as I said, some FPGAs do not even have true SERDES, and even when they do,while the TX part is straightforward, the RX part, not necessarily so for recovering the clock. Some FPGAs do have proper CDR, with some other, you need to hand-implement them. This is not trivial to implement robust CDR. I'm sure some vendors provide ready-to-use IPs for this. I'll have to see for the Artix-7, but otherwise, you often have to go for higher-end devices. I've read a couple papers about CDRs too, and implementing this on FPGAs with no CDR and just by oversampling, it seems hard to reach beyond something like 200-300Mbps reliably, and it's not even trivial...

Xilinx provides an appnote with ready-to-use HDL for implementing 4x oversampling up to 1.25 Gpbs, so it's completely trivial. You just need any 7 series FPGA with speed grade of 2 or higher (or you are willing to overclock the SG1 device).

Quote from: SiliconWizard on April 29, 2022, 08:39:02 pm

So. After having read the ULPI spec, I actually think using USB 2.0 PHYs should be absolutely doable. The "only" constraint you'd have is that the transmission of data would have to occur in data packets as defined by the USB standard - that's basically a sync preamble, packet ID, data payload, CRC and end of packet. Since there's a sync preamble at the beginning of each packet, there is nothing else to do for clock synchronization actually, if I understood it right. All the rest is for the USB protocol (like speed "negotiation", tokens, etc).

Well give it a try and let us know how it went.

Quote from: SiliconWizard on April 29, 2022, 08:39:02 pm

This is for experimenting at the moment, but why would it not be usable for a commercial solution? As long as one follows the ULPI spec, and the physical part of the USB spec, I can't really see a potential problem. Now as I mentioned, benefits are that we get a relatively robust PHY solution able to drive cables of several meters, for basically the cost of an LVDS or RS485 transceiver. But any reasonable counter-argument is welcome.

Just a few reasons I can think of off top of my head:
1. Users are stupid. What will happen to both sides if you connect your device to a PC? What about connecting some random peripheral? What if they accidentally use a power-only USB cable?
2. USB was not designed for high-EMI environments, and most USB 2 cables I've come across are not properly shielded.
3. Since USB is half-duplex, you only get half of the claimed bandwidth if you need to send data in both directions.
4. Using MGT is much easier as it requires no external components (aside from AC-coupling caps), and there are fairly low-end FPGAs which have them, and pretty fast too (up to 6.6 Gpbs for Artix in some packages).
5. This one is personal - I live by KISS principle, and hate all of this smart-assery, which tends to bite you in the back when you least expect it, so I do my best to avoid off-label uses. Few bucks saved on parts often end up costing tens of thousands $$$ more for extra development, testing and validation - I've seen this far too many times to be willing to repeat it.

langwadt · « **Reply #4 on:** April 29, 2022, 09:50:33 pm »

Quote from: asmi on April 29, 2022, 09:11:23 pm

Quote from: SiliconWizard on April 29, 2022, 08:39:02 pm
I've used LVDS transceivers for links up to 50 Mbps or so, and this is rather straightforward.
But we're talking about an order of magnitude faster with the idea of using USB PHYs in HS mode.
You never said what kind of speed you're looking for, so I tried to cover the whole spectrum. For lower speeds and long transmission lines, RS485/422 is definetly the best choice as it's the most reliable channel.

Quote from: SiliconWizard on April 29, 2022, 08:39:02 pm
I know this is doable with SERDES, but as I said, some FPGAs do not even have true SERDES, and even when they do,while the TX part is straightforward, the RX part, not necessarily so for recovering the clock. Some FPGAs do have proper CDR, with some other, you need to hand-implement them. This is not trivial to implement robust CDR. I'm sure some vendors provide ready-to-use IPs for this. I'll have to see for the Artix-7, but otherwise, you often have to go for higher-end devices. I've read a couple papers about CDRs too, and implementing this on FPGAs with no CDR and just by oversampling, it seems hard to reach beyond something like 200-300Mbps reliably, and it's not even trivial...
Xilinx provides an appnote with ready-to-use HDL for implementing 4x oversampling up to 1.25 Gpbs, so it's completely trivial. You just need any 7 series FPGA with speed grade of 2 or higher (or you are willing to overclock the SG1 device).

Quote from: SiliconWizard on April 29, 2022, 08:39:02 pm
So. After having read the ULPI spec, I actually think using USB 2.0 PHYs should be absolutely doable. The "only" constraint you'd have is that the transmission of data would have to occur in data packets as defined by the USB standard - that's basically a sync preamble, packet ID, data payload, CRC and end of packet. Since there's a sync preamble at the beginning of each packet, there is nothing else to do for clock synchronization actually, if I understood it right. All the rest is for the USB protocol (like speed "negotiation", tokens, etc).
Well give it a try and let us know how it went.

Quote from: SiliconWizard on April 29, 2022, 08:39:02 pm
This is for experimenting at the moment, but why would it not be usable for a commercial solution? As long as one follows the ULPI spec, and the physical part of the USB spec, I can't really see a potential problem. Now as I mentioned, benefits are that we get a relatively robust PHY solution able to drive cables of several meters, for basically the cost of an LVDS or RS485 transceiver. But any reasonable counter-argument is welcome.
Just a few reasons I can think of off top of my head:
1. Users are stupid. What will happen to both sides if you connect your device to a PC? What about connecting some random peripheral? What if they accidentally use a power-only USB cable?
2. USB was not designed for high-EMI environments, and most USB 2 cables I've come across are not properly shielded.
3. Since USB is half-duplex, you only get half of the claimed bandwidth if you need to send data in both directions.
4. Using MGT is much easier as it requires no external components (aside from AC-coupling caps), and there are fairly low-end FPGAs which have them, and pretty fast too (up to 6.6 Gpbs for Artix in some packages).
5. This one is personal - I live by KISS principle, and hate all of this smart-assery, which tends to bite you in the back when you least expect it, so I do my best to avoid off-label uses. Few bucks saved on parts often end up costing tens of thousands $$$ more for extra development, testing and validation - I've seen this far too many times to be willing to repeat it.

noone says you need to use USB cables and connectors, and MGT is going to need cables and connectors too

SiliconWizard · « **Reply #5 on:** April 30, 2022, 12:04:53 am »

Quote from: asmi on April 29, 2022, 09:11:23 pm

Quote from: SiliconWizard on April 29, 2022, 08:39:02 pm
I've used LVDS transceivers for links up to 50 Mbps or so, and this is rather straightforward.
But we're talking about an order of magnitude faster with the idea of using USB PHYs in HS mode.
You never said what kind of speed you're looking for, so I tried to cover the whole spectrum. For lower speeds and long transmission lines, RS485/422 is definetly the best choice as it's the most reliable channel.

Well, alright, that's why I made it clearer in the second post.

Still, I assumed that talking about 480Mb/s rate and gigabit Ethernet PHYs as an alternative would have at least given an idea of the range.

Quote from: asmi on April 29, 2022, 09:11:23 pm

Quote from: SiliconWizard on April 29, 2022, 08:39:02 pm
I know this is doable with SERDES, but as I said, some FPGAs do not even have true SERDES, and even when they do,while the TX part is straightforward, the RX part, not necessarily so for recovering the clock. Some FPGAs do have proper CDR, with some other, you need to hand-implement them. This is not trivial to implement robust CDR. I'm sure some vendors provide ready-to-use IPs for this. I'll have to see for the Artix-7, but otherwise, you often have to go for higher-end devices. I've read a couple papers about CDRs too, and implementing this on FPGAs with no CDR and just by oversampling, it seems hard to reach beyond something like 200-300Mbps reliably, and it's not even trivial...
Xilinx provides an appnote with ready-to-use HDL for implementing 4x oversampling up to 1.25 Gpbs, so it's completely trivial. You just need any 7 series FPGA with speed grade of 2 or higher (or you are willing to overclock the SG1 device).

So that was one of the FPGAs for which that would be easy. While I agree the Artix-7 is not particularly high-end, Xilinx is one of the few offering that. Certainly something to keep in mind, but I would like having more options and being a bit less tied to a particular vendor, especially at the moment.

Quote from: asmi on April 29, 2022, 09:11:23 pm

Quote from: SiliconWizard on April 29, 2022, 08:39:02 pm
So. After having read the ULPI spec, I actually think using USB 2.0 PHYs should be absolutely doable. The "only" constraint you'd have is that the transmission of data would have to occur in data packets as defined by the USB standard - that's basically a sync preamble, packet ID, data payload, CRC and end of packet. Since there's a sync preamble at the beginning of each packet, there is nothing else to do for clock synchronization actually, if I understood it right. All the rest is for the USB protocol (like speed "negotiation", tokens, etc).
Well give it a try and let us know how it went.

I'll try to. I ordered a few.

Quote from: asmi on April 29, 2022, 09:11:23 pm

Quote from: SiliconWizard on April 29, 2022, 08:39:02 pm
This is for experimenting at the moment, but why would it not be usable for a commercial solution? As long as one follows the ULPI spec, and the physical part of the USB spec, I can't really see a potential problem. Now as I mentioned, benefits are that we get a relatively robust PHY solution able to drive cables of several meters, for basically the cost of an LVDS or RS485 transceiver. But any reasonable counter-argument is welcome.
Just a few reasons I can think of off top of my head:
1. Users are stupid. What will happen to both sides if you connect your device to a PC? What about connecting some random peripheral? What if they accidentally use a power-only USB cable?

I agree, but as langwadt said, no need to use regular USB connectors and cables. I absolutely avoid using standard connectors for any other purpose than what's standard, so yep. But if I did here, I would make sure plugging it to a standard USB host or device would have no consequence other than a device not enumerating.

Quote from: asmi on April 29, 2022, 09:11:23 pm

2. USB was not designed for high-EMI environments, and most USB 2 cables I've come across are not properly shielded.

That's a fair one. The cable part is not a real concern, as I said, I don't really intend on using off-the-shelf USB cables anyway. But true that beyond cables, even the low-level protocol is not particularly good for EMI. No spread spectrum, no TMDS, no scrambling... But you can use decent quality shielded cables, and dedicated common-mode filters to make things significantly better. I think USB 3 SS should be better in that regard.

Quote from: asmi on April 29, 2022, 09:11:23 pm

3. Since USB is half-duplex, you only get half of the claimed bandwidth if you need to send data in both directions.

Yup, that's obviously what you get when using a single pair. Not necessarily a problem. But if full duplex is needed, one could always use two PHYs each with a dedicated direction. That would still be reasonably cheap.

Quote from: asmi on April 29, 2022, 09:11:23 pm

4. Using MGT is much easier as it requires no external components (aside from AC-coupling caps), and there are fairly low-end FPGAs which have them, and pretty fast too (up to 6.6 Gpbs for Artix in some packages).

It is easier, and more flexible, as long as it's available. As to external components... for board-to-board interconnect, no problem with that, but for connection using cables, I tend to add LVDS transceivers anyway. I don't feel very comfortable directly connecting FPGA IOs to a connector made for connection through cables several meters long. Maybe It's not justified.

Quote from: asmi on April 29, 2022, 09:11:23 pm

5. This one is personal - I live by KISS principle, and hate all of this smart-assery, which tends to bite you in the back when you least expect it, so I do my best to avoid off-label uses. Few bucks saved on parts often end up costing tens of thousands $$$ more for extra development, testing and validation - I've seen this far too many times to be willing to repeat it.

I don't blame. But I don't think this is really the case here. It's not like using some component out of spec. It's just using a subset of its features in a documented way, with the benefit of being dictacted by a standard spec, so that any ULPI-compliant PHY should give a pretty similar result.

I agree that using dedicated resources in some FPGA would be more flexible and possibly less "icky", but these days, the idea of using a standard and ubiquitous solution, reusable with any FPGA almost as is, is kind of attractive.

ali_asadzadeh · « **Reply #6 on:** April 30, 2022, 08:05:47 am »

Since you need a controller, Maybe a cheap FPGA like Gowin would come handy as they have a USB 2.0 Highspeed IP PHY and controller , which would need around 2K to 5K LUTs based on the config and it can be used in almost all of their parts.

SiliconWizard · « **Reply #7 on:** April 30, 2022, 08:37:14 pm »

Cheap is one factor, but as I said, the main one is availability and not being tied to any particular vendor. So no vendor-specific IP. (Out of curiosity, what's the licensing for the Gowin IP?)

As to FPGA-based solutions, I suppose asmi was referring to the XAPP523 app note from Xilinx. I'm reading it. Looks like pretty textbook oversampling stuff. It's not overly problematic to implement this in a generic way if you don't want Xilinx-specific stuff, the only issue is having true SERDES blocks in order to reach a decent rate. I'm currently using ECP5, for instance, and the bare version doesnt have true SERDES. The version with SERDES requires licensing for Diamond. Also, most of their more advanced IPs are paid ones, so you're stuck implementing this by hand. Not that this is overly complicated, but the devil's always in the details. Even the non-SERDES version of ECP5 might be ok for something around 400-500Mbps, but that might not be easy. Something to investigate.

Also, as I mentioned, the more high-end FPGAs often embed true CDR blocks, so there's no need to fiddle with oversampling, and the result is more robust and more flexible.

Regarding "no external components", what do you guys think? As I said, I don't much like directly routing FPGA IOs to connectors. Typical USB PHYs embed various means of protection. Sure you can always add discrete protection.

A side note for this idea is that the question of implementing some "high-speed" link between a MCU and a FPGA often pops up. Above a few tens of Mb/s, using SPI, you're often stuck. Or you'll have to resort to some parallel bus / memory interface, requiring many IOs and having all sorts of limitations depending on the MCU itself. While, if using a MCU with embedded USB HS, it's likely possibly to implement such a link using the same idea: using the embedded USB core with a custom protocol, which would save having to implement full-blown USB when this would be largely overkill - knowing that USB cores in MCUs are often pretty similar to external USB PHYs in terms of functionalities, the bulk of the USB protocol being implemented in software.

asmi · « **Reply #8 on:** April 30, 2022, 10:03:29 pm »

Quote from: SiliconWizard on April 30, 2022, 08:37:14 pm

As to FPGA-based solutions, I suppose asmi was referring to the XAPP523 app note from Xilinx. I'm reading it. Looks like pretty textbook oversampling stuff. It's not overly problematic to implement this in a generic way if you don't want Xilinx-specific stuff, the only issue is having true SERDES blocks in order to reach a decent rate. I'm currently using ECP5, for instance, and the bare version doesnt have true SERDES. The version with SERDES requires licensing for Diamond. Also, most of their more advanced IPs are paid ones, so you're stuck implementing this by hand. Not that this is overly complicated, but the devil's always in the details. Even the non-SERDES version of ECP5 might be ok for something around 400-500Mbps, but that might not be easy. Something to investigate.

You keep confusing SERDES and Multi-Gigabit Transceivers, these are not the same things. SERDES is a part of MGT, but there are also standalone SERDES'es embedded into I/O tiles, and as such available for every single I/O ball of Xilinx devices. Same goes for other vendors - ECP5 has GDDRs which support up to 7:1 (de)serialization and as such is a SERDES in it's own right, even though their manual doesn't name it for what it is. Unfortunately Lattice's datasheet is super-confusing and doesn't callout the max transfer rate for these modules, if it is only 400 MHz (which is what buffer is rated at) than it's rather pathetic I must say because even cheapest Spartan-7 xc7s6 can go all the way up to 1.25 Gbps per differential pair.

Quote from: SiliconWizard on April 30, 2022, 08:37:14 pm

Also, as I mentioned, the more high-end FPGAs often embed true CDR blocks, so there's no need to fiddle with oversampling, and the result is more robust and more flexible.

All MGTs include CDR block as MGT can't function without it, it doesn't mean anything about a part being high-end or not. All 7 series family is a low-end nowadays, high-end is Virtex US+.

Quote from: SiliconWizard on April 30, 2022, 08:37:14 pm

Regarding "no external components", what do you guys think? As I said, I don't much like directly routing FPGA IOs to connectors. Typical USB PHYs embed various means of protection. Sure you can always add discrete protection.

Why? MGTs have been specifically designed to drive transmission lines, infact they usually include blocks to deal with problems such transmission lines typicaly have (like pre- and post-emphasis, line equalization, filters, etc.), also most serial buses are AC-coupled so any DC voltage that somehow makes it's way onto these conductors will not damage anything. So it's completely safe to connect them directly to the transmission lines, the only protection you might want to add is ESD if transmission line entails connector. You can of course use redrivers if you so desire, but unless the transmission line is extra lossy and long, I think it's a waste of money and precious PCB space. Read, for example, Xilinx UG476 document to see what's actually inside a typical MGT block, and you will see that it's much more than just a SERDES with CDR - there is a whole bunch of circuitry - both analog and digital - to build a PMA/PCS for pretty much any serial protocol that is electrically compatible with MGT.

Quote from: SiliconWizard on April 30, 2022, 08:37:14 pm

A side note for this idea is that the question of implementing some "high-speed" link between a MCU and a FPGA often pops up. Above a few tens of Mb/s, using SPI, you're often stuck. Or you'll have to resort to some parallel bus / memory interface, requiring many IOs and having all sorts of limitations depending on the MCU itself. While, if using a MCU with embedded USB HS, it's likely possibly to implement such a link using the same idea: using the embedded USB core with a custom protocol, which would save having to implement full-blown USB when this would be largely overkill - knowing that USB cores in MCUs are often pretty similar to external USB PHYs in terms of functionalities, the bulk of the USB protocol being implemented in software.

MCUs which can actually utilize a high-speed connection to FPGA typically has some connection options - like HyperBus for STM32H7, QSPI for STM32F4 and F7, or PCI Express for SoCs like iMX processors (though it's rather pathetic at PCIE 2.0x1, so only 5 Gbps in each direction). As for USB, most MCUs have some sort of hardware USB MAC (like HW endpoint FIFOs, config registers and the like) and so you don't have direct access to the PHY like what you get with FPGA, so you might be forced to use full USB protocol instead of a hodge-podge that you have in mind. That said, I obviously didn't check all MCUs out there, so there might be some which make it possible.
--------
As for availability, for prototypes and one-offs you can try your luck on Aliexpress - there you can buy pretty much any Artix-7 device for relatively cheap, I've bought a bunch of various 7 series devices over the years and so far all of them worked (though there are some devices which I snagged just because the price was too good to miss - like 5 Zynq-030's with Kintex-based fabric and 10G+ MGTs for like $60 delivered, but I'm yet to come up with a use for them), but of course your mileage may vary. Just a bit of advice - the pricing there is all over the place, you will see some crazy stuff like higher-end parts being cheaper than lower-end ones (K325T is cheaper than pretty much any Artix), so look at all options before buying.

BrianHG · « **Reply #9 on:** April 30, 2022, 11:15:56 pm »

Quote from: SiliconWizard on April 28, 2022, 05:44:34 pm

Is ULPI flexible enough to allow this? (I've just started studying it so I'll still have to figure this out.)

The ULPI is a standard connection channel specification to your FPGA. A ULPI PHY IC is a USB cable driver with a simple ser & deser, clock recovery circuitry in it and simple USB arbitration, so your FPGA interface only runs at 1/2 or 1/4 480mb speed (3 pin VS 6 pin mode). Meaning you do not need a 480megabit ser-des in your FPGA. These ULPI PHY ICs are stupid other than a ser-des plus USB status. Everything else you need to program in the FPGA, but with the knowledge you are receiving and transmitting your packet structures and contents in bytes.

You should have access to 100% of the USB2.x capabilities with a ULPI PHY IC.

SiliconWizard · « **Reply #10 on:** May 01, 2022, 02:50:08 am »

Quote from: BrianHG on April 30, 2022, 11:15:56 pm

Quote from: SiliconWizard on April 28, 2022, 05:44:34 pm
Is ULPI flexible enough to allow this? (I've just started studying it so I'll still have to figure this out.)
The ULPI is a standard connection channel specification to your FPGA. A ULPI PHY IC is a USB cable driver with a simple ser & deser, clock recovery circuitry in it and simple USB arbitration, so your FPGA interface only runs at 1/2 or 1/4 480mb speed (3 pin VS 6 pin mode). Meaning you do not need a 480megabit ser-des in your FPGA. These ULPI PHY ICs are stupid other than a ser-des plus USB status. Everything else you need to program in the FPGA, but with the knowledge you are receiving and transmitting your packet structures and contents in bytes.

You should have access to 100% of the USB2.x capabilities with a ULPI PHY IC.

I've now read the ULPI spec fully and there should be no problem doing what I have in mind.

ULPI-compliant PHYs actually do a little more than what you wrote above, they also handle NRZI encoding, CRC, and form packets.
They also handle some additional functions that can be handy to have, like cable disconnection, VBUS detection (which could be used for something else), etc.

As I mentioned above, the only constraint you'll have is that the PHY will generate low-level USB packets. That's SYNC+packet ID+payload+CRC+EOP. You only need to provide packet ID and payload, the rest is automatic. The SYNC part is 32 bytes long in HS, so to optimize throughput, you better use large payloads. The USB std defines 1024 bytes maximum for data packets, but I don't know whether the PHYs actually check for length or not. Packet ID is 1 byte, you can write anything you want if you don't have to comply with USB. And CRC is 16 bits. EOP is a few bits. All in all, the SYNC preamble is what eats the most extra bandwidth, and for 1024 data packets, the overhead is about 3%. Since it handles CRC, you have a reasonable check for data integrity for free.

Just a note, ULPI supports various bus widths, but the most common is 8 data bits. So it's 8 bit @60MHz.

As to using MCUs with embedded USB 2 PHYs, doing the same will greatly depend on the MCU. Some have a PHY that isn't directly accessible, some do. But that was just another idea to investigate. That can't be as "portable" as doing this with a separate ULPI PHY.

For higher data rates, I guess doing something similar with an USB 3 PHY in SS mode should be doable as well. Now we're talking about 5Gsps. The interface requires more (and faster) IOs though, it's a PIPE interface, but that should still be doable with modest FPGAs.

Fully implementing links in FPGAs that have fast enough transceivers is of course also on the list, but I thought this idea might be worth investigating.

As to terms, yeah. Lattice calls SERDES their high-speed transceivers, and flags their FPGA without that as not having SERDES. It's a matter of terms. Xilinx actually has a number of variants for their high-speed transceivers, from GTP to GTX to... And yes, their higher-end FPGAs such as the Virtex series have much faster, but also more sophisticated transceivers. Another term used, CDR, let's also state what were are talking about. Even the GTP that the Artix (and I think Spartan 7?) have seem to have some kind of CDR, but as far as I can tell, it's not a full CDR as I think of it: they can't recover clock and data on their own (unless I missed something, I'm not an expert with the Xilinx 7-series), which is why the app note mentioned earlier shows a way to recover data (not the clock) using oversampling. That's not any kind of hardware CDR as I call it. But, as far as I've seen, the Virtex seem to have that. Now whether you'd get better, or at least as good performance (data integrity) relative to jitter and clock drift for a ~500Mbps link than a dedicated USB PHY, I do not know. But having read some papers about clock/data recovery and oversampling, I'm not completely sure that is the case with 4x oversampling and a relatively simple scheme. Maybe it is though. I'd be curious to see eye patterns and compare them - problem being that you can't directly get an eye pattern with the oversampling method, as you don't recover the clock.

Small edit: The SYNC preamble in data packets are 32 *bits* for USB HS, not 32 bytes. So the overhead is much less severe.

asmi · « **Reply #11 on:** May 01, 2022, 04:09:15 am »

Quote from: SiliconWizard on May 01, 2022, 02:50:08 am

Even the GTP that the Artix (and I think Spartan 7?) have seem to have some kind of CDR, but as far as I can tell, it's not a full CDR as I think of it: they can't recover clock and data on their own (unless I missed something, I'm not an expert with the Xilinx 7-series), which is why the app note mentioned earlier shows a way to recover data (not the clock) using oversampling. That's not any kind of hardware CDR as I call it. But, as far as I've seen, the Virtex seem to have that. Now whether you'd get better, or at least as good performance (data integrity) relative to jitter and clock drift for a ~500Mbps link than a dedicated USB PHY, I do not know. But having read some papers about clock/data recovery and oversampling, I'm not completely sure that is the case with 4x oversampling and a relatively simple scheme. Maybe it is though. I'd be curious to see eye patterns and compare them - problem being that you can't directly get an eye pattern with the oversampling method, as you don't recover the clock.

You mixed things up again

1. ALL 7 series devices have SERDES'es embedded into I/O tiles, these are available on all devices across the series, and on all user pins. These SERDES do NOT have CDR, and 4x oversampling appnote applies to these pins.
2. In addition to (1), some packages of Artix-7, and all Kintex-7 and Virtex-7 devices have Multi Gigabit Transceivers (MGT), these are GTP (for Artix), GTX (for Kintex) and GTH (for Virtex). These transceivers have dedicated IO pins (not shared with regular user I/O pins), each transceiver has a transmitter and receiver, they are grouped by 4 (so called "quads"), with each quad having some shared resources like dedicated clock inputs (2 per quad), quad PLL and some other circuitry. Each specific device has a certain number of those quads bonded out (with exception of Artix devices in CP236/238 package, which only has 2 transceivers bonded out from a quad). Receiver in each of those transceivers contain a full clock recovery module. Because of presense of this module, MGTs do not need any oversampling and handle clock recovery internally (provided right reference clock signal of course).

Once again, 4x oversampling appnote applies to REGULAR user I/O pins (and each of those pins is capable of reaching 1.25 Gbps wire speed), while all MGTs (even GTPs) have dedicated CDR modules and so don't need any oversampling.

DiTBho · « **Reply #12 on:** May 01, 2022, 03:45:22 pm »

what do you want to achieve exactly?
a fast link for what?

SiliconWizard · « **Reply #13 on:** May 01, 2022, 11:08:00 pm »

I'll have to take a deeper look at what the Xilinx 7 series offers. I guess from the mentioned app note, I assumed oversampling was the only way of implementing this on the Artix 7, and the little I read from the Xilinx fact sheets was not really helpful. That'll be for later. I do have 2 Artix dev boards actually, so that's something I'll probably work on at some point.

The ECP5 without SERDES (again, as *Lattice* calls it, which would be the equivalent of MGT), from what I've read, is limited to 400 MHz on its IOs. So that's a bit rough. But you wouldn't get better with many other FPGAs of this "range", or even lower end. For those, alternatives can be nice to have.

This is not an X/Y problem. I know what the alternative is if you use FPGAs with fast enough IOs or dedicated transceivers. So the idea is for when you can't use that, while using a standard solution (even if it means using a subset of it), allowing for *portability* of the approach. And portability is a pretty nice thing to have these days due to gigantic problems of availability. I was curious about whether others had already thought about it, or done it. So, the thread was mostly about that. And, if that can give ideas to some people, great. If not, that's fine too.

As also suggested initially, using a gigabit Ethernet PHY for rates in the same order (faster though obviously) could also be an option, and if anyone has experience with this, that can also be welcome. Implementing custom protocols over Ethernet seems much more common though, and it looks like it doesn't "tickle" people as much as the thought of doing this with USB, probably because USB is more "monolithic" compared to Ethernet.

BrianHG · « **Reply #14 on:** May 02, 2022, 12:24:54 am »

Quote from: SiliconWizard on May 01, 2022, 11:08:00 pm

The ECP5 without SERDES (again, as *Lattice* calls it, which would be the equivalent of MGT), from what I've read, is limited to 400 MHz on its IOs. So that's a bit rough. But you wouldn't get better with many other FPGAs of this "range", or even lower end. For those, alternatives can be nice to have.

800MBPS for -8, 700MBPS for -7, 624MBPS on the -6 when running it's SER-DES in DDRX2 mode, 500MBPS in DDRX1 mode for all speed grades.

Remember, a 400MHz toggle rate means a 800mbps ser-des capability. If not, it would be impossible to use DDR3 ram with these FPGAs which requires a minimum 606mbps ser-des.

Page 66, 67 and 68, Table 3.22. ECP5/ECP5-5G External Switching Characteristics in the ECP5 and ECP5-5G Family data sheet. Read the 'data rates' column.

asmi · « **Reply #15 on:** May 02, 2022, 12:43:34 am »

Quote from: BrianHG on May 02, 2022, 12:24:54 am

800MBPS for -8, 700MBPS for -7, 624MBPS on the -6 when running it's SER-DES in DDRX2 mode, 500MBPS in DDRX1 mode for all speed grades.

Remember, a 400MHz toggle rate means a 800mbps ser-des capability. If not, it would be impossible to use DDR3 ram with these FPGAs which requires a minimum 606mbps ser-des.

Page 66, 67 and 68, Table 3.22. ECP5/ECP5-5G External Switching Characteristics in the ECP5 and ECP5-5G Family data sheet. Read the 'data rates' column.

That's still rather pathetic compared to 7 series which can do 950 Mbps in the slowest speed grade (1), and 1.25 Gbps for any other speed grades. And that for any differential pair pins, of which there are 24 pairs per I/O bank - so 48 out of 50 I/O in the bank are capable of reaching max wire speed.

SiliconWizard · « **Reply #16 on:** May 02, 2022, 12:45:29 am »

Quote from: BrianHG on May 02, 2022, 12:24:54 am

Quote from: SiliconWizard on May 01, 2022, 11:08:00 pm
The ECP5 without SERDES (again, as *Lattice* calls it, which would be the equivalent of MGT), from what I've read, is limited to 400 MHz on its IOs. So that's a bit rough. But you wouldn't get better with many other FPGAs of this "range", or even lower end. For those, alternatives can be nice to have.
800MBPS for -8, 700MBPS for -7, 624MBPS on the -6 when running it's SER-DES in DDRX2 mode, 500MBPS in DDRX1 mode for all speed grades.

Remember, a 400MHz toggle rate means a 800mbps ser-des capability. If not, it would be impossible to use DDR3 ram with these FPGAs which requires a minimum 606mbps ser-des.

Page 66, 67 and 68, Table 3.22. ECP5/ECP5-5G External Switching Characteristics in the ECP5 and ECP5-5G Family data sheet. Read the 'data rates' column.

OK thanks. I'll have to see what can be achieved with that using oversampling. Yet another thing to do.
The USB PHY approach would still be interesting anyway for lower-end FPGAs, such as the iCE40 stuff, and I can think of a number of applications for this.

As to availability, well, even the ECP5 that was available a few months back when others weren't, it's now become pretty much unobtainium as well...

BrianHG · « **Reply #17 on:** May 02, 2022, 12:48:27 am »

Quote from: asmi on May 02, 2022, 12:43:34 am

Quote from: BrianHG on May 02, 2022, 12:24:54 am
800MBPS for -8, 700MBPS for -7, 624MBPS on the -6 when running it's SER-DES in DDRX2 mode, 500MBPS in DDRX1 mode for all speed grades.

Remember, a 400MHz toggle rate means a 800mbps ser-des capability. If not, it would be impossible to use DDR3 ram with these FPGAs which requires a minimum 606mbps ser-des.

Page 66, 67 and 68, Table 3.22. ECP5/ECP5-5G External Switching Characteristics in the ECP5 and ECP5-5G Family data sheet. Read the 'data rates' column.
That's still rather pathetic compared to 7 series which can do 950 Mbps in the slowest speed grade (1), and 1.25 Gbps for any other speed grades. And that for any differential pair pins, of which there are 24 pairs per I/O bank - so 48 out of 50 I/O in the bank are capable of reaching max wire speed.

It's a bloody 6$ FPGA for 25Kgate 1megabit ram, 12$ for 45kgate with 2 megabit ram.
This is not a Vendor A VS. Vendor B discussion. I was pointing out an error in SiliconWizard's reading of the data sheet.
And at 18$/36$ respectively, you can get the same FPGA with a number of 5gbit serdes ports.

BrianHG · « **Reply #18 on:** May 02, 2022, 12:57:31 am »

Quote from: SiliconWizard on May 02, 2022, 12:45:29 am

As to availability, well, even the ECP5 that was available a few months back when others weren't, it's now become pretty much unobtainium as well...

https://www.verical.com/pd/lattice-semiconductor-fpga-lfe5u-25f-6bg256c-5869565
25kgates at 8$.
Though 17924 in stock, it ships from Hong Kong.

Digikey has 80 of the the top end ones with the high speed transceivers:
https://www.digikey.com/en/products/detail/lattice-semiconductor-corporation/LFE5UM5G-85F-8BG381I/6173749?s=N4IgjCBcoLQCxVAYygMwIYBsDOBTANCAPZQDaIcArABxgCcIhlcA7AAxsgC6AvoTACZEIFJAAuAJwCuBYmRCVuPPiCGRymdGLEBLJLgAEqAA4BzdN0IA2YToAmUEDDBsIhY2MeMQYgJ7HcR3RsFGUgA

asmi · « **Reply #19 on:** May 02, 2022, 01:03:33 am »

Quote from: SiliconWizard on May 01, 2022, 11:08:00 pm

This is not an X/Y problem. I know what the alternative is if you use FPGAs with fast enough IOs or dedicated transceivers. So the idea is for when you can't use that, while using a standard solution (even if it means using a subset of it), allowing for *portability* of the approach. And portability is a pretty nice thing to have these days due to gigantic problems of availability. I was curious about whether others had already thought about it, or done it. So, the thread was mostly about that. And, if that can give ideas to some people, great. If not, that's fine too.

This is only possible if you take comparable FPGAs, otherwise you are grossly under-utilizing some in order to have them all perform identical task. Just make an experiment - take some small-ish softcore (say 1-2K LUTs) and try implementing it on ECP5, Cyclone 5 and Artix-7 at max possible frequency and you will see just how large the difference is. In just one experiment I had - I took my WIP RV64 softcore which works at 177 MHz on Artix-7 SG2 and by just moving it to a new Artix Ultrascale+ I was able to more than double the frequency (I was able to reach 388 MHz if my memory serves me). And that was for a "generic" enough HDL such that I could just re-target the code to a new family with almost no changes (AFAIR I only changed the component name for a PLL and tweaked some of it's settings to set up new target frequency). If you want to extract the absolute max performance from any FPGA, you will need to use device-specific features, which would make keeping code portable a nightmare.

So - messing about with USB PHY makes absolutely zero sense when you work with 7 series because it can reach higher speed on it's own, and you can have as many lanes as you want (like I said, even the cheapest and slowest S7 or A7 provide at least two full IO banks and so 48 differential pairs). In my experience having ability to scale the bandwidth is important, so I tend to use source syncronous LVDS on all FPGAs and whatever "native" IO block circuitly is available on a target device. Scaling USB PHY line bandwidth would be a nightmake, while source sync LVDS easily scales to many lanes (though unless you're willing to do per-line link training you will need to length-match all of these, so at some point routing it becomes harder).

SiliconWizard · « **Reply #20 on:** May 02, 2022, 01:04:12 am »

Quote from: BrianHG on May 02, 2022, 12:57:31 am

Quote from: SiliconWizard on May 02, 2022, 12:45:29 am
As to availability, well, even the ECP5 that was available a few months back when others weren't, it's now become pretty much unobtainium as well...

https://www.verical.com/pd/lattice-semiconductor-fpga-lfe5u-25f-6bg256c-5869565
25kgates at 8$.
Though 17924 in stock, it ships from Hong Kong.

Digikey has 80 of the the top end ones with the high speed transceivers:
https://www.digikey.com/en/products/detail/lattice-semiconductor-corporation/LFE5UM5G-85F-8BG381I/6173749?s=N4IgjCBcoLQCxVAYygMwIYBsDOBTANCAPZQDaIcArABxgCcIhlcA7AAxsgC6AvoTACZEIFJAAuAJwCuBYmRCVuPPiCGRymdGLEBLJLgAEqAA4BzdN0IA2YToAmUEDDBsIhY2MeMQYgJ7HcR3RsFGUgA

Thanks for the first link, I'll have a look. I didn't know about them. Edit: Yes, uh, apparently they only have this one reference of ECP5, and nothing else.
As to Digikey, I looked a couple days ago and could not find any in stock I think. The one you linked to has 74 in stock and probably zero pretty soon...

asmi · « **Reply #21 on:** May 02, 2022, 01:05:17 am »

Quote from: BrianHG on May 02, 2022, 12:48:27 am

And at 18$/36$ respectively, you can get the same FPGA with a number of 3.25gbit serdes ports.

Only if you are willing to pay 2k$/year licence fee

Oh, and you can get a relatively cheap Artix with 6.6G transceivers too

asmi · « **Reply #22 on:** May 02, 2022, 01:09:05 am »

Quote from: SiliconWizard on May 02, 2022, 01:04:12 am

Thanks for the first link, I'll have a look. I didn't know about them.
As to Digikey, I looked a couple days ago and could find any in stock I think. The one you linked to has 74 in stock and probably zero pretty soon...

These are LFE5UM, which require a paid license according to this page.

SiliconWizard · « **Reply #23 on:** May 02, 2022, 01:11:06 am »

Yes, the ECP5UM requires paid subscription. It's not cheap. Not a huge problem if you're selling products with that. More of a problem if not...

BrianHG · « **Reply #24 on:** May 02, 2022, 01:20:59 am »

And it's gonna be at least another 2 years before stock of FPGAs begins to come back to something sorta normal. For now, we got to scrounge up whatever becomes available, or, schedule large orders and hope you get them on time.

SpacedCowboy · « **Reply #25 on:** May 02, 2022, 02:49:13 am »

If you're open to repurposing different protocols, Efinix have $10 parts with hardened MIPI CSI2 - 4 data lanes @ 1.5gbps each + a clock lane. Each device has 2 RX and 2 TX hard cores on it. You don't get access directly to the transceivers, unfortunately, so you'd actually have to packet stuff up in a way that made sense to the MIPI interface but if you're looking at 480mbps as sufficient, then 6gbps could probably stand to have some overhead

and "video data" can just be "data", after all ...

Downsides:

MIPI is only available in the 0.65mm packages, not the easier-to-layout 0.8mm ones
You do need to buy a license for the 'Efinity' software, but you do that by buying a $35 'xyloni' board, which gets you a license for everything, and they renew it after a year if you ask
You'd have to package your data as video data to the hardened IP, but it doesn't look too hard - and you could test out the link by plugging it into a monitor
Probably yet another weird-FPGA to learn - they're not the most common type of FPGA around

I thought about using them myself for my FPGA<->FPGA data-link over a PCIe slot (not via cable) but I prefer 0.8mm BGAs for layout, and there's plenty of LVDS available for my own use-case.

These are cheap low-end 'Trion' FPGAs, but they're not terrible. The RiscV CPU they offer will get you to ~110MHz for fMax, and if you want more they've relatively recently introduced the 'Titanium' line which push that up to ~350MHz. I think the new chips are still unobtanium though, apart from in dev-kits, which themselves are in short supply...

asmi · « **Reply #26 on:** May 02, 2022, 03:14:50 am »

The question I've been wondering about since the beginning of this thread - why don't you use just regular old school source syncronous LVDS? It's supported by pretty much any FPGAs, it's trivial to implement and scale as needed.

SpacedCowboy · « **Reply #27 on:** May 02, 2022, 04:24:39 am »

*shrug* that's my plan, which I think will be fine because I'm travelling a few inches to a PCIe slot, and maybe a few inches to another FPGA after that, all on connected (if not quite the same) PCB.

I think SiliconWizard wants to push the data out over a few meters of cable, and was concerned about clock/data sync at the end of that.

free_electron · « **Reply #28 on:** May 02, 2022, 05:31:56 am »

sata or displayport cables. send clock over one pair , one tx pair one rx pair.

asmi · « **Reply #29 on:** May 02, 2022, 05:41:32 am »

Quote from: free_electron on May 02, 2022, 05:31:56 am

sata or displayport cables. send clock over one pair , one tx pair one rx pair.

Sata cable only has 2 differential pairs, and there is no requirement for sata or DP (or USB-C) cable to have pairs length-matched because they all use embedded clocks.

asmi · « **Reply #30 on:** May 02, 2022, 05:47:46 am »

Quote from: SpacedCowboy on May 02, 2022, 04:24:39 am

I think SiliconWizard wants to push the data out over a few meters of cable, and was concerned about clock/data sync at the end of that.

It can be done if you're using an LDVS cable with matched pairs, and your data rate is not very high, otherwise you will have to use embedded clock and so need to do a clock recovery or oversampling to extract data on a receiving side asynchronously. Infact if you have setup/hold times for your FPGAs on receiving side, you can calculate what kind of length mismatch you can tolerate and what kind of wire speed you can achieve.

DiTBho · « **Reply #31 on:** May 02, 2022, 06:00:40 pm »

Quote from: BrianHG on May 02, 2022, 01:20:59 am

And it's gonna be at least another 2 years before stock of FPGAs begins to come back to something sorta normal.

Yup. Today I contacted again the same company for their Xilinx Spartan-7 7S25
(not the chip, their board, I don't want, and I am not able to design and build it)
"Possible to order, delivery time on request" ... same response from 2020

SiliconWizard · « **Reply #32 on:** May 03, 2022, 01:41:56 am »

Quote from: asmi on May 02, 2022, 05:47:46 am

Quote from: SpacedCowboy on May 02, 2022, 04:24:39 am
I think SiliconWizard wants to push the data out over a few meters of cable, and was concerned about clock/data sync at the end of that.
It can be done if you're using an LDVS cable with matched pairs, and your data rate is not very high, otherwise you will have to use embedded clock and so need to do a clock recovery or oversampling to extract data on a receiving side asynchronously. Infact if you have setup/hold times for your FPGAs on receiving side, you can calculate what kind of length mismatch you can tolerate and what kind of wire speed you can achieve.

Yes, and this is running in circles.

If you want a reasonably robust data link @ around 500 Mbps, for which typical uses are OK with the half-duplex nature (a lot more data pumped in one direction than the other direction), over a single twisted pair, over distances from a few mm to a few meters, using an approach that is easily portable between various models and vendors of FPGAs, using USB PHYs (that themselves are ubiquitous, cheap and use a standard interface) doesn't look so bad to me. We're talking about $1 to $2 in small quantities, and probably under $1 for high quantities, and the ULPI interace, basically 60 MHz SDR, is easily achievable with even the smaller FPGA around. (And as I said, there may be reasons for using small ones here, not just for cost reasons, but also for power consumption, as long as the rest of your design doesn't require anything more powerful. Just compare the typical power consumption of even the smallest Artix-7, for instance, to that of an icE40UP.)

As I already mentioned twice, using gigabit Ethernet PHYs could also be an option (albeit not quite as simple or as cheap) for more bandwidth and longer cables. But requiring at least two pairs.

Of course if you don't need to combine all those requirements, or you simply have different requirements altogether, you have myriads of other options.

SiliconWizard · « **Reply #33 on:** May 03, 2022, 01:50:54 am »

Quote from: SpacedCowboy on May 02, 2022, 02:49:13 am

If you're open to repurposing different protocols, Efinix have $10 parts with hardened MIPI CSI2 - 4 data lanes @ 1.5gbps each + a clock lane.

The idea was not to repurpose a different bus/protocol just for the sake of it, but to be cheap and easily available and... portable over a range of parts. Also, not sure what would be the max distance for MIPI? I've never seen it used for anything much longer than flat cables a few cm long or so?

That said, Efinix parts looked interesting but I never got around to evaluating them.

SpacedCowboy · « **Reply #34 on:** May 03, 2022, 02:10:29 am »

apparently 15m - https://www.theimagingsource.com/media/blog/archive/20190603/

asmi · « **Reply #35 on:** May 03, 2022, 04:02:41 am »

Quote from: SiliconWizard on May 03, 2022, 01:41:56 am

If you want a reasonably robust data link @ around 500 Mbps, for which typical uses are OK with the half-duplex nature (a lot more data pumped in one direction than the other direction), over a single twisted pair, over distances from a few mm to a few meters, using an approach that is easily portable between various models and vendors of FPGAs, using USB PHYs (that themselves are ubiquitous, cheap and use a standard interface) doesn't look so bad to me. We're talking about $1 to $2 in small quantities, and probably under $1 for high quantities, and the ULPI interace, basically 60 MHz SDR, is easily achievable with even the smaller FPGA around. (And as I said, there may be reasons for using small ones here, not just for cost reasons, but also for power consumption, as long as the rest of your design doesn't require anything more powerful. Just compare the typical power consumption of even the smallest Artix-7, for instance, to that of an icE40UP.)

Since I'm an engineer, let's inject some practicality into discussion - what are you going to do with 480 Mbps data stream inside such feeble FPGAs, or how it's going to generate such data stream (if it's a trasmitting side)? Especially since you are going to expend like a third to half of available IO pins of ice40Ultra depending on package, so further limiting already very limited external connectivity options. Actually, connectivity (or lack of thereof) has been my biggest problem with ice40Ultra parts, I still have like a dozen or so 5K parts in QFN48 package I've bought years ago in the hope that at some point I will find a use for them, but so far I didn't and they keep collecting dust. Which is why I didn't even bother buying ice40UP parts when they showed up - even though with integrated RAM they are theoretically much more useful.
So - have you done any projects with these devices, and if so can you please give some examples for when they can be useful? Maybe I'm just blind and forcing much more expensive parts onto my customers for no good reason?

They don't seem to mind though

asmi · « **Reply #36 on:** May 03, 2022, 04:16:15 am »

Quote from: SpacedCowboy on May 03, 2022, 02:10:29 am

apparently 15m - https://www.theimagingsource.com/media/blog/archive/20190603/

If you would actually read the article, it says that they use FPD-Link III for actual transmission over cable, which is basically LVDS. With the kind of tolerances MIPI has (dV is nominally 200mV, which is less than DDR3L!), there is no way it would work over anything longer than a fraction of meter.

Quote

15 m Cable Thanks to FPD-Link III
Especially in the automotive sector, however, one is quickly confronted with the problem that standard ribbon cables, such as those used in smartphones between SoC and camera module, rarely allow cable lengths beyond 30 cm. Camera modules in an automotive surround-view application, for example, require cable lengths of several meters. The same often applies to industrial applications where camera modules are being installed into systems. The Flat Panel Display Link III (FPD-Link III) interface from Texas Instruments provides a solution. Designed for the transmission of high-resolution video data for automotive applications (in addition to pure data transmission), the interface offers bidirectional channels for control commands (e.g. for configuring a camera module via I2C or feedback from a touch display), as well as the option of power supply via a single coaxial cable. Such cables are thin, flexible and inexpensive - features that play a decisive role in price-sensitive market segments like the automotive industry. Two additional components are used to transmit the MIPI/CSI-2 signals via FPD-Link III: a serializer that translates from MIPI/CSI-2 to FPD-Link III and a deserializer that translates from FPD-Link III back to MIPI/CSI-2 (Ser-Des). While the serializer is placed directly on the camera module, the deserializer is located near the MIPI/CSI-2 input of the processing SoC. The FPD-Link III path is completely transparent for the user. The Imaging Source recognizes the need for longer transmission systems and now offers, together with its MIPI/CSI-2 modules, FPD-Link III bridges for common embedded systems such as NVIDIA Jetson.

SpacedCowboy · « **Reply #37 on:** May 03, 2022, 05:44:58 am »

My bad. I thought that was a brand of cable, not a transmission protocol/standard.

SiliconWizard · « **Reply #38 on:** May 03, 2022, 05:04:10 pm »

That didn't sound right indeed.

SiliconWizard · « **Reply #39 on:** May 03, 2022, 05:32:00 pm »

Quote from: asmi on May 03, 2022, 04:02:41 am

Quote from: SiliconWizard on May 03, 2022, 01:41:56 am
If you want a reasonably robust data link @ around 500 Mbps, for which typical uses are OK with the half-duplex nature (a lot more data pumped in one direction than the other direction), over a single twisted pair, over distances from a few mm to a few meters, using an approach that is easily portable between various models and vendors of FPGAs, using USB PHYs (that themselves are ubiquitous, cheap and use a standard interface) doesn't look so bad to me. We're talking about $1 to $2 in small quantities, and probably under $1 for high quantities, and the ULPI interace, basically 60 MHz SDR, is easily achievable with even the smaller FPGA around. (And as I said, there may be reasons for using small ones here, not just for cost reasons, but also for power consumption, as long as the rest of your design doesn't require anything more powerful. Just compare the typical power consumption of even the smallest Artix-7, for instance, to that of an icE40UP.)
Since I'm an engineer, let's inject some practicality into discussion - what are you going to do with 480 Mbps data stream inside such feeble FPGAs, or how it's going to generate such data stream (if it's a trasmitting side)?

It's just practically about 50 MBytes/s, which is really not that much. No problem whatsoever.

I have a handful of applications for that for which those FPGAs would be plenty. Basically acting as some kind of bridge for data acquisition. One class of such applications right now is for transmitting multi-channel digital audio with some custom protocol. An iCE40UP would be more than enough for this, and the power consumption is a great asset. ULPI requires 12 IOs, you have about 24 left or so for interfacing to DACs/ADCs and some accessory functions. But if I need more IOs, I can always use the MachXO2/3 lines for similar performance and similarly low power consumption. And, as I said, for devices requiring more, I can use more powerful FPGAs. The whole system can consist of devices with a varying degree of complexity.

I'll be glad to explain what I have in mind exactly, without giving too many details though, but that wasn't the point of this thread. If some of you can't see the point, that doesn't necessarily mean there isn't any. Ah, humility again.

asmi · « **Reply #40 on:** May 03, 2022, 06:26:55 pm »

Quote from: SiliconWizard on May 03, 2022, 05:32:00 pm

It's just practically about 50 MBytes/s, which is really not that much. No problem whatsoever.

That's actually quite bit for these devices. I would to hear more practical examples of how to generate or handle that kind of bitrate in these devices.

Quote from: SiliconWizard on May 03, 2022, 05:32:00 pm

I have a handful of applications for that for which those FPGAs would be plenty. Basically acting as some kind of bridge for data acquisition. One class of such applications right now is for transmitting multi-channel digital audio with some custom protocol. An iCE40UP would be more than enough for this, and the power consumption is a great asset. ULPI requires 12 IOs, you have about 24 left or so for interfacing to DACs/ADCs and some accessory functions. But if I need more IOs, I can always use the MachXO2/3 lines for similar performance and similarly low power consumption. And, as I said, for devices requiring more, I can use more powerful FPGAs. The whole system can consist of devices with a varying degree of complexity.

<Update: Fixed numbers

>
A single stream of raw audio at 192kHz@24bit is only ~~768 kbit/s~~4.6 Mbps, for 480 Mbit/s you can have ~~over 600~~over 100 channels, which is obviously not realistic. So if we take somewhat more realistic 8 channels, that's only about 636.8 Mbps Mbit/s, which is something that even an average RS485 transceiver can handle without too much fuss.

Quote from: SiliconWizard on May 03, 2022, 05:32:00 pm

I'll be glad to explain what I have in mind exactly, without giving too many details though, but that wasn't the point of this thread. If some of you can't see the point, that doesn't necessarily mean there isn't any. Ah, humility again.

I generally dislike theoretical discussions because they tend to devolve into pointless debates over spherical cows in the vacuum, which is why I always look at the subject from a practical standpoint. Sure I can set up a high-speed channel between two ice40U's by using a bunch pf their DDR primitives (I seem to recall they can go up to 500 Mbps), but what's the point of it? What kind of data can I send over such channel, and where to get it? These answers are the most important ones, once you have them, you will usually have a better idea of how to go about designing a communication channel.

BrianHG · « **Reply #41 on:** May 03, 2022, 08:28:10 pm »

Quote from: asmi on May 03, 2022, 06:26:55 pm

A single stream of raw audio at 192kHz@24bit is only 768 kbit/s, for 480 Mbit/s you can have over 600 channels, which is obviously not realistic. So if we take somewhat more realistic 8 channels, that's only about 6 Mbit/s, which is something that even an average RS485 transceiver can handle without too much fuss.

Please redo your math.

langwadt · « **Reply #42 on:** May 03, 2022, 08:35:28 pm »

Quote from: BrianHG on May 03, 2022, 08:28:10 pm

Quote from: asmi on May 03, 2022, 06:26:55 pm
A single stream of raw audio at 192kHz@24bit is only 768 kbit/s, for 480 Mbit/s you can have over 600 channels, which is obviously not realistic. So if we take somewhat more realistic 8 channels, that's only about 6 Mbit/s, which is something that even an average RS485 transceiver can handle without too much fuss.

Please redo your math.

only off by a factor of 6

BrianHG · « **Reply #43 on:** May 03, 2022, 08:37:26 pm »

Quote from: langwadt on May 03, 2022, 08:35:28 pm

Quote from: BrianHG on May 03, 2022, 08:28:10 pm
Quote from: asmi on May 03, 2022, 06:26:55 pm
A single stream of raw audio at 192kHz@24bit is only 768 kbit/s, for 480 Mbit/s you can have over 600 channels, which is obviously not realistic. So if we take somewhat more realistic 8 channels, that's only about 6 Mbit/s, which is something that even an average RS485 transceiver can handle without too much fuss.

Please redo your math.

only off by a factor of 6

Shhh... Let him work that out.
And with all overhead, it's higher.

asmi · « **Reply #44 on:** May 03, 2022, 08:53:53 pm »

Quote from: BrianHG on May 03, 2022, 08:37:26 pm

Shhh... Let him work that out.

128 kHz x 24 bits/sample ~ 4.6 Mbit/s, 480 / 3 = 104 channels - still too much. for 8 channels 4.6 * 8 = 36.8 Mbit/s is still within reach for RS485.

Quote from: BrianHG on May 03, 2022, 08:37:26 pm

And with all overhead, it's higher.

What overhead would you have in a raw audio stream? It's just an endless stream of audio samples

BrianHG · « **Reply #45 on:** May 03, 2022, 08:59:45 pm »

Quote from: asmi on May 03, 2022, 08:53:53 pm

Quote from: BrianHG on May 03, 2022, 08:37:26 pm
Shhh... Let him work that out.
128 kHz x 24 bits/sample ~ 3 Mbit/s, 480 / 3 = 160 channels - still too much. for 8 channels 3 * 8 = 24 Mbit/s is still within reach for RS485.

Quote from: BrianHG on May 03, 2022, 08:37:26 pm
And with all overhead, it's higher.
What overhead would you have in a raw audio stream? It's just an endless stream of audio samples

Mistakes again....
192khz, not 128khz...
Also, the overhead is on the USB side. Plus, you may have checksum data with your audio, plus start and stop bits when using a serial audio connection.
There may also line in audio in parallel which might need bus steering and handshaking unless you have a dedicated up channel.

asmi · « **Reply #46 on:** May 03, 2022, 09:15:35 pm »

Quote from: BrianHG on May 03, 2022, 08:59:45 pm

Also, the overhead is on the USB side. Plus, you may have checksum data with your audio, plus start and stop bits when using a serial audio connection.

We're looking at fully-custom channel, so no USB overhead. A checksum? Have you seen any real-time stream protocols with checksums? It's pointless. For framing you can use some "comma" sequence which can not be found in a real audio stream (like FFFFFF/000000/FFFFFF), or add 25% overhead and use classic 8b/10b encoding.

Quote from: BrianHG on May 03, 2022, 08:59:45 pm

There may also line in audio in parallel which might need bus steering and handshaking unless you have a dedicated up channel.

Again, we're talking about custom channel. RS485's 50 Mbps is a lot for audio, there is quite a bit of margin even for 8 channels. And RS485 is just an example, there are plenty of mid-bandwidth options (in 50-100 Mbps range), which begins from simple pin wiggling with no external parts at all.
My math inaptitude notwithstanding, my larger question still stands - what exactly can generate or process 480 Mbps data stream inside 4K LUT (actually 3520) FPGA with 4 DSP blocks and 80k of memory?

BrianHG · « **Reply #47 on:** May 03, 2022, 11:47:25 pm »

Quote from: asmi on May 03, 2022, 09:15:35 pm

Quote from: BrianHG on May 03, 2022, 08:59:45 pm
There may also line in audio in parallel which might need bus steering and handshaking unless you have a dedicated up channel.
Again, we're talking about custom channel. RS485's 50 Mbps is a lot for audio, there is quite a bit of margin even for 8 channels. And RS485 is just an example, there are plenty of mid-bandwidth options (in 50-100 Mbps range), which begins from simple pin wiggling with no external parts at all.
My math inaptitude notwithstanding, my larger question still stands - what exactly can generate or process 480 Mbps data stream inside 4K LUT (actually 3520) FPGA with 4 DSP blocks and 80k of memory?

The only FPGA I recommended was the 8$ (was 5$ before chip shortage) Lattice part where over 17000 parts of stock existed was a 24K LE with 1meg ram in it. This part with a ULPI usb phy in 6 wire interface mode is large and fast enough to create a true-HD 8 channel in, 8 channel out audio interface. I would still recommend added external ram as 1mb for 16 channel true-HD audio is a tiny 0.01 second buffer.

asmi · « **Reply #48 on:** May 04, 2022, 02:31:53 am »

Quote from: BrianHG on May 03, 2022, 11:47:25 pm

The only FPGA I recommended was the 8$ (was 5$ before chip shortage) Lattice part where over 17000 parts of stock existed was a 24K LE with 1meg ram in it. This part with a ULPI usb phy in 6 wire interface mode is large and fast enough to create a true-HD 8 channel in, 8 channel out audio interface. I would still recommend added external ram as 1mb for 16 channel true-HD audio is a tiny 0.01 second buffer.

Well you've got to follow the discussion. We're talking about ince40Ultra and UltraPlus, the largest of former family having stats as I posted.
Oh btw - what the hell is "true-HD" when it comes to audio?

BrianHG · « **Reply #49 on:** May 04, 2022, 03:09:43 am »

Quote from: asmi on May 04, 2022, 02:31:53 am

Oh btw - what the hell is "true-HD" when it comes to audio?

Sorry, my bad. True-HD audio 'usually' refers to uncompressed 192k 24bit audio. However, some may say 384k 24 bit, or 96k 24 bit as well as some 32bit formats. But you are right. It can be a few different bitrates & depth, jut not 44.1k 16bit.

SiliconWizard · « **Reply #50 on:** May 05, 2022, 07:08:44 pm »

(Just a little note about the ICE40UP: the 5K version has 5280 LUTs, 120Kbits of EBR and 1024Kbits of single-port RAM, with IOs working up to 250MHz. No need to get obsessed with this example of small FPGA I gave though. This was just a possible choice, not the only considered choice.)

People interested or curious about multi-channel audio can have a look at current standards, such as AES10. But AES10 could still be extended.
As a quick example, say you have a stream of 32 channles of 32-bit, 192kHz samples, that's about 25 MBytes/s. 64 channels, and you're close to the maximum bandwidth. Add some additional data within the stream, and it goes up. Etc.

Another quick example, since I talked about data acquisition. Say you have 8-bit samples @50 MHz sample rate to transmit over a single pair. With some overhead and maybe a few additional packets, and you reach the maximum bandwidth. Can you do it with an iCE40UP? Absolutely. It would take something like 10% of the total LUTs at maximum, actually. So a lot of room to spare for additional features.

For audio applications, getting as many samples as per the first paragraph using an iCE40UP and directly from ADCs would be challenging or impossible, but it could be aggregating input streams with lower bandwidth and transmit it all aggregated on a single pair. Is it doable with an ICE40UP? Absolutely.

Lastly, nobody said you had to use the full bandwidth of the link at all times. The whole idea of portability is that you could reuse the same bus for various applications and various "nodes". A given node could be using a small FPGA and transmitting only a few MBytes of data per s, while some other nodes could be aggregating data and transmit it using more bandwidth, using beefier FPGAs if required, especially if there is additional processing to be done.

Chris Mr · « **Reply #51 on:** May 06, 2022, 08:52:07 am »

I read most of the posts and wondered if you had considered SPE (Single Pair Ethernet). It's full duplex and up to 1Gbs.

Maybe not as cheap as the USB but hey - super simple to get going.

SiliconWizard · « **Reply #52 on:** May 08, 2022, 12:24:33 am »

Quote from: Chris Mr on May 06, 2022, 08:52:07 am

I read most of the posts and wondered if you had considered SPE (Single Pair Ethernet). It's full duplex and up to 1Gbs.

Maybe not as cheap as the USB but hey - super simple to get going.

I did mention gigabit Ethernet as an alternative, but I don't know much about SPE yet. I'll have a look at that.

jeremy · « **Reply #53 on:** May 08, 2022, 10:04:52 am »

You might find this useful: https://github.com/cliffordwolf/PonyLink

SpacedCowboy · « **Reply #54 on:** August 16, 2022, 05:03:29 pm »

Resurrecting an old thread

Did you actually get anywhere with this ? I find myself in a similar situation where I want to transmit a fair amount of bandwidth over a 1-2m or so cable. I remembered the thread, came back to it, and started to read the ULPI spec, and it does actually sound eminently do-able - sync to clk60, obey the STP/NXT and direction-change requirements, prepend data with a byte to identify what to do with it, and ... that's more or less it for transmit ... Receive is similar and maybe even easier...

Given that the parts cost of the rest of the components is all under about $15 (and the FPGA being half of that), I'm loathe to go spend a fortune on high-speed line drivers. 480 mbps for $1.70 or less seems pretty good...

SiliconWizard · « **Reply #55 on:** August 16, 2022, 07:05:43 pm »

I had to put this project on hold, but I'm definitely going to get back to it shortly.
Didn't see any possible showstoppers either. ULPI PHYs are pretty flexible.

Forty-Bot · « **Reply #56 on:** August 22, 2022, 05:17:59 am »

Quote from: Chris Mr on May 06, 2022, 08:52:07 am

I read most of the posts and wondered if you had considered SPE (Single Pair Ethernet). It's full duplex and up to 1Gbs.

Maybe not as cheap as the USB but hey - super simple to get going.

I think any kind of ethernet phy would work. OP said earlier that they only need 50 MB/s in either direction, which is well within the capabilities of 100BASE-TX. I did some brief searching, and the cheapest (non-obsolete, in-stock) transceivers right now are

$1.60 USB 2.0 https://octopart.com/usb3317c-cp-tr-microchip-24995458
$1.40 100BASE-TX https://octopart.com/lan8720a-cp-abc-microchip-75448787
$2.80 1000BASE-T https://octopart.com/vsc8531xmw-02-microchip-96774388

So there's not that much difference in price between the sub-1G transceivers. IMO Ethernet is nice because you get full duplex and it's easy to use the existing protocol and retain interoperability with other equipment (useful for development/testing). If you do end up using a custom (non-Ethernet-conforming protocol) with an Ethernet phy, keep the following in mind:

You must start the frame with 0x55, 0x55, etc. Technically you don't have to, but often the first byte(s) of a transmission won't be encoded, and the receiver will insert 0x55 automatically.
You must insert a 12-byte IPG (that is, deassert TX_EN) every 2000 bytes. This is necessary to maintain synchronization between the phys and to prevent any internal FIFOs from overflowing. Your phy may support jumbo frames, but read the datasheet.

Similar restrictions probably apply to USB as well, but I'm not as familiar with that standard.

SpacedCowboy · « **Reply #57 on:** August 22, 2022, 06:24:55 pm »

Using plain-old LVDS is fine for board->board connections (and I have a project on the go that does that) but I'm kind of leery about pushing raw LVDS over a 2m cable without some sort of PHY there.

I've nothing to back that up, I've never done it before, but my gut tells me that's pushing the envelope a bit. Now I'm not an EE, just an aging physicist - and perhaps I ought to be less reliant on gut feeling, but FPGA pins <--> LVDS over cable <--> FPGA pins seems ... optimistic.

Also worth noting (I don't know about @SiliconWizard) that I'm using very cheap FPGAs, in the $9 ballpark. I get 800Mbps LVDS with some SERDES attached, but I don't get high-speed transceivers, and the max capacitive load on the LVDS pins is specified as 10pF.

As for price, 100-BaseTX is going to be slower (but I take the point about duplex) and 1000-BaseTX is roughly twice the cost. Again, just speaking for myself, that's significant when the rest of the components are so cheap.

In fact, my interest in it as a technique is waning though, because I switched some stuff around, and now I don't need to actually send the high-volume data over a link at all - the best link is one you don't need at all

There'll still *be* a link, but it's sufficiently low bandwidth that the built-in USB port on the RP2040 that's booting the FPGA will do just fine


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

Author Topic: Using ULPI USB PHYs for custom data links (Read 6115 times)

Share me