Author Topic: Need some advice on transferring data from LVDS to ethernet, around 60 mbit/s  (Read 21388 times)

0 Members and 1 Guest are viewing this topic.

Offline asgard20032Topic starter

  • Regular Contributor
  • *
  • Posts: 184
As the title say, Need some advice on transferring data from LVDS to ethernet, around 60 mbit/s.

Its almost 100% sure that the data will be acquired by an FPGA. But the FPGA -> Ehernet....

it's not just a hobbyist project, and i can't give much information about the project in the current state now. It need to be cost efficient, without going too much complex. My job is just to grab data and send them over ethernet, from the FPGA. I need to get a balance from $$$ to simplicity. Software license is a matter. NIOS II, RTOS... these all cost $.

FPGA->SPI->Raspberry PI-> Ethernet?
FPGA + NIOSII + RTOS -> Eternet?
...

What do you guy think about that.
 

Offline nctnico

  • Super Contributor
  • ***
  • Posts: 26906
  • Country: nl
    • NCT Developments
Making a MAC in an FPGA is quite simple so I'd attach an ethernet PHY to the FPGA directly. You can let the FPGA pump out UDP packets to a host. The Lattice Mico 32 (LM32) softcore is free to use on any FPGA and works very well so you can put your embedded software in there if there is an advantage to having a softcore.
There are small lies, big lies and then there is what is on the screen of your oscilloscope.
 

Offline Kilrah

  • Supporter
  • ****
  • Posts: 1852
  • Country: ch
More detail is needed - do you only have to transfer data on point to point "Ethernet" wiring i.e. generic CATx/RJ45 that's otherwise unused, or should it co-habitate with usual network traffic? That's a massive difference in terms of complexity and potential performance.
 

Offline asgard20032Topic starter

  • Regular Contributor
  • *
  • Posts: 184
Only need point to point, and for detail, I am a student that got an internship in a small company, and well, my task is to transfer the data from LVDS to ethernet, without any processing of the said data.

Don't know how to make a mac, or how to interface RGMII. Then when i got a MAC, how to i directly use it without a stack? I am just a student, i don't have that much background on those thing. I know how to program in C and do some basic thing in VHDL. Creating raw UDP frame won't be the problem, i can do that. The problem is how to get that raw UDP package and put it in the PHY...
« Last Edit: April 27, 2016, 05:53:44 pm by asgard20032 »
 

Offline asgard20032Topic starter

  • Regular Contributor
  • *
  • Posts: 184
Of course if you want something that takes the FPGA soft-core and MAC gate design effort out of having TCP/IP and ethernet networking you should use a ZYNQ 7010 or Cyclone-V-SOC and run (for instance) LINUX on the ARM cores and use the built in ethernet MACs and the OS's networking stack.  But the PCB layout and cost will not be trivial, and you'd have to know something about embedded LINUX / RTOS on an FPGA platform.

This is currently the solution i have, but I am not sure its the best way to do it, or if my employee will agree on doing that considering the hardware cost. But since our expected production won't pass the 1000 unit, software license is also an issue (RTOS, softcore, IP core...)
 

Offline asgard20032Topic starter

  • Regular Contributor
  • *
  • Posts: 184
There are many FPGA boards out there with 100Mb/s or 1000Mb/s ethernet PHYs attached to the FPGA directly by RGMII or sometimes also possibly SGMII.
As has been said, implementing an RGMII interface in a reasonably capable FPGA (perhaps not the very smallest and least capable ones) is fairly easy.  Formatting UDP packets in an FPGA is also fairly easy.
Your cost adversity for "not just a hobby project" though is a bit concerning as is your not already knowing the generalities of this answer simply because if you don't know any in depth information about an ethernet RMII / RGMII interface or the way PHYs communicate or how to send UDP packets from an FPGA or how to design / program FPGAs for this kind of application it could still be a significant amount of work to climb the learning curve even with essentially ready to use (for R&D) FPGA development kits and reference designs.  Even just the proper PCB layout for a TQFP packaged 25k LE FPGA isn't completely trivial so I wouldn't assume that this is going to be solved for $50 and one day of engineering by one or two people that aren't up to speed on the technologies and tools at all.  Usually spending $500 for a NIOS license is the least of one's cost and effort worries for a project like this, though in this particular case it probably isn't needed due to the open cores and reference designs and free mini-micro-controller IP out there.

Get a FPGA reference board with an RGMII interface to an ethernet PHY and use it for a prototype.  Some of the Artix, Spartan 6, and Cyclone V options might be good to look at.  I think "Arty" may have one though it might be an SGMII type (maybe it also does RGMII), I forget.

There's a new ECP5 VERSAKIT out there by Lattice that was on sale a few months ago, I don't know if it still is.

Spartan 6 and Spartan 3 series have some TQFP options since otherwise FG256 BGA packages will certainly mean you're going to spend a good bit time and money on 6L or 8L PCB development and prototyping.

BGA is not a problem. The company for where i work do on regular basis board with bga with high pin count. They also passed me a development kit with a max10 with dual ethernet : Max 10 FPGA development kit : https://www.altera.com/products/boards_and_kits/dev-kits/altera/max-10-fpga-development-kit.html
 

Offline rstofer

  • Super Contributor
  • ***
  • Posts: 9890
  • Country: us
Get a FPGA reference board with an RGMII interface to an ethernet PHY and use it for a prototype.  Some of the Artix, Spartan 6, and Cyclone V options might be good to look at.  I think "Arty" may have one though it might be an SGMII type (maybe it also does RGMII), I forget.

I believe the Arty only uses MII based on the PHY
https://reference.digilentinc.com/arty:refmanual#ethernet_phy

It's not clear to me that 60 Mbps will fit in a 100 Mbps network if there is any kind of contention.  It's one thing to have a short burst at that rate, it's another to sustain it for a long period of time.  And we have never discussed the length of the stream.  Is it small enough to just queue the entire stream and parcel it out at some later time or does this have to be a continuous process.

I would look long and hard at external combined MAC/PHY chips just to shove the design off on someone else.  Besides they are already working... 
https://datasheets.maximintegrated.com/en/ds/78Q8430.pdf

This particular chip has a DMA setup whereby it will grab data from memory controlled by the FPGA.  The user has to implement a DMA controller but with Dual Port BlockRAM, that's kind of easy.  All of BlockRAM becomes a circular queue.

I have no idea how much effort it will be to implement the IP stack in VHDL or how much more effort will be required to implement TCP.  Using UDP would be application dependent.  Are missing or out-of-sequence packets allowed?

I guess as a first cut, before trying to design a board, I would look around at available development boards, like the Arty, and see if they could be used for proof-of-concept.  The Arty, when purchased from Digilent (and probably Xilinx) comes with a node locked license to the Vivado Design Suite and allows the use of Xilinx IPs (some, but my no means all).  However, there is the MicroBlaze processor if a softcore is desired, and there is certainly the ethernet stuff.  For proof-of-concept, I would use every bit of IP I could get!  I have no idea what it costs to license the stuff for production.

FPGAs aren't cheap!  Quantity 1, the Artix used in Arty runs about $35.  I can buy the complete Raspberry Pi, ready to rock and roll, for the same price.  If Linux can shovel the data fast enough, and the OP says it is his current solution (but he didn't say it was working), there is no point in mucking around with FPGAs.  However, there is the long boot time of Linux to consider.

The Digilent Zybo board (Zynq based) is a really nice way to do this project.  With the dual ARM cores having the Ethernet PHY connected means that networking will not be done in the FPGA.  Any protocol stack can be used.  lwIP comes to mind...  There is enough FPGA capability to receive the serial stream and bundle it up for the ARM cores to deal with.  At $189 per board, I doubt that this will be a final product.

http://store.digilentinc.com/zybo-zynq-7000-arm-fpga-soc-trainer-board/
 

Offline asgard20032Topic starter

  • Regular Contributor
  • *
  • Posts: 184
It will be a sustained data stream, continuous data. I don't know about out of order, or if its possible to keep only the most recent packet. I believe that by using some sort of jumbo frame, i shall be able to send every data as a whole packet, and no fragmented data, so maybe out of order will be less of an issue. The packet size will be like 5 kilobits. I don't think a raspberry pi will be able to grab data fast enough over spi to send them back to ethernet. Max RaspberryPi SPI speed : 125 mhz. Max usable SPI speed : 62.5 mhz. Driver overhead + processing data (so sending them to ethernet) + usb driver overhead (since its ethernet on USB bus) + the fact its a 10/100 ethernet....
 

Offline nctnico

  • Super Contributor
  • ***
  • Posts: 26906
  • Country: nl
    • NCT Developments
I'm working on a similar project where data from an ADC is transmitted over ethernet. I choose not to use jumbo frames because not all switches support them. Each packet has a sequence number so packet loss can be detected.
There are small lies, big lies and then there is what is on the screen of your oscilloscope.
 

Offline stmdude

  • Frequent Contributor
  • **
  • Posts: 479
  • Country: se
I haven't worked with it myself, but it might be worth looking into if its possible to utilize the PRU on the CPU used in the Beaglebone boards (AM3358).

You'll have to convert the LVDS signals into single-ended first, but after that, the PRU _should_ be able to sample them at 200MHz each. After that, the AM3358 has a gigabit ethernet connection and Linux, so the networking should be a breeze once you get the data into RAM.
 

Offline hamster_nz

  • Super Contributor
  • ***
  • Posts: 2803
  • Country: nz
One other thing to think about.

Data  => buffer => packet => wrap in ethernet + UDP/IP Headers + Trailers => Add CRC Checksum => MII => Ethernet PHY

As you won't be running a full IP stack you will need some form of user interface to set the source and target MACs and IPs - either that or a common way is it to wait until you receive an control packet (that is broadcast to the subnet), then use values from that.

Yep, UDP is not guaranteed delivery, so you might want to think about mitigating this with FEC, or maybe simply overlapping packets and stamping with a sequence number to allow you to detect dropped packets.

It could be prototyped (if not implemented) on any board with an Ethernet PHY, where you can find the datasheets for the PHY.
Gaze not into the abyss, lest you become recognized as an abyss domain expert, and they expect you keep gazing into the damn thing.
 

Offline rstofer

  • Super Contributor
  • ***
  • Posts: 9890
  • Country: us
It will be a sustained data stream, continuous data. I don't know about out of order, or if its possible to keep only the most recent packet. I believe that by using some sort of jumbo frame, i shall be able to send every data as a whole packet, and no fragmented data, so maybe out of order will be less of an issue. The packet size will be like 5 kilobits. I don't think a raspberry pi will be able to grab data fast enough over spi to send them back to ethernet. Max RaspberryPi SPI speed : 125 mhz. Max usable SPI speed : 62.5 mhz. Driver overhead + processing data (so sending them to ethernet) + usb driver overhead (since its ethernet on USB bus) + the fact its a 10/100 ethernet....

I never really expected that the Raspberry Pi could handle the data rate.  Even if you wanted to use the GPIO pins, it would seem dubious.
If I had to have an all-hardware solution, I would look at the Zybo board I linked above.  If I could cheat and use USB 3.x between the attachment and a PC while allowing the PC to handle the TCP/IP stuff, that's something I would look at.  USB 3.x can easily handle the 60 Mbps if the PC can do the rest.

That Zynq chip is $63 qty 1 and that's just for the FPGA.  All the stuff around the outside will add to the product cost.

I like the SoC approach because it is a lot easier to write a protocol stack in C rather than VHDL.  The FPGA would just grab the stream and stuff it in the DMA channel for the ARM processors to deal with.  I might use one core to assemble the packets and the other core to send them over the wire.  I would expect to have to use TCP and the Zybo board has a 1Gb Ethernet interface.  That's good because I seriously doubt if the stream will work over a 100 Mb network.  Dual 650 MHz ARM cores would be a good thing!

Once I had the setup working, I could look into reducing the parts cost but I don't think it will ever be 'cheap'.

I certainly don't have exposure to all of the development boards available.  In my little corner of the world, that Zynq board will be my next purchase and is about as high-end as I will ever need to go.

I did just download the Zynq book and from what I read, this is an ideal architecture for the project.  You have to provide an email address and the usual employer/job function nonsense but it's otherwise free:
http://store.digilentinc.com/the-zynq-book-tutorials-for-zybo-and-zedboard/

I didn't realize that the ARM processors were hard cores.  They exist, independent of what is programmed into the FPGA fabric.  One example in the book is the idea of embedding multiple MicroBlaze soft cores to deal with burdensome IO processing to peripherals that are built into the FPGA.
 

Offline asgard20032Topic starter

  • Regular Contributor
  • *
  • Posts: 184
One other thing to think about.

Data  => buffer => packet => wrap in ethernet + UDP/IP Headers + Trailers => Add CRC Checksum => MII => Ethernet PHY

As you won't be running a full IP stack you will need some form of user interface to set the source and target MACs and IPs - either that or a common way is it to wait until you receive an control packet (that is broadcast to the subnet), then use values from that.

Yep, UDP is not guaranteed delivery, so you might want to think about mitigating this with FEC, or maybe simply overlapping packets and stamping with a sequence number to allow you to detect dropped packets.

It could be prototyped (if not implemented) on any board with an Ethernet PHY, where you can find the datasheets for the PHY.

Can't i just write broadcast MAC and broadcast IP? And what is FEC?

My biggest concern is doing the Add CRC Checksum => MII => Ethernet PHY part
 

Offline stmdude

  • Frequent Contributor
  • **
  • Posts: 479
  • Country: se
Hmm..  There's a few assumptions being made every here and there in the thread..

1. Is this LVDS signal carrying something like a picture? As in a display?
 - Why I'm asking: Streaming video can handle packet-drops, and re-transmits are pointless, as the data would arrive too late to matter anyways. Some other kind of data might not be as forgiving.
2. You've said "ethernet", and implied "IP" by referring to UDP. However, is IP _actually_ needed?
 - Why I'm asking: If the data doesn't need to traverse (be routed) between two subnets, you can define your own "EtherType", and just cram whatever data you want into the packet after the 14 byte MAC header. That should simplify the network stack immensely, and eliminate the need for getting IP addresses, etc.

Also, keep in mind what nctnico said about jumbo frames. Not all ethernet switches support it, and usually, the ones that do support it needs to have jumbo-frames manually enabled for the switch-port.
And, even with jumbo-frames, the MTU is "only" 9000bytes, so depending on your data, you might still need to chop it up into MTU sized packets.
 

Offline janoc

  • Super Contributor
  • ***
  • Posts: 3785
  • Country: de

I never really expected that the Raspberry Pi could handle the data rate.  Even if you wanted to use the GPIO pins, it would seem dubious.


Also don't forget that the RasPi doesn't have a proper "native" ethernet but a MAC/Phy hanging off USB because the Broadcom SoC doesn't have a built-in ethernet MAC. It can run nominally at 100Mbps, but the throughput is relatively poor thanks to that.
 

Offline stmdude

  • Frequent Contributor
  • **
  • Posts: 479
  • Country: se
Can't i just write broadcast MAC and broadcast IP? And what is FEC?

FEC = Forward Error Correction

Regarding broadcast MAC..  In theory, you could. However, that would not be routable between subnets, and it would generate _enormous_ amounts of data, and put you on the "kill on sight" list for any network admin.

Imagine a network-switch in a company. Typically, people are connected to a 48-port switch, and they're fairly full. Lets say 40 ports are in use on this switch, and for simplicity's sake say that this switch isn't connected to any other switches (which would make the problem even bigger)..

If you send out 60Mbps of data to the broadcast-address FF:FF:FF:FF:FF:FF, that data will end up on _all ports_ of  the switch. I.e, 60Mbps*39ports. That's 2.34Gbps of traffic for the switch..  And, all the other 39 devices on the switch (other peoples PCs), will have to look at 60Mbps traffic each, and discard every single packet, but not before actually looking at the data, as the packet is indeed addressed to each and every computer.
 

Offline janoc

  • Super Contributor
  • ***
  • Posts: 3785
  • Country: de
Can't i just write broadcast MAC and broadcast IP? And what is FEC?

Well, you can. But then run for the hills fearing the wrath of your local IT guy if you unleash something like that on the network. That will completely saturate the entire network segment, because the switches will pass the broadcasts to every port/attached device, flooding the network and pretty much drowning out every other traffic. Ethernet doesn't have a concept of "fair access", so if you "hose" every device with sustained 60Mbps of traffic regardless of whether or not it is really for them, it will pretty much slow things to a crawl.

I think you should do a bit of research about how the full network stack actually works first ...

 

Offline hamster_nz

  • Super Contributor
  • ***
  • Posts: 2803
  • Country: nz
One other thing to think about.

Data  => buffer => packet => wrap in ethernet + UDP/IP Headers + Trailers => Add CRC Checksum => MII => Ethernet PHY

As you won't be running a full IP stack you will need some form of user interface to set the source and target MACs and IPs - either that or a common way is it to wait until you receive an control packet (that is broadcast to the subnet), then use values from that.

Yep, UDP is not guaranteed delivery, so you might want to think about mitigating this with FEC, or maybe simply overlapping packets and stamping with a sequence number to allow you to detect dropped packets.

It could be prototyped (if not implemented) on any board with an Ethernet PHY, where you can find the datasheets for the PHY.

Can't i just write broadcast MAC and broadcast IP? And what is FEC?

My biggest concern is doing the Add CRC Checksum => MII => Ethernet PHY part

CRC shouldn't be a problem - I've got a sample of it here:

http://hamsterworks.co.nz/mediawiki/index.php/10BaseT-TX

Can't say about MII... but I feel a project coming on!

An easy example of Forward Error Correction (FEC):

Divide data packets (A through H) in half (A0, A1, A2, A3)

Transmit with a sequence number:

0: A0 --
1: A1 A0
2: B0 A1
3: B1 B0
4: C0 B1
......

If a packet is dropped, you have the data in the previous and next packet.

Pros: Simple, can work with 50% packet loss. Loss of two neighboring packets only drops half a block of data.
Cons: 100% overhead.

A more complex scheme

Subdivide your data packets (A through H) into quarters (A0, A1, A2, A3, B0, B1....). Calculate parity blocks as  (AP, BP, CP)... as AP = A0^A1^A2^A3.

Transmit four of these blocks with a sequence number:
 0: A0 -- -- -- --
 1: B0 A1 -- -- --
 2: C0 B1 A2 -- --
 3: D0 C1 B2 A3 --
 4: E0  D1 C2 B3 AP
 5: F0  E1 D2 C3 BP
 6: G0  F1 E2 D3 CP
 7: H0 G1  F2 E3 DP
 8: --  H1 G2  F3 EP
............

On the receiving end, you insert any missing packets and recreate the lost data through parity calculation from blocks either side.

Pros: 25% overhead, easy(ish) to implement and decode
Cons: two dropped packets within a block of four causes a big window of lost data.

Still should be a truckload better than TCP/IP, where a dropped packet will stall the stream until it retransmits
Gaze not into the abyss, lest you become recognized as an abyss domain expert, and they expect you keep gazing into the damn thing.
 

Offline asgard20032Topic starter

  • Regular Contributor
  • *
  • Posts: 184
As i already said earlier, its a point to point network. Device directly connected in the other computer trough ethernet. No switch, no router. Altought, i will ask tomorrow to my boss about that, maybe there could be a switch... but as far as i know, there will be none.
« Last Edit: April 27, 2016, 10:22:36 pm by asgard20032 »
 

Offline rstofer

  • Super Contributor
  • ***
  • Posts: 9890
  • Country: us
Point to point saves a bit of effort but since we now know the destination is a PC and the PC will already have a protocol stack, it would be better to make up standard packets.  Whether TCP or UDP is a matter of some concern but either should work.

There is a lot of example code for the MAX 10 but I don't know how much of the IP is free or if any is even used.  I just noticed that there was a webserver project and that implies a lot of things about the provided code.


 

Offline hamster_nz

  • Super Contributor
  • ***
  • Posts: 2803
  • Country: nz
Point to point saves a bit of effort but since we now know the destination is a PC and the PC will already have a protocol stack, it would be better to make up standard packets.  Whether TCP or UDP is a matter of some concern but either should work.

There is a lot of example code for the MAX 10 but I don't know how much of the IP is free or if any is even used.  I just noticed that there was a webserver project and that implies a lot of things about the provided code.

I would be slightly concerned that having a soft processor in the way may slow things down... Here is a similar project, http://www.alteraforum.com/forum/showthread.php?t=35365 that hit issues.

And having a look at https://www.altera.com/support/support-resources/design-examples/intellectual-property/embedded/nios-ii/exm-ethernet-acceleration.html and the README in the project archive.

The throughput was 100MB in 50 sec => approx 20Mb/s, over a 100Mb/s full duplex link - about 1/3rd of what you want.



Gaze not into the abyss, lest you become recognized as an abyss domain expert, and they expect you keep gazing into the damn thing.
 

Offline rstofer

  • Super Contributor
  • ***
  • Posts: 9890
  • Country: us
I just ran a speed test at http://speedtest.xfinity.com/ and my desktop will transfer right at 300 Mbps over the routed Internet with my Xfinity service.  Clearly, I am not using a 100 Mbps internal network within the house.  So, there is nothing magical about your speed requirement, I can run 5 times that fast with no special hardware other than a dual core 2.8 GHz processor.

You could certainly write an FPGA state machine to wrap and send the packets at a fairly fast clip but most of the FPGAs I mess around with have a 100 Mhz crystal on the board.  Designing logic that runs faster than this adds a lot of complexity.  Notably, everything has to be broken up into a gigantic pipeline such that each process takes very little time and each stage is registered.  You can pull apart the C code for a stack like lwIP (LightWeight IP) and do in hardware what the stack is doing in C.  But you have to come up with a way to do a lot more in small pipelined stages.  Furthermore, the wider your word width, the less need you have for speed.  As you are shifting the serial bits in, consider stuffing them into a 64 bit word.  As you pack words into packets, you now only need to handle a million words per second.  One word per microsecond or something less than 100 clocks (using the 100 MHz crystal speed).

My guess is you would use UDP because you don't want to handle ACKs.  Just build up the packet and send it to the MAC.  The board you have has 1 Gbit ethernet PHY for which you need to find a compatible MAC - maybe something at OpenCores.  The PHY is the Marvel 88E1111 http://www.marvell.com/transceivers/assets/Marvell-Alaska-Ultra-88E1111-GbE.pdf,  There is bound to be something in the software kit that comes with the board.  OTOH, just because the board has an Gb PHY doesn't mean it actually works at that speed.  The PHY is taking data 4 bits wide so achieving 1 Gbit is a little more complicated.  They should have chosen 8 bits (just my wild guess, I haven't studied the issue!).  The OpenCores Triple Mode MAC expects an 8 bit databus (looking briefly at the documentation) and this narrow bus might complicate things:
http://opencores.org/project,ethernet_tri_mode

This needs to be looked at.  It is clearly solvable by spinning your own PCB but that's a LOT of work and not worth doing until a model is working. But it could also be that the example code for the board has already solved it.

 

Offline Kilrah

  • Supporter
  • ****
  • Posts: 1852
  • Country: ch
The Digilent Zybo board (Zynq based) is a really nice way to do this project.  With the dual ARM cores having the Ethernet PHY connected means that networking will not be done in the FPGA.  Any protocol stack can be used.  lwIP comes to mind...  There is enough FPGA capability to receive the serial stream and bundle it up for the ARM cores to deal with.  At $189 per board, I doubt that this will be a final product.

http://store.digilentinc.com/zybo-zynq-7000-arm-fpga-soc-trainer-board/

That Zynq chip is $63 qty 1 and that's just for the FPGA.  All the stuff around the outside will add to the product cost.

I find that chip awesome with everything it packs, got a Zybo but unfortunately haven't found the time to really play with it much yet :(

There's another board that's in development right now that could be a good solution for a small series / internal product being about 3x cheaper than a Zybo:
https://www.crowdsupply.com/krtkl/snickerdoodle
 

Offline hamster_nz

  • Super Contributor
  • ***
  • Posts: 2803
  • Country: nz
Spurred on by this thread, I've picked up an FPGA board for the first time in ages and started working on some code to talk directly to a MII interface. The board I'm using is a  is a Digilent Arty, (Artix-7 with a  TI DP83848J 10/100 Ethernet PHY on it). The design itself will be tiny - just a block ram holding the data packet, the packet CRC calculator, and the logic to send the nibbles to the PHY.

The H/W should easily be able to hit about 90% of the link bandwidth (90Mb/s) - well over what the OP needed.

After reading Datasheets, and tinkering last night I'm just about bring the PHY out of reset and then start sending UDP packets. I'll let you know how I get on over the weekend -I  don't really have any plans but to send a packet every 40ms or so, as I don't want to hose the house's wireless (as it will be cabled to the my ADSL router).
Gaze not into the abyss, lest you become recognized as an abyss domain expert, and they expect you keep gazing into the damn thing.
 

Offline asgard20032Topic starter

  • Regular Contributor
  • *
  • Posts: 184
My current solution is to use a Cyclone V SE SoC from Altera. But its a $ solution. I will sample all the channel, put the result in a shared memory, and Linux will do the hard network load. But I am pretty sure my boss would be more happier if i could do it with a small FPGA that could cost far less. So once i will be done with that solution, i will try to write directly to the PHY, so i can use a cheaper part. Doing it with a SoC should take me less than a week, once I understand how it work. So keep reporting how you do, hamster_nz, it could help me with my own implementation afterward.

Unrelated to my current work project, I intend to buy soon enough some FPGA board. Currently, I already own a papilio pro, papilio pro duo, minispartan+. Although I never had really the time to touch them. But I want to get good in FPGA. Those are nice board to put in some project, but they are not so great to learn. I was thinking about getting 1-3 real development board, like those from digilent and terasic. But I am not sure yet which one to take. I will take a SoC development board for sure. DE1-SoC or DE0-nano-SoC or Zymbo. Those are nice to learn about linux + FPGA combo. But I will also need a non-SoC(so more FPGA) board with lot of peripheral (like media codec, ethernet, whatever else interesting, so i could experiment thing), like DE2-115 (But cost so much). The DE0 nano is also interesting, since its small, and not so $$. Terasic offer nice student discount, but unfortunately, I live in canada, so shipping + import fee will be killing me. Digilent will also has the same problem.
« Last Edit: April 29, 2016, 11:09:07 am by asgard20032 »
 

Offline rstofer

  • Super Contributor
  • ***
  • Posts: 9890
  • Country: us
Spurred on by this thread, I've picked up an FPGA board for the first time in ages and started working on some code to talk directly to a MII interface. The board I'm using is a  is a Digilent Arty, (Artix-7 with a  TI DP83848J 10/100 Ethernet PHY on it). The design itself will be tiny - just a block ram holding the data packet, the packet CRC calculator, and the logic to send the nibbles to the PHY.

The H/W should easily be able to hit about 90% of the link bandwidth (90Mb/s) - well over what the OP needed.

After reading Datasheets, and tinkering last night I'm just about bring the PHY out of reset and then start sending UDP packets. I'll let you know how I get on over the weekend -I  don't really have any plans but to send a packet every 40ms or so, as I don't want to hose the house's wireless (as it will be cabled to the my ADSL router).

I'd be very interested in your results (and your code if it's available).  I looked at the LVDS input problem and it could be a challenge.  The first issue is whether or not one of the signals is a clock.  Without a clock, the stream is asynchronous and sampling can be done using some tricky embedded hardware and a clock generator.  The thing is, both approaches I looked at use IDELAY elements and the maximum tap is only out around 3 nS.  This approach works for Gb inputs but not low Mb.  There's a lot of information at Xilinx about how to grab these LVDS signals but, again, most of it seem to be for much faster signals.  More study needs to be done.

With a clock in the datastream, things might get a little easier but we still have the problem of crossing into the FPGA clock domain.  Usually this is just 2 or 3 D-flops in series so it might not be too bad.
 

Offline hamster_nz

  • Super Contributor
  • ***
  • Posts: 2803
  • Country: nz
I'd be very interested in your results (and your code if it's available).  I looked at the LVDS input problem and it could be a challenge.  The first issue is whether or not one of the signals is a clock.  Without a clock, the stream is asynchronous and sampling can be done using some tricky embedded hardware and a clock generator.  The thing is, both approaches I looked at use IDELAY elements and the maximum tap is only out around 3 nS.  This approach works for Gb inputs but not low Mb.  There's a lot of information at Xilinx about how to grab these LVDS signals but, again, most of it seem to be for much faster signals.  More study needs to be done.

With a clock in the datastream, things might get a little easier but we still have the problem of crossing into the FPGA clock domain.  Usually this is just 2 or 3 D-flops in series so it might not be too bad.

Once working I'll slap it on my Wiki.

The input side shouldn't be too hard, even without a clock. If there was a clock to follow I would oversample with a serdes block at something close to 330Mb/s, and then subsample every five or six bits, You can then nudge the sample frequency to keep transitions away from where you are sampling (eg ...5,6,5,6,5,6,5,6,5,6... if the source clock perfectly matches reference, or ...5,6,5,5,6,5,6,5... if the source clock is running a little faster.
Gaze not into the abyss, lest you become recognized as an abyss domain expert, and they expect you keep gazing into the damn thing.
 

Offline hamster_nz

  • Super Contributor
  • ***
  • Posts: 2803
  • Country: nz
Well that wasn't hard - sending hand crafted packets through the DP83848J PHY at 100Mb/s on the Digilent Arty FPGA board, using the MII.

http://hamsterworks.co.nz/mediawiki/index.php/ArtyEthernet

I'll keep correcting the data packet (mainly add checksum) over the next few days.

Resource usage is 22 slices and half a BRAM block
« Last Edit: April 29, 2016, 11:35:44 am by hamster_nz »
Gaze not into the abyss, lest you become recognized as an abyss domain expert, and they expect you keep gazing into the damn thing.
 

Offline Rasz

  • Super Contributor
  • ***
  • Posts: 2616
  • Country: 00
    • My random blog.
Unrelated to my current work project, I intend to buy soon enough some FPGA board. Currently, I already own a papilio pro, papilio pro duo, minispartan+. Although I never had really the time to touch them....

and this is your main problem, not the lvds, speed or implementing network layers
Lets face it - you are lazy. You think buying more expensive boards will somehow impart knowledge on you without doing any actual work.
hamster_nz already demonstrated this is 1-2 evenings project, but you rather procrastinate while thinking about $500 dev kits ;)
Who logs in to gdm? Not I, said the duck.
My fireplace is on fire, but in all the wrong places.
 

Offline asgard20032Topic starter

  • Regular Contributor
  • *
  • Posts: 184
Unrelated to my current work project, I intend to buy soon enough some FPGA board. Currently, I already own a papilio pro, papilio pro duo, minispartan+. Although I never had really the time to touch them....

and this is your main problem, not the lvds, speed or implementing network layers
Lets face it - you are lazy. You think buying more expensive boards will somehow impart knowledge on you without doing any actual work.
hamster_nz already demonstrated this is 1-2 evenings project, but you rather procrastinate while thinking about $500 dev kits ;)

Currently I have no development board at my home with which i can try to do ethernet development. And an internet phy is not something i can easily add to a papilio board. It require good pcb, short trace, length matched. I am currently trying to implement it on my fpga at job. But contrary to hamster, my interface is rgmii, not mii, so it send data on both edge of the clock. This is a bit more complicated. For someone with experience with fpga, like hamster, it is also easy. It is my first real FPGA project.
 

Offline asgard20032Topic starter

  • Regular Contributor
  • *
  • Posts: 184
Well that wasn't hard - sending hand crafted packets through the DP83848J PHY at 100Mb/s on the Digilent Arty FPGA board, using the MII.

http://hamsterworks.co.nz/mediawiki/index.php/ArtyEthernet

I'll keep correcting the data packet (mainly add checksum) over the next few days.

Resource usage is 22 slices and half a BRAM block

I am currently in the process of writing my own packet. I am a little bit confuse by your example.
Code: [Select]
-- Ethernet source  DE:AD:BE:ED:01:23
              when x"001C" => data <= "1110"; -- E
              when x"001D" => data <= "1101"; -- D
              when x"001E" => data <= "1101"; -- D
              when x"001F" => data <= "1010"; -- A
              when x"0020" => data <= "1110"; -- E
              when x"0021" => data <= "1011"; -- B
              when x"0022" => data <= "1111"; -- F
              when x"0023" => data <= "1110"; -- E
              when x"0024" => data <= "0001"; -- 1
              when x"0025" => data <= "0000"; -- 0
              when x"0026" => data <= "0011"; -- 3
              when x"0027" => data <= "0010"; -- 2

It look like you are sending the byte in the correct order, but with nibble inverted, why?
 

Offline Rasz

  • Super Contributor
  • ***
  • Posts: 2616
  • Country: 00
    • My random blog.
Currently I have no development board at my home with which i can try to do ethernet development.

you already said you have three never touched ones, all are ok for this

And an internet phy is not something i can easily add to a papilio board. It require good pcb, short trace, length matched.

no they dont, I did phy on a breadboard with 5cm jumper wires

I am currently trying to implement it on my fpga at job. But contrary to hamster, my interface is rgmii, not mii, so it send data on both edge of the clock. This is a bit more complicated. For someone with experience with fpga, like hamster, it is also easy. It is my first real FPGA project.

is this really your excuse? $15 piece of hardware that can be fedexed overnight is stopping whole project?
http://www.waveshare.com/DP83848-Ethernet-Board.htm
http://www.oddwires.com/ethernet-phy-module-dp83848

I know Im an asshat, but you need someone to slap you across the face with a dead fish. In the words of our great prophet:
Who logs in to gdm? Not I, said the duck.
My fireplace is on fire, but in all the wrong places.
 

Offline hamster_nz

  • Super Contributor
  • ***
  • Posts: 2803
  • Country: nz
Well that wasn't hard - sending hand crafted packets through the DP83848J PHY at 100Mb/s on the Digilent Arty FPGA board, using the MII.

http://hamsterworks.co.nz/mediawiki/index.php/ArtyEthernet

I'll keep correcting the data packet (mainly add checksum) over the next few days.

Resource usage is 22 slices and half a BRAM block

I am currently in the process of writing my own packet. I am a little bit confuse by your example.
Code: [Select]
-- Ethernet source  DE:AD:BE:ED:01:23
              when x"001C" => data <= "1110"; -- E
              when x"001D" => data <= "1101"; -- D
              when x"001E" => data <= "1101"; -- D
              when x"001F" => data <= "1010"; -- A
              when x"0020" => data <= "1110"; -- E
              when x"0021" => data <= "1011"; -- B
              when x"0022" => data <= "1111"; -- F
              when x"0023" => data <= "1110"; -- E
              when x"0024" => data <= "0001"; -- 1
              when x"0025" => data <= "0000"; -- 0
              when x"0026" => data <= "0011"; -- 3
              when x"0027" => data <= "0010"; -- 2

It look like you are sending the byte in the correct order, but with nibble inverted, why?

Ignore Rasz, he got out on the wrong side of bed :)

It is done that way because it works :)

Hardware engineers tend to design as if things are very long numbers, and software engineers think as though it is a series of bytes.

Just remember low nibble has to go to the wire first, then the high nibble.

I had a lot of stuff backwards when I got my first packet out on the wire!
Gaze not into the abyss, lest you become recognized as an abyss domain expert, and they expect you keep gazing into the damn thing.
 

Offline asgard20032Topic starter

  • Regular Contributor
  • *
  • Posts: 184
For the interface, i will use a DDIO. with a dual buffer that will hold the current lower nibble and upper nibble. On rising edge, i will update my upper nibble and send my lower nibble, and on falling edge, i will update my lower nibble and send the upper nibble. For the clock, i will use a pll to shift it a little bit. 2 IP core : PLL and DDIO. I just finished doing the test packet (although, just a simple test like you, not the full application with LVDS). But now, the compiler don't like my const array. It say i got error at the ' symbole. : reg [some:size]someRegisterName[some:size] = '{lot of nibble}

So now its more the verilog syntax. I use this project to also learn verilog. The artyx board you are using is pretty nice by the way. Most modern terasic board either pack lot of feature, but with everything interesting (SD card, ethernet...) connected to the  HPS (DE1-soc, DE0-nano-SoC), or are very minimalist(DE0-nan0) or costly (DE2-115, got ethernet and all other feature). Their DE1 was perfect, it got everything connected to its SoC (altought lacking ethernet, but got everything else), but its outdated, won't work with newest quartus. If only they could make an updated DE1 non-SoC, like they did with the DE2 (DE2-115). I got a nice book about SoC with FPGA (not arm SoC, but softcore SoC with nios II). The book was made for DE1. If i get the DE1-SoC, i could follow the whole book except for sram part, the green led part, and the sd card part(since its connected to the ARM). But for the sd card and the 8 lacking green led, i will (if i get the board) make a pcb (and use that occasion to learn kicad) with micro sd and green led, and even some other nicy (maybe an esp8266 and a bluetooth). But sram is not that easy to add on another pcb. Not trough normal header, too much capacitance, length, noise... At least, i could do 90% of my book.

But on the digilent side, altought their board got less stuff, they got two affordable recent technology board with ethernet NOT connected to a processor (ARM). Like the artyx you are using, or the nexys 4(with academic discount). But getting an altera board with ethernet would be more nice since I could do a full nios II learning path.

Unfortunately, I can get an academic discount, but shipping it here, import fee and ... will cost more than what I save with my academic discount. I wonder if I could get my academic discount trough local distributor like digikey or mouser.

Any suggestion of nice peripheral i could add to a DE1-SoC to enhance my overall experience and learning?

Now I am pretty confident of the success of my project.
 

Offline rstofer

  • Super Contributor
  • ***
  • Posts: 9890
  • Country: us
Well that wasn't hard - sending hand crafted packets through the DP83848J PHY at 100Mb/s on the Digilent Arty FPGA board, using the MII.

http://hamsterworks.co.nz/mediawiki/index.php/ArtyEthernet

I'll keep correcting the data packet (mainly add checksum) over the next few days.

Resource usage is 22 slices and half a BRAM block

Outstanding work!  I downloaded your code and I'll get on to it Monday.  I have a lot to learn and your code will help!

Now all I need are two TCP server sockets and one TCP client socket and I'm good to go with my project.  I'll have to think about how to get there.  Maybe a small CPU with a minimal stack like uIP or maybe lwIP.  Maybe the ZPU project core will work.  I already have it running on a Spartan 3 and it has the advantage of having ZPU-GCC.
« Last Edit: April 30, 2016, 04:38:42 am by rstofer »
 

Offline hamster_nz

  • Super Contributor
  • ***
  • Posts: 2803
  • Country: nz
Unrelated to my current work project, I intend to buy soon enough some FPGA board. Currently, I already own a papilio pro, papilio pro duo, minispartan+. Although I never had really the time to touch them....

Oh,don't feel too bad about the idle boards... until you get a reason to pull them out they always sit idle. You are most probably busy playing with other things... I've got a Parallella, a Raspberry Pi Zero, a Raspberry Pi B 3, an original Raspberry Pi B, and a Pine64 all of which have only been booted a few times, and a BeagleBone Black that has never been out of it's ESD bag - so I fully understand :)

It looks like I'm just throwing a few bits at the PHY to make it work, but there is a lot more to it than that - knowing how Ethernet addressing works, Ethernet broadcasts, IP headers, IP broadcasts, UDP ports, header checksums, (must add the CRCs in!). Luckily I've done most of it before, so all I really have to worry about is getting the PHY working. If you were coming at it cold it would be a month of spare time researching and reading standards, and playing around with code.

Feel free to ask any questions - I'll point you at Wikipedia or the original documentation where I found things out if I can.
Gaze not into the abyss, lest you become recognized as an abyss domain expert, and they expect you keep gazing into the damn thing.
 

Offline chris_leyson

  • Super Contributor
  • ***
  • Posts: 1541
  • Country: wales
Quote
It require good pcb, short trace, length matched

I wouldn't worry too much about matching trace lengths. For a start, the twisted pairs in CAT5e cable have slightly different lengths because the pairs are twisted at different pitches to reduce crosstalk. I might get around to measuring this with a TDR one day.
Also, the symbol rate for 1000base-T is 125 megabaud, so a 10mm difference between two traces in a differential pair corresponds to 1.5 degree phase error, that's in free space so it's a ballpark estimate. I wouldn't worry to much about trace lengths.
 

Offline nctnico

  • Super Contributor
  • ***
  • Posts: 26906
  • Country: nl
    • NCT Developments
Please use an array with numbers instead of having a seperate line of code for each line. Please use something like this:
output <= array_var(index_counter) ;

If index_counter is a unsigned number type then you can use it as an index directly.
There are small lies, big lies and then there is what is on the screen of your oscilloscope.
 

Offline hamster_nz

  • Super Contributor
  • ***
  • Posts: 2803
  • Country: nz
Please use an array with numbers instead of having a seperate line of code for each line. Please use something like this:
output <= array_var(index_counter) ;

If index_counter is a unsigned number type then you can use it as an index directly.
You are more than welcome to take the code and do whatever you like with it - convert it to an array, read it in from a file, type in binary or tap it out in morse code - whatever. You can even send it back to me and I'll post it up too.

However for my hacking I wanted to count individual bytes and to do math on the offsets and play with it. I also wanted to be able to comment every nibble for those who might decide to use it as a reference for their own implementation. I made a conscious decision to do it that way as it allows others to simple pass data into the module and merge it into the packet  (super inefficient I know, but it will still be smaller than a Microblaze).

In short, I'm not got to rework it just for you, but you can :-)

(Oh, and I am pretty sure you need to use "output <= array_var(to_integer(index_counter));" if using IEEE.NUMERIC_STD.ALL )
Gaze not into the abyss, lest you become recognized as an abyss domain expert, and they expect you keep gazing into the damn thing.
 

Offline hamster_nz

  • Super Contributor
  • ***
  • Posts: 2803
  • Country: nz
Pretty much finished the design (http://hamsterworks.co.nz/mediawiki/index.php/ArtyEthernet) - added a module to perform then CRC32 calculation for frames and another  to add the frame's preamble. There is now a simple way to overwrite what is currently empty nibbles with whatever data you want to send, and a signal that can be asserted to trigger sending of a new frame.

Over 100baseT, with small packets (UDP with 16 bytes of user data) it sends around 130,000 packets per second - one every 7us - for about 2MB/s of user data. This rate of packets is enough to make WireShark and my laptop groan, and it seems to drop about 1 in 5,000 packets due to the interface being flooded or other system activity.

If it was changed to send 1,518-byte frames (the biggest allowed with standard Ethernet) the protocol overhead goes down, allowing it to send 12,026,092 bytes of user data per second, 50% more than what the original poster wanted.
Gaze not into the abyss, lest you become recognized as an abyss domain expert, and they expect you keep gazing into the damn thing.
 

Offline rstofer

  • Super Contributor
  • ***
  • Posts: 9890
  • Country: us
Pretty much finished the design (http://hamsterworks.co.nz/mediawiki/index.php/ArtyEthernet) - added a module to perform then CRC32 calculation for frames and another  to add the frame's preamble. There is now a simple way to overwrite what is currently empty nibbles with whatever data you want to send, and a signal that can be asserted to trigger sending of a new frame.

Over 100baseT, with small packets (UDP with 16 bytes of user data) it sends around 130,000 packets per second - one every 7us - for about 2MB/s of user data. This rate of packets is enough to make WireShark and my laptop groan, and it seems to drop about 1 in 5,000 packets due to the interface being flooded or other system activity.

If it was changed to send 1,518-byte frames (the biggest allowed with standard Ethernet) the protocol overhead goes down, allowing it to send 12,026,092 bytes of user data per second, 50% more than what the original poster wanted.

I look forward to trying it later today.  I had a couple of issues with the original, probably related to Vivado 2016.1.  Or, more likely, just me...
 

Offline hamster_nz

  • Super Contributor
  • ***
  • Posts: 2803
  • Country: nz
I've added a sequence number to the packets, and been capturing in groups of 65,537, and when sending at full speed sometimes there are blocks of missing packets. Around 800 packets go missing at a time, along with a jump in the time-stamps against them. It is missing about 5 or 6 ms of data in bursts.

So I tried a few different options:

6,250 packets per second - 0 missing, 0 missing, 0 missing, 0 missing, 0 missing
24,700 packets per second - 0 missing, 0 missing, 0 missing, 0 missing, 0 missing
50,000 packets per second - 0 missing, 0 missing, 0 missing, 0 missing, 0 missing
100,000 packets per second - Dropped packets, 0 missing, 0 missing, 0 missing
150,000 packets per second - 11420 missing, 4472 missing, 3306 missing, Dropped packets, 0 missing.

(dropped packets were when Wireshark reported packets as 'dropped', based on NIC statistics. Missing is when expected packets were not in the capture).

So it looks like my laptop's onboard NIC is only good for up to 100,000 packets per second or so before it can't keep up.
Gaze not into the abyss, lest you become recognized as an abyss domain expert, and they expect you keep gazing into the damn thing.
 

Offline Rasz

  • Super Contributor
  • ***
  • Posts: 2616
  • Country: 00
    • My random blog.
(dropped packets were when Wireshark reported packets as 'dropped', based on NIC statistics. Missing is when expected packets were not in the capture).

So it looks like my laptop's onboard NIC is only good for up to 100,000 packets per second or so before it can't keep up.

Dont know about windows, but linux struggled for a long time when it comes to packet capture - every single packet generated new syscall switching cpu context and compared it individually against filter settings = huge overhead. Maybe, just maybe its because of this. You can try netcat, dump all traffic to a file and count manually (or write python/whatever parser)

nc -u -l 4096 > dump.bin
Who logs in to gdm? Not I, said the duck.
My fireplace is on fire, but in all the wrong places.
 

Offline hamster_nz

  • Super Contributor
  • ***
  • Posts: 2803
  • Country: nz
Yeah, I know I'm flogging a dead horse... but...

I've got a Gigabit PHY on a Nexys Video spitting bits - 978,000,000 of them per second. I'll make a new GitHub repo for it tomorrow at https://github.com/hamsternz?tab=repositories and upload the code.
Gaze not into the abyss, lest you become recognized as an abyss domain expert, and they expect you keep gazing into the damn thing.
 

Offline rstofer

  • Super Contributor
  • ***
  • Posts: 9890
  • Country: us
Yeah, I know I'm flogging a dead horse... but...

I've got a Gigabit PHY on a Nexys Video spitting bits - 978,000,000 of them per second. I'll make a new GitHub repo for it tomorrow at https://github.com/hamsternz?tab=repositories and upload the code.

I had been wondering if a 100 Mb network was going to work.  It doesn't for me, I only get about 70,000 small packets per second or about 33 Mb throughput.  I didn't chase down the dropped packets.  This with the older version and 58 byte packets.  It's odd that this happens, the PC is fairly capable and has two RealTek GBe PHYs.

Clearly, expanding the packet helps a lot but GBe really shines.  You're about 16 times the requirement falling to 5 times when something happens.  Pretty much the same problem with the 100 Mb PHY.  Somewhere, on the recceiving end, things slow down for a short while.

It is certainly possible to keep up with 60 Mb on the wire.  I'm not entirely certain that the receiver will be able to process the packets as fast as they arrive but that's not the problem of the sender.

When I eventually get my Zybo board, I want to revisit this.

 

Offline stmdude

  • Frequent Contributor
  • **
  • Posts: 479
  • Country: se
Guys, I'm out of my league here when it comes to the FPGA stuff, but I'm quite knowledgeable when it comes to networking equipment.

When you're testing this, make sure you only have a cable between your board and PC. Network switches do wierd and wonderful things to your packets, depending on how they're built.

First off, all switches (and L3 switches (aka, routers)) have two upper limits.  The first one is the throughput of the backplane. A good enterprise-grade switch has enough backplane capacity to fill all ports with both TX and RX traffic. I.e, an 24port GigE switch should have at last a 24*2Gbps backplane.
Most decent switches today have this, unless it's made of plastic, and bought because it was the cheapest.
However, the second upper limit varies _a lot_ between different manufacturers and models, and it's the PPS throughput (packets per second). This limit can be remarkably low. I've seen enterprise-grade GigE switches from very reputable vendors have PPS rates of <3Mpps.
If you exceed the PPS, the switch will drop your packets.
Essentially, the PPS is set by how quickly the switch can look up an entry in its table of "which MAC-address is on which port".

Also, RealTek NICs aren't exactly the bees knees. I'm surprised you get as few dropped (as in, dropped by the NIC) packets as you do, when sending that many packets (not bits, packets).
 

Offline rstofer

  • Super Contributor
  • ***
  • Posts: 9890
  • Country: us
There is something odd going on and, at the moment, I don't have the tools/skills to figure it out.  Why should my PC only grab 33 Mbps of UDP packets which are summarily dropped in the stack for lack of a client to eat them when I can handle 300 Mbps of routed TCP packets from Xfinity's speed test site 100 miles away?  FWIW, I just watched the speed test with Wireshark and I see a little over 300 Mbps sustained.  They are, of course, maximum length packets.

I did run the cable directly between the FPGA board and the PC.  I would have rather used a hub and had Wireshark running on a 3rd device but I'm not even sure if they make 100 Mbps hubs.  I just searched, they do make them but they are pricey.

The second incantation did result in packets whereas the first did not - for me...  But I still can't get the packets into Wireshark on Debian.  I wanted to see if there was a difference in performance.  The latest version, with large packets, will be more interesting but I don't have an FPGA board with a GBe PHY - yet.

I'm reasonably convinced that a sustained 60 Mbps can not be achieved on a 100 Mbps network.  It's pretty clear that it can be done on a 1000 Mbps network but we still haven't seen the application that will have to process that stream.  I don't know enough about the application to determine whether UDP (and dropped packets) will be acceptable or whether TCP will be required.  I would expect TCP to add additional overhead with ACK packets and other protocol requirements.  TCP also requires that the FPGA honor all the requirements and that implies that the PHY/MAC will have to receive data as well.  Blasting bits out of the FPGA is interesting (I didn't know how to deal with the PHY) but receiving bits will be a lot more challenging.
 

Offline asgard20032Topic starter

  • Regular Contributor
  • *
  • Posts: 184
For the rgmii interface, I need to send a logical derivative of tx_en and tx_err on the tx_ctl line on the txglk falling edge.

What is a logical derivative?
 

Offline stmdude

  • Frequent Contributor
  • **
  • Posts: 479
  • Country: se
@rstofer:  If you want to play around with protocols and packet-sizes, and have access to two PCs, you should have a look at iperf ( https://iperf.fr/ ). It'll allow you to generate as much traffic as you'd like, and receive it as well if you'd like. It'll show you where the bottlenecks of your particular system is, if it's Mbps or PPS.

60Mbps on 100Base is definitely possible. It's not even in the realm of difficult if you're using the maximum MTU.

If you want to have a nice protocol for high-bandwidth streaming to look at, check out UDT ( http://udt.sourceforge.net/ ). People are doing crazy things with it, like streaming infiniband over Ethernet. It has no issues with saturating 10GigE links. People have done FPGA implementations of it, so it seems applicable to your project.
 

Offline rstofer

  • Super Contributor
  • ***
  • Posts: 9890
  • Country: us
@rstofer:  If you want to play around with protocols and packet-sizes, and have access to two PCs, you should have a look at iperf ( https://iperf.fr/ ). It'll allow you to generate as much traffic as you'd like, and receive it as well if you'd like. It'll show you where the bottlenecks of your particular system is, if it's Mbps or PPS.

I'll take a look.  I really only have modest interest in this 60 Mbps problem.  I need a full TCP/IP stack to support a couple of server sockets and a client socket.  But I will need the entire TCP/IP protocol and I would very much like to have DHCP.  I think the Zybo board with the dual core ARM will be the way to handle the stack and the FPGA fabric can implement my CPU project which interacts with the sockets.  I have been waiting for an interesting FPGA board to come along that could handle the soft CPU and the required networking.  I think the Zybo may be that board.

Quote
60Mbps on 100Base is definitely possible. It's not even in the realm of difficult if you're using the maximum MTU.

Yes, you would have to use maximum, or nearly maximum, packets but I was thinking end-to-end up through the application layer.  It would be even more difficult if TCP is required.  Maybe it's time for another test.  There's some reason I can't get my UDP rate over 32Mbps.  At 70,000 pps with 58 byte packets, it would seem that all I need to do is increase the packet size.  Yes the pps will drop but probably not as fast as the bps increases.  More testing to do...
Quote

If you want to have a nice protocol for high-bandwidth streaming to look at, check out UDT ( http://udt.sourceforge.net/ ). People are doing crazy things with it, like streaming infiniband over Ethernet. It has no issues with saturating 10GigE links. People have done FPGA implementations of it, so it seems applicable to your project.

FPGAs are amazing devices.
 

Offline hamster_nz

  • Super Contributor
  • ***
  • Posts: 2803
  • Country: nz
Yes, you would have to use maximum, or nearly maximum, packets but I was thinking end-to-end up through the application layer.  It would be even more difficult if TCP is required.  Maybe it's time for another test.  There's some reason I can't get my UDP rate over 32Mbps.  At 70,000 pps with 58 byte packets, it would seem that all I need to do is increase the packet size.  Yes the pps will drop but probably not as fast as the bps increases.  More testing to do...

For 100BaseT, and maximum 'non-jumbo' UDP payload is 1472 (giving a 1512-byte frame), so with large packets the peah throughput is  12,500,000 / ( 8 + 14 + 20 + 8 + 1472 + 4 + 12) = 12500000/1540 = 8,116 frames per second, or  11,948,051 bytes/sec

However, In my design I'm sending frames with only 16 bytes of user data, 12,500,000 / ( 8 + 14 + 20 + 8 + 16 + 4 + 12) = 152,439 frames per second.

Because of this the OS's is seeing almost 20x the number of interrupts, context switches and so on than would be required to support a link fully saturated with larger packets.

Or to look at it another way, where the PC/laptop would have have over 123 us to process each large packet it now only has 6.56 us (only a few thousand CPU cycles), and it is choking.

However, that is pretty much what I would expect from 'cheap and cheerful' Gigabit NICs. It is most likely engineered to receive around 100k packets per second, which is more than enough to process a fully saturated gigabit link... ...but only if the average frame size is around 1250 bytes or more.

Gaze not into the abyss, lest you become recognized as an abyss domain expert, and they expect you keep gazing into the damn thing.
 

Offline hamster_nz

  • Super Contributor
  • ***
  • Posts: 2803
  • Country: nz
For the rgmii interface, I need to send a logical derivative of tx_en and tx_err on the tx_ctl line on the txglk falling edge.

What is a logical derivative?

tx_err is XORed with tx_en to give 'the logical derivative'.

tx_ctl runs at DDR, and sends tx_en for the first half of a cycle, and tx_err XOR tx_en for the second half.

So in normal use when starting to send a packet it sends sends 00,00,00,00,00,11,11,11,11,11,11,11,11...
when there is errors, it might send 00,00,00,00,00,11,10,11,11,10,11,11,11...

This reduces the number of transitions during normal operation, and therefore reduces power usage (and most likely more importantly) reduces EMI.
Gaze not into the abyss, lest you become recognized as an abyss domain expert, and they expect you keep gazing into the damn thing.
 

Offline asgard20032Topic starter

  • Regular Contributor
  • *
  • Posts: 184
hamster_nz, how do you debugged/made your implementation work? I am having some trouble with my implementation. When I look on modelsim, everything seem fine, got data in the right order. But when testing it in real life, nothing happen in wireshark. Not sure if it's a clock problem, crc problem, or data alignment with clock problem.
 

Offline hamster_nz

  • Super Contributor
  • ***
  • Posts: 2803
  • Country: nz
hamster_nz, how do you debugged/made your implementation work? I am having some trouble with my implementation. When I look on modelsim, everything seem fine, got data in the right order. But when testing it in real life, nothing happen in wireshark. Not sure if it's a clock problem, crc problem, or data alignment with clock problem.

Hi,

My test setup is
- Nexys Video Board
- UTP cable
- Gigabit NIC on my laptop.
- Wireshark running under Windows 8 to capture the packets

I have to set the NIC on the laptop to 1Gb/Full Duplex, and then download the design.

Because the packets are not addressed directly to my laptop I don't seen any network bandwidth until I start capturing them.

Under windows you can use "netstat -e" to verify packet counts and look for FCS/checksum issues (bad CRCs increment to the 'error' counter).

Problems I could expect
- My board's PHY is strapped to be 10/100/1000 after reset. Yours may not be, and you may need to configure it through the serial management interface.
- In my HDL design the TX clk is at 90 degrees to the main logic's clock used for the data. Some phys have the option to insert delays on the clocking making 0 degrees the best.
- If it is your own code, the nibble ordering is painful and seems wrong. For example with a MAC of 01:23:45:67:89:AB it should hit the wire in this order 1,0,3,2,5,4,7,6,9,8,B,A.

If you need me to send any simulation traces or CRCs for test data pop me an PM with your email and what would help and I'll do what I can to help, or send me through some simulation traces and I'll take a look.
Gaze not into the abyss, lest you become recognized as an abyss domain expert, and they expect you keep gazing into the damn thing.
 

Offline asgard20032Topic starter

  • Regular Contributor
  • *
  • Posts: 184
In my implementation, on the falling edge of my clock, I put the data and tx enable in a DDIO interface. The same clock drive the DDIO, so next rising edge, it has the first nibble (nibble also sent in revert order). My transmit clock is phased out by 90 degree. (But not sure if I should do + 90 or -90). Can you confirm my clock scheme is correct?

Didn't know I had to configure the PHY to 10/100/1000. I have the marvell 88e1111, and information about the serial management is so unclear.
 

Offline hamster_nz

  • Super Contributor
  • ***
  • Posts: 2803
  • Country: nz
Here's what it looks like in the simulator for me... broadcasting from DE:AD:BE:EF:01:12. Should answer a lot of questions.
Gaze not into the abyss, lest you become recognized as an abyss domain expert, and they expect you keep gazing into the damn thing.
 

Offline asgard20032Topic starter

  • Regular Contributor
  • *
  • Posts: 184
It's alive!!!

« Last Edit: May 24, 2016, 03:45:31 pm by asgard20032 »
 

Offline hamster_nz

  • Super Contributor
  • ***
  • Posts: 2803
  • Country: nz
Gaze not into the abyss, lest you become recognized as an abyss domain expert, and they expect you keep gazing into the damn thing.
 

Offline asgard20032Topic starter

  • Regular Contributor
  • *
  • Posts: 184
I can see packet from wireshark, no error in packet, but strangely, if i try to intercept packet from python, it just hang there... doing nothing. Have to kill python.

Code: [Select]
class MyUDPHandler(socketserver.BaseRequestHandler):
    """
    # This class works similar to the TCP handler class, except that
    # self.request consists of a pair of data and client socket, and since
    # there is no connection the client address must be given explicitly
    # when sending data back via sendto().
    # """

    def handle(self):
        print("Test")
        data = self.request[0].strip()
        socket = self.request[1]
        print ("{} wrote:".format(self.client_address[0]))
        print (data)

if __name__ == "__main__":
    HOST, PORT = "192.168.25.10", 4096
    server = socketserver.UDPServer((HOST, PORT), MyUDPHandler)
    print ('Starting listening')
    server.serve_forever()

192.168.25.10 is my computer NIC
192.168.25.25 is FPGA
4096 is the port. Packet size is 65 byte worth of data.

Windows 10 64 bit, anaconda python installation.
« Last Edit: May 24, 2016, 08:25:16 pm by asgard20032 »
 

Offline hamster_nz

  • Super Contributor
  • ***
  • Posts: 2803
  • Country: nz
Does 'netstat -e' show errors? If so that indicates CRC issues.

Code: [Select]
H:\>netstat -e
Interface Statistics

                           Received            Sent

Bytes                     107942320        14045394
Unicast packets              106224           65209
Non-unicast packets          130447            5870
Discards                          0               0
Errors                            0               0
Unknown protocols                 0

H:\>

With smart NICs sometimes bad packets look OK as the CRCs and checksums are verified by the NIC and not the OS... can you post a wireshark of your packet too?
Gaze not into the abyss, lest you become recognized as an abyss domain expert, and they expect you keep gazing into the damn thing.
 

Offline asgard20032Topic starter

  • Regular Contributor
  • *
  • Posts: 184
no error in netstat
 

Offline nctnico

  • Super Contributor
  • ***
  • Posts: 26906
  • Country: nl
    • NCT Developments
Python is probably choking on the vast amount of packet. I'd try to slow the rate of the packets down first.
There are small lies, big lies and then there is what is on the screen of your oscilloscope.
 

Offline asgard20032Topic starter

  • Regular Contributor
  • *
  • Posts: 184
Not related to python, since i tried in C# and C++. I finally found the problem. Although wireshark was able to see packet and was saying everything was ok, it was not. The IPV4 checksum is incorrect. I needed to go into wireshark option to enable checksum calculation. Then it flagged all my packet as incorrect. Then wireshark say checksum incorrect, it should be ****. If i copy and past this **** from wireshark into my vhdl, everything work fine.

2 Option now: Always use wireshark for precalculation and copy and past into my code, or correct my checksum algorithm(I tried the same technique as hamsternz for the checksum).
 

Offline hamster_nz

  • Super Contributor
  • ***
  • Posts: 2803
  • Country: nz
Then wireshark say checksum incorrect, it should be ****. If i copy and past this **** from wireshark into my vhdl, everything work fine.
Grr... checksums.

So playing around with my new-found learning on Ethernet on FPGAs, I'm getting so frustrated with locations of the checksums. Why did they put the checksums at the head of the packet, so you have to delay the data by 1500 cycles or so to allow the checksum in the header to be updated? Grrr

However I've got a working framework built (it has a few glaring omissions that will need to be added later), and am now slowly adding protocol support, hopefully up to a full HTTP Web Server for serving static content at full wire speed - ARP is completed & tested, ICMP is next... on github at https://github.com/hamsternz/FPGA_Webserver

Why? Because I'm stupid and like the challenge - however, it will be one of the few web servers that has no software of firmware involved anywhere, making it unhackable (except for DOS attacks), and it will be nearly fully deterministic (you throw the same data at it, you get nearly the same result)

It could be made fully deterministic if the TX clock was linked to the RX clock, but that would fix the link speed to 1Gb/s...


Gaze not into the abyss, lest you become recognized as an abyss domain expert, and they expect you keep gazing into the damn thing.
 

Offline nctnico

  • Super Contributor
  • ***
  • Posts: 26906
  • Country: nl
    • NCT Developments
For UDP data you don't have to calculate a checksum (leave it 0) and the IPV4 header checksum (which is only calculated over the header) can be pre-calculated and re-used for all packets.
There are small lies, big lies and then there is what is on the screen of your oscilloscope.
 

Offline Scrts

  • Frequent Contributor
  • **
  • Posts: 797
  • Country: lt
I've always used Wireshark for network packet analysis and I can say that this tool is the most amazing open source tool I've ever seen. It will also show you CRC problems or packet counter problems. E.g. I had a glitch in my FPGA logic when rolling over 16bit counter of the packet count. It took some time to run 65k packets to see, but wireshark highlighted that quickly.
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf