Author Topic: Digital FPV video for drone racing (Read 14421 times)

hexahedron · « **on:** January 18, 2019, 07:16:33 pm »

Hey all, first time poster here, I have a pretty cool project I've been thinking about for a few years. I started working on this project a few months ago, and I'd like to share my progress so far. If I'm posting in the wrong area, or if you have any other problems, please let me know! I don't want a bad rep right off the bat!
So, what is this project? Well, currently, all competitive drone racers use an "fpv" (first person view) system to pilot their drone at breakneck speeds. This system involves an all-analog system, based on analog television broadcast standards. This system uses the 5.8Ghz band, and when as few as 4 pilots are flying at once (there can be up to 8 in a race!) the signals start to interfere with each other, resulting in poor visibility. In many competitions, the pilots are flying with a mostly black and white signal that is riddled with static. My project intends to solve this problem.
The plan is to create a system that will compress a 480p, 60fps video stream into a signal that will fit within a bandwidth of 6Mhz. Why 6Mhz? MultiGP (largest drone racing thing) only allows for a select few video transmitters to be used for a drone to be legal. As these transmitters are designed for an NTSC video signal input, they all have low-pass filters on the video input, which will kill any higher frequencies (correct me if I'm wrong). MultiGP does NOT restrict the camera that can be used however, which means that my system should be legal.
How can we achieve compressing the video stream into that bandwidth constriction? The answer is that it isn't easy. To start, instead of using a 1 bit data stream, we can use 3 bits (8 values) for our data stream. This triples our data rate from 6Mbits/sec to 18Mbits/sec. Let's get some math out of the way. 640*480*60 is ~ 18.5Mpixels/sec, meaning that for every pixel in the video stream, we can only transmit less than 1 bit of information without going over our limit. I'll get into exactly how this can be accomplished in my next post, but for now I want to see how this community responds to this post.

LaserSteve · « **Reply #1 on:** January 18, 2019, 07:24:23 pm »

I'm a amateur radio operator. Could you please design your rig so emissions stay clear (60 dB down) of the 5.7 Ghz weak signal calling frequencies. Our emissions are narrow FM voice and SSB Voice at these frequencies, and some of our operators are doing good if they are within 1 Mhz of center.

ITU Regions:
5,668.2 MHz Region 1 Calling Frequency 1[3]
5,760.1 MHz Region 2 Calling Frequency[4]
5,760.2 MHz Region 1 Calling Frequency 2[3]

Thanks....

Steve

hexahedron · « **Reply #2 on:** January 18, 2019, 07:31:19 pm »

Will do. Most people using an fpv link don't go over 25mw

hexahedron · « **Reply #3 on:** January 18, 2019, 08:57:29 pm »

Well, I guess almost no response is better than a negative one, moving on!
This post will be about the concepts of how this will work going forward, and the next will be about my progress so far. Feel free to skip this post if you are inclined to do so.
So, how is it possible to compress the video stream in such as way that each pixel represents less than a single bit? The answer is found everywhere on the internet, in the form of jpeg compression! I will try my best to explain the process the best I can, but if you are still confused, check out https://en.wikipedia.org/wiki/JPEG , It's a great article.
For now, let's assume we just have an image we need to compress. The first step is to convert the color-space from RGB to YCbCr. This will separate the brightness from the color of the image. Why do this? Our eyes are not as good at sensing changes in brightness as they are in changes in color. This allows us to reduce the resolution of the 2 color channels in the image, with no perceptible difference. In our example, we will reduce the resolution by a factor of 2 in both the X and Y dimensions. Next, we will split each color channel into 8*8 pixel blocks. Then we will apply the Discrete Cosine Transform (DCT) algorithm to each 8*8 block. Explaining what DCT does is very difficult, I would recommend checking out https://en.wikipedia.org/wiki/Discrete_cosine_transform . In essence, we will separate the 8*8 block into another 8*8 block where the different frequencies of the image are separated, where the top left is the low frequency, and the bottom right is high frequency. Why do this step? Well, our eyes are not very good at perceiving high frequency patterns in images as the low frequency ones. This means we can reduce the amount of high frequency data contained in the image without there being a perceivable difference, to an extent. That step is called "quantization" where we divide each block (now DCTified) by a 8*8 quantization table. This table determines the level of compression in the image, and generally is a gradient of numbers where the values increase as they get closer to the lower right corner. This leaves us with 8*8 blocks that have largeish numbers in the upper left corner, and mostly numbers that are less than 1 in the rest of the image. We then round down each 8*8 block, removing all decimal points, resulting in a 8*8 block that mostly consists of zeros. At this point, depending on the quantization table, our image now mostly consists of zeros, and numbers on a range of -127-127 (single byte). The next step is called zig-zag encoding, where we simply convert our 8*8 image into a 64 value long string of bytes from a zig-zag pattern, See image.

We now can apply a compression algorithm called "run length encoding". https://en.wikipedia.org/wiki/Run-length_encoding This wikipedia article explains it much better than I can, so go read that. And we're done! Now, how effective was this? Well, let's do some math.

The OV7725 camera module (the one I will be using) uses 16 bits to represent each pixel.
Vres hres fps bits
640 x 480 x 60 x 16 = ~ 295Mbits/sec

In order to fit 295Mbits/sec into a 18Mbits/sec datastream, we need to reduce the data by a factor of 16:1. If we look at the JPEG wikipedia article, We can see that the jpeg compression standard "High quality" has a compression ratio of 15:1. I have made some programs that implement this in python, and I can confirm that ratios of 16:1 are indeed possible without being able to tell the difference without a reference image! The next post will be about what I have done to implement this algorithm on an FPGA.

StillTrying · « **Reply #4 on:** January 18, 2019, 09:26:53 pm »

Quote from: hexahedron on January 18, 2019, 08:57:29 pm

Well, I guess almost no response is better than a negative one, moving on!

Well I don't think it can be done, but I hope I'm wrong.

Any gap in a digital compressed stream will leave a much bigger hole than in an analogue stream.

hexahedron · « **Reply #5 on:** January 18, 2019, 09:50:01 pm »

That is a very good point! From my observations of how noise is introduced into a data signal, it tends to be fairly consistent as a function of how far away the transmitter is. One of my ideas I've had is to have a second data link which tells the camera to further compress the image and reduce the speed at which data is sent. IE: it waits longer in between each time it updates the data stream with new bits. This will allow the ADC on the other end to have more time to sample the incoming data for each bit. As some of my experiments have shown, this reduces the likelihood of errors when reading the data stream.

Additionally, I plan to utilize the audio channel to mark when each pixel block is being sent. This means that if an error is introduced, it should only effect the single pixel block.

StillTrying · « **Reply #6 on:** January 18, 2019, 10:06:00 pm »

I think on even a relatively clean signal you'd have to use 25% of the bit rate as error correction, which complicates it a bit.

Mechatrommer · « **Reply #7 on:** January 18, 2019, 10:55:11 pm »

compression, speed and lossy medium doesnt work very well, you need only to pick 2 out of that 3. one lost bit means losing the whole frame.

hexahedron · « **Reply #8 on:** January 18, 2019, 10:57:21 pm »

Well, Let's hope that signal degradation won't be too much worse than I was imagining.

Here's what I've done so far.
I've interfaced a a fpga with an OV7725 camera module: (I have fixed the issue shown in the vid)
And I've implemented 1D DCT on a FPGA:

Along with A lot of other experimentation before and inbetween those 2 videos.
Today I will be attempting to setup 2D DCT on the fpga while occasionally checking this forum.

EDIT: I have implemented 2D DCT, but it wasn't too interesting, So I wont make a post about it

NiHaoMike · « **Reply #9 on:** January 19, 2019, 01:20:22 am »

What about run a connectionless Wifi at 2.4GHz in parallel for better reliability? A Raspberry Pi Zero W would be nearly ideal for that.
https://befinitiv.wordpress.com/wifibroadcast-analog-like-transmission-of-live-video-data/

radioactive · « **Reply #10 on:** January 19, 2019, 03:04:57 am »

Quote from: hexahedron on January 18, 2019, 07:16:33 pm

To start, instead of using a 1 bit data stream, we can use 3 bits (8 values) for our data stream. This triples our data rate from 6Mbits/sec to 18Mbits/sec.

What do you plan on using for the RF system here out of curiosity? Single carrier? Multi-carrier? FEC? I assume you must have an off-the-shelf module in mind to interface with.

Marco · « **Reply #11 on:** January 19, 2019, 03:40:01 am »

The talk about interference and 2.4 GHz is irrelevant, he wants to use transmitters compliant for drone racing which usually take NTSC signals and use them to carry digital video. They are noisy, shouty things ... but that's out of his hands.

6 MHz != 6 Mb/s by the way ... you can stuff a whole lot more data through. How much depends on SNR, intersymbol interference and how you implement error correction.

Gary350z · « **Reply #12 on:** January 19, 2019, 03:40:55 am »

John Adams from Horizon Hobby and Spektrum discusses digital FPV video and the problems it creates in this interview.
https://youtu.be/BbwDRqF6Kb0?t=1294

Marco · « **Reply #13 on:** January 19, 2019, 04:03:44 am »

Quote from: hexahedron on January 18, 2019, 07:16:33 pm

but for now I want to see how this community responds to this post.

Do you have the space/power for some low power SBC (i.MX6ULL based?). The amount of effort needed to experiment with video coding algorithms will be a lot less than with a FPGA. Stuff like training sequence based equalization is also a hell of a lot easier in C than VHDL (I think it's a safe bet that signal strength is going to affect intersymbol interference).

NiHaoMike · « **Reply #14 on:** January 19, 2019, 04:05:43 am »

I'm thinking that using both the 5GHz transmitter and an off the shelf 2.4GHz transmitter together would yield a more robust signal than either one alone.

radar_macgyver · « **Reply #15 on:** January 19, 2019, 06:10:43 am »

You'll find the thing that kills you is latency (as mentioned in @Gary350z's video). Flying any quad FPV (let alone for racing) is hard, and even the smallest latency will be quickly noticed by the pilot. One of my buddies was into racing, and he asked if this was possible, I spent a while looking into it but the latency turned out to be the hardest problem to solve. My plan was to use a RPi zero and send the video stream over wifi. You might have better luck if the video compression is done in hardware. Maybe a Zynq, with the PL handling the video encoding and the PS streaming the data over wifi.

Marco · « **Reply #16 on:** January 19, 2019, 07:03:37 am »

There is nothing inherent in digital which causes latency, not even with motion compensation.

Unless you consider the number of scanlines needed for a row of blocks to be significant latency.

OwO · « **Reply #17 on:** January 19, 2019, 07:32:33 am »

I've been looking into the exact same kind of project. In my case the focus is on range and cost; all existing digital FPV solutions are about HD video, have shitty range, and cost >$500. I did some experiments with JPEG a long time ago and found that for 480p you can get the size down to about 20KB per frame and still get a usable image, which is 4.8Mb/s at 30fps. However video frames are highly redundant and with just a little bit of inter-frame prediction you should be able to do much better. H.264 for example can give you HD 1080p video at 5Mb/s.

For the RF data link I would recommend not going above QPSK because efficiency (bits you can get across per unit energy) rapidly falls as you stuff more bits into a given bandwidth. For a 6MHz channel this means 12Mb/s uncoded rate, and you will ideally want FEC at rate 1/2, which means 6Mb/s maximum actual bit rate. You also need to factor in some amount of protocol overhead (framing etc).

I would not consider wifi because of inherent protocol flaws and bad off-the-shelf implementations (bad listen-before-talk thresholds, etc) that will give you unexplained high latency.

Please do keep us updated because I'm sure your findings will be helpful for my project as well

OwO · « **Reply #18 on:** January 19, 2019, 07:49:03 am »

There was a VHDL H.264 encoder but I haven't gotten around to trying it yet: http://hardh264.sourceforge.net/H264-encoder-manual.html

I would imagine for a simpler implementation doing the JPEG stuff on a soft core might be fast enough, and you can accelerate major bottleneck components like motion estimation in hardware. You would send a full jpeg frame every e.g. 5 frames followed by partial frames (motion vectors & differences).

The other thing I would add is a 6MHz channel is a lot to expect in terms of phase coherence; 6MHz corresponds to a delay spread of <160ns, which means any reflections can corrupt your signal. You can probably predict whether this will be a problem or not by using an analog video transmitter/receiver and looking closely at the received image. If you see any ghosting or shadows then you might run into issues with inter-symbol interference.

Marco · « **Reply #19 on:** January 19, 2019, 08:00:20 am »

Quote from: OwO on January 19, 2019, 07:32:33 am

For the RF data link I would recommend not going above QPSK because efficiency (bits you can get across per unit energy) rapidly falls as you stuff more bits into a given bandwidth.

Don't these 5.8 GHz analogue video transmitters simply take the input (nominally NTSC/PAL) and FM modulate the carrier with it? Trying to get cutesy with carrier coding for the input of an FM modulator doesn't seem much use to me ... just treat it as a 1D channel, nice and simple.

What I would personally try to do is adapt something like the old V.34 channel equalization (except 1D instead of 2D). Every kB or so send a sync+CAZAC pulse to get an accurate equalization filter with FFT, rest of the time use adaptive LMS equalization. Only use 0 values for the sync, use say 16-256 for the data.

StillTrying · « **Reply #20 on:** January 19, 2019, 12:50:18 pm »

For 5.8GHz at shorter distances or even indoors and with lots of movement, I don't think circular polarization and up to 4 receiver diversity could be beat by any digital fixing, 5.8GHz is almost radar and reflects off almost everything.

Quote from: Marco on January 19, 2019, 07:03:37 am

There is nothing inherent in digital which causes latency, not even with motion compensation.

In theory, but sometimes there's around 100ms of latency when a miniature camera is directly analogue connected to a 5in LCD monitor.

Marco · « **Reply #21 on:** January 19, 2019, 01:25:39 pm »

Quote from: StillTrying on January 19, 2019, 12:50:18 pm

I don't think circular polarization and up to 4 receiver diversity could be beat by any digital fixing

Analogue can't use channel equalization and compression. The reflections change fast on a human timescale, but do they change fast enough to be a problem on the LMS algorithm's timescale?

Also you could do something in between, use half the bandwidth to just send a PCM image (but with the benefit of channel equalization, so still less ghosting than pure analogue) use the other half to send a compressed refinement layer.

hexahedron · « **Reply #22 on:** January 19, 2019, 04:34:06 pm »

WOW! Thanks for all the feedback on my project so far! Apologies for not responding, as I live in the USA I was asleep

The standard fpv system does indeed just directly fm modulate the 5.8Ghz signal, and has a 9Mhz sub-carrier for audio. I'm seeing a lot of talk about H.264, and while that would indeed compress the video much further, it has a few problems. Firstly and foremost, latency! From what I know, implementing H.264 on the transmitter and receiver end will add at least 1 frame of latency, which is unacceptable! Please correct me If I'm wrong on that. My system at most should add about 0.1 frames of latency to the system, here's why. If you recall from my explanation of how JPEG works, we reduce the color channels resolution by a factor of 2 in each dimension, then split it into 8*8 blocks. This means that 16 scan lines (horizontal row of pixels) are required for compression. My fpga system should be able to compress all the blocks in those 16 scan lines within 1 extra scan line from my calculations. We can then send all of the data for those 16 scan lines in less than 16 more scan lines (the brightness channel is calculated after 8 scan lines, so we start transmitting there). There are 480 scan lines (not including overscan) in a 480p video signal.
(1/480)*16 = 0.0333...
As it will take a little less than 16 scan lines to transfer all the data (for those 16 scan lines), and as all of the data is needed to re-construct those 16 scan lines, it will have a minimum of a little less than 32 scan lines of latency.
(1/480)*32 = 0.0666...
If we take into account that there are 60 frames per second:
((1/480)*32)/60 = 0.001111... seconds of latency
that is about 1.111... Milliseconds of latency!!!
Obviously, computing time will add a little more to that number, but I don't think latency will be an issue.
Again, thanks for the positive feedback and useful discussion! This is the first forum I've found where people actually care about and understand what I'm working on, and I can't tell you how much that means to me!

Marco · « **Reply #23 on:** January 19, 2019, 06:39:59 pm »

It depends on the encoder. Say X264 can just encode blocks in sync with the scan pattern (they've done development work for game streaming).

dmills · « **Reply #24 on:** January 19, 2019, 06:50:02 pm »

I would start by giving the design of your receivers a serious look, proper filters and high IMD3 amplifiers and mixers would seem to be a good starting point. Having the receivers close to the course is NOT always the smart play, because the inverse square law then puts a drone close to your rx aerials as having a VASTLY stronger signal then one on the other side of the course, where if your aerials are further away the two signals are much closer in (lower) level, level you can make up with antenna gain, blocking DR not so much.

In terms of image coding, start by defining how much latency is acceptable (as little as possible is NOT a specification), which will tell you how many video lines you have to work with (Don't forget that some cameras and monitors have a nasty habit of introducing a frame or two for their own purposes, for low latency a vidicon (Or image Orthicon) tube and CRT is hard to beat, but possibly a little heavy on an aerial drone!

You might be able to do interesting things with multiple receivers and sneaking training sequences into the VBI to allow your system to calculate the impulse response of the channel on a frame by frame basis. Multipath can for example be detected by looking for AM on the audio sub carrier, which with multiple receivers can be used to lower the 'fitness' of any receiver that has significant multipath.

There are really two parts to the digital approach, data reduction and the data coding for the link, and you should probably provide some sort of back channel so that the coder can be told what the characteristics of the signal at the receivers are and can modify the amount of bandwidth it allocates to FEC to suit, it will then have to communicate with the video data reduction codec to indicate how much bandwidth is available...

One trick that can be useful is to shuffle the data going into the transmitter so that bursts of interference tend to be spread out as a few bit errors per line rather then being a burst that completely wipes out a whole block of data, this is especially useful if the FEC data is added before this is done as it allows reconstruction of even badly corrupted data providing the temporal spread is large compared to the length of the interference burst.

Just some thoughts.

Regards, Dan.


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

Author Topic: Digital FPV video for drone racing (Read 14421 times)

Share me