Well, I guess almost no response is better than a negative one, moving on!
This post will be about the concepts of how this will work going forward, and the next will be about my progress so far. Feel free to skip this post if you are inclined to do so.
So, how is it possible to compress the video stream in such as way that each pixel represents less than a single bit? The answer is found everywhere on the internet, in the form of jpeg compression! I will try my best to explain the process the best I can, but if you are still confused, check out
https://en.wikipedia.org/wiki/JPEG , It's a great article.
For now, let's assume we just have an image we need to compress. The first step is to convert the color-space from RGB to YCbCr. This will separate the brightness from the color of the image. Why do this? Our eyes are not as good at sensing changes in brightness as they are in changes in color. This allows us to reduce the resolution of the 2 color channels in the image, with no perceptible difference. In our example, we will reduce the resolution by a factor of 2 in both the X and Y dimensions. Next, we will split each color channel into 8*8 pixel blocks. Then we will apply the
Discrete Cosine Transform (DCT) algorithm to each 8*8 block. Explaining what DCT does is very difficult, I would recommend checking out
https://en.wikipedia.org/wiki/Discrete_cosine_transform . In essence, we will separate the 8*8 block into another 8*8 block where the different frequencies of the image are separated, where the top left is the low frequency, and the bottom right is high frequency. Why do this step? Well, our eyes are not very good at perceiving high frequency patterns in images as the low frequency ones. This means we can reduce the amount of high frequency data contained in the image without there being a perceivable difference, to an extent. That step is called "quantization" where we divide each block (now DCTified) by a 8*8 quantization table. This table determines the level of compression in the image, and generally is a gradient of numbers where the values increase as they get closer to the lower right corner. This leaves us with 8*8 blocks that have largeish numbers in the upper left corner, and mostly numbers that are less than 1 in the rest of the image. We then round down each 8*8 block, removing all decimal points, resulting in a 8*8 block that mostly consists of zeros. At this point, depending on the quantization table, our image now mostly consists of zeros, and numbers on a range of -127-127 (single byte). The next step is called zig-zag encoding, where we simply convert our 8*8 image into a 64 value long string of bytes from a zig-zag pattern, See image.
We now can apply a compression algorithm called "run length encoding".
https://en.wikipedia.org/wiki/Run-length_encoding This wikipedia article explains it much better than I can, so go read that. And we're done! Now, how effective was this? Well, let's do some math.
The OV7725 camera module (the one I will be using) uses 16 bits to represent each pixel.
Vres hres fps bits
640 x 480 x 60 x 16 = ~ 295Mbits/sec
In order to fit 295Mbits/sec into a 18Mbits/sec datastream, we need to reduce the data by a factor of 16:1. If we look at the JPEG wikipedia article, We can see that the jpeg compression standard "High quality" has a compression ratio of 15:1. I have made some programs that implement this in python, and I can confirm that ratios of 16:1 are indeed possible without being able to tell the difference without a reference image! The next post will be about what I have done to implement this algorithm on an FPGA.