Author Topic: Dataset/Run Length Encoding  (Read 467 times)

0 Members and 1 Guest are viewing this topic.

Offline moontideTopic starter

  • Contributor
  • Posts: 29
Dataset/Run Length Encoding
« on: April 20, 2020, 12:22:33 am »
I have a problem I'm working with. It should eventually go a hardware solution.

10,000 options. 1 of those will be a 1 value. All the others will be 0.

I'm trying to get a simple way to define where that 1 is. Or what the first set of 0 size is. Or a ratio and scope of 0s are.

It's basically a compression system.


I have been looking at run length encoding lambda.



10,000 locations.
Only 1 location is 1.
All others are 0.






I need a simple input to determine the state.
This problem makes my skin boil! The cost of hardware is of no concern, but time to compute is important.




My current thought:
Get a lambda for the first 0 set. Reference it to length to make sure it's done. Then 1 location is obvious. Quick reference of data back into PCB data lines. Time is best to be small like a Thai manufacturer getting the 1st layout, lol.


I'm a bit drunk, but this is my challenge. I know data and base info but I think there will be options I do not realise in this community. Software is rather pure but circuit bending can wreck me if I demand pure data.

Questions?
 

Offline moontideTopic starter

  • Contributor
  • Posts: 29
Re: Dataset/Run Length Encoding
« Reply #1 on: April 20, 2020, 12:31:41 am »
Where I am interested in this interaction.

You have 168 or around 244 wires to ram. There are so many possible options of bit locations. I have a simpler system as only 1 location is needed to be determined. If I had under 999 options it will work.

I cannot understand how to translate it.


From what I remember of ram there is a change line and full 186/244 data. Idk, I'm lost, meep
 

Offline moontideTopic starter

  • Contributor
  • Posts: 29
Re: Dataset/Run Length Encoding
« Reply #2 on: April 21, 2020, 05:52:48 am »
NVM. Got it. WinZip eat my shorts.
 

Online T3sl4co1l

  • Super Contributor
  • ***
  • Posts: 21695
  • Country: us
  • Expert, Analog Electronics, PCB Layout, EMC
    • Seven Transistor Labs
Re: Dataset/Run Length Encoding
« Reply #3 on: April 21, 2020, 07:08:52 am »
If there is 1 active position out of 10,000 possible positions, you have a total entropy of 10,000 states, and need ceil(Lg(10,000)) = 14 bits to store it.

Most typical encoding would be binary positional, i.e., taking the value as an address into an array of 10k bits.  The 74HC138 being a simple hardware example (1-of-8 decoder).

I can't tell what else you're talking about; the only part that seems consistent is "1 of those will be a 1 value".  I don't know why you'd want a "set of 0 size", or what good it could be (you can declare any number of sets of size 0, anywhere in the array; it's a null operation?).  I don't know how "ratio" or "scope" apply to bit arrays.

If you ever have multiple 1's, you will need some way to encode how many, and where.  In principle, you need ceil(Lg(N-n)) bits for each subsequent bit, because, presumably bits shouldn't be placed on top of each other, that would just be one (redundant) bit, right?  That is, the first bit has 10,000 locations to choose from, the next has 9,999, and so on.  This is an insignificant difference at first (you don't even get to save one bit from the address variable until 8191 locations remain), but becomes important at high densities (which is why it seems plausible that RLE of a low-frequency pattern of 1s and 0s -- say, run lengths averaging 6 or more -- could be cheaply encoded by delta).

If there are few bits, a sparse model will compress better than an array or bitstream (at least for more naive methods, without getting into LZ compression for example -- which is very good all-around, worth considering if your problem isn't so trivial that one of these methods is easier, and assuming your platform can implement it).  If they tend to be grouped together, taking deltas will be cheap.  If they tend to be dense (short runs of 1s and 0s), RLE might still be used but with relatively short length, further compression being afforded with Huffman perhaps (which I've written before, for an embedded bitmap font).

Or even, if the exact number or position of bits doesn't matter so much, a lossy method might be used, for example taking any of the above cases but quantizing the arguments to shave off a few bits here and there.

Not to mention if there are higher order patterns in the data, for example if it can be wrapped into a 2D array, and columns, diagonals, blocks, etc. become useful.  Mind that, the more shapes you put in, the more bits you need to select between them, so, at some point, so many shapes will just be one-offs that the compression ratio goes past a peak.  (This would be equivalent, in a certain way, to LZ encoding, which uses pattern matching as well to build an ad-hoc "dictionary".)

Tim
Seven Transistor Labs, LLC
Electronic design, from concept to prototype.
Bringing a project to life?  Send me a message!
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf