Electronics > Projects, Designs, and Technical Stuff
Looking for ideas: data logger algorithms and data structures.
(1/3) > >>
ataradov:
I've been trying to wrap my head around this task that came up multiple times and I always just hacked my way around it. But now I want to solve it well.

Let's say we have a data logging application. The software receives log entries and has a typical window with a scroll bar interface. There can be a lot of entries. Let's say 250 million for the sake of this discussion. Let's also assume that all entries are the same size (they may have pointers to other data, it is not important) and originally located in a plain sequential array. Let's say each entry is 16 bytes, if it helps the thinking process. So we have ~4 GBytes of data.

The goal is to let users quickly scroll thorough all this. With linear array, the task is trivial, the starting line for display is just a scroll bar position and lines are sequential after that.

Here is the problem. Some of the entries are related to each other and can be hidden under a single collapsible entry. On average 3-10 sequential lines can be combined into logical blocks.

Now the display of this data becomes non-trivial. We can still easily track the number of visible lines by subtracting some when the entry is collapsed and adding back when the entry is open. This lets us easily set the limits for the scroll bar. But since any of the groups may be open or closed (expanded or collapsed) at the same time, mapping from a scroll bar position to the actual data index is hard now.

The goal is to come up with some data structure that will let the application quickly scroll over this huge array. Obviously once the first line is found, it is trivial to find the rest, so all we need to implement is a lookup function DisplayLineToArrayIndex().

The goal is lookup speed. But the open/collapse speed is also important for responsive UI, so it can't involve massive array updates. Additional memory is secondary, but obviously it should not be too big, given how much the main array already takes up.

The information about the groupings may be stored with the data itself, or as a separate entity. The groupings will be performed by the separate algorithm as each entry is received.

Any ideas?
Kalvin:
My 2c: Each entry should contain a flag which tells that whether the entry is a child or parent. If the flag is set, the entry is a child and the entry has a parent which can be found by finding the previous entry with the child flag unset. Now, you can jump around very quickly and you can always find the parent nodes by checking the flag. Allocate more bits if you need deeper nesting or want more fancier data structure. Add a More bit which will tell whether the entry will be contain data in the next entry as well.
ataradov:
But that will not help me with the most complicated part of this - if some parents are in a collapsed state, how do I find a data line that corresponds to the visual line N?

The obvious answer is to iterate over the whole array counting lines, but it will take a long time for large data sets. There may be some compromise with having a smaller array of pre-computed partial line counts for chunks of lines. In this case finding the index will be a two stage process - first finding the right chunk, and then finding the exact location inside the chunk.

I was hoping there is a better solution out there.
Kalvin:
If you keep a track of total number of parents, you could binary search for the parent to be found by "looking around at the neighborhood of the estimated entry point until you find a parent"? If the parent entries contain running entry number, finding the specific entry would become a bit more efficient at the expense of bytes required for this information. Alternatively, you could add a special entry at every 1024 entries (or so) which will contain the running number of the current parent. In that way you would not consume too much of memory for book keeping, and you can find at least the parent entries quite efficiently with binary search.
ataradov:
I don't understand your idea. Just keeping track of parents will not do much. They can open/closed in arbitrary way.
Navigation
Message Index
Next page
There was an error while thanking
Thanking...

Go to full version
Powered by SMFPacks Advanced Attachments Uploader Mod