Electronics > Projects, Designs, and Technical Stuff
Looking for ideas: data logger algorithms and data structures.
<< < (3/3)
splin:

--- Quote from: ataradov on December 04, 2018, 03:48:37 am ---
--- Quote from: splin on December 04, 2018, 03:36:44 am ---True, but I tried to quantify it. Since you've done this before, how many 'operations' (eg. search iterations) are acceptable for smooth scrolling?

--- End quote ---
It is not super critical. I have a mental number of a loop with 65K iterations being ok.  It is only needed for the first line, consecutive lines are easy and fast.
--- End quote ---

I'm not a PC programmer but that sounds reasonable. I'm not sure what you mean by the 'consecutive lines are easy' bit - surely you have to do the complete calculation every time the O/S tells you that the scrollbar has been moved? It could have moved a small amount but equally could have moved end to end in the time that the O/S gives your progam control again if a higher priority process happened to be running for a few 10s of milli-seconds.


--- Quote ---
--- Quote from: splin on December 04, 2018, 03:36:44 am ---If by that you mean using the entire BSA to cover only the currently in-use part of the main array rather than the entire address space, that will improve typical search performance but not the worst case - at the expense of complexity and a performance hit recalculating the whole BSA as data is added to the log
--- End quote ---
That is what I meant. I'm thinking the typical data will be much less than 250 M entries, I just don't want things to slow down when the full capacity is reached. I will obviously see how expensive that recalculation will be.
--- End quote ---

Optimizing the typical case will generally make the worst case worse - though perhaps not by much. In the case of a GUI you probably have little to gain by improving the typical case since your application probably isn't doing much else, if anything, at this point. If you make it fast enough for the worst case then it should be fast enough for all other cases, so there is no point making it more complex to improve the typical case. It would be different if this were a server application where total CPU load should be minimised. 


--- Quote ---
--- Quote from: splin on December 04, 2018, 03:36:44 am ---Are you referring to the extra memory that might be required to keep track of the dynamic sizing v static? It would be pretty negligable wouldn't it?
--- End quote ---
Yes, I would not consider consuming of up to 250 MBytes to be a problem at all.
--- End quote ---

I assume you mean 250 MBytes in total, in addition to the 4GB data log, for this application. I thought you were referring to the extra memory for dynamic sizing v static which should not be significant.


--- Quote ---
--- Quote from: splin on December 04, 2018, 03:36:44 am ---But in my proposal, the BSA contains the numbers of visible entries in each block so a change can affect 2 entries at most.

--- End quote ---
I had the same thing in mind, but I'm not 100% convinced that it is the case.

--- End quote ---

If you have 250M entries in the log and 250 entries in the BSA each of which holds the number of visible log entries in each 1-million entry block of the log. If every entry is visible then the maximum value of each BSA entry is 1 million. Fold one or more within a block and the corresponding BSA entry will be reduced accordingly. If the folded entries straddle two blocks then both the corresponding BSA entries will be affected. Equally when an entry is expanded - it can't change the BSA entries of any blocks that don't include those expanded log entries, and none can be increased beyond 1-million.

In general, use the simplest possible solution until you have some evidence that it isn't, or probably won't be, good enough - or you have sufficient free time/resources to 'improve' it if it you believe it will be worthwhile.
ataradov:

--- Quote from: splin on December 04, 2018, 05:39:13 am ---I'm not sure what you mean by the 'consecutive lines are easy' bit - surely you have to do the complete calculation every time the O/S tells you that the scrollbar has been moved?

--- End quote ---
The window displays more than one line. But once you found the index of the first one, finding the following ones is straightforward.


--- Quote from: splin on December 04, 2018, 05:39:13 am ---In the case of a GUI you probably have little to gain by improving the typical case since your application probably isn't doing much else, if anything, at this point. If you make it fast enough for the worst case then it should be fast enough for all other cases, so there is no point making it more complex to improve the typical case.

--- End quote ---
That is probably correct. I may be overthinking this.


--- Quote from: splin on December 04, 2018, 05:39:13 am ---I assume you mean 250 MBytes in total, in addition to the 4GB data log, for this application.

--- End quote ---
Yes, I meant the direct overhead for the additional markup information.


--- Quote from: splin on December 04, 2018, 05:39:13 am ---Equally when an entry is expanded - it can't change the BSA entries of any blocks that don't include those expanded log entries, and none can be increased beyond 1-million.

--- End quote ---
Yes, I now see that it will always be the case.

I guess I've got a plan to test.
rhb:
I hate to say this, but the answer is an M204 style database aka flat file array.  I processed many hundreds of GBs of oil well data in ASCII this way.  I used tens of thousands of files in a 3 level hierarchy of subdirectories  to keep the directory access times reasonable.  Almost everything was Bourne shell and awk. It's dead simple to code and runs like a bat out of hell.

In my case I could not keep stuff in memory so I had to do grep(1) style processing using intermediate files as this was over 10 years ago.  It took a lot of stroke to get 2 TB of disk all to myself.

If you need to do searches on multiple fields, a relational DB will eat you alive thrashing the cache doing pointer dereferencing.

If you absolutely, positively must go as fast as possible, track user query fields and generate sorted arrays for each heavily used field that you can query using bisection.  You can guarantee worst case performance for bisection of a sorted list.  Everything else will eat you alive on the worst case.

"When in doubt, use brute force."  Ken Thompson (according to wikipedia) .  My recollection is "Never underestimate the value of brute force."  Both may be accurate considering the origin.
Nominal Animal:
I'd chunk the data.  Each chunk (except the last) contains a reference to the array data (a span of a hundred entries or so), and the rendered height.  At any point in time, at most two chunks are displayed at the same time.  Whenever an entry is modified/opened/closed, the chunk size is recomputed; note that only a visible chunk can be modified.

The widget displaying the data is essentially a virtual canvas with a scrollbar, where you render the current chunks.
Navigation
Message Index
Previous page
There was an error while thanking
Thanking...

Go to full version
Powered by SMFPacks Advanced Attachments Uploader Mod