| Electronics > Projects, Designs, and Technical Stuff |
| Software to store, aggregate, analyse and display ~1G points of streamed data |
| << < (3/4) > >> |
| voltsandjolts:
--- Quote from: djacobow on June 21, 2019, 02:52:10 am ---What time-series databases have over flat binary files or whatever is the special indexing on the time stamps that makes them VERY fast to query time intervals --- End quote --- Fair enough but the OP is recording 12bit data at 1kHz, there is nothing simpler or faster than just appending those 12bits to a file. With the constant sample rate you don't even need a timestamp, making it VERY VERY fast to query time intervals. The OP also 'didn't want to spend weeks' developing software; so just write to a file then to plot just pick every n'th point in the time range you want and plot those in Octave or whatever. This is like 30 minutes of coding effort but for a perhaps interesting learning exercise it could be done with databases. |
| rs20:
What I didn't mention or didn't mention clearly is that I wanted to mess around and experiment with different functions, have those functions be nicely precomputed for me, and expose the processed data as interactive graphs in the cloud. With those things in mind, using InfluxDB looks like it'll be vastly less effort than a hacky agglomeration of C/awk/octave scripts. Thanks for the advice, I'll report back on how it goes! |
| Siwastaja:
Remember that often IO and storage is the bottleneck, both performance-wise, and for programming complexity. So precomputing and storing things often isn't a good idea, unless it can reduce the amount of data (so that you mostly read in the precomputed data, and only occasionally need to come back to the raw data). But whenever your processing functions are computationally small (and operate over short number of samples), and amount of data out >= amount of data in, it is likely faster to recalculate each time, even if you do it thousands of time again, since a data fetch from disk (or even from RAM!) to L1 cache or CPU registers is like 99.9% of the work. But all in all, the problem can't be generalized very well. In some cases, a 100 lines of code custom job does wonders. In other cases, it can get really difficult. So while it's good to collect generic wisdom like you get in this thread, each reply is only a grain of salt, a possible tool in a massive toolbox, when we don't know the exact details. But based on what we know now, you are having a fixed-frequency timebase and assuming padding 12->16 bits, generating only 2*1000*60*60*24*14/(1024^3) = 2.25 gigabytes of data during the 2 weeks. This is peanuts, this is something a computer from late 1990's could process with no issues whatsoever, even in one flat file in ancient FAT32 filesystem, and seek any value, performance limited by the actual disk seek time only. Now, this data completely fits RAM at once, and reading it from SSD won't take many seconds, so you don't need to even think about disk seeking for reading it partially. Just trivially index the table. So from performance viewpoint, there's no need to complicate things. The challenges are all in achieving the graphical visualization that pleases your eye (and allows you to see what you want to see), either by finding a suitable graphical tool, or programming the graphics part of your own processing. OTOH, the amount of the data is probably low enough that you can successfully (ab)use many database systems if they give you the style of output you like, even if they are not algorithmically the "right tool for the job". For creating new software for mass users, I loathe bloat and misusing computing resources and energy, but for actual one-offs and analysis jobs, doesn't matter. In cases like this, you optimize for development resources instead of computing efficiency, and often the best choice is something that just matches with your personal experience or way of doing this due to your personal history, no matter how "ineffective" others think it is. |
| 3roomlab:
--- Quote from: Karel on June 20, 2019, 08:04:30 am ---Have a look at EDFbrowser: https://www.teuniz.net/edfbrowser/ --- End quote --- I gave it a whirrrr I used my old log data 20k points 2 columns 424kb in csv it churned out edf 78.8kb never used it before, I am not familiar with the interface. but it looked like a audio editor trying to play data points :P at first glance, I think the compression looks nice. 424 to 79 kb (2 col data) my data was generated from a python logger that auto saves to xls in 20k point batches. I could make it do csv. assuming all things being the same if I were to do the data collection, 1m points in edf = 3.95mb in 50 batches I also tried the timescale reduction function, a reduction by 10 produces a 39kb file from the 79kb file. what is interesting is that it has a screen calibration function to fit 1:1 mm scale. in terms of generating a chart in opencalc, revisiting the chart (reloading) again vs using opening edf in terms of load time. I think edf is faster. now here comes something interesting. I re exported out the edf to csv and we can now see, the compression adds artifacts to the data example col 2 the data is -223nV the compressed data is now -222.89 now sure how you calculate additional bits of error, but every value is now changed as you can see in the pic. larger values suffer less change. so if I were to reformat my python logging to edf, the data may need to be multiplied up by x10? x100? to mitigate error. maybe now the original data need to be -223000pV, so the compressed data is 222999.9pV and I tried it -223000pV in original becomes -222892.11 in edf I guess the compression data loss is a fixed thing who want to try and calculate the bit loss from -223 to -222.89 ? (looks like 0.05% error?) if I have collected data for say measuring a 10kohm precision resistor in milliohms, say 10,000,000.67 milliohm? I think the saved data in edf will no longer be the same and usable. which I also tried using randomised 99999.xxxxx data the edf render is unusable as it converts everything to 9999.69482. all 20k data points is this value after compression. so my conclusion/guess is that edf data can only be used for approx 3 to 3.5 digit of precision recording which is what? 10 bit? 11bit? |
| Siwastaja:
--- Quote from: 3roomlab on June 23, 2019, 10:40:59 am ---I used my old log data 20k points 2 columns 424kb in csv it churned out edf 78.8kb ... and we can now see, the compression adds artifacts to the data example col 2 the data is -223nV the compressed data is now -222.89 --- End quote --- Just for lulz, try zipping the csv file and see how small it gets. |
| Navigation |
| Message Index |
| Next page |
| Previous page |