| Electronics > Projects, Designs, and Technical Stuff |
| Software to store, aggregate, analyse and display ~1G points of streamed data |
| (1/4) > >> |
| rs20:
I have a sensor that produces 12-bit samples at 1 kHz, which will be left running for (let's say) a couple of weeks. This means that over the 2 weeks, the sensor will produce about a billion samples. I'm looking for a software solution that can store this data as it's streamed in, and perform analysis and visualization at any time during or after the 2 weeks. One example of the sorts of computation I might want to do is take the deltas between successive samples ("differentiate"), clamp those deltas within a given range, and then do a cumulative sum ("integrate"). I'd then want a graph of those results (obviously decimated, I don't want to be drawing 1G points of data). It might be tempting to suggesting some sort of SQL database or similar because it can store stuff and accept queries in the interim, but SQL database are much more geared toward creating indexes and joining across tables; I suspect it's very far from optimal for simple sequential data / convolution-style operations. I'd shudder to think how long it would take to perform the example calculation mentioned above on a billion rows in a SQL database, even aside from the inelegance of expressing the above in SQL. In particular, the ability to perform streaming computation, and store those results seems like an essential feature of any reasonable solution here (so that the results of the computations can be simply retrieved rather than computed on the fly). Bonus points if the solution is free (w.r.t. money), and slight bonus points if the visualization is accessible on the internet (either tool has an HTTP interface, or the whole thing is in the cloud). I could spend weeks crafting a custom bit of software for this, but I'd really prefer to use something a bit more generic in the hope that I'll get some features for free (w.r.t. effort). |
| djacobow:
Take a look at InfluxDB and TimescaleDB. Those are the two solid time series databases. Amazon also has a new offering; I forget its name. For visualization, the typical companion is Grafana. People also use Kibana and Chronograf. I like TimescaleDB because it is actually just a plug-in for postgres. Basically, you can still do SQL-y stuff, but you can mark certain tables as time series. Performance may be tricky, though. You might try writing in chunks of several seconds at once. |
| OwO:
I would probably just store it as raw data in some files, I don't think any database is designed for this. Maybe keep track of the files or metadata in a database, but the actual data points should be stored as 16 bit integers in a file (not string). |
| OwO:
So am I understanding this right that you want something that can store and display the data as a graph on a web UI with zooming and all that? If that's the case there is no existing solution that can handle 1M points. I looked into exactly this problem a while ago (in my case it was about displaying waveforms from a SDR receiver over a web UI). The conclusion was I have to write everything including the graph UI if performance is any bit important. The server side has to precompute min/max downsampled "mipmaps" of the data and serve bits of it on request, while the client side has to transparently switch between mipmap levels (or portions of the raw data when fully zoomed in). There is currently no graphing library that will do this. |
| voltsandjolts:
Of course you can't actually plot more points than your display has pixels, so really its an aggregation problem. I wouldn't bother with a SQL DB for this simple situation, just write a binary file, start a new one for every hour. |
| Navigation |
| Message Index |
| Next page |