General > General Technical Chat
Hi, I'm Paul and I have a data habit.
paulca:
I started with a single temperature sensor on a Raspberry PI.
Now my data logger is processing around 2400 metrics per minute. Somewhere along the way I had to introduce retention policies because the database began to consume most of the server.
The most recent additions can been seen (in draft form stuffed on to the "at a glance realtime dash") in the attached image.
[ Attachment Invalid Or Does Not Exist ]
Most of the top section is the new weather station, and the slightly less new background ionisation detector.
Below those the original "at a glance" dashboard, showing the indoor temps, heating control states/demands/targets and the automated light states.
A quick glance of the "Office off grid" battery. Yes, it's partly running on battery this morning making this post.
Quick glances on power gen/consumption 24h and 1h.
Down the right size is the "at a glance" power zone break down, the heating pipe monitors, Lux sensors and the batteries in the sensors in one place.
There are 3 more entire dashboards, for "Solar power system details", a longer dash just of straight time-aligned graphs for everything and a dashboard specific to monitoring the servers/pcs.
I'll see if the uploader will let me add an image of the others below.
The architecture is rather simple. The majority of devices already support MQTT messaging, so devices all publish those onto an "External" message bus. A set of "proxies" subscribe to this bus and republish the data metrics with a slight enrichment and "normalization" to all "services" to make assumptions and self configure / self onboard devices based on the message contents, prefixes, suffixes and even just "Units". Importantly this layer timestamps everything... you would be surprised at home many sensors send data without a timestamp and you would not be surprised how much a headache that can cause with repeated stale data.
Those devices which don't do MQTT, will either use a hardware bridge (ESP32 BLE->MQTT) or HTTP... but they will end up in the same place on the internal MQTT bus.
The main internal, normalized, bus is then lifted and published to the InfluxDB persistence. Grafana graphs make the UI.
Some of the heating control states and lighting control states touch on the fact that there is a service layer behind this as well. Services which watch data, combine different elements of data and make determinations, publishing other messages to control things.
The lights for example, have their choice of data when i comes to determining if lights are required or not. Presently it uses the main MPPT power output, it was found that the solar panel voltage was triggering way to early in the morning and way to late in the evening. When the panel basically stops producing power works far better and will even bring the lights on correctly mid afternoon on dark winter days.
The next feature is to fix the motion controlled night light in the kitchen to not come on if the overhead is already on... by looking at the lux of that motion sensor to check. I mention this, as a king pin feature of this setup is that the data is all normalized into one big fat data bus, so all services have access to everything regardless of vendor or protocol.
paulca:
Image...
paulca:
Solar power details.
tggzzz:
--- Quote from: paulca on February 19, 2024, 11:45:00 am ---Now my data logger is processing around 2400 metrics per minute. Somewhere along the way I had to introduce retention policies because the database began to consume most of the server.
--- End quote ---
Better a data habit than a hand-waving habit :)
Why don't you only record sufficient samples to meet the Nyquist limit of whatever it is you are measuring? Room temperature doesn't change very fast!
Consult any DSP textbook for how to "coalesce" many samples taken at an unnecessarily high frequency into a single sample. Keywords include "decimation" and "subsampling" and "downsampling".
paulca:
--- Quote from: tggzzz on February 19, 2024, 12:44:11 pm ---
--- Quote from: paulca on February 19, 2024, 11:45:00 am ---Now my data logger is processing around 2400 metrics per minute. Somewhere along the way I had to introduce retention policies because the database began to consume most of the server.
--- End quote ---
Better a data habit than a hand-waving habit :)
Why don't you only record sufficient samples to meet the Nyquist limit of whatever it is you are measuring? Room temperature doesn't change very fast!
Consult any DSP textbook for how to "coalesce" many samples taken at an unnecessarily high frequency into a single sample. Keywords include "decimation" and "subsampling" and "downsampling".
--- End quote ---
The dashboard is "real-time". Rather it has a 5 second refresh, while the data has nanosecond precision, but the LCD display downstairs is literally real-time, the service layer is real time. When the temperature changes I might want the heating to respond to that change "now" not in a 1 waveforms time.
Anyway, straw man asides, yes, I could "hold and modify" the data on ingestion into longer intervals, or I could force an immediate lower resolution on append, but.. I choose to just run queries like the following.
--- Code: ---day2month_temps CREATE CONTINUOUS QUERY day2month_temps ON home_auto BEGIN SELECT mean(value) AS value, max(value) AS maxVal, min(value) AS minVal INTO home_auto.one_month.temperature FROM home_auto.one_day.temperature GROUP BY time(1m), * END
--- End code ---
Which the DB server runs every 30 seconds. The retention policy can then be selected in queries. In practice the 1 month -> 1 year -> 10 years policies aren't really working out either, so it will change at some point. My main issues with this approach (reduction queries) are:
* Management. They need to be set up per time-series.
* Variation of data. They are not all the same query/value/cadence.
* State transition time-series are a handful to process and reduce, not directly supported easily in the DB Server. (first/last are supported, total transition count isn't, nor would be total runtime per interval). If I reduce the heating boolean states to aggregate data it is no longer a discrete state timeseries, so needs different graphs, showing a visualisation of what happened within the interval.
So what I am likely to do, is nominate data, like state transitions and give them a fixed 1 year retention and no down sampling. I then need to automate the addition of the other retention policies to aid "onboarding" of devices. I note there are several series missing from the monthly policies which is annoying and the result of "copy pasta" management.
On how often the sensors report, at the moment is more critical. My "manually timered" sensors are, 5 seconds, 10 seconds, 1 minute or in a lot of cases "on change". I have experienced "event starvation" issues with temp sensors defaulting back to a max reporting period of 1 hour. My logic always favours "OFF" if nothing speaks up to say it (heating/lights) should still be on, it will be turned off. So the heating oscillated all night while the temp fell really, really slowly. This is probably a burn from the decision to use "cadence" for "hysteresis". The practical solution is an artifical event on a timer to force re-evaluation of the 'last' reported data.... that in itself has issues when senors stop reporting entirely! Generally, stale temperature (or out of spec) values are dropped and no action is taken.
Navigation
[0] Message Index
[#] Next page
Go to full version