Author Topic: Reducing state change data. (Read 704 times)

paulca · « **on:** September 17, 2021, 01:08:37 pm »

So if you were monitoring voltage or temperature every second, but wanted to reduce that data down to one record per minute for archiving, you might use aggregate function like "mean()", "min()", "max()" or selectors like "last()", "first()" etc.

I spend a while setting up a timeseries database to do just this, based on retention policies. I have high res data for a week, 1 minute resolution data for a year and 1 hour resolution data for the for seeable.

Then I hit a problem. Not all my time series data is numerical. A few are booleans reflecting "ON/OFF" states. While the sensor data is posted "on change" and has a natural frequency the boolean state data is a randomly spuratic series of ON...OFF....ON...OFF

The only wild idea I came up with was to "add" all the state changes together for a period and publish a single record reflecting it's state at the end of it.

The other option is to not archive the boolean itself, but rather a percentage value of how often it was "ON" and a transition_count of how many times it toggled.

Anyone approached this problem before?

Cerebus · « **Reply #1 on:** September 17, 2021, 02:29:11 pm »

You are re-inventing the wheel. Google "time series database". RRDTool is the archetypical example of this species, there are many others.

paulca · « **Reply #2 on:** September 17, 2021, 02:34:11 pm »

Quote from: Cerebus on September 17, 2021, 02:29:11 pm

You are re-inventing the wheel. Google "time series database". RRDTool is the archetypical example of this species, there are many others.

Actually I am using InfluxDB. Fairly sure that RRDTool only provides aggregates and selectors as well for it's reduction function. I don't think RRDTool even stores boolean types.
The only real difference with InfluxDB is that retention and reduction is manually specified per database and storage is flexible ( I do miss fixed sized DBs).

I'm obviously trying to avoid values like

Heating ON/OFF: 0.65425
Which makes no sense and cannot be graphed as a state ribbon

T3sl4co1l · « **Reply #3 on:** September 17, 2021, 03:29:11 pm »

So like raw delta-sigma data? What possible use would that have? Averaging as a percentage-ON value sounds perfectly reasonable to me. Like, what good would it do to graph it anyway? It's just a solid block from 0 to 1, you can't see the individual edges zoomed out over weeks or years right?

If you're using it to run correlation or pattern matching, would it not be more effective to run those right away?

The challenge of course, is finding a suitable transform to permit reasonable compression (lossy if necessary), without locking yourself into overly narrow types of matching.

If it's something like furnace on/off data, a Fourier or Hadamard transform would seem perfectly suited. You don't lose any information in the process (time-of-day becomes phase angle), and you can stack the results from adjacent days or weeks as needed, perhaps logging the residual (variations between periods), or exceptional cases ("and as you can see from the electrical bill, this is when I went on vacation", etc.). You can reduce resolution (gather frequency bins, 1/s --> 1/min, etc.) or perform other transformations later on.

Tim

paulca · « **Reply #4 on:** September 17, 2021, 04:26:11 pm »

You're right. I suppose with any large data it's best to work out how you will use it first and structure it accordingly.

Yes, zoomed out over months/years it will be difficult to see individual transitions, just sort of "dithering". But... there are aggregate calculation done on the timeseries such as "total transitions" and "total on time". Those are dynamic based on time period.

However. Storing the "percentage of on time" and "total transitions", that data can be reconstructed later.

The gas bill is fine, but it's out of phase and messed up by estimate readings, price changes and all manor of things. It would be nice to see total runtime per period, average transitions per day over period... so I can deduce trends, rather than absolute values. "Did the insulation I installed make this winter better?", "Did the sympathetic heating responses lower the amount of cycling."

It's looking like % + count is the way to go. I can still have the absolute state transition history for 1 week to diagnose and monitor day to day behaviour.

As an aside, the separate mind bender I'm working with is, current target temperatures. I have many things - an indeterminate number of things - that can set a "target" to aim for. The trouble is, it's perfectly valid for multiple targets to exist for each zone. The source of such targets "schedules" are not aware of each other. So they publish competing data for the same value. I would like to record/display the boiled down reality, ie. the highest active target for each zone. Easy. Except it's a distributed, concurrent, async, event driving, architecture. My solution is to store the timestamp of every target temperature for a zone against it's value, overriding any on update. When I want to publish the target temp for a given period, I just find the highest key in the data which isn't expired/stale. Not finished it, but it looks good so far.

David Hess · « **Reply #5 on:** September 18, 2021, 07:03:37 pm »

Temperature changes slowly so I might encode it as delta values. Alternatively I might record some combination of the minimum, maximum, average, and standard deviation over discrete time periods.

paulca · « **Reply #6 on:** September 29, 2021, 01:17:45 pm »

Quote from: David Hess on September 18, 2021, 07:03:37 pm

Temperature changes slowly so I might encode it as delta values. Alternatively I might record some combination of the minimum, maximum, average, and standard deviation over discrete time periods.

Yes temperature, being a continuous value, is easy enough to reduce down with MEAN, LAST, MAX, MIN etc.


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

EEVblog Electronics Community Forum

Author Topic: Reducing state change data. (Read 704 times)

paulca

Reducing state change data.

Cerebus

Re: Reducing state change data.

paulca

Re: Reducing state change data.

T3sl4co1l

Re: Reducing state change data.

paulca

Re: Reducing state change data.

David Hess

Re: Reducing state change data.

paulca

Re: Reducing state change data.

Share me