Bit of a mouthful.
I have a detailed time series of instantaneous Watts being consumed. I have a few data analytics tools. How hard can it be, right?
Chopping a lot of steps the source data looks like this:
+-------------------+----------+------+
| time| key| value|
+-------------------+----------+------+
|1678892412041096000|mainsPower|382.96|
|1678892442053409000|mainsPower|393.56|
|1678892448316177000|mainsPower|418.38|
|1678892458305550000|mainsPower|398.81|
|1678892472059483000|mainsPower|391.73|
|1678892502071421000|mainsPower|390.59|
|1678892532074947000|mainsPower|389.02|
|1678892562082924000|mainsPower|390.51|
|1678892592156734000|mainsPower|403.92|
|1678892622162091000|mainsPower| 392|
|1678892638103360000|mainsPower|379.91|
|1678892651098937000|mainsPower|399.94|
|1678892652170208000|mainsPower|393.23|
|1678892682185678000|mainsPower|387.42|
|1678892712182986000|mainsPower|387.84|
|1678892729019943000|mainsPower|466.35|
|1678892733018366000|mainsPower| 398.8|
|1678892742194999000|mainsPower|396.54|
|1678892753001237000|mainsPower|460.11|
|1678892756999136000|mainsPower|408.65|
+-------------------+----------+------+
So far I have extracted the timestamp to timestamp, chronological deltas in the wattage, a rounded version and their abs() value.
Ordering by the ABS value does start to show promise:
+-------------------+----------+-------+-----------------------+-------------------+-------+---------+
|time |key |value |timestamp |delta |delta_2|abs_delta|
+-------------------+----------+-------+-----------------------+-------------------+-------+---------+
|1678973146796417000|mainsPower|492.91 |2023-03-16 13:25:46:000|-100.07999999999998|-100.0 |100.0 |
|1678910098765749000|mainsPower|434.95 |2023-03-15 19:54:58:000|-100.00000000000006|-100.0 |100.0 |
|1678955387327490000|mainsPower|643.03 |2023-03-16 08:29:47:000|100.17999999999995 |100.0 |100.0 |
|1678971862984229000|mainsPower|459.95 |2023-03-16 13:04:22:000|-100.02000000000004|-100.0 |100.0 |
|1678972967735389000|mainsPower|599.19 |2023-03-16 13:22:47:000|100.70000000000005 |101.0 |101.0 |
|1678973512903132000|mainsPower|498.69 |2023-03-16 13:31:52:000|-100.84999999999997|-101.0 |101.0 |
|1678972043090063000|mainsPower|561.52 |2023-03-16 13:07:23:000|-100.78999999999996|-101.0 |101.0 |
|1678975416882210000|mainsPower|523.49 |2023-03-16 14:03:36:000|-101.36000000000001|-101.0 |101.0 |
|1678972293340457000|mainsPower|456.83 |2023-03-16 13:11:33:000|-101.12000000000006|-101.0 |101.0 |
|1678972548598568000|mainsPower|442.36 |2023-03-16 13:15:48:000|-100.73000000000002|-101.0 |101.0 |
|1678971751988803000|mainsPower|466.29 |2023-03-16 13:02:31:000|-101.19999999999999|-101.0 |101.0 |
|1678972552598249000|mainsPower|525.47 |2023-03-16 13:15:52:000|-100.74000000000001|-101.0 |101.0 |
|1678972664635737000|mainsPower|518.88 |2023-03-16 13:17:44:000|-100.70000000000005|-101.0 |101.0 |
|1678970718564290000|mainsPower|533.16 |2023-03-16 12:45:18:000|-100.64999999999998|-101.0 |101.0 |
|1678972724653133000|mainsPower|438.7 |2023-03-16 13:18:44:000|-101.30000000000001|-101.0 |101.0 |
|1678970564699165000|mainsPower|2362.87|2023-03-16 12:42:44:000|101.58999999999969 |102.0 |102.0 |
|1678975831698850000|mainsPower|494.92 |2023-03-16 14:10:31:000|-101.78000000000003|-102.0 |102.0 |
|1678972391481375000|mainsPower|585.24 |2023-03-16 13:13:11:000|102.29000000000002 |102.0 |102.0 |
... snip...
Without getting into exotics and without even aiming at the end goal I tried to pull some histograms and statistics together.
De-noise. Truncate the abs deltas to 2 significant figures.
Group By the resulting deltas and Sum their non-absed real value. Order ascending.
This gives me a much more encouraging view. I call this statistic the "coherence" across that group. Lower is better.
+-----------------+-------------------+
|sig_fig_abs_delta| sum(delta)|
+-----------------+-------------------+
| 380.0|-0.8699999999999477|
| 910.0| 1.7599999999998772|
| 870.0| 1.849999999999909|
| 780.0| 3.2800000000002|
| 1000.0|-3.5200000000000955|
| 1800.0|-3.7000000000000455|
| 200.0| 7.199999999999875|
| 230.0| 8.620000000000118|
| 1300.0|-22.550000000000637|
| 150.0| -118.9099999999998|
| 180.0| 176.1400000000001|
| 340.0| -344.62|
| 350.0| 348.02000000000004|
| 360.0| 357.81000000000006|
| 370.0| -368.5899999999999|
| 390.0| 393.13000000000005|
| 400.0| 396.7499999999998|
| 210.0| 407.14000000000044|
| 410.0| -410.78|
| 440.0| -442.4100000000001|
+-----------------+-------------------+
only showing top 20 rows
I also counted the various instances of the 2-sig-fig-abs-deltas.
+-----------------+-----+
|sig_fig_abs_delta|count|
+-----------------+-----+
| 110.0| 46|
| 150.0| 41|
| 100.0| 39|
| 120.0| 36|
| 1400.0| 33|
| 140.0| 27|
| 170.0| 27|
| 1500.0| 26|
| 160.0| 24|
| 130.0| 24|
| 180.0| 23|
| 210.0| 22|
| 2200.0| 20|
| 190.0| 19|
| 200.0| 16|
| 1700.0| 16|
| 220.0| 14|
| 230.0| 10|
| 1800.0| 10|
| 250.0| 9|
+-----------------+-----+
only showing top 20 rows
The catch is. Neither actually say much individually. I need to combine them.... one moment please...
+-----------------+------------------+-----+--------------------+
|sig_fig_abs_delta| sum(delta)|count| count_over_sum|
+-----------------+------------------+-----+--------------------+
| 200.0| 7.199999999999875| 16| 2.222222222222261|
| 230.0| 8.620000000000118| 10| 1.160092807424578|
| 910.0|1.7599999999998772| 2| 1.1363636363637157|
| 870.0| 1.849999999999909| 2| 1.0810810810811342|
| 780.0| 3.2800000000002| 2| 0.6097560975609384|
| 180.0| 176.1400000000001| 23| 0.13057794935846478|
| 210.0|407.14000000000044| 22|0.054035466915557245|
| 160.0|1578.8199999999997| 24|0.015201226232249404|
| 170.0| 1867.13| 27|0.014460696362867074|
| 220.0|1330.0100000000004| 14|0.010526236644837253|
| 360.0|357.81000000000006| 3|0.008384338056510437|
| 260.0| 777.47| 5| 0.00643111631316964|
| 240.0|1193.9099999999999| 7|0.005863088507508942|
| 270.0| 1077.95| 6|0.005566120877591725|
| 330.0| 981.5700000000002| 5|0.005093880212312926|
| 1200.0| 1140.96| 5|0.004382274575795821|
| 250.0| 2247.42| 9|0.004004591932082121|
| 280.0| 1113.02| 4|0.003593825807263122|
| 310.0| 625.8| 2|0.003195909236177...|
| 320.0| 644.7900000000001| 2|0.003101785077311993|
+-----------------+------------------+-----+--------------------+
only showing top 20 rows
Interesting. Some candidates.
The next approach has to be more exotic and actually trying to match up pairs of deltas + and - in time and see how that plays out with some "sanity test" stats like above.
Anyone worked on stuff like this? Any "information theorists" who can help with the more exotic pivots etc.?
I believe the companies who normally offer this stuff do it by running machine learning on datasets of all their customers and devices and publish likely candidates to end devices, but allow users to override them.
Surely it should be achievable some how with a single user data set. I currently have only 1 day at max resolution before it's down sampled. I might expand that to a month to give me more distribution for analysis like this.