Author Topic: [C++] Identifying abnormal values in a stream of data  (Read 1304 times)

0 Members and 1 Guest are viewing this topic.

Offline pman92Topic starter

  • Contributor
  • Posts: 29
  • Country: au
[C++] Identifying abnormal values in a stream of data
« on: November 04, 2021, 08:34:46 am »
Hi guys,
I'm working on a small project that uses a microcontroller to measure the length (ie. frequency) of an inputted square wave pulse using pin change interrupts and a hardware timer. The measured pulse length (which is in timer clocks/ticks) is then modified. It might be doubled, or halved, or multiplied by 2/3, or any number of other possibilities depending on what the user has selected. This modified pulse is then outputted on another pin using another hardware timer to toggle the pin as required.
Importantly, the input pulse length is always changing (its not a fixed frequency, it might be increasing or decreasing or staying roughly the same), and it has a large range of valid "pulse lengths" (which are measured in timer ticks) from roughly 150 up to about 100,000.
Its intended to be used in an automotive application and as a result the input square wave is sometimes noisy. The high frequency noise is easy to resolve, if the measured pulse length falls outside the valid range (150 to 100,000) its ignored. However I'm having trouble resolving intermittent pulses that get measured but are invalid/noise but still fall within the valid range.

For example, it might measure a pulse of 10,000, then one of 10500, then one of 11,000, then one of 2000, then one of 9,500, then one of 12,000, then 12,500.  Clearly the pulse of 2000 was an error caused by noise. However because it is within the valid range it has a brief effect on the output (until more valid pulses are received to return it to where it should be). The effect being I get random jumps or dips in the frequency of the output pulse when it should be a smooth change of frequency ramping up/down following the input.

I have started saving and averaging the last several pulse lengths, before using the averaged value to calculate the output, which has improved the problem. However I want to keep the output responsive to the input and don't want to save too many values as it will make it "laggy". And when you only have a handful of values to average, and the invalid one is an order of magnitude smaller or bigger than the others, it still has a significant impact on the result.

I'm thinking of something that checks each pulse and compares it to previous ones and decides if its a valid or invalid pulse length, depending on what the recent few valid pulses were. For example if its recently received valid pulses in the 30,000 to 40,000 range and it suddenly gets one of 600, it will be discarded and ignored. Discarding pulses is not a problem as it continues calculating the output frequency from the last valid ones.

However I'm struggling to work out how to approach this in C++. I already have a short array of the last few pulses (used for the averaging) but I'm not sure the best way to compare this to a new one and decide if its valid or invalid. I also need to make sure its not going to get "stuck" expecting a pulse length in a range that the input has since moved out of (possibly some sort of time-out?).

I figured this is probably a somewhat common problem and no doubt someone has done something similar before. Hopefully someone can point me in the right direction.

Thanks in advance, Daniel
 

Offline T3sl4co1l

  • Super Contributor
  • ***
  • Posts: 21686
  • Country: us
  • Expert, Analog Electronics, PCB Layout, EMC
    • Seven Transistor Labs
Re: [C++] Identifying abnormal values in a stream of data
« Reply #1 on: November 04, 2021, 09:10:16 am »
So, RPM, or crank position sensor or something related to that?  You don't need to be so abstract, and there may be opportunities there.  For example, what about the signal itself?  Is it analog in nature?  Can it be cleaned up better before converting to digital?  ("Noisy" and "pin change interrupt" seen together, raises all sorts of red flags.)

So, you want a median filter?

Tim
Seven Transistor Labs, LLC
Electronic design, from concept to prototype.
Bringing a project to life?  Send me a message!
 

Offline pman92Topic starter

  • Contributor
  • Posts: 29
  • Country: au
Re: [C++] Identifying abnormal values in a stream of data
« Reply #2 on: November 04, 2021, 09:54:37 am »
The signal is RPM. It is designed to accept an RPM signal and covert it to a different frequency to drive a tachometer designed for a different number of cylinders.

The hardware provides a couple of input options.
For square wave stuff (such as connecting to the ignition coil negative or using an existing square wave signal from an ECU) there's an optoisolator which generates the 5v square wave from the incoming square wave. The 5v square wave is then fed through a Schmitt trigger before going to the microprocessor.
For AC/Inductive input stuff (inductive type sensors used on a lot of diesel engines, that generate an AC voltage), there's a dual comparator setup to detect the zero crossing point with positive feedback/hysteresis. That is then fed through the same schmitt trigger before it reaches the micro.

A median filter may be just what I'm after
 

Offline pman92Topic starter

  • Contributor
  • Posts: 29
  • Country: au
Re: [C++] Identifying abnormal values in a stream of data
« Reply #3 on: November 04, 2021, 10:21:03 am »
For example, what about the signal itself?  Is it analog in nature?  Can it be cleaned up better before converting to digital?

Actually after thinking properly about that I believe it probably could be in most cases.

Its measuring the pulse only on each rising edge. So if there is a bit of noise that causes a miss-reading it must be picking up 2 shorter than desired pulse lengths (see attached image)
1315112-0

I bet that bit of noise is fairly high frequency (at least compared the the signal I'm looking for) and could be eliminated by a low pass filter. As long as the filter allows valid signals through it would probably get rid of most of the problem.

For anything that manages to get through, a median filter in software could catch.

Thanks Tim. I was so deep down the software solution path, assuming that's what would be required, that I never even considered a hardware solution
 

Online Berni

  • Super Contributor
  • ***
  • Posts: 4955
  • Country: si
Re: [C++] Identifying abnormal values in a stream of data
« Reply #4 on: November 04, 2021, 10:32:20 am »
Yep you do always want some analog signal conditioning on your input.

Just low pass filtering will remove a lot of high frequency noise like short sharp pulses (that typicaly get in from parasitic coupling between wires). High pass filtering away DC offset can also be a good idea to make sure any weird floaty voltages don't affect your signal. You also typically want to send your signal trough a schmitt trigger circuit so that floating near the high/low threshold doesn't cause noise to be picked up as extra pulses.

But sometimes the signal is just too garbage to even analog filter. In that case there is a filter type that works well for spurious pulses like that. Take 16 samples, calculate the average, then pick the sample that was the closest to the average. When you get the next sample add it as the 17th and throw away the 1st and calculate again. That way you get no reduction in sample rate from the filtering and it should not slow down your signals change rate, just delay it by about 8 samples.
 

Offline Psi

  • Super Contributor
  • ***
  • Posts: 9951
  • Country: nz
Re: [C++] Identifying abnormal values in a stream of data
« Reply #5 on: November 04, 2021, 11:01:16 am »
This is what I would try..

Maintain the last n samples (lets say 10) as well as the calculated lower and upper quartile of those n samples.
Whenever you get a new sample you check if it is between the lower and upper quartile range. If it is you use the value and it will get added to the last 10 samples and the oldest sample dropped.
 
If it is not within the lower and upper quartile you discard that sample and instead use the last 'good' sample.
But, you also keep a count of how many times this has happened in a row (sample outside of range).
The first time it happens you use the last good sample, as stated above.  The 2nd time you average the last good sample and the new potentially bad sample. The 3rd time you average the same last good sample with the two latest potentially bad samples. etc.

So the system will remove anything abnormal but you are still using actual samples, not anything filtered, so the response is good. Except in the case where the abnormal samples continue and turn out to be correct data. In which case there is a little bit of averaging until it gets back into range.

If it is really bad quality data you may also want to do a small running average on the data first, before the system above.
« Last Edit: November 04, 2021, 11:06:57 am by Psi »
Greek letter 'Psi' (not Pounds per Square Inch)
 

Offline T3sl4co1l

  • Super Contributor
  • ***
  • Posts: 21686
  • Country: us
  • Expert, Analog Electronics, PCB Layout, EMC
    • Seven Transistor Labs
Re: [C++] Identifying abnormal values in a stream of data
« Reply #6 on: November 05, 2021, 04:20:16 am »
Ah, RPM changer, sounded something like that!

So, this isn't anything new, it's been done many times -- have you checked what solutions are already available?  I'm pretty sure I've heard of this application before.  Maybe there's an open source solution?

Or maybe it's not a super common application, and is hard/impossible to search for, that always sucks...


To be clear, there are lots of ways to filter digital signals -- it works just the same as analog, everything's just smooshed down to one-bit quantization.  Same theory applies: digital debounce is just a one-bit version of a linear lowpass filter; well, if you aren't using hysteresis, which is nonlinear of course -- but that still corresponds to a related analog circuit.

Yeah, electrical and opto signals should be pretty clean, save for EMI, which can be easily filtered (should be well above the signal's frequencies).  Inductive sensors have weak waveforms at low RPM, so do be careful how you deal with that, including checking for pulse width and balance -- the duty cycle might be wonky for example.  (Actually, if it's coming from a crank position sensor, the duty will be very low indeed by default, right?  Well, I don't know what all sensors are usually available, maybe there's a cleaner one that's more readily available.  CPS is ECU stuff after all.)

So, as is always the case -- if you can gather data from various environments, especially noisy ones, that gives you something to work with.


As for methods, there's kind of two ways to go about it: by event, or by periodic sampling.  Perhaps there's a hybrid method inbetween, as well.


There's some logic to a "what's this pulse width compared to previous?" method; kind of a zero-dimensional cellular automata.  It's a simple starting point.  But it has the downsides that, if some pulses come too quickly, what do you do?  A sudden burst of rapid pulses completely flushes the memory; so, it can respond quickly, sure, but... should it?  Or if you decide to ignore a pulse (or edge, even), how do you go about that?  It's event-driven, you're obligated to do something every time.  And how much memory should you log?  Probably not too many pulses, because of corresponding delay at minimum speed.  But that limits how much analysis you can do.

A more academic approach is to treat it like any other signal problem: you've got some input bitstream at the given sample rate, and you can do filtering and statistics and whatever on it, and update the output periodically as needed.  (Output updates don't even have to be exactly periodic; you might hold off on updating the output until a new value is known, after one or a few apparent input cycles have passed.)  Analyses can be as fancy as the Fourier transform (resulting in a measure of frequency) -- though since we're working with bits, the Hadamard transform may be more appropriate (resulting in a measure of "sequency"!).  Note that an FT (or HT) can only be done on a whole buffer, so this would have the downside of a total propagation delay equal to buffer length.

The upshot to methods like frequency transforms is, you can simply rescale the spectrum, inverse transform, and output the bitstream.  It's hard getting there, but it's a fairly trivial task once it's done.  And you can suppress noise/harmonics, peak detect, whatever to get a clean output.

An inbetween sort of analysis would be something like a digital PLL.  Run a phase or frequency difference detector between input and reference, and adjust accordingly.  Filter the detector output to remove spurious detections, which simultaneously limits the rate at which the output can respond -- as we should expect.

I'm not sure offhand how annoying a PLL is in software, but it should be alright with an MCU, timer (with input capture and other fairly standard things?) and PID loop*.

*I use "PID" generally here; you might not want a PID exactly, for numerical or dynamical purposes.  More just, any control loop and filter response that's suitable, while having overall integral characteristic (so that error tends towards zero over time).


And we can also make some assumptions about the underlying physical system.  The engine can only accelerate at whatever rate it can.  If it's a manual transmission, changes could be a bit more abrupt due to external forces (both up and down shifting).  Automatic should be somewhere inbetween, depending on type (with non-locking torque converter being the softest, I would think).  This can inform decisions about what statistics to use on accepting/rejecting pulse widths, or, say we do a transform and instead of a nice sharp peak it's got a huge smear of frequencies (chirp), so maybe we shouldn't do peak detection on that; etc.

Tim
Seven Transistor Labs, LLC
Electronic design, from concept to prototype.
Bringing a project to life?  Send me a message!
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf