Author Topic: Data Smoothing- Excel  (Read 16917 times)

0 Members and 1 Guest are viewing this topic.

Offline Conrad HoffmanTopic starter

  • Super Contributor
  • ***
  • Posts: 1930
  • Country: us
    • The Messy Basement
Data Smoothing- Excel
« on: August 16, 2017, 05:27:09 pm »
Here are a few Savitzky-Golay filters you can use to smooth things like voltage history. Savitzky-Golay filters are quite effective and are commonly used in spectroscopy work where you don't want to shift the positions of resonant peaks. They're also good for audio filtering. Lot's of info about 'em on the 'net, plus tables of coefficients if you want to have all the intermediate ones. You can also calculate longer series, but the typing gets a bit annoying and the chance of error goes up. Hopefully I got the numbers right here.

Use your vba editor in Excel and cut and paste these into a module. They should then show up in your macro list to run. I wanted them to be entirely self contained, so a somewhat inelegant three input boxes are presented. The assumption is that your input data will be in the first column and you get to choose the start and stop rows. Then you get to choose which column the output will go in. That's by number, not letter; A=1, B=2, C=3 etc.

Better vba programmers than me can probably improve on these greatly. It's a brute force method and might be faster if For Each were used. These filters are often done using convolution, which is much faster, but way above my pay grade. I didn't use a User Form for input because I wanted them self-contained- I keep a lot of macros in my template that I use for everything. I also don't know what Excel version they'll be compatible with, but probably all.

Enjoy & give some feedback on whether they work OK and if you have improvements.

Code: [Select]
Sub SG_five()
    '5 Point Savitzky-Golay Smoothing Filter
    'Multiple InputBoxes are used so the macro is self-contained.
    'Note that this is a fixed output calculation and does not update if input data is changed.
    On Error GoTo NotValidInput
    Dim i As Integer       'counter
    Dim Lrow As Integer    'low cell of data column
    Dim Urow As Integer    'high cell of data column
    Dim Ic As Integer      'column NUMBER for input data
    Dim Cnum As Integer    'column NUMBER for output
    Ic = InputBox("Enter column NUMBER of input data (A=1, B=2 etc.)")
    Lrow = InputBox("Enter row number of first data cell.")
    Urow = InputBox("Enter row number of last data cell.")
    Cnum = InputBox("Enter column NUMBER for output (A=1, B=2 etc.).")
    '5 point S-G coefficients are -3, 12, 17, 12, -3 and the divisor is 35
    For i = (Lrow + 2) To (Urow - 2)
         Cells(i, Cnum).Value = (-3 * Cells(i - 2, Ic).Value + 12 * Cells(i - 1, Ic).Value _
                               + 17 * Cells(i, Ic).Value + 12 * Cells(i + 1, Ic).Value - 3 * Cells(i + 2, Ic).Value) / 35
    Next i
    Exit Sub
NotValidInput:
    MsgBox ("Non valid entry- terminating.")
   
End Sub

Sub SG_eleven()
    '11 Point Savitzky-Golay Smoothing Filter
    'Multiple InputBoxes are used so the macro is self-contained.
    'Note that this is a fixed output calculation and does not update if input data is changed.
    On Error GoTo NotValidInput
    Dim i As Integer       'counter
    Dim Lrow As Integer    'lower cell of data column
    Dim Urow As Integer    'uppper cell of data column
    Dim Ic As Integer      'column NUMBER for input data
    Dim Cnum As Integer    'column number for output
    Ic = InputBox("Enter column NUMBER of input data (A=1, B=2 etc.)")
    Lrow = InputBox("Enter row number of first data cell.")
    Urow = InputBox("Enter row number of last data cell.")
    Cnum = InputBox("Enter column NUMBER for output (A=1, B=2 etc.).")
    For i = (Lrow + 5) To (Urow - 5)
         Cells(i, Cnum).Value = (-36 * Cells(i - 5, Ic).Value + 9 * Cells(i - 4, Ic).Value + 44 * Cells(i - 3, Ic).Value + 69 * Cells(i - 2, Ic).Value _
                                + 84 * Cells(i - 1, Ic).Value + 89 * Cells(i, Ic).Value + 84 * Cells(i + 1, Ic).Value + 69 * Cells(i + 2, Ic).Value _
                                 + 44 * Cells(i + 3, Ic).Value + 9 * Cells(i + 4, Ic).Value - 36 * Cells(i + 5, Ic).Value) / 429
    Next i
    Exit Sub
NotValidInput:
    MsgBox ("Non valid entry- terminating.")
   
End Sub

Sub SG_twentynine()
    '29 Point Savitzky-Golay Smoothing Filter
    'Multiple InputBoxes are used so the macro is self-contained.
    'Note that this is a fixed output calculation and does not update if input data is changed.
    On Error GoTo NotValidInput
    Dim i As Integer       'counter
    Dim Lrow As Integer    'lower cell of data column
    Dim Urow As Integer    'uppper cell of data column
    Dim Ic As Integer      'column NUMBER for input data
    Dim Cnum As Integer    'column NUMBER for output data
    Ic = InputBox("Enter column NUMBER of input data (A=1, B=2 etc.)")
    Lrow = InputBox("Enter row number of first data cell.")
    Urow = InputBox("Enter row number of last data cell.")
    Cnum = InputBox("Enter column NUMBER for output data (A=1, B=2 etc.)")
    For i = (Lrow + 14) To (Urow - 14)
         Cells(i, Cnum).Value = (-351 * Cells(i - 14, Ic).Value - 216 * Cells(i - 13, Ic).Value + -91 * Cells(i - 12, Ic).Value _
                                 + 24 * Cells(i - 11, Ic).Value + 129 * Cells(i - 10, Ic).Value + 224 * Cells(i - 9, Ic).Value _
                                + 309 * Cells(i - 8, Ic).Value + 384 * Cells(i - 7, Ic).Value + 449 * Cells(i - 6, Ic).Value _
                                + 504 * Cells(i - 5, Ic).Value + 549 * Cells(i - 4, Ic).Value + 584 * Cells(i - 3, Ic).Value _
                                + 609 * Cells(i - 2, Ic).Value + 624 * Cells(i - 1, Ic).Value + 629 * Cells(i, Ic).Value _
                                + 624 * Cells(i + 1, Ic).Value + 609 * Cells(i + 2, Ic).Value + 584 * Cells(i + 3, Ic).Value _
                                + 549 * Cells(i + 4, Ic).Value + 504 * Cells(i + 5, Ic).Value + 449 * Cells(i + 6, Ic).Value _
                                + 384 * Cells(i + 7, Ic).Value + 309 * Cells(i + 8, Ic).Value + 224 * Cells(i + 9, Ic).Value _
                                + 129 * Cells(i + 10, Ic).Value + 24 * Cells(i + 11, Ic).Value - 91 * Cells(i + 12, Ic).Value _
                                - 216 * Cells(i + 13, Ic).Value - 351 * Cells(i + 14, Ic).Value) / 8091
    Next i
    Exit Sub
NotValidInput:
    MsgBox ("Non valid entry- terminating.")
   
End Sub
« Last Edit: August 19, 2017, 01:40:10 pm by Conrad Hoffman »
 
The following users thanked this post: quarks, BravoV, cellularmitosis, bitseeker, enut11, 2N3055, Ash, Mr. Scram, serg-el

Offline metrologist

  • Super Contributor
  • ***
  • Posts: 2199
  • Country: 00
Re: Data Smoothing- Excel
« Reply #1 on: August 16, 2017, 06:20:30 pm »
They work beautiful. Thanks for shoveling  :D
 

Offline Conrad HoffmanTopic starter

  • Super Contributor
  • ***
  • Posts: 1930
  • Country: us
    • The Messy Basement
Re: Data Smoothing- Excel
« Reply #2 on: August 17, 2017, 01:08:05 pm »
Forgot to mention, it's common to make several passes with these filters. Sometimes multiple passes of a lower order filter will remove more high frequency and "spiky" stuff, with less change in the shape. With these you'd have to change the input column in the code, or add a variable and another InputBox, then have multiple input/output columns (you can't change the input data without saving the section being worked on). These have to be subs because a function can't change any value other than the one you put it in. These could be done as functions, if you copied the function to each of the output cells. Lots of ways to get the job done. Glad it worked for you. :D
 

Offline Conrad HoffmanTopic starter

  • Super Contributor
  • ***
  • Posts: 1930
  • Country: us
    • The Messy Basement
Re: Data Smoothing- Excel
« Reply #3 on: August 19, 2017, 01:42:13 pm »
Updated the code so they all have 4 InBoxes so you can enter both input and output columns. Previously, I said they could be done using convolution. Not really what I intended- they ARE convolution. There's a fast convolution method based on FFT that's much faster, which is what I was referring to. Attached graph shows a bit of data from my DMV. Black dotted is original noisy data. Blue is 5 point filter. Yellow is 11 point filter. Brown is 29 point filter. Red is a second pass of the 29 point filter.

Hundreds of views and only one comment? Anybody else try 'em?
« Last Edit: August 19, 2017, 01:47:52 pm by Conrad Hoffman »
 

Offline bitseeker

  • Super Contributor
  • ***
  • Posts: 9057
  • Country: us
  • Lots of engineer-tweakable parts inside!
Re: Data Smoothing- Excel
« Reply #4 on: August 19, 2017, 09:40:39 pm »
Thanks, Conrad. I'll try this out on my next logging project.
TEA is the way. | TEA Time channel
 

Online CatalinaWOW

  • Super Contributor
  • ***
  • Posts: 5226
  • Country: us
Re: Data Smoothing- Excel
« Reply #5 on: August 19, 2017, 10:44:23 pm »
I had never heard of this filter class before and did some reading on what they are and why they might be used.  My conclusion is that they are a least squares curve fit to an appropriate interpolating function.  The question becomes - what is appropriate.  In one mentioned application, smoothing resonant peaks in spectroscopy, there is an a priori known shape for these peaks.

In most of our voltage reference work the a priori known shape is a flat line, and the least squares curve fit to that line is the simply calculated running average (taking care to average a set of points centered on the point your are estimating).  This is a special case of a Savitsky-Galoy filter.

There might be some meat to be had given an analytical form of the temperature response and a relatively noise free measurement of temperature.  Then this filter becomes a least squares fit to the analytical response.   

I am sure there are other specific cases, including the obvious source and load sensitivity.

My go to reference for this type of estimation is "Discrete Techniques of Parameter Estimation" by Jerry Mendel.
 

Offline Conrad HoffmanTopic starter

  • Super Contributor
  • ***
  • Posts: 1930
  • Country: us
    • The Messy Basement
Re: Data Smoothing- Excel
« Reply #6 on: August 19, 2017, 11:52:11 pm »
From Numerical Recipes, "The concept of "smoothing" data lies in a murky area, ..." and I can't disagree. Still, trying to see lower frequencies in a forest of noise is far easier with some kind of averaging or smoothing. At least with S-G you know the underlying signal won't be displaced in time. I can also make the case that whatever is achieved in software is very similar to what might be achieved with more low pass filtration in the hardware- unfortunately neither one can solve the problem we'd all like to solve, that of unpredictable low frequency noise in references and such. Consider S-G another tool in the box, a hammer in case you happen across a nail.  :-DMM
 

Offline alm

  • Super Contributor
  • ***
  • Posts: 2862
  • Country: 00
Re: Data Smoothing- Excel
« Reply #7 on: August 20, 2017, 08:25:38 am »
For people who want to experiment with this filter in other programs, it is available as sgolayfilt in MATLAB and Octave and as scipy.signal.savgol_filter in SciPy.

The original paper is probably paywalled. This paper is freely available and has some more details about frequency domain properties. From what I read, it appears that this has less high-frequency attenuation (c.f. peak shape/height) than a simpler averaging filter, but less noise reduction. I am not convinced that this is a good trade-off for most DC metrology applications. It is also important to know that points near the ends of the curve receive a different treatment (also common with standard moving average filters). It strikes me that anytime you are doing curve fitting, you are making assumptions about properties of the signal. I did not find these assumptions clearly spelled out, and I wonder how applicable they are to our signals.

I find the interactive behavior not very elegant, I would want it to behave more like a worksheet function so it will respond instantly to changed data. But I imagine it is not possible for a worksheet function to produce multiple rows of output (e.g. B1:B100 = foo(A1:A100)). It is always good to have another tool in your toolbox. Thanks for posting the code, and thanks for bringing up this filter.
 
The following users thanked this post: babysitter, maat

Offline Conrad HoffmanTopic starter

  • Super Contributor
  • ***
  • Posts: 1930
  • Country: us
    • The Messy Basement
Re: Data Smoothing- Excel
« Reply #8 on: August 20, 2017, 11:13:13 am »
Excel vba has what I consider the strange limitation that functions can only change their own cell. Even if that function calls another subroutine, the limitation still applies. That leaves one with the choice of copying the function multiple times, or using subs with clumsy input dialogs. Anyway, if you want "live update", just put the filter routines in a function and copy it to each output cell. You'd have to try this with large data sets to see if there's a speed penalty, or maybe it's faster. My assumption was our input data is almost always fixed and running the filter manually was OK, but the function method makes sense if you want to have a spreadsheet template where the data is repeatedly copied in and you want the output to happen automagically.

It's also easy enough to change the coefficients for a simple moving average, linear weighted moving average or various other filter types. In all honesty I'm not sure it makes a huge amount of difference. Our data is usually simple, hopefully the desired flat line, with some noise. It's just hard to mess that up, regardless of filter type. FWIW, in most applications I've run across S-G filters seem to be a safe choice, if not optimal. The more mathematically astute can do better, but it's possible to spend an inordinate amount of time looking for what might be a tiny improvement.

Finally, with large data sets I don't worry about losing some points at the end. There are a couple strategies for filling that area in, but I don't like the idea of making up data, so I prefer to just truncate the results. As always, horses for courses- if anybody has different filter strategies to suggest, post 'em to try!
« Last Edit: August 20, 2017, 11:16:07 am by Conrad Hoffman »
 

Offline eugene

  • Frequent Contributor
  • **
  • Posts: 493
  • Country: us
Re: Data Smoothing- Excel
« Reply #9 on: September 05, 2017, 01:30:09 pm »
I find the interactive behavior not very elegant, I would want it to behave more like a worksheet function so it will respond instantly to changed data. But I imagine it is not possible for a worksheet function to produce multiple rows of output (e.g. B1:B100 = foo(A1:A100)). It is always good to have another tool in your toolbox. Thanks for posting the code, and thanks for bringing up this filter.

Though I like Conrad's code, and thank him for introducing VBA to those that might not have experience with it, I agree that doing things in place seems more natural to me when using Excel. For this, the function SUMPRODUCT(RANGE1, RANGE2) comes to the rescue. Enter your filter coefficients as a column somewhere in your workbook (could be on a completely separate tab.) Select them as one of the ranges, and an equal number of data points as the other. It doesn't matter which comes first. Then copy the function into the column of cells that will contain the smoothed data.

For example, if your filter coefficients are in G1:G5 and your raw data is in A1:A100, then enter the following into B3

=SUMPRODUCT(A1:A5, $G$1:$G$5)

Copy that formula into all the cells in B4:B98.

If you want to get fancier, name the cells containing the filter coefficients something like "SG2COEFS" and then use the formula

=SUMPRODUCT(A1:A5, SG2COEFS)

To name the cells, select them and then enter "SG2COEFS" into the small input box just above column A. You can have multiple sets of coefficients with different names.
« Last Edit: September 05, 2017, 02:15:00 pm by eugene »
90% of quoted statistics are fictional
 
The following users thanked this post: Conrad Hoffman, alm, Chris_Walch

Offline Dougaj4

  • Newbie
  • Posts: 1
  • Country: au
Re: Data Smoothing- Excel
« Reply #10 on: November 20, 2019, 12:34:15 am »
I have just seen this code posted on the Eng-Tips forum https://www.eng-tips.com/viewthread.cfm?qid=460490

I modified the code to run as a function (see note below to return all the results with a single function)

Code: [Select]
Function Savitzky(Drange As Variant, Optional Points As Long = 5)
'5, 11 or 29 Point Savitzky-Golay Smoothing Filter

On Error GoTo NotValidInput
Dim i As Long, j As Long  'counters
Dim irows As Long 'Number of rows of data
Dim Res As Double, ResA() As Variant, FactA As Variant, div As Long

Drange = Drange.Value2

irows = UBound(Drange)
ReDim ResA(1 To irows, 1 To 1)
Select Case Points
    Case 5
        For i = 1 To 2
            ResA(i, 1) = CVErr(xlErrNA)
        Next i
        FactA = Array(-3, 12, 17, 12, -3)
        div = 35
        For i = 3 To irows - 2
            Res = 0
            For j = 0 To 4
                Res = Res + (FactA(j) * Drange(i + j - 2, 1))
            Next j
            ResA(i, 1) = Res / div
        Next i
        For i = irows - 1 To irows
            ResA(i, 1) = CVErr(xlErrNA)
        Next i
    Case 11
        For i = 1 To 5
            ResA(i, 1) = CVErr(xlErrNA)
        Next i
        FactA = Array(-36, 9, 44, 69, 84, 89, 84, 69, 44, 9, -36)
        div = 429
        For i = 6 To irows - 5
            Res = 0
            For j = 0 To 10
                Res = Res + (FactA(j) * Drange(i + j - 5, 1))
            Next j
            ResA(i, 1) = Res / div
        Next i
        For i = irows - 4 To irows
            ResA(i, 1) = CVErr(xlErrNA)
        Next i
    Case 29
        For i = 1 To 14
            ResA(i, 1) = CVErr(xlErrNA)
        Next i
       
        FactA = Array(-351, -216, -91, 24, 129, 224, 309, 384, 449, 504, 549, 584, 609, 624, 629, 624, 609, 584, 549, 504, 449, _
        384, 309, 224, 129, 24, -91, -216, -351)
        div = 8091
        For i = 15 To irows - 14
            Res = 0
            For j = 0 To 28
                Res = Res + (FactA(j) * Drange(i + j - 14, 1))
            Next j
            ResA(i, 1) = Res / div
        Next i
        For i = irows - 14 To irows
            ResA(i, 1) = CVErr(xlErrNA)
        Next i

   
    Case Else
        Savitzky = "Points must be 5, 11, or 29"
        Exit Function
End Select

Savitzky = ResA
Exit Function
NotValidInput:
Savitzky = ("Non valid entry- terminating.")

 End Function

To return all of the results enter as an array function:
 Enter the formula in the top cell of the output range.
 Select the entire output range
 Press F2
 Press Ctrl-Shift-Enter

 
The following users thanked this post: Conrad Hoffman

Offline guenthert

  • Frequent Contributor
  • **
  • Posts: 712
  • Country: de
Re: Data Smoothing- Excel
« Reply #11 on: November 27, 2019, 04:44:03 pm »
Is that BASIC?  Oh, wow, what a blast from the past.  I miss line numbers though, well at least there are plenty of magic numbers.   :-DD

Seriously, what's the objective here?  Drawing a nice flat line?  That surely can be arranged much easier.  I don't quite see the point of massaging the data until it 'looks' good.  Why do we put so much effort into measuring it when we afterwards throw the numbers in a magic box?  I'd rather see some good reasoning behind any decision to throw data away.
 

Offline Kleinstein

  • Super Contributor
  • ***
  • Posts: 14181
  • Country: de
Re: Data Smoothing- Excel
« Reply #12 on: November 27, 2019, 05:36:41 pm »
Smoothing the data is no more than a digital filtering to suppress higher frequency fluctuations / noise. For looking at slower processes this is perfectly fine, especially if used with decimation.  The program looks like a normal FIR low pass filter, nothing robust. Throwing away some data comes in with robust filters, that try to identify bad points and than through them out or give less weight to them - this is where things get debatable. With a good reason and known issues this may also be very appropriate.

For smaller data sets the spread-sheet programms have suite-table average functions. It depends if really a more or less brick-wall filter with ringing is really wanted.
 

Offline dietert1

  • Super Contributor
  • ***
  • Posts: 2063
  • Country: br
    • CADT Homepage
Re: Data Smoothing- Excel
« Reply #13 on: November 27, 2019, 07:04:42 pm »
FIR filters for smoothing are low pass filters. Excel does something similar when you select running average for a trend curve.
For denoising the usual approach includes a median filter, that eliminate outliers from averaging. Those digital filter methods are well understood and there are loads of examples in the web.

Regards, Dieter
 

Offline RandallMcRee

  • Frequent Contributor
  • **
  • Posts: 541
  • Country: us
Re: Data Smoothing- Excel
« Reply #14 on: November 27, 2019, 07:18:43 pm »
FIR filters for smoothing are low pass filters. Excel does something similar when you select running average for a trend curve.
For denoising the usual approach includes a median filter, that eliminate outliers from averaging. Those digital filter methods are well understood and there are loads of examples in the web.

Regards, Dieter
Well, its not that Savitzky-Golay are *not* low pass filters--they are. But they have nice properties, to wit:

Savitzky and Golay were
interested in smoothing noisy data
obtained from chemical spectrum analyzers, and they demonstrated that least squares smoothing reduces noise while
maintaining the shape and height of
waveform peaks
(in their case, Gaussian shaped spectral peaks). In researching
this topic, I did find some awareness of
S-G filters in the signal processing community. Hamming’s book [7] has a discussion of the use of least-squares in
data smoothing, and Orfanidis has a
detailed discussion of S-G filters in his
book, which is now out of print but
available for free download [8].
(from the source quoted in alm on August 20, 2017)

Seems like a nice property for metrology, as well.
 
The following users thanked this post: bitseeker

Offline zbigniew169

  • Newbie
  • Posts: 2
Re: Data Smoothing- Excel
« Reply #15 on: January 17, 2020, 03:57:14 pm »
Your version does not compile, and freebasic complains there are a lot of errors in the code:
============================
C:\temp==>>fbc savitzky_func.bas
savitzky_func.bas(1) error 59: Illegal specification, at parameter 1 (Drange) of Savitzky() in
 'Function Savitzky(Drange As Variant, Optional Points As Long = 5)'
savitzky_func.bas(7) error 14: Expected identifier, found 'Variant' in 'Dim Res As Double, Res
A() As Variant, FactA As Variant, div As Long'
savitzky_func.bas(9) error 265: Symbol not a CLASS, ENUM, TYPE or UNION type, before '.' in 'D
range = Drange.Value2'
savitzky_func.bas(11) error 63: Expected array, before ')' in 'irows = UBound(Drange)'
savitzky_func.bas(13) error 42: Variable not declared, Points in 'Select Case Points'
savitzky_func.bas(16) error 42: Variable not declared, CVErr in 'ResA(i, 1) = CVErr(xlErrNA)'
savitzky_func.bas(18) error 42: Variable not declared, FactA in 'FactA = Array(-3, 12, 17, 12,
 -3)'
savitzky_func.bas(19) error 42: Variable not declared, div in 'div = 35'
savitzky_func.bas(23) error 9: Expected expression, found 'FactA' in 'Res = Res + (FactA(j) *
Drange(i + j - 2, 1))'
savitzky_func.bas(25) error 9: Expected expression, found 'div' in 'ResA(i, 1) = Res / div'
savitzky_func.bas(25) error 133: Too many errors, exiting

Fri  17 Jan 2020 * [Tcmd_x64 25.0.27] * 16:53:25
C:\temp==>>
============================
Do You currenty have another version ?
 

Offline SilverSolder

  • Super Contributor
  • ***
  • Posts: 6126
  • Country: 00
Re: Data Smoothing- Excel
« Reply #16 on: January 18, 2020, 04:01:53 pm »
^Is Freebasic compatible with Microsoft VBA?
 

Offline babysitter

  • Frequent Contributor
  • **
  • Posts: 893
  • Country: de
  • pushing silicon at work
Re: Data Smoothing- Excel
« Reply #17 on: January 20, 2020, 07:49:25 pm »
Balancing on the zone right between on-topic and off-topic...

Thank you for this! I started experimenting with two geophones hooked up to HX711, but I see a lot of traffic/culture noise with my 10 Hz units. Jzst recently realized I need to filter data - at one years worth of data, there was just one tiny quale (magnitude 2.2 about 30 km away) clearly visible. I consider this a good start - but not today.

(I appreciate hints.)



I'm not a feature, I'm a bug! ARC DG3HDA
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf