I use
wxMaxima or Maxima and
gnuplot for most of my curve fitting needs. They are free and open source, and available for all operating systems. wxMaxima is the graphical user interface for Maxima, a computer algebra system. Gnuplot is a tool for generating various kinds of plots, but can also do least-squares one-dimensional curve fitting (using nonlinear least-squares Marquardt-Levenberg algorithm) quite well.
If you have
N values –– that is,
N pairs of
x and
f(x) ––, then an
N-1'th degree polynomial gives an exact fit.
For example, let's say you have a table of four points:
x f(x) 25 100 50 120 75 145 100 185This tells us a third degree polynomial can describe this exactly. In Maxima, we can find it out using
f(x) := C0 + C1*x + C2*x^2 + C3*x^3; solve([ f(25) = 100, f(50) = 120, f(75) = 145, f(100) = 185 ], [ C0, C1, C2, C3 ]);Note that there are four coefficients
C0 through
C3. This is plain
linear algebra, and by hand the solution is obtained by solving the system of equations, often in matrix form. However, Maxima will tell us the solution coefficients:
[[ C0 = 75, C1 = 37/30, C2 = -3/250, C3 = 1/9375 ]]If you want them in decimal form, use for example
float(%); which repeats the results as
[[ C0 = 75.0, C1 = 1.23333333333, C2 = -0.012, C3 = 1.066666667e-4 ]]Thus, the function we are looking for is exactly
f(x) := 75 + 37*x/30 - 3*x*x/250 + x*x*x/9375;To test, if you try
f(25); f(50); f(75); f(100);, you will get the expected results,
100,
120,
145, and
185, respectively.
f(20) will yield
2393/25, or (after
float(%);) 95.72.
You can use standard functions in the function definition, in which case it becomes a system of
N equations with
N unknowns, which may or may not have a solution or more than one solution. If Maxima responds with just
[], it means it couldn't find an exact solution. If may also return several solutions, with
%1,
%2 as free variables (i.e., any value for them yields a valid solution). For example, if you ask
solve(x^2 - 4 = 0, x);it will tell you the two solutions,
[x = -2, x = 2]If I had the dosh, I might buy a license for
Maple, because its optimizer and solver are slightly better, and I often play with annoyingly complex functions where I need to do transforms myself to get Maxima to find a solution. Back in my Uni days, I loved to use Maple for that stuff (molecular dynamics potential models, forcefields, et cetera).
When you have many more data points to fit to, an exact function is often not realistic. Instead, you want an
approximate function, that reproduces the data points as close as possible. The trick is how to define that "close", when you have a set of distances, and different methods for measuring the distance. A typical definition of the distance is
residual, which is just the difference between the fitted function value and the data set value for each data set point
x. (That means, the distance is only measured along the
f(x) axis, or vertically in typical plots.) A typical definition for "as close as possible" is least squares, which finds the fit that yields the smallest sum of each residual squared, for each of the given data points. This is what is called "least squares fitting".
Let's say we wanted to fit
f(x) = C0 + C1*x + C2*x*x to the above dataset using least squares fitting. It cannot
exactly reproduce the data points, but it should get close. With gnuplot, we first prepare the data set, by creating a plain text file with one data point per line. (Lines that begin with a # are comment lines, and gnuplot will ignore those.) For example, let's say we create
data.txt with
# x f(x) 25 100 50 120 75 145 100 185The first line is just a comment line for us humans. We then start gnuplot, and tell it:
f(x) = C0 + C1*x + C2*x*x fit f(x) "data.txt" using 1:2 via C0, C1, C2The 1:2 there means first column for values of x, and second column for values of f(x). It iterates for a while, then describes the fit it generated. The fit is not the last lines, it is actually a bit before under the "Final set of parameters" heading, but it also sets the fit constants. So, just run
print C0, C1, C2and it will tell you
92.4999999999093 0.120000000003356 0.00799999999997438i.e. the fit function is
f(x) = 92.5 + 0.12*x + 0.008*x*x.
With gnuplot, we can now compare our fit to the points by plotting it as a graph:
set xrange [ 0 : 125 ]; plot f(x) notitle with lines lc "red", "data.txt" using 1:2 notitle with points lc "black"Similarly, we can print the fit function values in the four interesting points via
print f(25), f(50), f(75), f(100)which tells us the corresponding values (100.5, 118.5, 146.5, 184.5, so not exactly right, but not that much off either).
We could also just try a linear fit, via
f(x) = C0 + C1*x ; fit f(x) "data.txt" u 1:2 via C0, C1which is not too bad either.
If you want more control over fitting, say adding importance/weighing to the data points you have (and thus exceed basic "least squares" fitting with your own "distance" and "close enough" definitions), you can use
Octave/Matlab (Octave being pretty compatible with Matlab but free and open source), or if you are familiar with Python, use
numpy[/tt]/[URL=https://scipy.org/]scipy for numerical fitting (and other stuff, including Fourier transforms and such). Statistics people tend to use
R programming language, which is also free and open source, where fitting curves slots under the heading
regression analysis, often linear regression (typical example is seeing a point cloud, and the linear regression line through it, showing the dependence between the point x and y coordinates).
For microcontroller stuff, the above is typically the starting point and not the final word, because they (especially Maxima) assume infinite-precision variables. So, I personally often end up just finding the approximate values for the coefficients
C0,
C1, and so on, and write a brute-force (possibly aided by gradient descent at highest supported precision and range, typically
long double on my machine) C program to find the exact values in the actual format I will use –– be that fixed point or float/IEEE 754 binary32 or double/IEEE 754 binary64 ––, that optimizes the approximate coefficients for the used numeric format. This is often seen as silly or going too far, but writing the program is quick, and I can usually leave the computer to chug through the solution space overnight, so I don't see the silliness. You might wish to check out the
4th order polynomial coefficients for pressure at temperature thread from a few years back for what that kind of optimization can be useful for, compared to just curve fitting.