For what it is worth, in my "work" I often use empirically derived potential models, which are defined as a set of samples.
To use them in a simulation, a piecewise polynomial is fitted. Because interatomic forces are derived from the gradient of the total potential, the resulting curve must have a continuous derivative also. This tends to cause oscillation at the tail end, which is problematic, because any kind of "valley" in the potential basically represents a bound state. That is, those oscillations generate some weak bound states: as if near-stationary atoms prefer to hang out at specific (longer) distances. Quite unphysical.
A much better approach is to have a qualitative model also. This corresponds to some form of a function, where some or all of the constants are unknown. Instead of deriving a piecewise curve from the samples, you fit the function to the samples, to get the constants. (Sometimes, physicists do this bass-ackwards: they simulate some atoms ab-initio (using quantum mechanics only) to get such samples, then look at the resulting curve to find a suitable complicated-ish function, and finally find a way to explain why that function works. Then publish it as if they had done it in the reverse order.)
A more practical fitting is to use piecewise polynomials, but dampen them.
For cubic polynomials (extremely common, and have nice features, so recommended -- also used in e.g. Bézier curves), near peaks or throughs the interpolated value can reach unrealistic values (even under zero for strictly positive samples).
Damping means you modify either the interpolating polynomial (if it has extra degrees of freedom), or the samples, until the over/undershoot is within limits. Note that this is typically only done at local minima and maxima, not all through the curve.
An interesting approach to piecewise fitting is to use estimated or real error bars for each sample, and introduce random gaussian error (controlling only the variance/deviation) to each sample, and generate a huge number of full fit curves. From that full set, you pick either the most likely one (among the curves, not among the generating samples!), or use some formula to score the curves, to pick among the best-scoring ones.
However, it is not particularly easy to make a physical argument for that particular curve, unless you have a physical basis for the scoring formula. So you rarely (never) see this used in scientific papers. Hand-tuning the samples to reduce any problematic features in the curve is much more common, and can be left uncommented, because small changes at samples near, produce large changes in, the problematic features.
In practice, this means that I personally have learned to look not only just the curve, but its derivatives also, when qualifying fitted curves. Tools like bash, awk, and gnuplot suffice, although for heavier work numpy works also.
(While I prefer C, speed of implementing such rarely-used tools beats any small gains in execution speed; that, and the desire to keep the code as easily understood an maintainable as possible, is why I use script languages for such tools.)
Linear interpolation works well for things like probability distributions, intensities, and such, where simple continuity of the curve is needed, and there is no need to consider continuity of any derivatives.
(For generating pseudorandom numbers according to a numerical distribution, it is easier to use no interpolation at all, just rectangular bins. Such generators produce one of the samples, but distributed according to the area covered by each sample.)
The reason I posted my script above, is because it should do exactly what OP asked in their initial post.
The script reads in samples, optionally sets boundary limits (to ensure y=0 at x=0, and y=5 at e.g. x=255, via -min 0 0 -max 255 5) and range of samples to output (say -out 0 255), averages duplicate samples and interpolates missing samples linearly, and outputs the result in the same format as the input.
That said, CatalinaWOW is perfectly right: you should probably reconsider first whether linear interpolation is the best option, or whether you picked it because it felt simplest to implement. The others aren't really that much harder to implement, and having such tools in your toolbox is useful. Has been for me, anyway. (I guess I am a tool hoarder of just another type, eh?)
Finally, the more data you have, the better; especially if it is noisy. Filtering is easy when you have lots of data, but hard when data is sparse.