Author Topic: Creating lookup tables from sets of samples (Read 3415 times)

JPortici · « **on:** July 12, 2019, 09:23:54 am »

Hi, i have a nonlinear block i have to replicate... Its behaviour is more or less a piecewise linear function, but because reasons i don't want to derive a function.. i want to replicate it 1:1
So i ramped the input and now i have a few thousand samples that should describe the output completely and i want them to become a lookup table.

Is there a way using maybe excel to do it automatically?

Problem 1: I didn't actually ramp the input, it's a potentiometer so i had channel 1 connected to the taper, channel 2 connected to the output. This means that the ramp is a little more of a piecewise ramp, but it's just a problem of sorting and averaging/removing duplicates, right?
Problem 2: for other reasons there are missing points (not going down to zero and up to 5V) and the generator should be able to produce a 0 -> 2^n - 1 ramp assuming 0 = 0V and 2^n -1 = 5V, filling the gap following the trend

I am doing it manually and i'm almost at the end, but it would be nice to know if a tool for creating LUT already exists

The process is: sort for channel 1 (the input ramp) ascending -> apply formula to convert to counts -> average/eliminate duplicates -> fill gaps -> bonus: C style array

capt bullshot · « **Reply #1 on:** July 12, 2019, 09:34:33 am »

I'd just write a simple C (or whatever language is your preference) to do that. Excel is a tool for bean counters, not for engineers.

ledtester · « **Reply #2 on:** July 12, 2019, 12:44:38 pm »

Quote from: capt bullshot on July 12, 2019, 09:34:33 am

Excel is a tool for bean counters, not for engineers.

One programmer has changed his mind about them:

https://news.ycombinator.com/item?id=20417967

JPortici · « **Reply #3 on:** July 12, 2019, 11:01:45 pm »

Yeah, in the end it would have taken the same time to write and test a C program to do that.
However, while i'm not a bean counter i prefer Excel to matlab/octave unless i know already what to do: change the content on one cell -> automatically change on everything else, plus diagrams without having to re-run the last 10s of commands

Nominal Animal · « **Reply #4 on:** July 13, 2019, 02:37:35 am »

If you have Bash and Awk, you can use the following script, interpolate.bash:

Code: [Select]

#!/bin/bash
DEFAULTFORMAT="%.06f"
FORMAT="$DEFAULTFORMAT"
MIN=""
MAX=""
RANGE=""
HELP=0
ARGS=()
while [ $# -ge 1 ]; do
    if [ "$1" = "-min" ]; then
        MIN="$2 $3"
        shift 3
    elif [ "$1" = "-max" ]; then
        MAX="$2 $3"
        shift 3
    elif [ "$1" = "-format" -o "$1" = "-fmt" ]; then
        FORMAT="$2"
        shift 2
    elif [ "$1" = "-out" -o "$1" = "-range" ]; then
        RANGE="$2 $3"
        shift 3
    else
        ARGS=("${ARGS[@]}" "$1")
        shift 1
    fi
done
set -- "${ARGS[@]}"
if [ $# -lt 1 -o "$1" = "-h" -o "$1" = "--help" ]; then
    exec >&2
    printf '\n'
    printf 'Usage: %s [ -h | --help ]\n' "$0"
    printf '       %s [ OPTIONS ] INPUTFILE(s) [ > OUTPUTFILE ]\n' "$0"
    printf 'Options:\n'
    printf '       -min MININDEX MINVALUE\n'
    printf '       -max MAXINDEX MAXVALUE\n'
    printf '       -out FROMINDEX TOINDEX\n'
    printf '       -format FORMAT\n'
    printf '\n'
    printf 'Both INPUTFILE and OUTPUTFILE will contain two columns per line.\n'
    printf 'First column contains the index (integer), and\n'
    printf 'the second column contains the value at that index.\n'
    printf 'If an index is specified multiple times, the value is\n'
    printf 'averaged.  Missing indexes are interpolated linearly.\n'
    printf '\n'
    printf 'The order of the rows in INPUTFILE does not matter.\n'
    printf 'The rows in OUTPUTFILE will be in ascending order of index.\n'
    printf '\n'
    printf 'MININDEX and MAXINDEX enforce lower and upper limits for the values.\n'
    printf 'FROMINDEX and TOINDEX select only a subrange to be output.\n'
    printf "FORMAT is the printf format for the values, default '%s'.\n" "$DEFAULTFORMAT"
    printf '\n'
    exit 1
fi
awk -v "format=$FORMAT" -v "mini=$MIN" -v "maxi=$MAX" -v "range=$RANGE" 'BEGIN { imin=1e300; imax=-1e300; }
    NF>=2 && $1 !~ /#/ {
        i=int($1); v[i]+=$2 ; n[i]++;
        if (imin>i) imin=i;
        if (imax<i) imax=i 
    }
    END {
        for (i=imin; i<=imax; i++) if (n[i] > 1) { v[i] = v[i] / n[i]; n[i] = 1 }
        if (split(mini, temp)==2) { imin=int(temp[1]); v[imin]=temp[2]; n[imin]=1 }
        if (split(maxi, temp)==2) { imax=int(temp[1]); v[imax]=temp[2]; n[imax]=1 }
        o = imin
        for (i=imin+1; i<=imax; i++) {
            if (n[i] > 0) {
                for (k=o+1; k<i; k++) v[k] = (v[o]*(i-k) + v[i]*(k-o)) / (i-o)
                o = i
            }
        }
        omin=imin ; omax=imax
        if (split(range, temp)==2) { omin=int(temp[1]); omax=int(temp[2]); }
        if (omin<imin) omin=imin
        if (omax>imax) omax=imax
        for (i=omin; i<=omax; i++) printf "%d " format "\n", i, v[i]
    }' "$INPUTFILE"

It takes an input with two values per line (use - for standard input), separated by whitespace: first the index, and then the value at that index. The order does not matter. Output has the same format, except in ascending indexes. Run without parameters to see help/usage.

If you want to ensure the value is 255 at 0, and zero at 256, and print only 0..255 as a comma-separated list of integers suitable for a C array definition, use
./interpolate.bash INPUTFILE -fmt '%.0f' -min 0 255 -max 256 0 -out 0 255 | awk '{ printf "%s, ", $2 } END { printf "\n" }'

I've got tons of these snippets, including one that does a least squares fitting to an arbitrary function you specify on the command line, and displays the original data and the fit as curves using gnuplot. It is typical for most of the code in my scripts to be about printing the usage, but I've learned to not omit it because I soon forget, and later, I'll want to know what the script does. It is easier to have the script tell me when I run it, than read the code.

Mechatrommer · « **Reply #5 on:** July 13, 2019, 03:21:34 am »

Quote from: JPortici on July 12, 2019, 09:23:54 am

I am doing it manually and i'm almost at the end, but it would be nice to know if a tool for creating LUT already exists

you make your own LUT, if you want Excel to become your "processor" you need compatible format, easiest is text based CSV. you can then sort and interpolate in Excel...

Quote from: JPortici on July 12, 2019, 09:23:54 am

Is there a way using maybe excel to do it automatically?

while bound to linear interpolation, somebody have done it, google "interpolation in excel" will give you few hits.

for other tools, i'm guessing you can google as well, but if its me, i'll just code it, its 10 minutes to an hour worth of code. so i can pick what function to use, bicubic, spline or whatever + other specialized functions we want to put in such as remove duplicates or output raw/binary array format etc as you've mentioned.

CatalinaWOW · « **Reply #6 on:** July 13, 2019, 03:50:31 am »

Whether Excel or other tool you will have to use some intelligence to guide the response. Does the program that will use the LUT allow non-uniform spacing of points? What maximum error between your LUT approximation and the "real" function will you allow? Does the consuming program interpolate or just use nearest point? If it does interpolate, what kind of interpolation does it use? All of these will affect the design of the LUT.

Other things that will affect this. Does your data have noise in it, or would you get the same results if you repeated the experiment. If there is noise you will want to think about what kind of noise it is, and design an appropriate filter. Excel is fine for implementation of modest dimension digital filters, but you might be more comfortable with other tools.

Nominal Animal · « **Reply #7 on:** July 14, 2019, 04:16:03 am »

For what it is worth, in my "work" I often use empirically derived potential models, which are defined as a set of samples.

To use them in a simulation, a piecewise polynomial is fitted. Because interatomic forces are derived from the gradient of the total potential, the resulting curve must have a continuous derivative also. This tends to cause oscillation at the tail end, which is problematic, because any kind of "valley" in the potential basically represents a bound state. That is, those oscillations generate some weak bound states: as if near-stationary atoms prefer to hang out at specific (longer) distances. Quite unphysical.

A much better approach is to have a qualitative model also. This corresponds to some form of a function, where some or all of the constants are unknown. Instead of deriving a piecewise curve from the samples, you fit the function to the samples, to get the constants. (Sometimes, physicists do this bass-ackwards: they simulate some atoms ab-initio (using quantum mechanics only) to get such samples, then look at the resulting curve to find a suitable complicated-ish function, and finally find a way to explain why that function works. Then publish it as if they had done it in the reverse order.)

A more practical fitting is to use piecewise polynomials, but dampen them.
For cubic polynomials (extremely common, and have nice features, so recommended -- also used in e.g. Bézier curves), near peaks or throughs the interpolated value can reach unrealistic values (even under zero for strictly positive samples).
Damping means you modify either the interpolating polynomial (if it has extra degrees of freedom), or the samples, until the over/undershoot is within limits. Note that this is typically only done at local minima and maxima, not all through the curve.

An interesting approach to piecewise fitting is to use estimated or real error bars for each sample, and introduce random gaussian error (controlling only the variance/deviation) to each sample, and generate a huge number of full fit curves. From that full set, you pick either the most likely one (among the curves, not among the generating samples!), or use some formula to score the curves, to pick among the best-scoring ones.
However, it is not particularly easy to make a physical argument for that particular curve, unless you have a physical basis for the scoring formula. So you rarely (never) see this used in scientific papers. Hand-tuning the samples to reduce any problematic features in the curve is much more common, and can be left uncommented, because small changes at samples near, produce large changes in, the problematic features.

In practice, this means that I personally have learned to look not only just the curve, but its derivatives also, when qualifying fitted curves. Tools like bash, awk, and gnuplot suffice, although for heavier work numpy works also.
(While I prefer C, speed of implementing such rarely-used tools beats any small gains in execution speed; that, and the desire to keep the code as easily understood an maintainable as possible, is why I use script languages for such tools.)

Linear interpolation works well for things like probability distributions, intensities, and such, where simple continuity of the curve is needed, and there is no need to consider continuity of any derivatives.
(For generating pseudorandom numbers according to a numerical distribution, it is easier to use no interpolation at all, just rectangular bins. Such generators produce one of the samples, but distributed according to the area covered by each sample.)

The reason I posted my script above, is because it should do exactly what OP asked in their initial post.
The script reads in samples, optionally sets boundary limits (to ensure y=0 at x=0, and y=5 at e.g. x=255, via -min 0 0 -max 255 5) and range of samples to output (say -out 0 255), averages duplicate samples and interpolates missing samples linearly, and outputs the result in the same format as the input.

That said, CatalinaWOW is perfectly right: you should probably reconsider first whether linear interpolation is the best option, or whether you picked it because it felt simplest to implement. The others aren't really that much harder to implement, and having such tools in your toolbox is useful. Has been for me, anyway. (I guess I am a tool hoarder of just another type, eh?)

Finally, the more data you have, the better; especially if it is noisy. Filtering is easy when you have lots of data, but hard when data is sparse.

mrflibble · « **Reply #8 on:** July 16, 2019, 07:19:42 pm »

If you are okay with using openoffice/libreoffce instead of excel, then you could run python scripts "inside" of the spreadsheet. Which allows you to use the metric shit-tons of python packages out there for signal processing, making pretty plots and all round general sciency stuff.

Basically you start the spreadsheet component, telling it to accept connections from scripts through this UNO protocol interface thingy. Then fire up the python script that connects to the already running spreadsheet.

Using that connection you can read & modify cells, the works. You can also register listeners that execute on certain events, such as a cell (or cell range) changing value.

See this for a good tutorial: http://christopher5106.github.io/office/2015/12/06/openoffice-libreoffice-automate-your-office-tasks-with-python-macros.html


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

EEVblog Electronics Community Forum

Author Topic: Creating lookup tables from sets of samples (Read 3415 times)

JPortici

Creating lookup tables from sets of samples

capt bullshot

Re: Creating lookup tables from sets of samples

ledtester

Re: Creating lookup tables from sets of samples

JPortici

Re: Creating lookup tables from sets of samples

Nominal Animal

Re: Creating lookup tables from sets of samples

Mechatrommer

Re: Creating lookup tables from sets of samples

CatalinaWOW

Re: Creating lookup tables from sets of samples

Nominal Animal

Re: Creating lookup tables from sets of samples

mrflibble

Re: Creating lookup tables from sets of samples

Share me