Author Topic: algorithm to find best matches of values from bags of parts? (Read 3848 times)

linux-works · « **on:** June 24, 2018, 04:39:43 pm »

I bet there are already-solved ways to do this.

suppose you have a bag of 1k resistors and you want to find clusters of them that are closely matched. its not as important that all in the bag are exactly 1k, but I'd like to find 5 or 10 that are very close to each other, and I can use them as a matched set.

assuming I can read the value to enough digits to be able to sort them, what's a good way to do this?

I was first thinking that I would have a DMM with scpi read values as I put each resistor (or whatever the thing being matched) into the test jig, press the test button and get a value. maybe I have a sequence number for each part and the computer stores the index and value, for each of the parts I read in. I have 50 or 100 cups and after each part is tested, I put it in its numbered cup and proceed until I run thru the whole bag (or enough to be a 'batch').

then I run some sort alg and find clusters of things close together. that's where I'm asking for help. is there a standard/good way to do that? something that compute efficient.

for a 1k target, I may tell the alg that its ok to extend out plus/minus 0.1k, and then have it find clusters of values that are close together.

I'm not a math person, nor a physics person, and I suspect there are some clever ways to find clusters in a space of values?

Kleinstein · « **Reply #1 on:** June 24, 2018, 05:14:12 pm »

The obvious way is to first sort the table by values (no need to use a very efficient sorting algorithm, for 100 numbers the simple bubble-sort is good enough).

The second step is than to look for the smallest difference over the number of devices needed. So if you need 3 matched resistors check the difference of R(N) and R(N+2) for all N.

As the computation is fast compared to the measurement and changing parts. So one could do the calculation right after each sample measured. So one could stop measuring more parts if a sufficient good match is found.
If computation is critical (slow computer like an antique Commodore PET - at least has GPIB build in) one can also add the new data to a sorted list.

apblog · « **Reply #2 on:** June 24, 2018, 07:58:00 pm »

why not just sort them into physical bins as you test them? Then it becomes visually obvious where the clusters are.

Conrad Hoffman · « **Reply #3 on:** June 24, 2018, 09:19:35 pm »

The crude way I do this is to get a piece of corrugated cardboard and set it edge up. I write some values down the face at some reasonable spacing. Then I measure the parts and stick them in the holes on the edge of the cardboard, matching the markings. The groups of close values stand out easily for further investigation.

linux-works · « **Reply #4 on:** June 24, 2018, 09:35:26 pm »

thanks for the ideas, guys.

the eye can spot clusters; how can I get the computer to 'see' them? I want to automate this as much as possible.

I'm now thinking it could be something dynamic, sort of a rebalancing approach. start out with everyone (sample of 1) in the same bucket. add next item, those 2 are 'close' since they are just 2 points. as you add more ports, at some point that single bucket is worth dividing; split it, somehow (from 1 into 2 parts, or more than 2 parts?) and then keep running that same kind of alg.

I think the buckets have to be large, then somehow start to self-shrink based on population size. if we have a bucket that has value 10 and 11 (say 1010 and 1011, for the 1k initial target) and we lump the 10's and 11's into the same bucket, due to how the algorithm ran. after some time, we might find only a few 11's and mostly 10's in that bucket. perhaps its worth having a bucket of 'pure tens' (that are an exact match) than polluting that bucket with some mixed values. (continue to apply this beyond integer-thinking, of course).

it feels like there is some self-sizing approach, somehow. I don't know if I'm right or even close (?).

thing is, I (the user) may not know, apriori, what the bucket radius should be, for grouping values. you may choose to pick the 1010 resistors since you have 12 of them all of the same value, vs the 1004 resistors that have 2 at that value, 1 at 1002, 1 at 1005 and so on.

its like I want to run the algo, have the buckets grow in some smart way, then sort the buckets by population (mode).

??

rhb · « **Reply #5 on:** June 24, 2018, 11:11:17 pm »

Sort the list of values and then plot it with gnuplot by line. The flattest parts of the plot are the closest values. Another option would be to sort the list and then use awk, python, perl etc to calculate the ratios of adjacent values in the list. Then sort that list.

Conrad Hoffman · « **Reply #6 on:** June 24, 2018, 11:47:57 pm »

Histogram?

rhb · « **Reply #7 on:** June 25, 2018, 01:10:26 am »

Quote from: Conrad Hoffman on June 24, 2018, 11:47:57 pm

Histogram?

The problem with histograms is you need to decide on the range and binning. That's why I sort and plot the values before computing histograms for serious work. I'm a big fan of histograms and wrote a code to do 2D histograms that has been extremely useful. It gets really cool when you slice the data on a 3rd variable and show it as a movie.

T3sl4co1l · « **Reply #8 on:** June 25, 2018, 01:57:09 am »

A lot of work for a non-solution.

That is, if you're sorting 5% resistors, say, they're not going to be better than 200 ppm/C, even 1000. And not the typical tempco, but the tempco range.

So are you going to sort on tempco range as well as value at 25C?

1% resistors are so cheap, there's no possible way this makes sense. Or 0.2% resistors even. 0.1% and below the price finally starts going back up... but the tempcos get very good indeed. You get what you pay for, which isn't what a sorting method will give you.

Tim

Conrad Hoffman · « **Reply #9 on:** June 25, 2018, 02:43:01 am »

Not sure I agree. I've built highly stable and functional KVDs just by sorting inexpensive bags of 200 1% metal films. Theory says you can't, but theory usually considers the full range of part specifications, when any given batch manufactured over a short time period will be within a narrower range.

linux-works · « **Reply #10 on:** June 25, 2018, 03:16:19 am »

one thing I'm trying to think about - but no one is really touching on - is having variable bucket sizes, or changing bucket sizes.

its not a hard problem if you KNOW what bucket size you want. here, for me, I don't. part of the solution is for it to 'tell' me what kind of spreads, are in this data set.

I think I could iterate on bucket size and do the parts binning on size 's', sort the histogram to find the most populous clusters, then that's the result set for size 's'. try it for 's-deltaS' and s+deltaS'. move 's' around and see how the spreads change.

there may be several 's' values where you get nice groupings of parts values. you may even decide to loosen your criteria for resistor (etc) matching - if you find that by using a slightly wider radius, you capture a lot more values. as you squeeze the allowable tolerance down lower and lower, some values 'fall out' of the set of clusters. it feels like there's a way to iterate thru this and have a determinable stop point.

all without humans having to look at things. I want a computer driven sorter, or at least highly computer assisted.

sleemanj · « **Reply #11 on:** June 25, 2018, 04:37:36 am »

It seems a problem not far removed from...

cellularmitosis · « **Reply #12 on:** June 25, 2018, 05:12:31 am »

Hi linux-works,

(assuming you'll be using these at room temperature where tempco isn't an issue...)

The last time I had a need to do this, I had put in too much thought and mental exercise before I actually sat down and started measuring resistors.

I think you'll be surprised at how few resistors you need to measure before you find a few which are very closely matched. Measure a group of 20 and you'll be shocked at how close the best pair are matched.

You can also use a google spreadsheet to make it easier to find the most closely-matched set. Here's a screenshot where I measured a few LM399's, then sorted by value, then added column which showed the "span of three" value for each unit + neighbors. Here, you can see that #22, #3 and #24 are the closest group of three (they match within a span of 11mV).

cellularmitosis · « **Reply #13 on:** June 25, 2018, 05:28:26 am »

Quote from: Conrad Hoffman on June 24, 2018, 09:19:35 pm

The crude way I do this is to get a piece of corrugated cardboard and set it edge up.

Quote from: rhb on June 24, 2018, 11:11:17 pm

Sort the list of values and then plot it with gnuplot by line. The flattest parts of the plot are the closest values.

++ I recently used these approaches when I found someone selling a whole bunch of NOS LM329's.

I have a crazy idea to bin these and sell a few poor-man's 7-volt Guildline 4400 clone kits.

ivaylo · « **Reply #14 on:** June 25, 2018, 06:40:24 am »

I guess the standard deviation SD is known so plot the normal distribution curve, determine the size/count of the bins you need and measure away placing in said bins. If SD is not known, take a random sample of say 50, calculate standard error, then there is a formula to get to SD and do the above. Or something like that...

chickenHeadKnob · « **Reply #15 on:** June 25, 2018, 06:54:57 am »

Quote from: linux-works on June 25, 2018, 03:16:19 am

one thing I'm trying to think about - but no one is really touching on - is having variable bucket sizes, or changing bucket sizes.

its not a hard problem if you KNOW what bucket size you want. here, for me, I don't. part of the solution is for it to 'tell' me what kind of spreads, are in this data set.

I think I could iterate on bucket size and do the parts binning on size 's', sort the histogram to find the most populous clusters, then that's the result set for size 's'. try it for 's-deltaS' and s+deltaS'. move 's' around and see how the spreads change.

there may be several 's' values where you get nice groupings of parts values. you may even decide to loosen your criteria for resistor (etc) matching - if you find that by using a slightly wider radius, you capture a lot more values. as you squeeze the allowable tolerance down lower and lower, some values 'fall out' of the set of clusters. it feels like there's a way to iterate thru this and have a determinable stop point.

all without humans having to look at things. I want a computer driven sorter, or at least highly computer assisted.

I suspect you are just over-thinking this one, you already are very close to a workable solution that I would propose: Namely a multi pass radix sort with 10 bins per sort phase. This way your binning becomes automatically adaptive at the cost of multi-passes (multiple measurement of each resistor).

larsdenmark · « **Reply #16 on:** June 25, 2018, 07:47:24 am »

Figure out what tolerance you need the resistors to have in order for you to consider them close enough.

There is no need to have 100 cups. 10 should do it. Label the cups with suitable intervals near the nominal value of the resistors. Since the resistance most likely is normally distributed the majority of the resistors will be very close to the nominal value. If a resistor falls outside the interval covered by your 10 cups them place it is a discard pile (most likely something happened to the resistor during manufacturing which could have other implications such as a changed temp. coeff.).

You may wish to use a Wheatstone bridge for the measurement for improved accuracy.

Zero999 · « **Reply #17 on:** June 25, 2018, 08:25:37 am »

Quote from: T3sl4co1l on June 25, 2018, 01:57:09 am

A lot of work for a non-solution.

That is, if you're sorting 5% resistors, say, they're not going to be better than 200 ppm/C, even 1000. And not the typical tempco, but the tempco range.

So are you going to sort on tempco range as well as value at 25C?

1% resistors are so cheap, there's no possible way this makes sense. Or 0.2% resistors even. 0.1% and below the price finally starts going back up... but the tempcos get very good indeed. You get what you pay for, which isn't what a sorting method will give you.

Tim

I agree.

Quote from: Conrad Hoffman on June 25, 2018, 02:43:01 am

Not sure I agree. I've built highly stable and functional KVDs just by sorting inexpensive bags of 200 1% metal films. Theory says you can't, but theory usually considers the full range of part specifications, when any given batch manufactured over a short time period will be within a narrower range.

I don't think anyone has said that it's not possible.

In theory, yes it's possible to match 1% resistors which have a fairly good temperature co-efficient anyway. For a one off, prototype or hobby project this makes sense, since it's worth spending a couple of minutes finding four or so resistors which match very closely.

However, for larger volumes, it's no longer economical. It's far cheaper to just buy close tolerance resistors in the fist place.

There are also matched resistor networks available which come in various ratios too.
http://www.vishaypg.com/docs/63081/smn.pdf
https://datasheets.maximintegrated.com/en/ds/MAX5491.pdf
http://www.analog.com/media/en/technical-documentation/data-sheets/5400fc.pdf

Conrad Hoffman · « **Reply #18 on:** June 25, 2018, 01:10:23 pm »

Production- no way! Sorting anything in production is a huge PITA and usually leads to low yields. Thank goodness I'm a hobbyist.

No math wiz here, but once you have a set of measurements it would be really fast to just start with the first one and compare it to all the rest. Display the quantity that fall within some tolerances of your choice. Sort the list by that. Unless you measured hundreds of thousands, any modern computer using any of the compiled languages could go through the whole list in a second or three.

I do something similar with a gear ratio program that calcs and sorts all possible combinations of two compound sets over a fairly wide range. It uses a crazy amount of memory, about 2.7G, but nothing the average user doesn't have. Load time is 3-10 seconds. Your problem is much smaller.

larsdenmark · « **Reply #19 on:** June 25, 2018, 02:23:06 pm »

There is no problems for a computer to sort 100,000 measured values. The problem is how to identify and sort the physical resistors afterwards.
No matter what scheme you come up with you only want to touch each resistor once. Hence the need for binning the resistors as early as possible.

Brutte · « **Reply #20 on:** June 25, 2018, 02:48:30 pm »

I have started very similar topic on avrfreaks some time ago.
With P&P machine in mind.

cellularmitosis · « **Reply #21 on:** June 25, 2018, 03:23:13 pm »

Quote from: T3sl4co1l on June 25, 2018, 01:57:09 am

So are you going to sort on tempco range as well as value at 25C?

One of the pipe dreams I occasionally play with (mentally) is how I'd automate testing the tempco of a bandolier of through-hole resistors. Building a mechanism to feed, cut, and grab the resistors seems pretty approachable, but I've been hung up on how I'd actually heat them in a repeatable way (without dipping them in oil).

While we are in this thread, does anyone have any ideas here?

Something which just occurred to me is using current to induce self-heating. On an assembly line, sandwich the resistor between two small chunks of styrofoam, run X current for Y time, then immediately measure the change in resistance. Measuring at multiple points would be easy too (or just measure the entire curve, with something like an SMU?).

T3sl4co1l · « **Reply #22 on:** June 25, 2018, 03:43:56 pm »

I suppose you could zap it with a known energy, on the assumption that the heat capacity is constant. A 10% error in temperature only gives a 10% error in tempco -- not a big deal against the spread you're likely to get.

Alternately, use clamps to grab the leads near the body. Heat/cool that way. Use a generous clamping force, enough to just squash the leads maybe. Required dwell time would be apparent: you get an exponential tail (more or less; actually, there'd be a big diffusion component mixed in, because heat), probably on the order of 20 seconds per part. You might want to build this as a carousel with many tests pipelined at once.

Tim

linux-works · « **Reply #23 on:** July 04, 2018, 02:35:51 pm »

I found out the name for this type of algorithm or subject matter. here's a link that was informative (to me, I am new to this):

http://www.mvstat.net/tduong/research/seminars/seminar-2001-05/

the concept is called "kernel density estimation"

knowing the name is half the battle; now I can see if this will lead to a useful solution.

bitseeker · « **Reply #24 on:** July 05, 2018, 07:25:32 am »

Very interesting. Thanks for the link.


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

Author Topic: algorithm to find best matches of values from bags of parts? (Read 3848 times)

Share me