EEVblog Electronics Community Forum

Products => Computers => Programming => Topic started by: HwAoRrDk on April 03, 2020, 02:31:04 pm

Title: Suggestions wanted for hashing function
Post by: HwAoRrDk on April 03, 2020, 02:31:04 pm
My scenario is as follows:

I have an arbitrary length array of bytes. There is a limited set of certain byte values within this array I'm interested in. For each instance of a value in that set, I know it's offset in the array. So, therefore, each value of interest can be uniquely identified by the pair of <offset, value>.

I want to hash these multi-byte 'IDs' down to a single-byte index value in a limited range - for the sake of example, say, 0-32. Is there some not-too-complicated hashing algorithm that can be applied here? I want something fast and suitable for running on an 8-bit microcontroller.

I've looked at Pearson hashing (https://en.wikipedia.org/wiki/Pearson_hashing), as that gives a single-byte result (i.e. 0-255), but I'm not sure it can be modified to produce values in an even smaller range. I've read the original paper (https://dl.acm.org/doi/10.1145/78973.78978), and it does mention smaller output ranges, but that's only in the context of string inputs that have a limited character set (e.g. alphanumeric characters only). I did think perhaps I could just use the standard algorithm and then divide the result by 8 (or >> 3), but I don't know whether that will just render it pointless by introducing collisions.

Any suggestions?
Title: Re: Suggestions wanted for hashing function
Post by: Mechatrommer on April 03, 2020, 02:52:24 pm
each byte 0-32? there are 2 bytes of that type <offset, value>? thats 1024 combination. hashing it to 1 byte value is collision guaranteed.
Title: Re: Suggestions wanted for hashing function
Post by: HwAoRrDk on April 03, 2020, 03:39:45 pm
Sorry, I don't think I explained very well. Let me give an example.

Given a set of IDs (each comprising pair of 8 bit value and 16 bit offset, concatenated):
0x840007
0x9b0068
0xa10074
0x9b0135
...etc.

I want to hash each into a value between 0 and 31. E.g. the first might be equal to 12, the second 29, third 3, etc.

I want to do this because the hashed value will form an index into an array of memory that cannot be made big enough to encompass using the raw ID as an index (which would need 16 million entries!).
Title: Re: Suggestions wanted for hashing function
Post by: bd139 on April 03, 2020, 03:55:30 pm
Just use modulo division by the number of slots if it requires no cryptographic strength and colissions are not a problem. Don't make it more complicated than you need it to be until you need it to be.
Title: Re: Suggestions wanted for hashing function
Post by: Mechatrommer on April 03, 2020, 08:18:59 pm
Sorry, I don't think I explained very well. Let me give an example.

Given a set of IDs (each comprising pair of 8 bit value and 16 bit offset, concatenated):
0x840007
0x9b0068
0xa10074
0x9b0135
...etc.

I want to hash each into a value between 0 and 31. E.g. the first might be equal to 12, the second 29, third 3, etc.

I want to do this because the hashed value will form an index into an array of memory that cannot be made big enough to encompass using the raw ID as an index (which would need 16 million entries!).
that is even worse than i thought. 16 millions entries into 32 entries. you dont mind collision do you? if you dont as bd139 said, modulo is one way, another way is get X bits value from ID 8 bits value that you think unique and Y bits from 16 bits offset. combine them X + Y = 5 bits (32 entries). or XOR them all the 3 bytes and discard either 3 bits of LSB or MSB, or the mainstream basic (as in literature similar to checksum)... shift bits and XOR 3 times and take 5 bits. ymmv.
Title: Re: Suggestions wanted for hashing function
Post by: bd139 on April 03, 2020, 08:35:07 pm
Re-read this. Smells very much like you need a binary tree rather than a hash table for this. That handles sparse key space better with almost constant lookup time. Really though even if it’s a thousand entries it might just be as fast to use a linked list if the MCU isn’t shit.

Start with something lazy and make it fast if you need to.
Title: Re: Suggestions wanted for hashing function
Post by: HwAoRrDk on April 04, 2020, 01:24:38 pm
Hmm, isn't it funny how sometimes when you sleep on a problem, you realise you're being a colossal spoon, and taking the wrong approach. Turns out I don't need any of this. |O :-DD

If I re-factor my code quite a bit, I can, at the point where the byte array gets scanned for the set of values of interest, simply have a counter that increments every time one is found, and then use the counter value as an index for each value, and pass that through to the code that actually does the business. It was the disconnect between the two parts of code that fooled me in to thinking I needed to deal with just the <value,offset> pairs coming through.

Thanks anyway.
Title: Re: Suggestions wanted for hashing function
Post by: Nominal Animal on April 04, 2020, 10:13:00 pm
Hmm, isn't it funny how sometimes when you sleep on a problem, you realise you're being a colossal spoon, and taking the wrong approach. Turns out I don't need any of this. |O :-DD
You mean, isn't it funny that if you describe the problem to someone else, and then sleep on it, you often wake up with a better solution to the problem?

That happens to me with most complex problems I try to solve.  I've learned to use it to my advantage; to let my subconscious (and especially whatever you call the housekeeping/cleanup process that happens in ones brain during sleep) work on it while I do something else (less taxing on the mind, especially physical exercise).  I can nowadays even feel an "itch" of sorts when it is close to a solution, and know to tickle it a bit and then let mull in peace.

I thought it is a rather clever way to avoid doing unnecessary work...