EEVblog Electronics Community Forum

Products => Computers => Programming => Topic started by: hamster_nz on July 14, 2019, 08:27:49 am

Title: Neat algorithms
Post by: hamster_nz on July 14, 2019, 08:27:49 am
I want to learn some new binary or computational algorithms. The hard thing is that unless you know the names you don't know what to look for.

Things like 8b/10b coding, CORDIC, LFSRs, FFT, forward error correction codes, that sort of thing.

So far I think 8b/10b is my favorite for 'wow' factor, and CORDIC sine/cosine for usefulness, and CORDIC rectangular to polar conversion for how it seems to turn the magical into mundane.

Am thinking of looking at turbo codes - are they nice to study?



Title: Re: Neat algorithms
Post by: westfw on July 15, 2019, 02:29:41 am
There are a lot of collections of this sort of thing.
For example, there is "HAKMEM": ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-239.pdfIt's from 1972, and most of the actual code is PDP10 assembly language (in which you might consider a 36x36bit multiply as faster than than the obvious loop to reverse the bit in a 7 or 8bit byte), but it has some stuff nevertheless.
Title: Re: Neat algorithms
Post by: westfw on July 15, 2019, 04:03:17 am
Bit twiddling hacks: https://graphics.stanford.edu/~seander/bithacks.html
Title: Re: Neat algorithms
Post by: westfw on July 15, 2019, 04:07:24 am
Mark Owen fit an entire floating point library into 1k bytes of ARM CM0 code:  https://www.quinapalus.com/qfplib.html (https://www.quinapalus.com/qfplib.htmlI)
I (https://www.quinapalus.com/qfplib.htmlI) don't know if there are any "interesting" algorithms in there, or whether it's just a really impressive implementation...
Title: Re: Neat algorithms
Post by: hamster_nz on July 15, 2019, 05:29:45 am
So I've started to look at BKM - https://en.wikipedia.org/wiki/BKM_algorithm - mainly because the Wikipedia page is so small that it must be interesting  :D.

It is a "shift and add" algorithm that allows you to calculate things like log(x) and log2(x), such that it could be implemented in digital logic or microcontrollers without floating point. (but it only converges over a limited (but still enough to be useful) range of inputs.

I've converted it to fixed point, and it works like magic:

Code: [Select]
static double log_base_e ( double Argument) /* 1.0 <= Argument <= 4.0 */
 {
    assert(Argument >= 1.0 && Argument <= 4.01);
    uint32_t a = Argument * 512*1024*1024;
    uint32_t x =            512*1024*1024;
    uint32_t y = 0;

    for ( int k = 0; k < N_CONSTANTS; k++ ) {
      uint32_t z = x + (x>>k);
      if ( z <= a) {
          x  = z;
          y += A_e_i[k];
      }
    }
    return y/(1024.0*1024.0*1024.0*2.0);
 }

If anybody else is interested in experimenting, the constants are appropriately scaled versions of A_e[k] = ln (1 + 0.5^i)

Code: [Select]
#define N_CONSTANTS (31)

unsigned int A_e_i[N_CONSTANTS] =
{
        1488522235, 870729689, 479197127,
        252937143,  130190384, 66081633,
        33294987,   16712019,  8372266,
        4190213,    2096128,   1048320,
        524224,     262128,    131068,
        65535,      32767,     16383,
        8191,       4095,      2047,
        1023,       511,       255,
        127,        63,        31,
        15,         7,         3,
        1
};

I've also got log2() version that also works - same algorithm, just different constants:

Code: [Select]
static const unsigned int A_2_i [N_CONSTANTS] =
 {
         2147483648, 1256197404, 691335319,
         364911161,  187825021,  95335645,
         48034512 ,  24110347,   12078627,
         6045199,    3024074,    1512406,
         756295,     378170,     189091,
         94547,      47273,      23637,
         11818,      5909,       2954,
         1477,       738,        369,
         184,        92,         46,
         23,         11,         5,
         2
 };

I'm instantly confused as to how it works, so I think there is plenty of pondering for me to do!
Title: Re: Neat algorithms
Post by: BravoV on July 15, 2019, 05:33:33 am
Subbed ...  :-+
Title: Re: Neat algorithms
Post by: Berni on July 15, 2019, 05:33:59 am
There is also quite a bit to be seen in some of the more advanced fast sorting algorithms.

Another field filled with tricks is computer graphics. The amounts of data that have to be moved is very large so things tend to be optimized to be as efficient as possible. This goes into how to quickly draw smooth lines, fill in polygon areas, do image manipulation such as resampling, stretching, rotations etc.. Then goes into how 3D rendering is done in fast ways, coming from oldschool and now obsolete ray casting to more current polygon based 3D to raytracing to real time ray tracing. For example whole bunch of fancy algorithms are involved in the game like Doom 3D to make it fast enough to run smoothly on the slow computers of the time.
Title: Re: Neat algorithms
Post by: Nominal Animal on July 15, 2019, 08:50:58 am
This might come in handy someday:
(https://i.stack.imgur.com/LXLmK.png)
I found these when I needed a mapping between in-order and heap/storage-order perfect binary search trees (but I wasn't the first; they're OEIS A131987 (https://oeis.org/A131987) and A101279 (https://oeis.org/A101279)).

    k(i,N) = (b(i + 2N - 1))/2

    i(k,N) = (k - 2⌊log2 k) 2N - ⌊log2 k + 2N - ⌊log2 k⌋ - 1

where i and k = 1 .. 2N-1, b(y) is a function that divides y by its largest power of two factor (i.e. removes low zero bits), and ⌊ ⌋ denotes the floor operation.  Note that ⌊log2 k⌋ evaluates to the position of the highest bit set in k (with 0 used for the least significant bit), and 2⌊log2 k evaluates to the highest bit set in k.

The two expressions might look odd, but they're surprisingly simple and fast when written in C (using e.g. GCC builtins (https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html) like __builtin_clz() and __builtin_ctz() and bit shifts); just remember to account for zero-based indexing in C.
Title: Re: Neat algorithms
Post by: Dubbie on July 15, 2019, 08:55:51 am
<snip>coming from oldschool and now obsolete ray casting</snip>

Oi whats this about raycasting being obsolete?

Thats all I do all day every day :D
Title: Re: Neat algorithms
Post by: BillyD on July 15, 2019, 10:42:29 am
Even though I've never had to use it in my career, I always thought that the Bresenham circle algorithm was very clever.
Up to that I would use floating point trigonometric functions to achieve the same result, at probably thousands of times more processing overhead.


Title: Re: Neat algorithms
Post by: Nominal Animal on July 15, 2019, 12:22:52 pm
That reminds me: basic vector algebra. Dot product, cross product, projection, transformation matrices (and their properties), and rotations via unit quaternions aka versors.

If you ever wondered how hard or easy it is to raycast :P stick-and-balls atomic models, or capped cylinders or capsules, lookee here (https://en.wikipedia.org/wiki/User:Nominal_animal).  These assume the eye (ray source) is at origin.  (The answer is, very easy with vector algebra, and if you know vector algebra, turning them into raycasting code is easy as well.)

If you've ever done trilateration or multilateration, and you used any trigonometric functions, you did it wrong.  (The formulae are quite easy if you use vector algebra to get to a simple coordinate system where you only need addition, subtraction, multiplication and division, and a single square root to get the result; and vector algebra again, to transform that result to your original coordinate system.)
Title: Re: Neat algorithms
Post by: T3sl4co1l on July 15, 2019, 12:58:02 pm
Linear algebra, and complex numbers (arguably a special case of R^2 or R^2x2, depending on which format you opt for, and how you define operations), is a bit of work to learn, but intuitively powerful.

Simple case I've played with, the intersections of lines to give a pseudo-3D perspective view.  The operations are very familiar from complex arithmetic: dot and cross products are complex multiplication and division.

(https://www.seventransistorlabs.com/Images/LinesApplet2.png)

Repeat for all lines in the current sector, and recurse over connected sectors, and you have a rudimentary graphics engine.  This is roughly how Ken Silverman's Build engine is written.  Similar arithmetic applies to Wolfenstein 3D, with simplifications for the grid-based geometry.  (This is not, in fact, how Doom is written -- the geometry is of course just as indispensable, but sorting is done by BSP tree.)

Speaking of raycasters (Wolf3D and such), I implemented one of those on an AVR.  Tons of CPU power for this -- the actual rendering is done in a few milliseconds, the rest is spent drawing pixels to the stupid serial LCD. :P
https://imgur.com/tdApTG1 (https://imgur.com/tdApTG1)
It could be textured, if there were enough free RAM/ROM to hold textures in; drawing would be a little slower, really just because more pixels would have to be written (the flat shading allows an optimal display routine, writing exactly the pixels that change from frame to frame; this is why the fill rate varies between scenes).

Also for that display, I wrote this compressor, which was a nice exercise: https://www.seventransistorlabs.com/compr.html (https://www.seventransistorlabs.com/compr.html)

Tim
Title: Re: Neat algorithms
Post by: scatha on July 15, 2019, 10:52:12 pm
If you haven't already, before starting on turbo codes have a look at block and convolutional codes more generally. Block codes are a really cool example of applied linear algebra and some of the decoding techniques used in convolutional codes (particularly the Viterbi algorithm) are neat. The Viterbi algorithm is pretty simple, easy to visualize, and pops up in other places where a maximum likelihood path through a hidden Markov model is needed.
Title: Re: Neat algorithms
Post by: coppice on July 15, 2019, 10:57:31 pm
If you haven't already, before starting on turbo codes have a look at block and convolutional codes more generally. Block codes are a really cool example of applied linear algebra and some of the decoding techniques used in convolutional codes (particularly the Viterbi algorithm) are neat. The Viterbi algorithm is pretty simple, easy to visualize, and pops up in other places where a maximum likelihood path through a hidden Markov model is needed.
I'd add that if you are interested in turbo codes, also look at LDPC and polar codes. I find LDPC particularly interesting because it was published in the 1960s but sunk without trace, because the industry wasn't ready for it. Then it was rediscovered decades later, and is now widely used.
Title: Re: Neat algorithms
Post by: hamster_nz on July 15, 2019, 11:48:41 pm
I'd add that if you are interested in turbo codes, also look at LDPC and polar codes. I find LDPC particularly interesting because it was published in the 1960s but sunk without trace, because the industry wasn't ready for it. Then it was rediscovered decades later, and is now widely used.

I've never heard of Polar Codes before. Thanks!

(starts looking at http://pfister.ee.duke.edu/courses/ecen655/polar.pdf (http://pfister.ee.duke.edu/courses/ecen655/polar.pdf) - A Brief Introduction to Polar Codes)
Title: Re: Neat algorithms
Post by: Berni on July 16, 2019, 06:23:41 am
Oh and since nobody trolled with this one, here is the algorithm that's actually called NEAT:
https://en.wikipedia.org/wiki/Neuroevolution_of_augmenting_topologies
Title: Re: Neat algorithms
Post by: brucehoult on July 16, 2019, 10:26:04 pm
8b/10b is nice. It's pretty old, developed by IBM in 1983.

One wonders if they were influenced by Woz's "5 plus 3" and slightly later "6 plus 2" encodings, which served the same purpose on Apple ][ floppy disks. 8b/10b of course "wastes" only 20% of the bits vs 25% for "6 plus 2". I also don't think Woz paid a lot of attention to DC balance, but only to the maximum number of bit times without a transition.
Title: Re: Neat algorithms
Post by: mrflibble on July 18, 2019, 01:25:26 am
I've never heard of Polar Codes before. Thanks!
Same here!

Quote
(starts looking at http://pfister.ee.duke.edu/courses/ecen655/polar.pdf (http://pfister.ee.duke.edu/courses/ecen655/polar.pdf) - A Brief Introduction to Polar Codes)
Just in case you didn't already notice, there's more interesting stuff to be found on the ECEN 655 course page (http://pfister.ee.duke.edu/courses/ecen655/). For example the one on partition functions of non-fucked-up factor graphs. Or whatever's your mathematical poison really.  ;D
Title: Re: Neat algorithms
Post by: NorthGuy on July 18, 2019, 03:27:10 pm
I have a book "Numerical recipes in c", which is a collection of various algorithms.
Title: Re: Neat algorithms
Post by: RoGeorge on July 18, 2019, 03:42:12 pm
The hard thing is that unless you know the names you don't know what to look for.

Did you parsed the classics?
If not, try the programming bible, the Knuth's books:  https://en.wikipedia.org/wiki/The_Art_of_Computer_Programming
Title: Re: Neat algorithms
Post by: mrflibble on July 19, 2019, 11:36:32 am
This might come in handy someday:
(https://i.stack.imgur.com/LXLmK.png)
I found these when I needed a mapping between in-order and heap/storage-order perfect binary search trees (but I wasn't the first; they're OEIS A131987 (https://oeis.org/A131987) and A101279 (https://oeis.org/A101279)).

    k(i,N) = (b(i + 2N - 1))/2

    i(k,N) = (k - 2⌊log2 k) 2N - ⌊log2 k + 2N - ⌊log2 k⌋ - 1
Neat! While scribbling those down and looking at the resulting graphs & indices I noticed that it contained cyclic subgroups. Taking the above graphs as example, and start at number k=3, so an entry in the "Storage" graph. That maps to i=6 in the "In-order" graph. Then jump to k=6 in the other graph again, and repeat mapping.

k=3
i(3,N) = 6
i(6,N) = 5
i(5,N) = 3

Now we're back at 3 again, so we have a cycle of 3 elements {3,6,5}.
Similarly we have a cycle of 2 elements {1,4}. Which leaves {2} and {7}.

So I wondered how the number and size of these cyclic subgroups would progress for larger N. In binary notation those cycles also exhibit some interesting patterns. Just the first few entries, N=1..9. Note how mirror and rotation symmetries affect the number of entries in a subgroup. As an extreme example for N=9, the single element group {170} (010101010 binary).
Code: [Select]
  1 : 1

--------------------------------------------------------------------------------

  1 : 01, 10
  3 : 11

--------------------------------------------------------------------------------

  1 : 001, 100
  2 : 010
  3 : 011, 110, 101
  7 : 111

--------------------------------------------------------------------------------

  1 : 0001, 1000
  2 : 0010, 0100
  3 : 0011, 1100, 1001
  5 : 0101, 0110, 1010
  7 : 0111, 1110, 1101, 1011
 15 : 1111

--------------------------------------------------------------------------------

  1 : 00001, 10000
  2 : 00010, 01000
  3 : 00011, 11000, 10001
  4 : 00100
  5 : 00101, 01100, 10010
  6 : 00110, 10100, 01001
  7 : 00111, 11100, 11001, 10011
 10 : 01010
 11 : 01011, 01110, 11010, 10101
 13 : 01101, 10110
 15 : 01111, 11110, 11101, 11011, 10111
 31 : 11111

--------------------------------------------------------------------------------

  1 : 000001, 100000
  2 : 000010, 010000
  3 : 000011, 110000, 100001
  4 : 000100, 001000
  5 : 000101, 011000, 100010
  6 : 000110, 101000, 010001
  7 : 000111, 111000, 110001, 100011
  9 : 001001, 001100, 100100
 10 : 001010, 010100, 010010
 11 : 001011, 011100, 110010, 100101
 13 : 001101, 101100, 011001, 100110
 14 : 001110, 110100, 101001, 010011
 15 : 001111, 111100, 111001, 110011, 100111
 21 : 010101, 010110, 011010, 101010
 23 : 010111, 011110, 111010, 110101, 101011
 27 : 011011, 101110, 011101, 110110, 101101
 31 : 011111, 111110, 111101, 111011, 110111, 101111
 63 : 111111

--------------------------------------------------------------------------------

  1 : 0000001, 1000000
  2 : 0000010, 0100000
  3 : 0000011, 1100000, 1000001
  4 : 0000100, 0010000
  5 : 0000101, 0110000, 1000010
  6 : 0000110, 1010000, 0100001
  7 : 0000111, 1110000, 1100001, 1000011
  8 : 0001000
  9 : 0001001, 0011000, 1000100
 10 : 0001010, 0101000, 0100010
 11 : 0001011, 0111000, 1100010, 1000101
 12 : 0001100, 1001000, 0010001
 13 : 0001101, 1011000, 0110001, 1000110
 14 : 0001110, 1101000, 1010001, 0100011
 15 : 0001111, 1111000, 1110001, 1100011, 1000111
 18 : 0010010, 0010100, 0100100
 19 : 0010011, 0011100, 1100100, 1001001
 21 : 0010101, 0101100, 0110010, 1001010
 22 : 0010110, 0110100, 1010010, 0100101
 23 : 0010111, 0111100, 1110010, 1100101, 1001011
 25 : 0011001, 1001100
 26 : 0011010, 1010100, 0101001, 0100110
 27 : 0011011, 1011100, 0111001, 1100110, 1001101
 29 : 0011101, 1101100, 1011001, 0110011, 1001110
 30 : 0011110, 1110100, 1101001, 1010011, 0100111
 31 : 0011111, 1111100, 1111001, 1110011, 1100111, 1001111
 42 : 0101010
 43 : 0101011, 0101110, 0111010, 1101010, 1010101
 45 : 0101101, 0110110, 1011010, 0110101, 1010110
 47 : 0101111, 0111110, 1111010, 1110101, 1101011, 1010111
 55 : 0110111, 1011110, 0111101, 1110110, 1101101, 1011011
 59 : 0111011, 1101110, 1011101
 63 : 0111111, 1111110, 1111101, 1111011, 1110111, 1101111, 1011111
127 : 1111111

--------------------------------------------------------------------------------

  1 : 00000001, 10000000
  2 : 00000010, 01000000
  3 : 00000011, 11000000, 10000001
  4 : 00000100, 00100000
  5 : 00000101, 01100000, 10000010
  6 : 00000110, 10100000, 01000001
  7 : 00000111, 11100000, 11000001, 10000011
  8 : 00001000, 00010000
  9 : 00001001, 00110000, 10000100
 10 : 00001010, 01010000, 01000010
 11 : 00001011, 01110000, 11000010, 10000101
 12 : 00001100, 10010000, 00100001
 13 : 00001101, 10110000, 01100001, 10000110
 14 : 00001110, 11010000, 10100001, 01000011
 15 : 00001111, 11110000, 11100001, 11000011, 10000111
 17 : 00010001, 00011000, 10001000
 18 : 00010010, 00101000, 01000100
 19 : 00010011, 00111000, 11000100, 10001001
 20 : 00010100, 01001000, 00100010
 21 : 00010101, 01011000, 01100010, 10001010
 22 : 00010110, 01101000, 10100010, 01000101
 23 : 00010111, 01111000, 11100010, 11000101, 10001011
 25 : 00011001, 10011000, 00110001, 10001100
 26 : 00011010, 10101000, 01010001, 01000110
 27 : 00011011, 10111000, 01110001, 11000110, 10001101
 28 : 00011100, 11001000, 10010001, 00100011
 29 : 00011101, 11011000, 10110001, 01100011, 10001110
 30 : 00011110, 11101000, 11010001, 10100011, 01000111
 31 : 00011111, 11111000, 11110001, 11100011, 11000111, 10001111
 36 : 00100100
 37 : 00100101, 00101100, 01100100, 10010010
 38 : 00100110, 00110100, 10100100, 01001001
 39 : 00100111, 00111100, 11100100, 11001001, 10010011
 41 : 00101001, 01001100, 00110010, 10010100
 42 : 00101010, 01010100, 01010010, 01001010
 43 : 00101011, 01011100, 01110010, 11001010, 10010101
 45 : 00101101, 01101100, 10110010, 01100101, 10010110
 46 : 00101110, 01110100, 11010010, 10100101, 01001011
 47 : 00101111, 01111100, 11110010, 11100101, 11001011, 10010111
 51 : 00110011, 10011100, 00111001, 11001100, 10011001
 53 : 00110101, 10101100, 01011001, 01100110, 10011010
 54 : 00110110, 10110100, 01101001, 10100110, 01001101
 55 : 00110111, 10111100, 01111001, 11100110, 11001101, 10011011
 58 : 00111010, 11010100, 10101001, 01010011, 01001110
 59 : 00111011, 11011100, 10111001, 01110011, 11001110, 10011101
 61 : 00111101, 11101100, 11011001, 10110011, 01100111, 10011110
 62 : 00111110, 11110100, 11101001, 11010011, 10100111, 01001111
 63 : 00111111, 11111100, 11111001, 11110011, 11100111, 11001111, 10011111
 85 : 01010101, 01010110, 01011010, 01101010, 10101010
 87 : 01010111, 01011110, 01111010, 11101010, 11010101, 10101011
 91 : 01011011, 01101110, 10111010, 01110101, 11010110, 10101101
 93 : 01011101, 01110110, 11011010, 10110101, 01101011, 10101110
 95 : 01011111, 01111110, 11111010, 11110101, 11101011, 11010111, 10101111
109 : 01101101, 10110110
111 : 01101111, 10111110, 01111101, 11110110, 11101101, 11011011, 10110111
119 : 01110111, 11011110, 10111101, 01111011, 11101110, 11011101, 10111011
127 : 01111111, 11111110, 11111101, 11111011, 11110111, 11101111, 11011111, 10111111
255 : 11111111

--------------------------------------------------------------------------------

  1 : 000000001, 100000000
  2 : 000000010, 010000000
  3 : 000000011, 110000000, 100000001
  4 : 000000100, 001000000
  5 : 000000101, 011000000, 100000010
  6 : 000000110, 101000000, 010000001
  7 : 000000111, 111000000, 110000001, 100000011
  8 : 000001000, 000100000
  9 : 000001001, 001100000, 100000100
 10 : 000001010, 010100000, 010000010
 11 : 000001011, 011100000, 110000010, 100000101
 12 : 000001100, 100100000, 001000001
 13 : 000001101, 101100000, 011000001, 100000110
 14 : 000001110, 110100000, 101000001, 010000011
 15 : 000001111, 111100000, 111000001, 110000011, 100000111
 16 : 000010000
 17 : 000010001, 000110000, 100001000
 18 : 000010010, 001010000, 010000100
 19 : 000010011, 001110000, 110000100, 100001001
 20 : 000010100, 010010000, 001000010
 21 : 000010101, 010110000, 011000010, 100001010
 22 : 000010110, 011010000, 101000010, 010000101
 23 : 000010111, 011110000, 111000010, 110000101, 100001011
 24 : 000011000, 100010000, 000100001
 25 : 000011001, 100110000, 001100001, 100001100
 26 : 000011010, 101010000, 010100001, 010000110
 27 : 000011011, 101110000, 011100001, 110000110, 100001101
 28 : 000011100, 110010000, 100100001, 001000011
 29 : 000011101, 110110000, 101100001, 011000011, 100001110
 30 : 000011110, 111010000, 110100001, 101000011, 010000111
 31 : 000011111, 111110000, 111100001, 111000011, 110000111, 100001111
 34 : 000100010, 000101000, 010001000
 35 : 000100011, 000111000, 110001000, 100010001
 36 : 000100100, 001001000, 001000100
 37 : 000100101, 001011000, 011000100, 100010010
 38 : 000100110, 001101000, 101000100, 010001001
 39 : 000100111, 001111000, 111000100, 110001001, 100010011
 41 : 000101001, 010011000, 001100010, 100010100
 42 : 000101010, 010101000, 010100010, 010001010
 43 : 000101011, 010111000, 011100010, 110001010, 100010101
 44 : 000101100, 011001000, 100100010, 001000101
 45 : 000101101, 011011000, 101100010, 011000101, 100010110
 46 : 000101110, 011101000, 110100010, 101000101, 010001011
 47 : 000101111, 011111000, 111100010, 111000101, 110001011, 100010111
 49 : 000110001, 100011000
 50 : 000110010, 100101000, 001010001, 010001100
 51 : 000110011, 100111000, 001110001, 110001100, 100011001
 52 : 000110100, 101001000, 010010001, 001000110
 53 : 000110101, 101011000, 010110001, 011000110, 100011010
 54 : 000110110, 101101000, 011010001, 101000110, 010001101
 55 : 000110111, 101111000, 011110001, 111000110, 110001101, 100011011
 57 : 000111001, 110011000, 100110001, 001100011, 100011100
 58 : 000111010, 110101000, 101010001, 010100011, 010001110
 59 : 000111011, 110111000, 101110001, 011100011, 110001110, 100011101
 60 : 000111100, 111001000, 110010001, 100100011, 001000111
 61 : 000111101, 111011000, 110110001, 101100011, 011000111, 100011110
 62 : 000111110, 111101000, 111010001, 110100011, 101000111, 010001111
 63 : 000111111, 111111000, 111110001, 111100011, 111000111, 110001111, 100011111
 73 : 001001001, 001001100, 001100100, 100100100
 74 : 001001010, 001010100, 010100100, 010010010
 75 : 001001011, 001011100, 011100100, 110010010, 100100101
 77 : 001001101, 001101100, 101100100, 011001001, 100100110
 78 : 001001110, 001110100, 110100100, 101001001, 010010011
 79 : 001001111, 001111100, 111100100, 111001001, 110010011, 100100111
 82 : 001010010, 010010100
 83 : 001010011, 010011100, 001110010, 110010100, 100101001
 85 : 001010101, 010101100, 010110010, 011001010, 100101010
 86 : 001010110, 010110100, 011010010, 101001010, 010010101
 87 : 001010111, 010111100, 011110010, 111001010, 110010101, 100101011
 89 : 001011001, 011001100, 100110010, 001100101, 100101100
 90 : 001011010, 011010100, 101010010, 010100101, 010010110
 91 : 001011011, 011011100, 101110010, 011100101, 110010110, 100101101
 93 : 001011101, 011101100, 110110010, 101100101, 011001011, 100101110
 94 : 001011110, 011110100, 111010010, 110100101, 101001011, 010010111
 95 : 001011111, 011111100, 111110010, 111100101, 111001011, 110010111, 100101111
102 : 001100110, 100110100, 001101001, 101001100, 010011001
103 : 001100111, 100111100, 001111001, 111001100, 110011001, 100110011
106 : 001101010, 101010100, 010101001, 010100110, 010011010
107 : 001101011, 101011100, 010111001, 011100110, 110011010, 100110101
109 : 001101101, 101101100, 011011001, 101100110, 011001101, 100110110
110 : 001101110, 101110100, 011101001, 110100110, 101001101, 010011011
111 : 001101111, 101111100, 011111001, 111100110, 111001101, 110011011, 100110111
115 : 001110011, 110011100, 100111001
117 : 001110101, 110101100, 101011001, 010110011, 011001110, 100111010
118 : 001110110, 110110100, 101101001, 011010011, 101001110, 010011101
119 : 001110111, 110111100, 101111001, 011110011, 111001110, 110011101, 100111011
122 : 001111010, 111010100, 110101001, 101010011, 010100111, 010011110
123 : 001111011, 111011100, 110111001, 101110011, 011100111, 110011110, 100111101
125 : 001111101, 111101100, 111011001, 110110011, 101100111, 011001111, 100111110
126 : 001111110, 111110100, 111101001, 111010011, 110100111, 101001111, 010011111
127 : 001111111, 111111100, 111111001, 111110011, 111100111, 111001111, 110011111, 100111111
170 : 010101010
171 : 010101011, 010101110, 010111010, 011101010, 110101010, 101010101
173 : 010101101, 010110110, 011011010, 101101010, 011010101, 101010110
175 : 010101111, 010111110, 011111010, 111101010, 111010101, 110101011, 101010111
181 : 010110101, 011010110, 101011010
183 : 010110111, 011011110, 101111010, 011110101, 111010110, 110101101, 101011011
187 : 010111011, 011101110, 110111010, 101110101, 011101011, 110101110, 101011101
189 : 010111101, 011110110, 111011010, 110110101, 101101011, 011010111, 101011110
191 : 010111111, 011111110, 111111010, 111110101, 111101011, 111010111, 110101111, 101011111
219 : 011011011, 101101110, 011011101, 101110110, 011101101, 110110110, 101101101
223 : 011011111, 101111110, 011111101, 111110110, 111101101, 111011011, 110110111, 101101111
239 : 011101111, 110111110, 101111101, 011111011, 111101110, 111011101, 110111011, 101110111
247 : 011110111, 111011110, 110111101, 101111011
255 : 011111111, 111111110, 111111101, 111111011, 111110111, 111101111, 111011111, 110111111, 101111111
511 : 111111111


The number of subgroups for N=1..20:
1,2,4,6,12,18,34,58,106,186,350,630,1180,2190,4114,7710,14600,27594,52486,99878

That looked so deliciously progressive that I would have been a little sad if there wasn't an OEIS entry. The main surprise was going to be, what kind of entry is it going to be. ;D

So ... plug it into OEIS (https://oeis.org/search?q=1%2C2%2C4%2C6%2C12%2C18%2C34%2C58%2C106%2C186%2C350%2C630%2C1180%2C2190%2C4114%2C7710%2C14600%2C27594%2C52486%2C99878&language=english&go=Search) and hope to find something interesting. Turns out it is A052823 (https://oeis.org/A052823), "A simple grammar: cycles of pairs of sequences". Sounds about right. :)
And it has all the right features that can result in a rainy day down the rabbit hole. Or maybe two rainy days, because both the number of n-bead necklaces and the Zeta function reference look vewy vewy rabbit holey. For a previous project, while mucking about with partitioning of bit sequences I kept tripping over beads and necklaces, so I wonder what that reference is going to turn up. I am so hoping that it has partitions all over the place, but that surprise will just have to wait for some rain...
Title: Re: Neat algorithms
Post by: Nominal Animal on July 19, 2019, 01:36:00 pm
Absolutely; digit reversal is one of those things that surprised the heck out of me when I first encountered it.  I thought it only appeared in games and tricks, not in real-world tasks, and not in stuff related to graphs!
My intuition says the cyclic subgraphs are related to digit reversal, but my math-fu isn't good enough to be sure.
Title: Re: Neat algorithms
Post by: mrflibble on July 24, 2019, 09:10:19 pm
Those cyclic subgroups are indeed caused in part by the reversal of part of the bit sequence. But you don't really need any math-fu to see how it works. Just take a good look at those k(i,N) & i(k,N) functions. Pick the one you find easiest to read, and check what happens when this function is applied. Each step splits the binary representation in 2 chunks, does something reversey to 1 chunk while keeping the other as is, and then swaps the 2 resulting chunks. And you can even see what kind of operation that is when you keep track of what those operations "mean" in the graph.

The way those cycles looked I thought I could probably generate them with a bunch of polynomials over a Galois field + some modular arithmetic (similar to how a LFSR works since I used GF(2)). Well, aaaaalmost, but not quite.  :( Darnit. I was hoping to generate those cycles using polynomials with a degree comparable to the tree depth, but that was a no-go. Works for the first few orders but after that it's a nogo. Of course you could make it work by using polynomials of higher degree, but that would be cheating. :P

As for rabbit holes, yup, several freaking big ones. Complete with prime number distributions, because Zeta. Haven't even scratched the surface there, because those or not just rabbit holes, but bloody recursive rabbit holes. :scared:
Title: Re: Neat algorithms
Post by: Nominal Animal on July 24, 2019, 10:22:17 pm
I know what you mean by rabbit holes (planetary warrens, more like).  I did start looking at whether there are efficient swapping sequences to convert one into another (that's closely related to loops in the graphs), but managed to catch myself in time.

I once explored the Fibonacci sequence, up to examining the ways of calculating a specific Fn efficiently.  Well, that ended up with me reading some papers on how to do multiplication of very large numbers via summing the Fourier-transforms of their digits.  Yeah.  It Gets Deep.
Title: balanced binary counter
Post by: notadave on July 25, 2019, 08:07:30 pm
If you like 8b10b you might like my version of 4b6b, I am sure it is a brain twister.
Of these two functions, one counts up, the other down.
Encode.c takes a normal binary number and returns a balanced number. Decode.c converts the other way.
Due to the implementation with zeros in the unused places, it is a fixed number of ones and zeros counter but I meant it to be the same number of ones and zeros. That is why I call it balanced.
What uses you may have for it I can not know, but I use it for channel coding (mostly 4b6b but also 6b8b), without the use of tables or other encoded knowledge. Maybe this type of counting could one day be used by computers that save power by maintaining what ever state property they use to represent bits. The code words are well suited for OOK modulation.
These functions are small as machine code and sufficiently fast for encoding 4b6b and 6b8b at 10kBaud with 8MIps. They only require simple 8bit instructions that are always (PIC, ATtiny, Z80) available, the encoder does not even add. Their run time goes up exponentially with the bits, so even though the algorithm scales  to any number it is impractical.
This implementation is done in 'C', it does not need much memory but done in ASM or HDL the implementation would be different, operating on the bits themselves.
These functions do not provide any error handling, they are part of a chain/stack that ensures data flow, framing, error handling, synchronization, ... none of which is included here. Right in front of the encoding counter you will need a function that skips unwanted combinations. These are to me: 000111=0d and their inverse=19d. Other words might be used as special side channel info or for synchronization. That is how I find/mark the start of a frame.
Title: Re: Neat algorithms
Post by: mrflibble on July 26, 2019, 04:47:02 am
Gave it a go as well for the swapping sequences, since that was a relatively small code change. Decided to go a little higher with the polynomial degrees. So for tree depth of 3 (the example graphs in this thread) I went up to degree 9 polynomials. But still no dice.

I wonder if it would be viable to "find" the polynomials by construction, and then try to simplify where possible. I'd expect there to be 21th degree polynomials that would do the trick for a mere tree depth of 3. That is way beyond what I would call an elegant solution, but it might help in gaining some insights. At least at this point I don't see a nice solution. Not that that means anything really, because I am perfectly capable of being mathematically blind as a bat. ;D

As for summing the Fourier transform of digits, that does look suspiciously like finding prime (or co-prime) factors for both multiplicands. In that case multiplication of the original numbers boils down to adding the multiplicities of the (co-)prime factors. Similar to how you can do fast counters or multipliers in fpga by having a bunch of counters each with a prime modulo. Or coprime modulos, since for example 3*5=15 and 2^4=16 are rather convenient to map to actual hardware.
Title: Re: Neat algorithms
Post by: legacy on July 26, 2019, 11:04:05 am
Absolutely; digit reversal is one of those things that surprised the heck out of me when I first encountered it.  I thought it only appeared in games and tricks, not in real-world tasks, and not in stuff related to graphs!
My intuition says the cyclic subgraphs are related to digit reversal, but my math-fu isn't good enough to be sure.

I haven't understood where the "digit reversal" is useful. Can you guys give an example  :-//
Title: Re: Neat algorithms
Post by: mrflibble on July 26, 2019, 11:51:08 am
I haven't understood where the "digit reversal" is useful. Can you guys give an example  :-//
Maybe you managed to miss this post (https://www.eevblog.com/forum/programming/neat-algorithms/msg2548206/#msg2548206)?  :-//

(https://i.stack.imgur.com/LXLmK.png)

Check out the binary representations of those nodes...
Title: Re: Neat algorithms
Post by: Ice-Tea on July 26, 2019, 11:53:05 am
Am thinking of looking at turbo codes - are they nice to study?

I did my thesis on them ;)
Title: Re: Neat algorithms
Post by: mrflibble on July 26, 2019, 12:01:55 pm
Am thinking of looking at turbo codes - are they nice to study?

I did my thesis on them ;)
So is that a yes, or a no? ;D
Title: Re: Neat algorithms
Post by: Ice-Tea on July 26, 2019, 12:38:05 pm
Well, yes. It was quite rewarding to get the "point". But as far as I remember, it's not exactly something to tack on in a hurry. Takes a fair amount of thinking and resources.

All that was 20 years ago with very green Ice-Tea, so my memories may be a bit colored ;)
Title: Re: Neat algorithms
Post by: coppice on July 26, 2019, 01:16:42 pm
Well, yes. It was quite rewarding to get the "point". But as far as I remember, it's not exactly something to tack on in a hurry. Takes a fair amount of thinking and resources.

All that was 20 years ago with very green Ice-Tea, so my memories may be a bit colored ;)
Pretty much any advanced coding scheme has to be planned into a comms system from the start, from simple 2D convolutional codes up to the latest stuff like polar codes.
Title: Re: Neat algorithms
Post by: chris_leyson on July 26, 2019, 01:34:04 pm
Talking of cyclic things and Galois fields, The fast-M transform is a quick way of cross correlating a binary M sequence or maximal length sequence using the fast Walsh–Hadamard transform. https://epubs.siam.org/doi/10.1137/0220043 (https://epubs.siam.org/doi/10.1137/0220043)
Title: Re: Neat algorithms
Post by: Nominal Animal on July 26, 2019, 02:32:51 pm
I haven't understood where it's useful.

Let's say you xi in ascending order with i=1..N in a list or a tree or a heap, and you want to construct a perfect binary search tree out of them.  (One example is when you fully balance a binary search tree.)  Then, k(i,N) gives you the index in an array-based binary tree, where element xi is located at.  It turns out that even though the formula for k(i,N) looks ugly, current processors can do it faster than the amortized average latency in random memory accesses.  So, it is basically as cheap as a scatter-gather copy of the tree; you get the binary search tree construction for free.

Analogously, if you do have a perfect binary tree stored in an array, and you want to access the ith element, it is at index k(i,N).  Or, if you have an element at index k, it is i(k,N)'th in the tree; no need to backtrack to root or anything like that.

In my case, I wanted to use something I call chunked binary search trees (although I think mathematicians use partitioned instead).  The idea is that because our computers (both caches and storage media) store data in chunks (pages or segments), to efficiently locate sorted data, we store the most often used elements in the very first chunk.  Consider a binary search among N elements: every search checks the element at N/2. On average, almost half the searches check the element at 3N/4, and almost half the element at N/4.  We store these in the very first tree, so that all searches use the very first tree; almost all searches use either the second or third tree; almost all searches use the fourth, fifth, sixth, or seventh tree, and so on.  See how the probability of access is 21-L, where L is the level of the subtree?  We end up with a binary search tree of binary search trees, and the level of the tree directly reflects the probability of that particular tree needed for an uniformly random search.  Each tree is up to one chunk in size, and they are stored in the "storage order". (The contents of each tree can be in linear order, "in-order".)

With this scheme, the number of chunks needed to find the element with the desired value is minimized.  It is perfect for read-only memory-mapping large data sets, as you can let the kernel manage which pages of the mapping are in memory, and which will be read from storage when next needed; it will respond dynamically to memory pressure, so you'll get the maximum benefit of available memory with automatic resident set shrinking when other tasks need the RAM.

While the formulae for i(k,N) and k(i,N) look ugly, they are trivial for current (desktop and server) processors; they contain hardware instructions for the operations, and can evaluate the expressions in just a few cycles.  The main problem with this scheme is that I have not yet found an efficient enough way to describe it to other programmers; and the C code that implements it, tends to utterly confound and exasperate anyone who does not understand the underlying idea first.
Title: Re: Neat algorithms
Post by: legacy on July 26, 2019, 05:07:29 pm
The main problem with this scheme is that I have not yet found an efficient enough way to describe it to other programmers; and the C code that implements it, tends to utterly confound and exasperate anyone who does not understand the underlying idea first.

Yep. I have recently implemented B-tree toy-filesystem where everything that the filesystem stores is organized in B-trees.
- free inodes? handled by a B-tree
- entry-to-inode? handled by a B-tree
- etc

But I know how to do it, how to write it, even if I used simply because I read a book about filesystems and XFS was described as the "original B-tree filesystem".

Now, for the arise-v2 softcore project, I have the need to "map" a ROM address into the corresponding line in assembly code that the "code flow" shows on the screen. There is a hardware debugger (project Otaku) inside the softcore; sort of telemetric thing that at the end of each "code-fetch" sends the CPU's context to the host (so I can verify that the HDL code is working as expected). Lines in the assembly source file are not synchronized to the lines of machine code, due to comments in the source file, due to included files, due to macros, and other stuff. Oh, and I also need to do vice versa: I need to map a machine address exported by the debugger into a line of one of the n files that compose a project, and for both these two purposes, I am thinking about recycling the same b-tree algorithm used for the toy filesystem.

That's why I got surprised to see an other binary-tree scheme.
Title: Re: Neat algorithms
Post by: brucehoult on July 26, 2019, 06:47:24 pm
But I know how to do it, how to write it, even if I used simply because I read a book about filesystems and XFS was described as the "original B-tree filesystem".

1993?

Apple's HFS, for example, was just a little earlier in 1985.
Title: Re: Neat algorithms
Post by: Nominal Animal on July 26, 2019, 07:16:06 pm
I also need to do vice versa: I need to map a machine address exported by the debugger into a line of one of the n files that compose a project, and for both these two purposes, I am thinking about recycling the same b-tree algorithm used for the toy filesystem.
Yup, you essentially have an ordered 1:1 map between machine addresses and source lines, and given one, need to match the other, with the twist that if the key is not listed, then use the nearest previous key instead.  There are many ways to implement it, but algorithmic reuse makes sense, as it makes the project easier to maintain, and a few percent speed difference between implementations is utterly insignificant in the grand scheme of things.

That's why I got surprised to see an other binary-tree scheme.
Oh, that binary search tree stuff is for LARGE datasets.  As in won't fit in RAM.  With sparse keys, too, say 2128 or larger keyspace.  The reason to not use b-trees is when the dataset has a simple binary partitioning function, but no d-way partitioning function, nor any single-dimensional order.  For example, if your data are k-dimensional points, you can use a hyperplane to split the leftover set in two in any direction.  If the dataset is one-dimensional, then b-tree is more efficient (and you typically only need a small number of levels, maybe five to eight at most, even for huuuuge datasets, considering the currently-used page sizes, given typical key sizes).

(Rudolf Bayer (https://en.wikipedia.org/wiki/Rudolf_Bayer) and Edward M. McCreight (https://en.wikipedia.org/wiki/Edward_M._McCreight) invented B-trees (https://en.wikipedia.org/wiki/B-trees) back in 1971.)
Title: Re: Neat algorithms
Post by: chickenHeadKnob on July 26, 2019, 07:21:43 pm
But I know how to do it, how to write it, even if I used simply because I read a book about filesystems and XFS was described as the "original B-tree filesystem".

1993?

Apple's HFS, for example, was just a little earlier in 1985.

The B-tree concept dates back to late 60's, IBM mainframes had VSAM in the 70's, with VSAM being sort of b-tree with complications. nothing new under the sun you young whipper-snapper-shens
Title: Re: Neat algorithms
Post by: brucehoult on July 26, 2019, 07:28:03 pm
B-trees have been used for a long time within files, yes. However on most systems the disk blocks within a file were essentially a linked list. One of the big differences with HFS was that the list of disk blocks in every file on the disk were kept in one single B-tree for the disk, with {inode number, block number} as the key.

*not actually inode on a Mac, but similar
Title: Re: Neat algorithms
Post by: legacy on July 26, 2019, 09:11:39 pm
say 2128 or larger keyspace

yup, the toy-filesystem has been implemented in the ROM of a Coldfire-v1 board that handles a parallel/ATA hard drive of 4Gbyte. It was too small for today Linux needs, so I was really tempted to throw it into the trash can it, but then I decided to reuse it for a hobby project. The hard drive is old but modern enough to let you set a data block of 1Kbyte of size, so 4Gbyte/1Kbyte is approximatively the order of magnitude of the maximal physical entries handled by b-tree.

With nodern hard drives (>= 1Tera byte) it's a very large keyspace  :D
Title: Re: Neat algorithms
Post by: Nominal Animal on July 27, 2019, 12:19:10 am
Filesystems use B-trees (or variants) as opposed to binary search trees, because the data is essentially one-dimensional; the entries are naturally ordered, so there is no need or benefit in binary splitting.  B-trees just fit the use case much better.

HFS did have data and resource forks, but they were in different extents, referred to from the same catalog file b-tree entry.

Technically, we could have multidimensional files, or files composed of allocation block sized nodes in a directed graph or a tree... but since we can emulate those trivially using one-dimensional data (especially nicely on filesystems that support sparse files), it looks like one-dimensional data storage models suffice.  For us humans, at least.
Title: Re: Neat algorithms
Post by: legacy on July 27, 2019, 11:11:43 am
What I found difficult regards the implementation. B-tree are usually implemented in a way they allocate leaves and nodes in ram, while in my case structures are entirely allocated on disk, so it's a b-tree that read a disk block of 1Kbyte and this contains leaves and nodes.

The original reason for this is that my Coldfire board had only 2Mbyte of ram, soldered on the motherboard and not expandable.
Title: Re: Neat algorithms
Post by: hamster_nz on July 28, 2019, 10:55:51 am
Been reading & playing with the Hilbert Transform. It takes real-only samples and 'magically' makes them into complex samples. For example, if you feed in a cosine wave, the transform should (and does) gives you back the matching sine wave for the quadrature part.

Lots of interesting thoughts, like how the FIR filter has has the same coefficients as the spectrum of a square wave, because the frequency response needs to look like a square wave with negative frequencies get moved -90 degrees, positive frequencies move +90 degrees.

Code: [Select]
#include <math.h>
#include <stdio.h>

#define SAMPLES 1000
#define HALF_WIDTH 11  /* e.g. 11 filters from -11 to 11 */

float x[SAMPLES];

int main(int argc, char *argv[]) {
  int i;
  /* Build some test data */
  for(i = 0; i < SAMPLES; i++) {
     x[i] = cos(2*M_PI*i/10.3);
  }

  /* Now apply the Hilbert Transform and see what you get */
  /* It should be close to sin(2*M_PI*i/10.3) */
  for(i = HALF_WIDTH; i < SAMPLES-HALF_WIDTH-1; i++) {
    double h = 0;
    int j;

    /* Apply the kernel */
    for(j = 1; j <= HALF_WIDTH; j+=2)
      h += (x[i-j]-x[i+j]) * 2.0/(j*M_PI);

    /* Print result */
    printf("%8.5f, %8.5f\n", x[i], h);
  }
}
Title: Re: Neat algorithms
Post by: Sam Hobbs on August 01, 2019, 05:56:31 pm
I am not sure what the question is. Perhaps the Dictionary of Algorithms and Data Structures (https://xlinux.nist.gov/dads)  is relevant.
Title: Re: Neat algorithms
Post by: nigelwright7557 on August 04, 2019, 05:33:45 am
One of my jobs is designing USB scopes.
I wrote most of the code myself.
However I wasn't an expert on FFT.
So I searched the WWW for an algorithm.
I downloaded about 3 that I plugged into my own C# code but none worked very well.
They seemed to be very noisy even with a nice sine wave from my sig gen.
It was a struggle but I eventually found one that worked well.
Code: [Select]
        /*
           This computes an in-place complex-to-complex FFT
           x and y are the real and imaginary arrays of 2^m points.
           dir =  1 gives forward transform
           dir = -1 gives reverse transform
        */
        int FFT(int dir, long m)
        {
            long n, i, i1, j, k, i2, l, l1, l2;
            double c1, c2, tx, ty, t1, t2, u1, u2, z;

            /* Calculate the number of points */
            n = 1;
            for (i = 0; i < m; i++)
                n *= 2;

            /* Do the bit reversal */
            i2 = n >> 1;
            j = 0;
            for (i = 0; i < n - 1; i++)
            {
                if (i < j)
                {
                    tx = x[i];
                    ty = y[i];
                    x[i] = x[j];
                    y[i] = y[j];
                    x[j] = tx;
                    y[j] = ty;
                }
                k = i2;
                while (k <= j)
                {
                    j -= k;
                    k >>= 1;
                }
                j += k;
            }

            /* Compute the FFT */
            c1 = -1.0;
            c2 = 0.0;
            l2 = 1;
            for (l = 0; l < m; l++)
            {
                l1 = l2;
                l2 <<= 1;
                u1 = 1.0;
                u2 = 0.0;
                for (j = 0; j < l1; j++)
                {
                    for (i = j; i < n; i += l2)
                    {
                        i1 = i + l1;
                        t1 = u1 * x[i1] - u2 * y[i1];
                        t2 = u1 * y[i1] + u2 * x[i1];
                        x[i1] = x[i] - t1;
                        y[i1] = y[i] - t2;
                        x[i] += t1;
                        y[i] += t2;
                    }
                    z = u1 * c1 - u2 * c2;
                    u2 = u1 * c2 + u2 * c1;
                    u1 = z;
                }
                c2 = Math.Sqrt((1.0 - c1) / 2.0);
                if (dir == 1)
                    c2 = -c2;
                c1 = Math.Sqrt((1.0 + c1) / 2.0);
            }

            /* Scaling for forward transform */
            if (dir == 1)
            {
                for (i = 0; i < n; i++)
                {
                    x[i] /= n;
                    y[i] /= n;
                }
            }

            return (1);
        }

Title: Re: Neat algorithms
Post by: westfw on August 04, 2019, 07:13:32 am
Quote
I eventually found one
Where?  Written by whom?  What's the license?
Title: Re: Neat algorithms
Post by: nigelwright7557 on August 04, 2019, 07:10:18 pm
Quote
I eventually found one
Where?  Written by whom?  What's the license?
Cant remember which site I got it off or who wrote it.

Title: Re: Neat algorithms
Post by: brucehoult on August 05, 2019, 12:23:54 am
Quote
I eventually found one
Where?  Written by whom?  What's the license?

30 seconds with google finds:

http://paulbourke.net/miscellaneous/dft/ (http://paulbourke.net/miscellaneous/dft/)

No license information on that page, but one level up http://paulbourke.net/miscellaneous (http://paulbourke.net/miscellaneous) says right at the bottom:

Quote
The contents of this web site are © Copyright Paul Bourke or a third party contributor where indicated. You may print or save an electronic copy of parts of this web site for your own personal use. Permission must be sought for any other use. Any source code found here may be freely used provided credits are given to the author. Purchase of credit free licenses of material found on this site can be negotiated with the author. The author can also quote for unique variations and/or higher resolution versions of the images found on this site.
Title: Re: Neat algorithms
Post by: Nominal Animal on August 05, 2019, 06:41:42 am
Shows you the "care" typical proprietary developers have with copyrights...  :palm:
Title: Re: Neat algorithms
Post by: nigelwright7557 on August 06, 2019, 09:41:35 am
Shows you the "care" typical proprietary developers have with copyrights...  :palm:
Open source mate !
https://csharp.hotexamples.com/examples/-/FFT/-/php-fft-class-examples.html

Title: Re: Neat algorithms
Post by: Nominal Animal on August 06, 2019, 10:47:45 am
Shows you the "care" typical proprietary developers have with copyrights...  :palm:
Open source mate !
https://csharp.hotexamples.com/examples/-/FFT/-/php-fft-class-examples.html
So... because somebody put the code on a web page, you have the right to use it in your projects?  No, that's not how copyright works, mate.

Let me clarify my position.  Many proprietary software developers believe that open source is anticapitalistic, communistic, or otherwise against competition and horrible for everyone involved, and that gives them the right to ignore the copyright licenses.  I disagree.  I do both open source and proprietary development, and actually had to hire a lawyer in 2000 or so to clarify various licenses to me -- I have this irrational internal drive to do things right, even if nobody else cares --, and I now claim to have a pretty good picture of both the legal implications in various jurisdictions, as well as their suitability for various developer intentions.  Thing is, open source is based on competition; you can describe how it works in terms of game theory.  It is just that if you are fixated on a single business model, you refuse to consider any other models, and cannot see how one can benefit (as a developer) from open source licensing.   However, laws apply whether one understands them or not, and that includes copyright laws.

Even more than unscrupulous businesses (like Qualcomm and Cisco, who seem to rely on their size of their legal department to not to have to follow laws, especially copyright laws wrt. open source projects, especially Linux), I hate incompetent copy-paste programmers, who not only get paid to create shit that will stain others, but happily believe they are doing something positive in the world.  They should be summarily shot.

(Not with a firearm, though.  With a cluebat.  Everyone can learn, except the rare ones who refuse.  Now those, those can be shot with a firearm for all I care.)
Title: Re: Neat algorithms
Post by: westfw on August 07, 2019, 01:27:05 am
Quote
Any source code found here may be freely used provided credits are given to the author.
I had suspected that that part had not been noticed, or followed...  A *LOT* of web-published SW contains such provisions.  As a hobbyist, giving credit where it is due is the least you can do.  As a profession, it's really important, legally, to keep track of the provenance of any found software that you use.  You probably need a lawyer or legal team to say "for sure" whether your use of OSSW will change the terms under which you can make money with your final product, but you should AT LEAST be able to say "I found this code at xxx site, read the license, and I think it's ok for me to use it in my product..."
This gets a bit fuzzier if we're talking about those 4-line "neat algorithms" with an obvious implementation.  I wouldn't worry about that "bit counting using shifts, ands, and adds", for example.  But if you have a piece of code as large as an FFT subroutine...   Pay more attention!
Title: Re: Neat algorithms
Post by: brucehoult on August 07, 2019, 01:40:22 am
Quote
Any source code found here may be freely used provided credits are given to the author.
I had suspected that that part had not been noticed, or followed...  A *LOT* of web-published SW contains such provisions.  As a hobbyist, giving credit where it is due is the least you can do.  As a profession, it's really important, legally, to keep track of the provenance of any found software that you use.  You probably need a lawyer or legal team to say "for sure" whether your use of OSSW will change the terms under which you can make money with your final product, but you should AT LEAST be able to say "I found this code at xxx site, read the license, and I think it's ok for me to use it in my product..."
This gets a bit fuzzier if we're talking about those 4-line "neat algorithms" with an obvious implementation.  I wouldn't worry about that "bit counting using shifts, ands, and adds", for example.  But if you have a piece of code as large as an FFT subroutine...   Pay more attention!

I'm feeling like many people didn't notice that even when I quoted the paragraph containing it here.

It's not exactly onerous terms to adhere to.
Title: Re: Neat algorithms
Post by: legacy on August 11, 2019, 10:06:31 am
Quote
Any source code found here may be freely used provided credits are given to the author.
I had suspected that that part had not been noticed, or followed...  A *LOT* of web-published SW contains such provisions.  As a hobbyist, giving credit where it is due is the least you can do.  As a profession, it's really important, legally, to keep track of the provenance of any found software that you use.  You probably need a lawyer or legal team to say "for sure" whether your use of OSSW will change the terms under which you can make money with your final product, but you should AT LEAST be able to say "I found this code at xxx site, read the license, and I think it's ok for me to use it in my product..."

you definitively need a lawyer or legal team.
Title: Re: Neat algorithms
Post by: NorthGuy on August 11, 2019, 01:03:32 pm
Thing is, open source is based on competition

Open source certainly participate in competition, and often wins because of low price, despite better products may be available. Then, once getting widespread, it is glorified by the mob, so that people believe it is actually the best.

A good example is FreeRTOS, which is fairly mediocre compared to other RTOSes which were present on the market several years ago. By now, people believe FreeRTOS is the best, and market for RTOSes is destroyed entirely.

Meanwhile, the FreeRTOS itself is bought and re-purposed by Amazon. I'm sure Amazon could buy one of better RTOSes as the companies making RTOSes are certainly not in good shape and would definitely sell. But no. Amazon sees more value in popularity than in quality.
Title: Re: Neat algorithms
Post by: Nominal Animal on August 11, 2019, 01:50:15 pm
Then, once getting widespread, it is glorified by the mob, so that people believe it is actually the best.
Human mobs are irrational, absolutely true.

Open source has and will/would survive fine without mobs, but take out the competitive aspect, and it dies from bitrot.  Competition keeps FOSS alive.

A good example is FreeRTOS, which is fairly mediocre compared to other RTOSes which were present on the market several years ago. By now, people believe FreeRTOS is the best, and market for RTOSes is destroyed entirely.
Don't blame FreeRTOS, blame humans who are satisfied with mediocre features and performance.

I have exactly the same gripe about the fundamental structure in molecular dynamics simulators (LAMMPS, Gromacs, etc.).  Their core design reflects computer architectures of the 1980s.  Everything else, from MP (multiprocessing) to MPI (distributed computing) to GPGPU (GPU computation) is just bolted-on.  Aggregation, not design.  But since it works, after a fashion, everyone is satisfied. (Except myself, it seems.)

I just believe in freedom of choice, too; that I do not have the right to force others to sanity.



The reason I do not blame open source, or even think of ways to "fix" open source to counteract the mediocrity, is that this has happened and keeps happening all over human behaviour.  For example, a hundred and twenty years ago, electric cars were ubiquitous in New York.  Compared to now, we have better materials (lighter structures, better electric motor performance, and better battery capacity), but it is all incremental development.  What happened?

Roman legions had battlefield medics with capabilities (including amputation) and survival rates that were only re-reached in the first world war.  What happened in the near two millenia in between?

We still marvel at megalithic ruins, especially the ones with multi-tonne boulders and basically seamless joins.  We say we could do that now, if we wanted, but we don't.  We don't even know why they were built, and therefore claim they were built for religious reasons.  How does that actually make sense?

It boils down to human mobs being really shitty; the majority of humanity only looking for relative comfort and ease.  Bread and circuses.  It is a small minority that actually pushes technologies and advancements forward.  Among that minority, open source works just fine.  Making it work with the majority of humans is much more difficult; in my case, I've found that mixing proprietary bits to essentially stop the stupid/annoying exploiter ones, but not hobble the true developer minority, works best.  It does not reflect negatively on open source per se, it's just that we have to work in the imperfect real world.
Title: Re: Neat algorithms
Post by: NorthGuy on August 11, 2019, 03:37:04 pm
Don't blame FreeRTOS, blame humans who are satisfied with mediocre features and performance.

You cannot really blame people for preferring something free to something expensive.

Look at the car market. It's now doing relatively well - there are numbers of manufacturers which try hard to make their cars better, so they're getting better and less expensive.

Now imagine a new car would be introduced, which is free - you don't buy it, you just download it and drive. Even if it wasn't as good as paid cars, many people woud prefer it. Soon, you'd have lots of people driving free cars. How wonderful.

But this would destroy the mechanism which forces the car manufacturers to produce better and cheaper cars. So, they wouldn't. Many would go out of business. Ferrari would survive, but Ford wouldn't. Soon after, you would only have mediocre free cars and nothing else.

Capitalism is a mechanism which forces people to work. People work, produce things. Things get consumed. People are happy. If the mechanism is destroyed (as in Soviet Union, or in software market) people stop producing and everything goes down the drain.
Title: Re: Neat algorithms
Post by: coppice on August 11, 2019, 05:36:37 pm
Thing is, open source is based on competition

Open source certainly participate in competition, and often wins because of low price, despite better products may be available. Then, once getting widespread, it is glorified by the mob, so that people believe it is actually the best.

A good example is FreeRTOS, which is fairly mediocre compared to other RTOSes which were present on the market several years ago. By now, people believe FreeRTOS is the best, and market for RTOSes is destroyed entirely.

Meanwhile, the FreeRTOS itself is bought and re-purposed by Amazon. I'm sure Amazon could buy one of better RTOSes as the companies making RTOSes are certainly not in good shape and would definitely sell. But no. Amazon sees more value in popularity than in quality.
Many otherwise good tools in the embedded space - compilers, RTOSes, GUI libraries, etc. - have become a huge liability for people by suddenly disappearing from the general market when one player (e.g. a silicon vendor) bought them. I think the RTOS area had reached the tipping point where people have lost trust in things that someone else controls their use of. Of course, the price of FreeRTOS doesn't hurt. :)
Title: Re: Neat algorithms
Post by: Nominal Animal on August 11, 2019, 05:41:27 pm
You cannot really blame people for preferring something free to something expensive.
I can, if there is a significant difference in quality.  Here, we are talking about tools, whose quality will immediately affect the quality of the result.

I obviously understand the reason why hobbyists prefer free over nonfree; I'm a hobbyist myself.  But, when doing a paid work task, I do pay for proper tools.

But this would destroy the mechanism which forces the car manufacturers to produce better and cheaper cars.
The current mechanism which forces the car manufacturers to produce better and cheaper cars.

Fact is, all industries have gone through a fundamental revolution changing business models every couple of generations for a couple of centuries now.  The answer is not to freeze the business models, but to develop them so that they match the market, and the overall market drives what the cultures involved want and need.

Let's say that what you outlined actually happened, and suddenly cars were a commodity instead of a luxury item.  How does that differ from how internet came to be?
Would you have stopped the development of the internet just to protect the business model of telecom and cable TV companies?

The fast and hard fact is that companies are not human, and are not worth protecting.  It is the market, the competition, that keeps businesses healthy, and that is what we (whoever has the executive power) must enforce and direct so that overall, the market does what we want.

A major problem in current software choices is that ever since Microsoft gained dominance, humans have accepted that software is unreliable; that that is the nature of software.  There are a few niches (like medical devices) that are supposed to have different values, but experience (their failure rates and typical bugs) shows that is simply not true, they're just as crap as every other piece of software.  That attitude, that software is fundamentally unreliable, is pervasive throughout human computer use: we're basically programmed for that now.  Yet, Microsofts products are nonfree, and cost money, even though better free alternatives exist.

More importantly, the shift even in free software is from reliability and robustness to ... I dunno, surface bling? marketing? catering to lowest common denominator? Whatever the heck systemd is; all I know that it is neither KISS nor UNIXy.  Fewer and fewer people are interested in OpenBSD or FreeBSD.  Crappy and buggy Linux distributions are most popular; they don't need to be robust or reliable anymore.  Hell, they've brought back the "you need to reboot your computer now" idiom!

If the mechanism is destroyed (as in Soviet Union, or in software market) people stop producing and everything goes down the drain.
No, that happened in Soviet Union because competition was eradicated.

In the software market, competition is weak because people accept crap from both commercial and free software.  There is no demand for anything better.  You claim it is because no-cost software saturated the market; I claim it is because people are easily satisfied with bread and circuses, even when something better is available.

Here in Finland, it is completely accepted that multi-million IT projects often simply "fail", producing no results.  They are blamed on some of the lower-level workers, the bosses get their bonuses and switch to new projects without any harm to their reputation.  This is accepted practice, and is also the reason why IT product quality in Finland has gone to crap.



Let me go back a couple of decades, when I first started running a Linux-based mail and file servers for a small department in a Finnish university.

This was the era of Sendmail, a buggy insecure piece of shit that everyone was using, because it was what everyone was using.  Me, I've always had a paranoid streak, so after testing a few different options (with Exim, procmail, and qmail in the top three I chose from), settled to qmail, because of its security approach.

At every single turn, I had to explain my choice to people who could not themselves explain why they used Sendmail and not something else.

The biggest issue I ever had with it, was that it was too fast in processing internal mailing lists, confusing the main University Sendmail installation; limiting the rate at the Sendmail end fixed that.  After a couple of years, even the main Uni server admins came to accept it as a reliable, although "quirky" choice.

The exact same situation was, and remains, with Bind.  Having fed up with bind security issues, I switched to djbdns (for a local NAT).

Fast forward to today.

Exim has the same architecture as Sendmail, and although is much better security-wise, is Swiss cheese compared to qmail.  Most people still use bind, and simply live with the frequent security issues.  qmail is a tiny niche.  Reliability and efficiency wise we have gone backwards, not forwards.  "Patch Tuesday" is a fact of life, and not a joke anymore.

What happened, was that combined with the increasing memory, bandwidth, and processing power, we reached the satisfy-the-humans level with crappy software.  It does not matter whether it is paid or no-cost, free or proprietary.

So yeah, I definitely blame us humans for having such low expectations and being so shortsighted.  Stupid, really.



Many otherwise good tools in the embedded space - compilers, RTOSes, GUI libraries, etc. - have become a huge liability for people by suddenly disappearing from the general market when one player (e.g. a silicon vendor) bought them.

I know nothing of the RTOS market, but single-vendor products freak the shit out of me.  I've been burned too many times.  I don't trust them at all.
Title: Re: Neat algorithms
Post by: T3sl4co1l on August 11, 2019, 07:46:20 pm
A major problem in current software choices is that ever since Microsoft gained dominance, humans have accepted that software is unreliable; that that is the nature of software.  There are a few niches (like medical devices) that are supposed to have different values, but experience (their failure rates and typical bugs) shows that is simply not true, they're just as crap as every other piece of software.  That attitude, that software is fundamentally unreliable, is pervasive throughout human computer use: we're basically programmed for that now.  Yet, Microsofts products are nonfree, and cost money, even though better free alternatives exist.

I like your systematic perspective; which leads me to question why Microsoft -- an agent, not a market -- deserves specific blame?

I would argue that:
1. Obviously, smaller software products have fewer bugs to begin with.
2. Nostalgia bias masks how buggy early software was.  (OS bugs have been around since day one; examples comes to mind of the first viruses, and other exploits on mainframe and time share systems.  Many of which arguably were a conceptual failure: trusting that the users weren't trying to game the system.  It was a tenable assumption for the smaller labs or companies of the time that enforced user accountability.)
3. There were fewer users to find bugs.  A "big hit" might've had 10k's of sales in the 70s or early 80s.
4. Microsoft rode the wave of technology through the 80s and 90s; they are certainly exemplar of large-scale corporate software production, but they aren't to blame for the wave itself.
5. And with that scale, the first three points get reversed, plus a necessary compromise between quality, development time and development cost.  On the upside, you get a considerable reduction in unit cost, keeping products accessible to a wide user base, despite the rising development costs.  (And, as it turns out, this can go all the way to zero, making FOSS a thing.)

Cheers!
Tim
Title: Re: Neat algorithms
Post by: SiliconWizard on August 11, 2019, 08:12:01 pm
Thing is, open source is based on competition

Open source certainly participate in competition, and often wins because of low price, despite better products may be available.

Competition normally leads to better products, or cheaper, or both.
You can't beat free though. Once a given product is, or has become free, competition is dead. The game is over. Sure you can always sell BETTER products, but you'll never beat the infinite performance/cost ratio.

Of course and fortunately, that's not entirely true. Direct costs are only part of the story. The total cost is what really matters. For a given commercial product, it may cost a lot less overal to select a good commercial RTOS rather than FreeRTOS. Problem is, too many managers still don't realize this.
Title: Re: Neat algorithms
Post by: Nominal Animal on August 11, 2019, 08:54:38 pm
why Microsoft -- an agent, not a market -- deserves specific blame?
My opinion is biased by my own experiences, obviously.

My first foray into computing was on a Commodore C64 in late 1980s, then C128, then a Huyndai 80286 PC clone.  All of these had rather stable software, with the exception of tape issues on C64 and C128 (I never had the disk drive for those), and that I never got SEUCK (Shoot-Em Up Construction Kit) to work on my C128, even in C64 mode.  My first real Unix use was on DEC Alphas in 1994.  In 1998, I maintained a couple of classrooms full of Macs (most running Mac OS 7.5, with full suite of Adobe Photoshop, Pagemaker, and Macromedia Freehand, Director, and so on), a dozen or so Windows PCs, and a couple of Linux servers.  The Macs were the easiest to manage, as I cloned them off master templates, even though they were in use by students (without personal logins, so having access to internals; which necessitated reinstalls rather often).

Before 1998, my experience was that aside from Windows stuff, computers and applications tended to work quite reliably, unless you ran out of memory.  (Mac OS was especially nice, as it allowed one to easily control the amount of memory an application was allowed.)

From 1992 onwards, I had been using Windows (3.1, then 95, then 98, and a bit of NT "server" stuff), and it never felt stable to me.  Comparing to anything else I used, it was the rickety unstable one.

In 1996-1998, when I started using Linux, it surprised me how some software was so much more reliable and stable than others; and that it was not a question of complexity, but of the people maintaining them.  I recall the times when Red Hat pushed a different compiler for the kernel than the userspace.. and not fondly.
It became obvious to me that there were two camps of people: one pushed product out, and let users deal with the bugs that may or may not be fixed later.  The other, like Adobe and Macromedia, provided software that actually worked.  Interoperability was always a problem (up to crashing programs when trying to open files created with a competing program), but this was the era of Wordperfect - Word wars, and Microsoft was mired up to its CEO in that as well.  Before that, one could expect interoperability to became better, not worse.

The final straw that broke my back wrt. Microsoft was how they "fixed" a bug in Mac version of Microsoft Office.  The bug was that after installing Office, the computer would refuse to shut down.  (Obviously, some MS component was stalling the shutdown process.)  The "fix" was an additional system module, that constantly consumed a megabyte of memory in an era where 32 megabytes was lots.  The fuckers couldn't even fix their own code, and instead added a constantly running system module to coddle the piece of shit to allow shutting down the computer!

I myself never switched to Windows 2000 or ME or later versions, as I switched to Linux instead.  Early Mac OS X versions were interesting, and I did use some Macs sporadically since then, but I felt Linux fit my uses much better.  By 2005, I was using Linux exclusively.

So, to summarize, in my experience shitty code comes from Microsoft and people who believe Microsofts tactics and business methods are acceptable;
and from people who simply don't/didn't see the need for software to be reliable and robust, like Eric Allman or Paul Vixie.

In the last twenty years or so, I've seen shitty code become more and more ubiquitous, and robust and reliable code rarer and rarer.
Is it just me?  Could be, but I don't think so.

The total cost is what really matters. For a given commercial product, it may cost a lot less overal to select a good commercial RTOS rather than FreeRTOS. Problem is, too many managers still don't realize this.
So true.  With FOSS, the immediate price may be zero, but there are lots of other costs; the most important thing being that being a "FOSS customer" means absolutely nothing, unlike when you have a commercial relationship.  As a FOSS user/"customer", you have zero leverage on the developers, unless you spend money, time, or other resources.  It definitely is not zero-cost.
Title: Re: Neat algorithms
Post by: NorthGuy on August 11, 2019, 09:17:14 pm
Fact is, all industries have gone through a fundamental revolution changing business models every couple of generations for a couple of centuries now.

Capitalism is not a market model. It is a fundamental mechanism which guides the economy. Businesses which produces something useful receive money (which is a measure of their contribution to the society if you will), then they can use the money to produce more. Such businesses get progressively bigger and bigger. Business which produces something which nobody wants, lose money and die. The death is key as it frees resources for successful businesses. This mechanism forces everyone to do something useful.

When the mechanism is impeded it works worse. It is even possible to destroy it completely (as in Soviet Union).

With software, capitalism worked great at the beginning - there were lots of businesses of various sizes which produced commercial software.  But then free software came to fruition, the mechanism meticulously eliminated all the non-free businesses and there's nothing left (setting aside games, accounting end few other niches which hasn't been penetrated by free software yet). Since capitalism cannot work with things lacking monetary value, there's no software market at all. In such conditions, people simply have to accept whatever they're given, because there's no choice. I don't think this has any relation to Microsoft.

As to the mail servers. I used Sendmail for a while. Then I wrote my own mail server for my own use. Long ago. It works great. But only for me. If I decided to sell it, nobody would buy it because there's plenty of free mail servers everywhere - no market for commercial mail servers. So, you're stuck with Exim, whether you accept it or not ;)
Title: Re: Neat algorithms
Post by: NorthGuy on August 11, 2019, 09:24:54 pm
In the last twenty years or so, I've seen shitty code become more and more ubiquitous, and robust and reliable code rarer and rarer.
Is it just me?  Could be, but I don't think so.

That has been my experience too. However, I attribute this to the way people write software now. Programmers of older days are all but extinct.
Title: Re: Neat algorithms
Post by: hamster_nz on August 11, 2019, 11:48:08 pm
In the last twenty years or so, I've seen shitty code become more and more ubiquitous, and robust and reliable code rarer and rarer.
Is it just me?  Could be, but I don't think so.

That has been my experience too. However, I attribute this to the way people write software now. Programmers of older days are all but extinct.

Where software needs to be reliable, and people pay for it to be reliable, then it is reliable. I've been running complex equipment in production for 5+ years, 24x7, without any downtime.

I've also had health customers say "availability of our application is critical to us, but I won't pay the $50k needed for licensing a shared-nothing clustered SQL database". I suspect they apply the same process to paying for software QA engineers.
Title: Re: Neat algorithms
Post by: Nominal Animal on August 12, 2019, 12:09:55 am
Fact is, all industries have gone through a fundamental revolution changing business models every couple of generations for a couple of centuries now.
Capitalism is not a market model. It is a fundamental mechanism which guides the economy.
No argument there.  However, you seem to assert that there is only one software business model in capitalism, and I'm saying that due to changes in technology (various costs), business models must change and adapt in capitalistic systems to survive.  I base that on history, and on that long-lived tech companies have changed their business tactics every couple of generations.

There are many more ways to do business in software than package them for sale on store shelves.

Where software needs to be reliable, and people pay for it to be reliable, then it is reliable. I've been running complex equipment in production for 5+ years, 24x7, without any downtime.
Are you sure you haven't simply paid for enough hardware to cover the cases where the software shits on you, without it being visible to your end users?
Because that is the only way I know how to get arbitrary stuff done right now without writing, or configuring and fixing, the software myself.
Title: Re: Neat algorithms
Post by: hamster_nz on August 12, 2019, 12:55:43 am
Quote from: Nominal Animal link=topic=198744.msg2609568#msg2609568
Are you sure you haven't simply paid for enough hardware to cover the cases where the software shits on you, without it being visible to your end users?
Because that is the only way I know how to get arbitrary stuff done right now without writing, or configuring and fixing, the software myself.

As the person who fixes the broken stuff (or escorts vendor engineers in to fix broken stuff) I am sure that given enough time all hardware shits on you... but no amount of perfect hardware can make up for unreliable software.

But software & systems can make up for the failings of hardware - and yes, that usually involves redundancy. But you need the redundancy it to do maintenance anyway....
Title: Re: Neat algorithms
Post by: NorthGuy on August 12, 2019, 02:58:04 am
There are many more ways to do business in software than package them for sale on store shelves.

Sure. Such as "software as a service". Microsoft doesn't really sell software. They simply collect money. When I buy laptop and install Linux on it, Microsoft get paid. You wouldn't say that they sold software to me, would you?
Title: Re: Neat algorithms
Post by: BravoV on August 12, 2019, 03:27:23 am
Can we back on track at the algorithms only please ...
Title: Re: Neat algorithms
Post by: legacy on August 16, 2019, 03:47:42 pm
Code: [Select]
opening btree file [mydata.bin] .. done
> h
integer followed by "I" to Insert
integer followed by "D" to Delete
integer followed by "S" to Search { found, not found }
integer followed by "A" to Search* { found, closest left-value }
"." to show the tree
"Q" to quit
> 98 99 10 d
item 98 99 100 have been deleted
> 100 a
Search* (100):
{{ 253 752 1003 1254 1505 1756 2007 2258 2509 2760 3011 3262 3513 }
{ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74
75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 101
102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118
119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135
136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152
153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169
170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186
187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203
204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220
221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237
238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 } }

Not found, the closest left-value is 97
>

This algorithm is a b-tree (not b+tree). After building the B-tree by entering integers on the keyboard or by supplying these as a script (this is what I have done, creating items from 0 to 3999), the user can update it interactively by entering or deleting individual item. The user can also search the B-tree for a given item.

The App writes, reads, and updates nodes on disk using a binary file.

But, the real challenge was ... how to find the "closest" item when it doesn't exist.

Code: [Select]
Search* (100): 

... 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 .. .. .. 101 102 103 ...

Not found, the closest left-value is 97
Title: Re: Neat algorithms
Post by: Nominal Animal on August 16, 2019, 04:11:42 pm
But, the real challenge was ... how to find the "closest" item when it doesn't exist.
Situations like that are surprisingly common.  The idea/algorithm itself may be easy to implement, but getting the engineering right so it works for practical, imperfect data, takes a surprising amount of effort.

(Also, one may optimize the "work case", only to find out that half or more than half of the actual cases encountered are those that "miss", and the chosen algorithm works terribly badly for those.  It is too common to just write a comment saying "fix this later, I'm too busy right now", because the proper answer is to just rewrite the entire section.  Error checking is just a small part of that, too; you might wish to verify your implementation works if you search for a value smaller or greater than any already in the tree.)

I personally would also consider adding a command that saves the current tree in Graphviz Dot format.  If the values are unique, you can use them as the node keys; otherwise you need to synthesize an identifier for each value.  I've found that exploring (even pseudorandom) data and looking at what kind of trees they generate, really helps in getting a good intuitive picture of the algorithm itself.  (Especially when you can compare different tree structures or algorithms, for the same data.)
Title: Re: Neat algorithms
Post by: legacy on August 19, 2019, 07:35:36 am
yup, Graphviz rules: good idea  :D

Now I'd like to implement a B*tree.
Title: Re: Neat algorithms
Post by: mrflibble on August 19, 2019, 10:25:19 am
On the subject of graph visualization: you may also want to check Cytoscape (https://cytoscape.org/). Learned about it from the bioinformatics angle, but you can use it for anything to do with graphs, especially large graphs. And you can interface it (https://apps.cytoscape.org/apps/cyrest) with your favorite python or R scripts. Or anything else really, as long as it can talk to the REST interface.

Regarding binary search trees, I'm surprised no-one has mentioned Tango trees (https://en.wikipedia.org/wiki/Tango_tree) yet. Paper here: http://erikdemaine.org/papers/Tango_SICOMP/paper.pdf (http://erikdemaine.org/papers/Tango_SICOMP/paper.pdf)

There's also a bunch of youtube vids on the subject, as part of an MIT course. Don't know which one, but I'll update with a link if someone didn't find it in the meantime. Nevermind found it: https://www.youtube.com/watch?v=NoOYvZvH_FU (https://www.youtube.com/watch?v=NoOYvZvH_FU)

It's part of the MIT-6.851 Advanced Data Structures (https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-851-advanced-data-structures-spring-2012/calendar-and-notes/) course, which I can recommend in it's entirety, not just the bit about tango trees.

Title: Re: Neat algorithms
Post by: legacy on August 19, 2019, 12:29:28 pm
Tango trees? Is there any good C implementation? Where is it used?
Title: Re: Neat algorithms
Post by: legacy on August 19, 2019, 12:41:15 pm
B*trees is a special case of B+trees which guarantee that nodes are at least 2/3 full. This is good for my purpose because it improves my filesystem. But I have a serious problem: I cannot avoid recursion! B*tree is more prone to have recursion, and I'd like to avoid it as far as possible.

p.s.
I have spent the last two years (2 hours per weekend) on my B+tree. From the paper to a working piece of C code, developed firstly on a PC and then "adapted" to an embedded board. A new algorithm is good but ... these things consume a lot of time when you use them for something practical.
Title: Re: Neat algorithms
Post by: Nominal Animal on August 19, 2019, 10:36:05 pm
Graphviz has the benefit that it does all the heavy lifting for you.  In Dot language, a simple three-node tree as a directed graph:
Code: [Select]
digraph {
    "root" -> "left";
    "root" -> "right";
}
You can add node modifiers, like "root" [ shape=hexagon ]; as well as edge attributes like labels and such.  It is just text, which makes it especially useful for debug visualization from any program or script.  Graphviz Dot (dot/neato/twopi/circo) will handle the layout for you, using a number of algorithms described in peer-reviewed articles (https://www.graphviz.org/theory/).  It is a surprisingly powerful tool -- or a set of tools, really.  I like to generate the graphs as SVG, then hand-tweak them in Inkscape, for maximum quality and usefulness.  See the example gallery (https://www.graphviz.org/gallery/) for stuff it can do.

A new algorithm is good but ... these things consume a lot of time when you use them for something practical.
Yet, that is the only way to truly learn/grok a new algorithm, and integrate it to ones "tool kit".

Knowing about an algorithm means you can choose between it and some others, but to implement it, you need engineering: good old elbow grease.  (Well, brain grease, but that sounds icky.)
Title: Re: Neat algorithms
Post by: brucehoult on August 19, 2019, 10:39:43 pm
B*trees is a special case of B+trees which guarantee that nodes are at least 2/3 full. This is good for my purpose because it improves my filesystem. But I have a serious problem: I cannot avoid recursion! B*tree is more prone to have recursion, and I'd like to avoid it as far as possible.

Why do you want to avoid recursion?
Title: Re: Neat algorithms
Post by: hamster_nz on August 20, 2019, 03:01:30 am
B*trees is a special case of B+trees which guarantee that nodes are at least 2/3 full. This is good for my purpose because it improves my filesystem. But I have a serious problem: I cannot avoid recursion! B*tree is more prone to have recursion, and I'd like to avoid it as far as possible.

Why do you want to avoid recursion?

It is seen as an honorable goal if you have limited stack space, or want a bounded run time.

Recursion seems to have a magical appeal to people that never clicked with me:

Code: [Select]
int factorial(int x) {
  if(x == 1)
    return 1;
  else
     return x* factorial(x-1);
}
 
Yeah... whatever - give me a 'for' loop any day.
Title: Re: Neat algorithms
Post by: T3sl4co1l on August 20, 2019, 03:03:43 am
Write it for tail recursion; compilers handle that optimization automatically.

Can always use an array as stack and put it in a for loop.  It's just syntactically uglier.  You're not actually avoiding recursion, it's still recursive (otherwise it wouldn't be the same algorithm), you're just moving where (and how much) the recursion is stored.

Tim
Title: Re: Neat algorithms
Post by: legacy on August 20, 2019, 09:07:33 am
Why do you want to avoid recursion?

The recursion can make the algorithm not deterministic about the use of ram, it might consume too much ram, and I am limited to 4Mbyte on that board, with other stuff that needs ram. I am not even using malloc because I cannot have virtual memory, the CPU does not even have an MMU. Everything must be static and deterministic, with the worst-case known and calculated in advance.

On a PC, it's a completely different story, they have many more resources, and I can even reimplement the algorithm in Erlang, which has no problem in eating up to 1Gbyte of stack.


So the question is: can a tree be traversed without
and just a handful of pointers?

with my current implementation of B+tree-disk-oriented, I cannot use "ram-pointers", I need to use pointers to disk blocks, so the algorithm loads the most used nodes in ram, and leave leaves on disk. A leaf has the iblock to other disk-pointers. It can avoid recursion, but the tree needs to be traversed via queues.

it works, it's decently fast and elegant, but it only keeps nodes 50% empty, and this is not good when the storage size is a premium.
Title: Re: Neat algorithms
Post by: legacy on August 20, 2019, 09:44:40 am
It is seen as an honorable goal if you have limited stack space, or want a bounded run time.

There is also the old good charm of the challenge launched in the far 1968 when Knuth posed the problem of designing a stack-free, tag-free, non-recursive algorithm that traversed the tree in in-order while leaving it unaltered in the end.

Since then, numerous solutions have appeared in the literature, but many professors still say that none of them is even close to the solution. Or just it's impossible.

In 1979 Morris suggested an algorithm that is clearly non-recursive, but which only at the first glance appears to use neither a stack nor tag fields ... but the stack is inside the tree, so once again it was only close to the solution.

There were several deviations from Morris solution, but here the main problem is that it seems that you have to mutate the concept of tree in order to find a valid solution to solve the original problem posed by Knuth, and this is considered as a cheat.

-

Can we have 1M USD/Euro for the first one that claims to have found a true valid solution?  :D
Title: Re: Neat algorithms
Post by: westfw on August 20, 2019, 10:18:12 am
Meh.  Trees (and graphs) are so inherently, obviously, and elegantly recursive that avoiding recursion in   code that implements them is ... cutting off your nose to spite your face.
The algorithms involved are usually well defined and understood wrt their resource usage, and the insistence in some “safety standards” that programmers come up with some less-well-debugged, harder to-understand loop-based equivalent (or use a poorer data structure) is one of the main objections that I have toward such standards.  :-(

Title: Re: Neat algorithms
Post by: RoGeorge on August 20, 2019, 10:43:41 am
Isn't the main objection against recursive algorithms, in general, about memory?

The fact that are using the computer stack (to store intermediate results obtained during each recursion step) thus making them slower and memory intensive.
Title: Re: Neat algorithms
Post by: coppice on August 20, 2019, 11:13:10 am
Isn't the main objection against recursive algorithms, in general, about memory?

The fact that are using the computer stack (to store intermediate results obtained during each recursion step) thus making them slower and memory intensive.
The problem is not that they are memory intensive. Its that unless you take special action their memory use can be unconstrained. A routine that always calls itself a well known range of times has a known maximum memory requirement, and is OK. Many recursive routines call themselves an arbitrary number of times, and will eventually blow the stack up unless special action is taken to constrain the call depth and perhaps declare a failure to compute the true result.
Title: Re: Neat algorithms
Post by: legacy on August 20, 2019, 11:46:40 am
A routine that always calls itself a well known range of times has a known maximum memory requirement

This requires you to have an additional field in the structure for counting the number of calls, which correspond to used resources, therefore it *somehow* workarounds the problem making things back to deterministic, but it sucks a lot when you have to instrument the code for the test-report because it generates too many triplets, and a function that calls itself can consume more ram, and usually does consume more ram.
Title: Re: Neat algorithms
Post by: coppice on August 20, 2019, 12:01:17 pm
A routine that always calls itself a well known range of times has a known maximum memory requirement

This requires you to have an additional field in the structure for counting the number of calls, which correspond to used resources, therefore it *somehow* workarounds the problem making things back to deterministic, but it sucks a lot when you have to instrument the code for the test-report because it generates too many triplets, and a function that calls itself can consume more ram, and usually does consume more ram.
There are many algorithms which are recursive, but have an easy to identify maximum for the depth they will ever call to. Many divide and conquer type algorithms are like this. You usually know the maximum resolution you will ever be able to divide to, so you know the maximum depth of recursion you will ever get.
Title: Re: Neat algorithms
Post by: SiliconWizard on August 20, 2019, 01:49:26 pm
Head and tail recursion are often relatively easy to transform into pure iteration.
Some compilers can do this by themselves. Very simple example with GCC (9.2.0 here):

Code: [Select]
unsigned int Factorial(unsigned int n)
{
if (n <= 1)
return 1;
else
return n * Factorial(n - 1);
}

compiled with:
Code: [Select]
gcc -O3 -S TailRecursion.c

Assembly:
Code: [Select]
Factorial:
.seh_endprologue
movl $1, %eax
cmpl $1, %ecx
jbe .L1
.p2align 4,,10
.p2align 3
.L2:
movl %ecx, %edx
subl $1, %ecx
imull %edx, %eax
cmpl $1, %ecx
jne .L2
.L1:
ret

It can be very hard, or sometimes even impossible for other, less trivial forms of recursion.
Title: Re: Neat algorithms
Post by: RoGeorge on August 20, 2019, 02:12:06 pm
It can be very hard, or sometimes even impossible for other, less trivial forms of recursion.

Are there any examples, or a demonstration about algorithms that are impossible to convert?

Asking just out of curiosity, because I always thought *any* recursive algorithm can be translated into a non recursive one.
Title: Re: Neat algorithms
Post by: coppice on August 20, 2019, 02:15:58 pm
It can be very hard, or sometimes even impossible for other, less trivial forms of recursion.

Are there any examples, or a demonstration about algorithms that are impossible to convert?

Asking just out of curiosity, because I always thought *any* recursive algorithm can be translated into a non recursive one.
If you create your own stack, and save state in that, you can turn anything recursive into something you can claim to be non-recursive :)
Title: Re: Neat algorithms
Post by: SiliconWizard on August 20, 2019, 03:14:39 pm
It can be very hard, or sometimes even impossible for other, less trivial forms of recursion.

Are there any examples, or a demonstration about algorithms that are impossible to convert?

Asking just out of curiosity, because I always thought *any* recursive algorithm can be translated into a non recursive one.
If you create your own stack, and save state in that, you can turn anything recursive into something you can claim to be non-recursive :)

Yep. :D
Just a random example: https://www.geeksforgeeks.org/iterative-quick-sort/ (https://www.geeksforgeeks.org/iterative-quick-sort/)

Whether this is worth it is completely debatable though. The recursive form is much simpler and more readable.
Then it will depend on many factors, such as the cost of a function call on a given language and platform, the cost of using the stack instead of the heap, and many other things, versus the cost of allocating and handling your own stack, and the hindered readability...

Recursion is a powerful tool and most of all, when it applies, it makes code easier to write, more readable, look more like maths, and also it's usually easier to verify pre and post conditions.
In many simple cases, it can be optimized away by a decent compiler when possible anyway. So there's no hard and stubborn reason to avoid recursion. Just like any other tool, use it wisely and use it for a reason.
Title: Re: Neat algorithms
Post by: mfro on August 20, 2019, 03:30:43 pm
although it's from the coding stoneage, I still think this one qualifies:

Duff's device

Code: [Select]
send(to, from, count)
register short *to, *from;
register count;
{
    register n = (count + 7) / 8;
    switch (count % 8) {
    case 0: do { *to = *from++;
    case 7:      *to = *from++;
    case 6:      *to = *from++;
    case 5:      *to = *from++;
    case 4:      *to = *from++;
    case 3:      *to = *from++;
    case 2:      *to = *from++;
    case 1:      *to = *from++;
            } while (--n > 0);
    }
}
Title: Re: Neat algorithms
Post by: legacy on August 20, 2019, 04:45:14 pm
Recursion is a powerful tool and most of all, when it applies, it makes code easier to write, more readable, look more like maths, and also it's usually easier to verify pre and post conditions.

So tell me why recursion isn't allowed for safety critical code in automotive and avionics applications, and I have never been authorized to use it.
Title: Re: Neat algorithms
Post by: Yansi on August 20, 2019, 05:13:02 pm
New subscriber to the topic.  :popcorn: :-+
Title: Re: Neat algorithms
Post by: RoGeorge on August 20, 2019, 06:34:29 pm
To me, the most impressive algorithms are the simplest ones, and the absolute winner so far is Von Neuman with his unbiasing a biased coin algorithm.

The problem is this:
- you have a biased coin.  Somebody messed up with the coin, and now the head/tail chances are not 50/50 any more. You don't know how big is the bias, and don't have time to measure it.  All you know is that it's not 50/50 any more.

How do you generate fair 50/50 head/tails with a biased coin?

"Von Neumann gave a simple solution: flip the coin twice. If it comes up heads followed by tails, then call the outcome HEAD. If it comes up tails followed by heads, then call the outcome TAIL. Otherwise (i.e., two heads or two tails occured) repeat the process." https://web.eecs.umich.edu/~qstout/abs/AnnProb84.html
Title: Re: Neat algorithms
Post by: Nominal Animal on August 20, 2019, 07:19:03 pm
To me, the most impressive algorithms are the simplest ones, and the absolute winner so far is Von Neuman with his unbiasing a biased coin algorithm.
In that case, I recommend you look up Metropolis-Hastings algorithm (https://en.wikipedia.org/wiki/Metropolis-Hastings_algorithm) (for the inverse; to generate pseudorandom numbers according to a specific probability distribution P(x), when you only know a function f(x) that is proportional to P).  From there, go on to e.g. simulated annealing (https://en.wikipedia.org/wiki/Simulated_annealing), which can be used to find global minima or maxima of a function; it is very similar to Metropolis-Hastings.

(Or maybe in the reverse order, depending on what your own optimal approach paths are.)

Describing how these algorithms work and why takes quite a lot of text and math, but when implemented, they're nearly trivial: just a few lines of code.  If you do not know the pattern that implements the method, and how they work, the code looks silly: naaah, that can't be right; it looks like it just finds the answer by accident, using random numbers like that.  Yet, it works just fine.
Title: Re: Neat algorithms
Post by: SiliconWizard on August 20, 2019, 08:31:05 pm
Recursion is a powerful tool and most of all, when it applies, it makes code easier to write, more readable, look more like maths, and also it's usually easier to verify pre and post conditions.

So tell me why recursion isn't allowed for safety critical code in automotive and avionics applications, and I have never been authorized to use it.

Well, because it isn't allowed. Discussing rules that you have to follow and can't change may be intellectually stimulating, but it's pointless. Such is life when you're not one of those who set the rules. ;D

As to the rationale... :P
I would say it's a cousin of the rule not allowing the use of dynamic allocation.
One specific risk of using recursion is that unless it's perfectly controlled and bug-free, you can run out of stack, which is arguably a bad thing, because it can be impossible to recover from.

Of course this is like any rule set to limit some risk. Like if you follow a rule of never drinking, you'll never get drunk. Thus the probability of you having a car accident due to driving while drunk is zero. That doesn't mean you'll never have a car accident. I like this illustration of risk mitigation.

Just a few random additional notes about recursion:

- There's a difference between recursion as a programming paradigm (or even notation) and effective recursion at run-time. As was pointed out above, you can write recursive code that won't actually lead to recursive execution. In C, this is a matter of optimization, but in languages such as functional languages, I think this is an important concept. (And that said, of course if you have a rule of forbidding recursion at run-time, relying on compiler optimization to get around it would probably be a bad idea!)

- There are means of making recursion a bit safer, such as controlling the depth at run-time (not hard to do). Of course you now have to decide what to do if the max depth has been reached and the function has not completed, but it gives you an opportunity to handle it gracefully instead of just letting the system crash.

- Recursion may happen without you realizing it. Beyond the simple case of a function directly calling itself, it may do so indirectly through a chain of calls of other functions. In moderately complex software, this is not necessarily easy to spot. Using code analysis tools may help.

- Implementing a typically recursive algorithm (not trivial to convert to iterative) without direct recursion, in the way coppice suggested, is often more of a trick than something really useful. It can lead to actually more inefficient code, without necessarily being much "safer".

- Using some functions of the C std lib, such as qsort, may introduce recursive code in your software. Most qsort implementations are recursive. In certified std libraries for safety-critical applications, qsort is probably not implemented as a recursive function. Something to check though.

- Some things, like parsers, are much easier and more elegant to write using recursion.

- I've never used recursion in any critical piece of "embedded" code even in settings where this was not a rule. A mix of common sense, risk mitigation and often no real need for/benefit of recursion for rather low-level stuff. I've never set a "no recursion" rule though when it was not strictly required from a regulatory POV.

Title: Re: Neat algorithms
Post by: hamster_nz on August 20, 2019, 09:54:55 pm
- Recursion may happen without you realizing it. Beyond the simple case of a function directly calling itself, it may do so indirectly through a chain of calls of other functions. In moderately complex software, this is not necessarily easy to spot. Using code analysis tools may help.

Ah, the joy of avoiding fun things like trying to log an "Out Of Memory" exception to a file, when the file handling functions need to allocate memory....
Title: Re: Neat algorithms
Post by: T3sl4co1l on August 20, 2019, 11:28:20 pm
There are many algorithms which are recursive, but have an easy to identify maximum for the depth they will ever call to. Many divide and conquer type algorithms are like this. You usually know the maximum resolution you will ever be able to divide to, so you know the maximum depth of recursion you will ever get.

Curiously the example that was given in protest was 1. a trivial loop, and 2. O(N) complexity.  I suppose the teachers are to blame, because factorial(n) and especially egregiously fibonacci(n) are such common but such terrible examples.

Suffice it to say, you'd never use a recursive function when a linear loop works; trees are the perfect example because the depth is O(lg N), very modest even on fuckoff-massive structures.

Tim
Title: Re: Neat algorithms
Post by: Nominal Animal on August 21, 2019, 01:21:31 am
For a 512-order B+ tree, you need just 31 levels to record each atom in the observable universe, assuming the characteristics of B+ trees (https://en.wikipedia.org/wiki/B%2B_tree#Characteristics) listed at Wikipedia are correct; I'm too lazy to check elsewhere.

(The observable universe contains something like 2265 atoms, and the number of records in a 512-order B+ tree with 31 levels (b=512, h=30) is roughly 2233 to 2270.)

If you implement a B+ tree, you choose the order (branching factor b).  Then, the maximum number of records you can store directly defines the maximum depth h+1 of your tree.  Robust code will keep track of its recursion depth, and if it exceeds h or h+1 depending on how you count the calls, the tree must be corrupted.

So, if you really worry about the maximum recursion depth for B+ trees, you must be using too small a branching factor b for the number of records you use.
Title: Re: Neat algorithms
Post by: legacy on August 21, 2019, 04:12:54 am
So, if you really worry about the maximum recursion depth for B+ trees, you must be using too small a branching factor b for the number of records you use.

yup, this approach is interesting. Thanks.

Do you know *how* we quickly and low costly check the maximal consume of the stack at runtime for a given test-case under recursion? We fill the stack area with a known pattern 0xdeadbeaf, and we check how many times the pattern appears after the algorithm has completed its execution  ;D

This works for a given environment and it cannot be written in a formal test-report without a manual justification, but it gives us useful information during the engineering steps.

It's a practical "artisanal" approach, which doesn't take too much time to give some useful information back to devs.
Title: Re: Neat algorithms
Post by: gnif on August 21, 2019, 04:50:39 am
Just a little useful function I wrote that I call upon often...

Code: [Select]
#include <stdlib.h>
#include <stdio.h>

char * bytesToHR(unsigned long bytes)
{
  static const char * suffix[] = {"B", "KiB", "MiB", "GiB", "TiB", "PiB", NULL};
  const char ** s = suffix;
  float b = (float)bytes;

  while(b > 1023.0f && *(s+1))
  {
    b /= 1024.0f;
    ++s;
  }

  int len = snprintf(NULL, 0, "%.2f %s", b, *s);
  char * out = malloc(len + 1);
  sprintf(out, "%.2f %s", b, *s);
  return out;
}

int main(int argc, char * argv[])
{
  char * s = bytesToHR(12345678);
  printf("%s\n", s);
  free(s);

  return 0;
}

Output:

Code: [Select]
11.77 MiB

License: Free for general use anywhere, no attribution required.
Title: Re: Neat algorithms
Post by: westfw on August 21, 2019, 05:49:53 am
Quote
tell me why recursion isn't allowed for safety critical code in automotive and avionics applications, and I have never been authorized to use it.
Well, you take fears that were probably relevant 30+ years ago, put them together into a "standard" that dates back 20+ years, and then hand the "approval for exceptions" process off to "management" that hasn't been following the science, or the art, for about the same length of time.  "No one ever gets fired for failing to approve "risky" code"...  (not that they ever really realize that it's probably far less risky to take a modern and proven recursive, dynamic-allocation, recursive library and use it, than to ask their mid-level engineers to achieve the same thing from scratch without "modern techniques.")   Sigh.
And those Arianes and 787s and F35s and Zumwalts and SLS all come in under budget and work perfectly as a result.  Advanced medical equipment has gotten better, cheaper, and more available each year as Moore's Law advances the underlying technology!  Hurah!   Grr.


Quote
We fill the stack area with a known pattern 0xdeadbeaf, and we check how many times the pattern appears after the algorithm has completed its execution  (https://www.eevblog.com/forum/Smileys/default/xgrin.gif.pagespeed.ic.QVVz6XIT20.png)
This works for a given environment and it cannot be written in a formal test-report without a manual justification, but it gives us useful information during the engineering steps.
What?  Only during test?  We had that check in production code, for every process (recursive or not), at least "periodically" (I forget whether this was a "background" check, or something that happened in the scheduler.)  Of course, that WAS in an environment where probably the worst that would happen is irate customers would call up with "it rebooted and the last thing it said was something about "process has low stack."  Please fix it!"...)
Title: Re: Neat algorithms
Post by: legacy on August 21, 2019, 10:07:30 am
Well, you take fears that were probably relevant 30+ years ago, put them together into a "standard" that dates back 20+ years, and then hand the "approval for exceptions" process off to "management" that hasn't been following the science, or the art, for about the same length of time. 

So 200 guys employed on daily (testing, development, documentation, QA compliance, etc) activities in avionics are all wrong, as well as 20 senior-engineers with 30 years of experience  :-//

I am usually the rebel of the group, but we have a military approach to seniors, so I trust them, even if it's not clear why they give directives



What?  Only during test?

It's clearly written that we take this approach during the early steps of development, I can add,  "when" we cannot access ICEs or other equipment, or when we quickly need to have a confirmation. We have to run the full testing session only later, after the first interaction of the engineering phase has completed.

..

When the step(n) is PASSED

When the step(m) is PASSED the project gets committed to Doors with a progressive revision {A, B, C, ...}, otherwise it's still DRAFT-{r0, r1,r2,...} and it goes back to engineers and testing-squad.

-

The "artisanal" approach I was talking about is during draft-r0, but it's also what I usually use at home, where I do not yet have a BDI2000/3000 able to monitor the SP register, or any special hardware able to monitor watchpoints (physical addresses to ram), or something able to track triplets (this is usually done for dynamic coverage and costs 20k euro in software and hardware, but it can upload triplets at run-time up to 60Mbyte/sec on a firewire link) so I sometimes use the same approach via a super-cheap gdb-stub on a super-cheap serial line @ 115200bps.
Title: Re: Neat algorithms
Post by: NorthGuy on August 21, 2019, 12:35:25 pm
So 200 guys employed on daily (testing, development, documentation, QA compliance, etc) activities in avionics are all wrong, as well as 20 senior-engineers with 30 years of experience  :-//

You wouldn't argue that nosediving two Boeings into the ground was the right think to do, would you?

And they cannot even find the bug ...

Perhaps having too much bureaucratic rules, and thereby sifting out all the employers who wanted to use their own mind, has something to do with that.
Title: Re: Neat algorithms
Post by: RoGeorge on August 21, 2019, 02:10:52 pm
Or, perhaps NOT obeying all those rules will nosedive into the ground 10 airplanes daily.
Anecdotal references are not a proof.

Unless somebody comes up with something better, let's not forget Aeronautics is the safest transportation industry.
Title: Re: Neat algorithms
Post by: NorthGuy on August 21, 2019, 02:45:40 pm
Or, perhaps NOT obeying all those rules will nosedive into the ground 10 airplanes daily.
Anecdotal references are not a proof.

Exactly. All we know is that with these rules in place, the software can crash a plain without giving its human pilot any chance.
Title: Re: Neat algorithms
Post by: SiliconWizard on August 21, 2019, 03:00:15 pm
Just consider the funny example I gave about strict rules for risk mitigation. All we can say they do at best is avoid some specific cases of fuck-up. Not that the design will ever be problem-free. Obviously doesn't mean we shouldn't follow rules. Basic logic at play here.

Admittedly, they can tend to give a false sense of safety though, and that's when they could be considered counter-productive. It's very hard to determine whether the statistical net outcome for any given rule (or a set of them) is positive or negative. Good luck with that. So the usual approach in safety-critical settings is to enforce as many as possible, close to, but just below the threshold at which they would completely halt development, and then hope for the best.

@RoGeorge said:
Quote
Anecdotal references are not a proof.
which may be either completely right, or false. Depends on what you consider here. Even just one occurence of a problem is complete proof that the problem CAN happen, and in safety-critical settings, this should be enough to take action to improve things based on that single occurence only. Constant improvement.

Could mean that the processes at play here have holes. Whether this is due to too many rules, or not enough, or whatever else is extremely difficult to determine. Needs very thorough analysis. But it definitely NEEDS to be analyzed IMO, and actions taken if at all possible.


Title: Re: Neat algorithms
Post by: legacy on August 21, 2019, 03:50:55 pm
Admittedly, they can tend to give a false sense of safety though

It's not correct and it sounds rather misleading. Concerning software (the hardware follows different directives, although similar), the whole DO178B/level{A..D} describes how the engineering and testing activities have to be done, while the MISRA-C and SPARK-ADA tell us some rules to manage the job.

This doesn't magically make the "source" absolutely safe and "bug-free", but it rather helps people in a large team to cooperate in a productive and deterministic way with all the tools and resources known and described on a piece of paper so we can learn something new each time to improve the product during its life-cycle.

Our directives do not only consider the source, but rather even how it's tested concerning equipment, procedures and human resources, so if a senior says that recursion shall be avoided, he/she knows that it might work, but it will be more difficult to manage within the software life cycle.
Title: Re: Neat algorithms
Post by: legacy on August 21, 2019, 04:25:13 pm
I've never used recursion in any critical piece of "embedded" code even in settings where this was not a rule. A mix of common sense, risk mitigation and often no real need for/benefit of recursion for rather low-level stuff. I've never set a "no recursion" rule though when it was not strictly required from a regulatory POV.

Rules say no recursion. This is a guide, but it's not written in stones. If you want to use recursion you have to

Something similar has been even approved for the AFDX switch (it's a kind of ethernet switch, made deterministic),  but it was hard to obtain the OK.
Title: Re: Neat algorithms
Post by: SiliconWizard on August 21, 2019, 05:33:19 pm
Admittedly, they can tend to give a false sense of safety though

It's not correct and it sounds rather misleading. Concerning software (the hardware follows different directives, although similar), the whole DO178B/level{A..D} describes how the engineering and testing activities have to be done, while the MISRA-C and SPARK-ADA tell us some rules to manage the job.

What exactly is not correct and sounds misleading? I think you're consistently missing the point of the discussion.
I was considering the fact that blindly following some hard rules CAN tend to give a false sense of safety. CAN. I didn't say it always did.
Then a "false sense of safety" means that engineers would think those rules would protect them from ever fucking up. WHEN that's the case (not saying it is ALWAYS the case, but it does happen), it CAN actually be counter-productive (not saying it is ALWAYS the case).

Read everything again. That didn't mean no rules should be followed either. Argh, logics...

Our directives do not only consider the source, but rather even how it's tested concerning equipment, procedures and human resources, so if a senior says that recursion shall be avoided, he/she knows that it might work, but it will be more difficult to manage within the software life cycle.

Again, this point is very real, but moot in the context of this discussion. As I said as a preliminary, if you don't have a choice, discussing something is interesting, but completely pointless.
Title: Re: Neat algorithms
Post by: T3sl4co1l on August 21, 2019, 06:34:16 pm
The false sense comes when, because you're surrounded by rules for every step of development, you may get into the assumption that they are actually comprehensive and therefore you have no responsibility yourself.

Well, as a coder you may not necessarily have any responsibility yourself, but it applies recursively (hah) to your immediate manager, or QA, and so on up the chain of responsibility.

The better case seems to be, having very experienced coders -- who understand the edge cases of the algorithm in question, and the pitfalls of the language -- and also subjecting the resulting product (at a source code level, as well as at a physical pen-test level, say) to a battery of tests by researchers even more experienced in picking apart edge cases.

But it's also not clear to me if such an approach is likely to have any better economy (that is, a higher 1/(bugs * dollars) figure-of-merit) versus the traditional aerospace approach.  Both are onerously expensive (and for good reason!).  I'm no manager, I'm just speculating.

Heh, or even necessarily that that's an adequate metric, since it may well be that mainstream, cheap and buggy development is so much cheaper that it actually offers a better score.  In which case we might choose a different exponent for the parameters, or we would want to stipulate that the bug rate has to be below some maximum.  (Incidentally, the bug rate obviously can't be measured at approval or release time, but it can be measured over the lifetime of the product, say in the number of bugs noted by operators and maintenance, and the number of fixes pushed to the field.)

(Heh, well, maybe that's not even a good idea at all since it attaches a negative monetary value to bug disclosure and patching.)


Well, you take fears that were probably relevant 30+ years ago, put them together into a "standard" that dates back 20+ years, and then hand the "approval for exceptions" process off to "management" that hasn't been following the science, or the art, for about the same length of time. 

So 200 guys employed on daily (testing, development, documentation, QA compliance, etc) activities in avionics are all wrong, as well as 20 senior-engineers with 30 years of experience  :-//

Just to tickle the dragon here -- yes, that may be a reasonable argument.  It would be foolish after all to expect that computer science is over, done and done, fully researched.  CS is rather high level but it often trickles down into engineering.

Not to say that the existing process is bad in-and-of-itself, or compared to a particular hypothetical.  Just that, compared to some unknown hypothetical best, it's probably not there, and there is always room for improvement.

Which is the more general argument, from safety culture: there's always room for improvement. :)

It's not clear how much is actually in flux (and whether it's changing truly for the better, or because office politics), or if it's literally been unchanged for 30 years because bureaucracy, so, this covers the bases. :-//

Tim
Title: Re: Neat algorithms
Post by: NorthGuy on August 21, 2019, 06:56:18 pm
... so it's clearly and formally marked as under your responsibility ...

Everything you do or accept is your responsibility. Bureaucratic rules often pretend to absolve people of responsibility. This makes people feel safe, irresponsible perhaps. But this doesn't really work. For example, that guy who made a mistake which caused two Boeing crashes. He is not going to be hanged in front of the crowd. But the rest of his life will be a nightmare where every minute he tries hard to convince himself that he didn't kill 500 people, that he is a good man who just followed rules.

Wouldn't it be better if he recognized from the start that the responsibility is his?
Title: Re: Neat algorithms
Post by: RoGeorge on August 21, 2019, 07:06:04 pm
Can we please return to the "Neat algorithms", topic?
Title: Re: Neat algorithms
Post by: legacy on August 21, 2019, 09:13:20 pm
Everything you do or accept is your responsibility.

It's a different kind of responsibility because in that case you temporarily become a sort of team leader regarding the project, and this means that you have to somehow coordinate people around you, as well as other people by emails and video conference. In short, since they see your name on Doors, you are the person who will be called for everything concerning the project.
Title: Re: Neat algorithms
Post by: legacy on August 21, 2019, 10:21:47 pm
What exactly is not correct and sounds misleading? [...] Then a "false sense of safety" means that engineers would think those rules would protect them from ever fucking up.

Precisely this point is incorrect and lets it pass that we should be a sort of monkeys to blindly follow hard rules.

You miss that things are really so complex that no-one can judge other's competence, and there is not either any "false sense of safety", here we need rules to handle a large group composed of people with different skills and capabilities, developers, testers, QA guys, guys who only study the error propagation on abnormal condition, etc, and each of them is no less important than the developer-squad. So each feedback is collected and things are done with care by planners, and their rules come from the experience about how to reduce friction and to facilitate the collaboration between each squad.

That's their competence, and this is also the reason why if you want to use the recursion, well ... it's not denied, but you have to open a branch on Doors, and if your approach will be successful, it will be applied again, and the rule revisioned/removed.

Note, there were attempts to use recursion, and ... attempts were not so successful in the last 10 years, in fact, SPARK2014 confirmed the rule, and recursion still shall be avoided. Which is the reason why you still need to open a branch on Doors.
Title: Re: Neat algorithms
Post by: NorthGuy on August 21, 2019, 10:31:31 pm
Everything you do or accept is your responsibility.

It's a different kind of responsibility because in that case you temporarily become a sort of team leader regarding the project, and this means that you have to somehow coordinate people around you, as well as other people by emails and video conference. In short, since they see your name on Doors, you are the person who will be called for everything concerning the project.

I wouldn't call this a different kind.

If you organize and coordinate people, you're responsible for the results. If the effort fails, you'll be the one to blame, sack or whatever. You cannot have freedom without responsibility.
Title: Re: Neat algorithms
Post by: westfw on August 21, 2019, 11:11:53 pm
Quote
Can we please return to the "Neat algorithms", topic?

The IP checksum has some interesting properties.
In theory, it's a one's complement checksum of 16bit words.  On a twos-complement ALU (as found in almost every CPU, ever), you can implement a 1's complement by using "end-around carry" - after you do a normal add, you add in any resulting carry bit:
Code: [Select]
  carry, Rd = Rs + Rs             ;; all Rx and math operations are 16bits, 2's comp.
  if (carry) Rd = Rd + 1;
But this would be relatively slow, and addition is nicely commutative, and addition is commutative and associative, and 1's comp addition is associative WRT byteswapping, so IP checksum functions usually get highly optimized.
On a generic 32bit CPU, the usual optimization is to accumulate the carries in a larger word and the possibly "fold" and add them at the end:
Code: [Select]
    for (word16 in words) {
      ck32 = ck32 + word16
    }
    while (ck32 >> 16) {
      ck32 = (ck32 & 0xFFFF) + (ck32 >> 16)
    }
But that's usually pretty silly on an 8bit CPU, and if you have both a CPU and language that allow access to the carry bit (for multi-precision math), you can do a lot better doing the end-around carry together with adding the NEXT byte:
Code: [Select]
    prevcarry = 0
    for () {
      prevcarry, ck8low = ck8low+nextbyte+prevcarry   // (and increment to following byte)
      prevcarry, ck8hi = ck8hi+nextbyte+carry
    }
    prevcarry, ck8low = ck8low + prevcarry          // final end-around carry
    ck8hi = ck8hi + prevcarry
Note that you'll need some sort of looping code that doesn't destroy the prevcarry bit...

So for example on AVR, while the usual lwip code will use the generic 32bit algorithm, you can write code that's less than half the size and more than twice the speed: https://github.com/WestfW/Duino-hacks/blob/master/ipchecksum_test/ipchecksum_test.ino

Title: Re: Neat algorithms
Post by: westfw on August 21, 2019, 11:21:53 pm
Quote
If you want to use recursion you have to [six steps, including "approval by 3 seniors" and "the QA team"
Exactly.  It's so much easier to just say "no."
I'll bet that the numerous Computer Science texts and papers that have touted and analyzed the crap out of the recursive algorithms don't count much toward the justification, either...
Title: Re: Neat algorithms
Post by: Nominal Animal on August 22, 2019, 03:33:13 am
Anyone interested in efficient RGB blending?

While this is not an algorithm per se, more like bit ops tips, the underlying idea — SIMD-like operations on unsigned integers with field width sufficient for intermediate results, and unpacking packed numeric fields for this by shifting every second field up by a full record width — is an interesting method; close enough to a neat algorithm to be noted here, IMHO.



If you have two ARGB colors as 32-bit unsigned integers, you only need five bit shifts, six binary and operations (that use one of two 32-bit masks), one binary or operation, and two additions, to calculate their 50% blend:
Code: [Select]
static inline uint32_t argb8888_blend_half(const uint32_t argb1, const uint32_t argb2)
{
    const uint32_t  rb1 = (argb1 << 7) & 0x7F807F80u,
                    ag1 = (argb1 >> 1) & 0x7F807F80u,
                    rb2 = (argb2 << 7) & 0x7F807F80u,
                    ag2 = (argb2 >> 1) & 0x7F807F80u;
    return (((rb1 + rb2) & 0xFF00FF00u) >> 8)
         |  ((ag1 + ag2) & 0xFF00FF00u);
}
The exact same approach (just different numeric constants) can be used to calculate 25%, 12.5%, 6.25%, 3.125%, 1.5625%, 0.78125%, and 0.390625% blends as well.

On 64-bit architectures, expanding the above constants to 64-bit, you can do those blends for two different pairs of color values at roughly the same cost (depending on whether the color values are already packed, or if you need to repack them; so the difference is just some ANDs, ORs, and bit shifts at most).

On 64-bit architectures, an arbitrary ARGB8888 blend only needs two (56×9=64 bit) multiplications, four shifts, six binary ands, three binary ors, one subtraction, and one addition:
Code: [Select]
static inline uint32_t argb8888_blend(const uint32_t      argb0,
                                      const uint32_t      argb1,
                                      const unsigned int  p)
{
    const uint64_t  c0 = (argb0 & 0x00FF00FF) | ((uint64_t)(argb0 & 0xFF00FF00) << 24),
                    c1 = (argb1 & 0x00FF00FF) | ((uint64_t)(argb1 & 0xFF00FF00) << 24),
                    p0 = 256 - p,
                    p1 = p;
    const uint64_t   c = p0*c0 + p1*c1;
    return ((c >> 8) & 0x00FF00FF) | ((c >> 32) & 0xFF00FF00);
}
and you can even apply that to R10G10B10 (30-bit color) by changing the numerical constants.  The "trick" is trivial; second (and fourth) component are shifted up and replaced with enough zeroes to hold the product.

On 32-bit architectures, you can do it with four (24×9=32 bit) multiplications:
Code: [Select]
static inline uint32_t rgb8888_blend(const uint32_t argb0, const uint32_t argb1, const uint32_t p)
{
    const uint32_t  rb0 =  argb0       & 0x00FF00FF,
                    ag0 = (argb0 >> 8) & 0x00FF00FF,
                    rb1 =  argb1       & 0x00FF00FF,
                    ag1 = (argb1 >> 8) & 0x00FF00FF,
                     p0 = 256 - p,
                     p1 = p;
    return (((p0*rb0 + p1*rb1) & 0xFF00FF00u) >> 8)
         |  ((p0*ag0 + p1*ag1) & 0xFF00FF00u);
}
(Total cost being four 24×9=32bit multiplications, three eight-bit right shifts, six binary ANDs, one binary OR, two additions, and one subtraction.)

Obviously, on 8- and 16-bit architectures, you're better off handling each color component separately, as the internal register width is too small for more than one color component, as the intermediate results are 16-bit.

On 32-bit (or 64-bit) architectures, there are a few binary tricks with RGB565 also.  For example, an arbitrary blend (32 steps) between two color values requires just two multiplications (27×6=32 bit), six binary ands (each using one of two 16-bit masks), four shifts, one subtraction, and one addition:
Code: [Select]
static inline uint16_t rgb565_blend(const uint16_t rgb0, const uint16_t rgb1, const uint16_t p)
{
    const uint32_t  c0 = (rgb0 & 0xF81F) | ((uint32_t)(rgb0 & 0x07E0) << 16);
    const uint32_t  c1 = (rgb1 & 0xF81F) | ((uint32_r)(rgb1 & 0x07E0) << 16);
    const uint32_t  p0 = 32 - p,  p1 = p;
    const uint32_t   c = p0*c0 + p1*c1;
    return ((c >> 5) & 0xF81F) | ((c >> 21) & 0x07E0);
}
On 64-bit architectures, you can trivially expand that to blending pairs of color values, for about the same cost.
Title: Re: Neat algorithms
Post by: brucehoult on August 25, 2019, 05:53:05 am
Why do you want to avoid recursion?

The recursion can make the algorithm not deterministic about the use of ram, it might consume too much ram, and I am limited to 4Mbyte on that board, with other stuff that needs ram. I am not even using malloc because I cannot have virtual memory, the CPU does not even have an MMU. Everything must be static and deterministic, with the worst-case known and calculated in advance.

We're talking about B-trees here. With disk-based blocks. What's your block size? How many nodes are in each block? You've probably got a fanout factor of around 100 or so, which means if you have a total data size of 100,000,000 disk blocks (400 GB, say, with 4 KB blocks), then your recursion depth is going to be *four*.  That's not going to run even an Arduino out of stack space.
Title: Re: Neat algorithms
Post by: brucehoult on August 25, 2019, 05:53:43 am
Meh.  Trees (and graphs) are so inherently, obviously, and elegantly recursive that avoiding recursion in   code that implements them is ... cutting off your nose to spite your face.
The algorithms involved are usually well defined and understood wrt their resource usage, and the insistence in some “safety standards” that programmers come up with some less-well-debugged, harder to-understand loop-based equivalent (or use a poorer data structure) is one of the main objections that I have toward such standards.  :-(

Exactly.
Title: Re: Neat algorithms
Post by: brucehoult on August 25, 2019, 05:55:10 am
Isn't the main objection against recursive algorithms, in general, about memory?

The fact that are using the computer stack (to store intermediate results obtained during each recursion step) thus making them slower and memory intensive.
The problem is not that they are memory intensive. Its that unless you take special action their memory use can be unconstrained. A routine that always calls itself a well known range of times has a known maximum memory requirement, and is OK. Many recursive routines call themselves an arbitrary number of times, and will eventually blow the stack up unless special action is taken to constrain the call depth and perhaps declare a failure to compute the true result.

With a B-tree you're going to run out of disk space long before you run out of stack space. They are extremely shallow trees, by design.
Title: Re: Neat algorithms
Post by: westfw on August 25, 2019, 06:13:55 am
OTOH, if the trees are THAT shallow, how much code complexity does the recursion save you?

Title: Re: Neat algorithms
Post by: mfro on August 25, 2019, 06:57:39 am
If you have a nice recursive algorithm but fear of overrunning stack, you can easily keep track of your recursion depth with an additional parameter you decrement each call and compare to a (hopefully reasonably choosen) maximum recursion depth (exiting with an error when the threshold has been reached).

This obviously uses extra stack space (and a little performance), so its up to you to to decide if you want to keep it in production code or remove it and call your testing (and additional safety marging due to its removal) good enough to be safe.

 
Title: Re: Neat algorithms
Post by: brucehoult on August 25, 2019, 06:59:01 am
OTOH, if the trees are THAT shallow, how much code complexity does the recursion save you?

Exactly the same amount as any other depth except 0 or 1 :-)
Title: Re: Neat algorithms
Post by: BravoV on August 25, 2019, 07:04:21 am
If you have a nice recursive algorithm but fear of overrunning stack, you can easily keep track of your recursion depth with an additional parameter you decrement each call and compare to a (hopefully reasonably choosen) maximum recursion depth (exiting with an error when the threshold has been reached).

This obviously uses extra stack space (and a little performance), so its up to you to to decide if you want to keep it in production code or remove it and call your testing (and additional safety marging due to its removal) good enough to be safe.

Not an expert in programming, nor experienced.

I've been wondering about the "control" on stack usage in recursion programming, seeing people raised lots of concerns.

My simple question is, if running a recursive program, without any control or aware of the stack limitation, isn't that like using a cpu by designed, that can't handle say like div by zero (eg. NMI or etc) from the 1st place ?

Edit : Simpler analogy, driving a tank with eyes closed, that you can't really be sure that you're crushing your enemy (expected & wanted) or your friends.  :-DD

Cmiiw.
Title: Re: Neat algorithms
Post by: RoGeorge on August 25, 2019, 07:13:12 am
There is another topic about following or not the programming guidelines.

Quoting from the attached PDF:  https://www.eevblog.com/forum/chat/dogmatic-coding-standards/msg2636658/#msg2636658 (https://www.eevblog.com/forum/chat/dogmatic-coding-standards/msg2636658/#msg2636658)
Quote
Avoiding recursion results in having an acyclic function call graph, which code analyzers can exploit to prove limits on stack use and boundedness of exe-cutions.
Title: Re: Neat algorithms
Post by: brucehoult on August 25, 2019, 07:24:17 am
There is another topic about following or not the programming guidelines.

Quoting from the attached PDF:  https://www.eevblog.com/forum/chat/dogmatic-coding-standards/msg2636658/#msg2636658 (https://www.eevblog.com/forum/chat/dogmatic-coding-standards/msg2636658/#msg2636658)
Quote
Avoiding recursion results in having an acyclic function call graph, which code analyzers can exploit to prove limits on stack use and boundedness of exe-cutions.

All that does is moves the problem from proving that you won't recurse too deep to proving that you won't run out of heap space or that the array you statically allocated for your hand-managed stack is large enough (and you didn't write buggy code doing it).
Title: Re: Neat algorithms
Post by: mfro on August 25, 2019, 07:32:26 am
... I've been wondering about the "control" on stack usage in recursion programming, seeing people raised lots of concerns.

My simple question is, if running a recursive program, without any control or aware of the stack limitation, isn't that like using a cpu by designed, that can't handle say like div by zero (eg. NMI or etc) from the 1st place ?

I think that is more a question of how to avoid/detect stack overflow in any program, not just those that use recursive algorithms.

Any program where the call stack depth is dependend on run time values is potentially prone to stack overflow, i.e. refraining from recursion is no guarantee to avoid it (it's just that recursion will make it more likely as it  increases stack load).

There are methods to detect a stack overflow; ranging from the poor man's stack guard where you vaccinate bottom of stack with a 'magic value' you regularily check for overwrites to unmapped pages below the stack on architectures that have a MMU.

Small µCs usually do not have the luxury of hardware supported faults on stack overflow, leading to the usual 'common sense' approach to ban everything that might cause it from µC code (MISRA-C, for example, explicitly prohibits recursion).
Title: Re: Neat algorithms
Post by: RoGeorge on August 25, 2019, 07:56:07 am
My intention was to move the talk about guidelines to its own topic, but I failed, sorry.  ;D
Title: Re: Neat algorithms
Post by: T3sl4co1l on August 25, 2019, 09:46:39 am
If you have a nice recursive algorithm but fear of overrunning stack, you can easily keep track of your recursion depth with an additional parameter you decrement each call and compare to a (hopefully reasonably choosen) maximum recursion depth (exiting with an error when the threshold has been reached).

This obviously uses extra stack space (and a little performance), so its up to you to to decide if you want to keep it in production code or remove it and call your testing (and additional safety marging due to its removal) good enough to be safe.

Or use a static variable and increment it on every call and decrement it on every return. :)

(Disclaimer: may break tail recursion optimization?)

Tim
Title: Re: Neat algorithms
Post by: coppice on August 25, 2019, 11:48:35 am
If you have a nice recursive algorithm but fear of overrunning stack, you can easily keep track of your recursion depth with an additional parameter you decrement each call and compare to a (hopefully reasonably choosen) maximum recursion depth (exiting with an error when the threshold has been reached).

This obviously uses extra stack space (and a little performance), so its up to you to to decide if you want to keep it in production code or remove it and call your testing (and additional safety marging due to its removal) good enough to be safe.

Not an expert in programming, nor experienced.

I've been wondering about the "control" on stack usage in recursion programming, seeing people raised lots of concerns.

My simple question is, if running a recursive program, without any control or aware of the stack limitation, isn't that like using a cpu by designed, that can't handle say like div by zero (eg. NMI or etc) from the 1st place ?

Edit : Simpler analogy, driving a tank with eyes closed, that you can't really be sure that you're crushing your enemy (expected & wanted) or your friends.  :-DD

Cmiiw.
The whole issue of blowing the stack is a big deal for any program. Recursive routines are just a notoriously troublesome example.

In most cases it is very difficult to work out the worst case biggest possible stack requirement, and make sure you have allowed enough stack space for that. On bigger machines most people do little more than allow lots of stack space and hope for the best. On small embedded machines you have to get your stack allocation right, as there is little RAM to waste. You might get some help from static analysis tools, but those tools are not generally foolproof. For example, they don't usually have the ability to work out the worst case program call depth + the worst case mix of nested interrupts.
Title: Re: Neat algorithms
Post by: NorthGuy on August 25, 2019, 02:50:00 pm
In most cases it is very difficult to work out the worst case biggest possible stack requirement, and make sure you have allowed enough stack space for that.

I don't think so. At any rate, it's much harder to predict heap behaviour, which, unlike stack, can be fragmented, possibly after days or months of running.
Title: Re: Neat algorithms
Post by: SiliconWizard on August 25, 2019, 02:57:45 pm
In most cases it is very difficult to work out the worst case biggest possible stack requirement, and make sure you have allowed enough stack space for that.

I don't think so. At any rate, it's much harder to predict heap behaviour, which, unlike stack, can be fragmented, possibly after days or months of running.

Agree with that.
Title: Re: Neat algorithms
Post by: coppice on August 25, 2019, 04:08:10 pm
In most cases it is very difficult to work out the worst case biggest possible stack requirement, and make sure you have allowed enough stack space for that.

I don't think so. At any rate, it's much harder to predict heap behaviour, which, unlike stack, can be fragmented, possibly after days or months of running.
Whilst its true that heap analysis is harder, most small embedded designs completely avoid using a heap. They can't help using the stack, and they can't afford to waste more than a few bytes. When your worst case stack use is an usually deep point in some calls, and all the interrupts having nested, hitting worst case can be a very rare event. It has to be analysed, but most development tool chains doesn't really take that need seriously, and try to help.
Title: Re: Neat algorithms
Post by: RoGeorge on August 26, 2019, 09:57:13 am
Feynman's algorithm for logarithms

Quote from: doi:10.1063/1.881196
Feynman worked out in some detail the program for
computing Hopfield's network on the Connection Ma-
chine.  The part that he was proudest of was the
subroutine for computing a logarithm.  I mention it here
not only because it is a clever algorithm, but also
because it is a specific contribution Richard made to the
mainstream of computer science.  He had invented it at
Los Alamos.

Consider the problem of finding the logarithm of a
fractional number between 1 and 2.  (The algorithm can be
generalized without too much difficulty.)  Feynman ob-
served that any such number can be uniquely represented
as a product of numbers of the form 1 + 2 -k, where k is an
integer.  Testing for the presence of each of these factors in
a binary representation is simply a matter of a shift and a
subtraction.  Once the factors are determined, the loga-
rithm can be computed by adding together the precomput-
ed logarithms of the factors.  The algorithm fit the
Connection Machine especially well because the small
table of the logarithms of 1 + 2 -k could be shared by all
the processors.  The entire computation took less time
than doing a division.
Title: Re: Neat algorithms
Post by: Nominal Animal on August 26, 2019, 12:08:01 pm
Quarter square multiplication (https://en.wikipedia.org/wiki/Shift-and-add_algorithm#Quarter_square_multiplication), one of the shift-and-add algorithms:  Let's say you know Sn = floor(n2/4), i.e. Sn is a quarter of n squared, rounded towards zero.  If x ≥ y ≥ 0, then x·y = Sx+y - Sx-y.

I wonder if this could be used to implement a fast multiplication on an FPGA?  I dunno.
Title: Re: Neat algorithms
Post by: SiliconWizard on August 26, 2019, 01:54:03 pm
Quarter square multiplication (https://en.wikipedia.org/wiki/Shift-and-add_algorithm#Quarter_square_multiplication), one of the shift-and-add algorithms:  Let's say you know Sn = floor(n2/4), i.e. Sn is a quarter of n squared, rounded towards zero.  If xy ≥ 0, then x·y = Sx+y - Sx-y.

I wonder if this could be used to implement a fast multiplication on an FPGA?  I dunno.

For small integers, probably. Now for larger ones, I think the size of the required look-up table would quickly get impractical. Ideas?
Title: Re: Neat algorithms
Post by: Nominal Animal on August 26, 2019, 04:09:21 pm
for larger ones, I think the size of the required look-up table would quickly get impractical. Ideas?
I experimented with some approaches, but for larger ones, a binary shift-and-add ("barrel multiplier"?) wins. (It is difficult to keep N×N-bit product with 2N-bit result under 2N ops.)  So, this only makes sense if you have Si precalculated -- actually, Si = floor(abs(i)2/4) for i = 1-2N .. 2N+1-1, so that for 0≤a,b≤2N-1, a×b becomes Sa+b - Sa-b.  That makes the lookup table size just under 6 N 2N bits, turning the multiplication into two lookups, two subtractions, and one addition.  I do not know how expensive such ROM tables are in general.

For example, if one has only 8x8 to 16-bit multiplication, the naïve/direct method is to compute 32×32 to 64-bit product using (a0 + 28a1 + 216a2 + 224a3)(b0 + 28b1 + 216b2 + 224b3) = 20 a0 b0 + 28 a0 b1 + 216 a0 b2 + 224 a0 b3 + 28 a1 b0 + 216 a1 b1 + 224 a1 b2 + 232 a1 b3 + 216 a2 b0 + 224 a2 b1 + 232 a2 b2 + 240 a2 b3 + 224 a3 b0 + 232 a3 b1 + 240 a3 b2 + 248 a3 b3.
That is, with K words to construct each N-bit unsigned integer value, you have K2 terms.  However, a barrel multiplier can do it in O(KN), using just one-bit shifts (through carry), and additions; in hardware, several of them can be done in parallel.  (For K=4, a 32×32=64-bit multiplication can be done in seven 64-bit additions, where each summand is constructed of at most three multiplications affecting non-overlapping bits.)

Anyway, knowing such mathematical tricks exist, may be useful for some odd purposes.  Neat, in my opinion.
Title: Re: Neat algorithms
Post by: coppice on August 26, 2019, 07:24:51 pm
Quarter square multiplication (https://en.wikipedia.org/wiki/Shift-and-add_algorithm#Quarter_square_multiplication), one of the shift-and-add algorithms:  Let's say you know Sn = floor(n2/4), i.e. Sn is a quarter of n squared, rounded towards zero.  If xy ≥ 0, then x·y = Sx+y - Sx-y.

I wonder if this could be used to implement a fast multiplication on an FPGA?  I dunno.
The critical importance of fast, silicon efficient, multiplication for DSP applications means a huge number of avenues to faster multiplication have been tried. You need to try something a bit more novel than a well established mathematical identity to have a good hope of breaking new ground. There are quite a few tricks for finding approximate answers, that people generally only know about if they work in an area where the approximation is effective. There aren't many obscure tricks for generating exact results, because everyone wants those.
Title: Re: Neat algorithms
Post by: legacy on August 26, 2019, 09:40:34 pm
how much code complexity does the recursion save you?

it saves a lot of debugging and team-working time, and it saves a lot of complexity in the test-report activity.

Title: Re: Neat algorithms
Post by: legacy on August 26, 2019, 09:51:57 pm
In most cases it is very difficult to work out the worst case biggest possible stack requirement, and make sure you have allowed enough stack space for that.

That's what I showed in my first post talking about *how* we do usually measure how much stack the algorithm is going to consume in both normal and abnormal operation. But it's just to make an idea, useful when the code is draft, and it cannot be used for any ufficial test-report.

On bigger machines most people do little more than allow lots of stack space and hope for the best. On small embedded machines you have to get your stack allocation right, as there is little RAM to waste. You might get some help from static analysis tools, but those tools are not generally foolproof. For example, they don't usually have the ability to work out the worst case program call depth + the worst case mix of nested interrupts.

We are on big embedded machines! PPC405, PPC440, PPC460, e500, etc, running RTOSes, but what not-avionic people do find hard seem the understanding of *how* do you test things.

Worst cases happen in abnormal conditions, where the "funniest" surprises greet you with their hands :D
Title: Re: Neat algorithms
Post by: legacy on August 26, 2019, 10:18:13 pm
We're talking about B-trees here. With disk-based blocks. What's your block size?

I am on B+tree, the block size is 4Kbyte, and block-pointers (iblock) are only stored on leaves.

I have rewritten the 90% of the library recursion-free, and I can easily show hot-points to guys in the testing-squad, e.g.  btree_key_inserts and  node_insert_after_splitting are two examples of "write-one-read-many" critical functions (they can be used with SMP) that insert a new { key,  iblock } into a leaf so as to exceed the tree's order, causing the leaf to be split in half.

This code is critical, but I have a way to express the exact use of memory that it consumes

Code: [Select]
    btree_key_t      key[DEFAULT_ORDER];
    btree_iblock_t   iblock[DEFAULT_ORDER];

Those two arrays are a private "temporary" sets of key and block-pointer to hold everything in order, including the new key and iblock. This stuff would be on the stack with the recursion approach, and writing a test case would be more complex by several orders of magnitude, including the fact that you cannot test the quality on any running hardware :D
Title: Re: Neat algorit
Post by: Howardlong on August 26, 2019, 10:35:11 pm
Regarding recursion, my anecdote is about a cheque printing algorithm (to print the words for the numbers) that one of my ex-colleagues presented as his own work. He was always quite full of himself, and to be honest we didn’t always see eye to eye on things: he was the young a blue sky thinker, whereas I’m a cynical old fart pragmatist with decades of battle wounds. His code was a heady mix of quite complex recursive self-referring SQL and XML. I even congratulated him on his ingenuity.

So part of me was really quite impressed with what he’d produced, but part of me was dreading supporting it as it was very write only code. But it did seem to work. One day we had a much larger than normal cheque batch, but not huge, about 30,000 cheques or so, which took down the database server, and with it all the call centre users.

The way the implementation worked under the hood is that the recursion objects were repeatedly allocated in RAM, so when the batch size hit a certain size, the machine was starved of memory, and ISTR it had 768GB RAM.

While I was figuring out how to resolve it, I took his code apart, and I suddenly realised while Googling part of it that my ex-colleague had pinched the whole code verbatim off the internet.

There are two morals. The first is that one should be very careful when re-using someone else’s code for your use case. The second is that some people can come over as smart, but aren’t.
Title: Re: Neat algorithms
Post by: westfw on August 27, 2019, 04:58:23 am
Quote
In most cases it is very difficult to work out the worst case biggest possible stack requirement
Yeah, that's the claim.  But I don't think it's true.In many cases, you're dealing with standard algorithms whose stack depth requirements have been analyzed to death, providing many published papers and textbooks and theses.  That leaves only the stack requirements for each level as "unknown", and that's easy to measure for a particular compiler/cpu.Sure, "not all recursive algorithms" are that well analyzed.  But you're throwing away the ones that are for no good reason.
Title: Re: Neat algorithms
Post by: Nominal Animal on August 27, 2019, 05:38:51 am
Quarter square multiplication (https://en.wikipedia.org/wiki/Shift-and-add_algorithm#Quarter_square_multiplication), one of the shift-and-add algorithms:  Let's say you know Sn = floor(n2/4), i.e. Sn is a quarter of n squared, rounded towards zero.  If xy ≥ 0, then x·y = Sx+y - Sx-y.

I wonder if this could be used to implement a fast multiplication on an FPGA?  I dunno.
The critical importance of fast, silicon efficient, multiplication for DSP applications means a huge number of avenues to faster multiplication have been tried.
Sure, but the solution space is even more huge.  This is just one of those utterly simple mathematical identities that people overlook, exactly because it is too simple.

I wondered whether it was a viable approach, and whether people were aware of it, considering that 1) FPGAs have LUTs (look-up tables) anyway, 2) here the look-up table contains at most 3×2N entries of 2N bits for N×N=2N-bit multiplication, unlike N2 for direct multiplication look-up, and 3) I haven't seen this mentioned in places that talk about multiplications on FPGAs (like here (http://www.andraka.com/multipli.php)).

Again, I have zero experience with FPGAs, so I do not know.  I do know that if one assumes that everything interesting has already been found and examined, one will be sorely disappointed in the reality of things.
Title: Re: Neat algorithms
Post by: mfro on August 27, 2019, 07:18:20 am
Again, I have zero experience with FPGAs, so I do not know.

FPGAs usually have dedicated DSP units on silicon and vendors usually do not tell you about implementation details.

Anyway, I would not expect anybody to be able to compete with 'direct silicon' no matter how smart and simple the implementation might be.
Title: Re: Neat algorithms
Post by: RoGeorge on August 27, 2019, 08:34:11 am
Any dedicated hardware is in fact the "direct silicon" implementation of an algorithm, so a hardware MAC could still be improved by a more performant algorithm. (MAC == multiplier followed by an adder/accumulator, the basic block for hardware DSP in FPGAs)

An example from the former ALTERA FPGAs (now Intel):

Implementing Multipliers in FPGA Devices
https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/an/an306.pdf (https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/an/an306.pdf)

A Survey and Comparative Analysis of Multiply-Accumulate (MAC) Block for Digital Signal Processing Application on ASIC and FPGA
https://scialert.net/fulltext/?doi=jas.2015.934.946 (https://scialert.net/fulltext/?doi=jas.2015.934.946)
Title: Re: Neat algorithms
Post by: coppice on August 27, 2019, 12:58:20 pm
Again, I have zero experience with FPGAs, so I do not know.
FPGAs usually have dedicated DSP units on silicon and vendors usually do not tell you about implementation details.
The big FPGAs are mostly designed as DSP engines, so they have lots of dedicated hardware multipliers. In smaller FPGAs its still common to use macros to create multipliers out of logic blocks.

There is little point in the FPGA makers telling you the implementation details. You can easily tell from the specs. Its going to be based around a Booth or a Wallace tree, and if it has a latency greater than 1 clock cycle there will be some pipelining to help with the propagation delays in the tree.
Title: Re: Neat algorithms
Post by: legacy on August 27, 2019, 08:39:07 pm
about multiplications

My Arise-v2 core does implement MAC by a booth module. I am not considering any specific hardware, but rather a generic piece of VHDL code to be tested and simulated before it gets synthesized, so in this working conditions, "booth" is the best solution for me, and I would recommend it to everyone who wants to study how a fast MUL works.

Anyway, the division module is still problematic for me because it still takes 32 clocks for computing 32 bit. The reson is that I haven't yet found an algorithm with an acceptable complexity.
Title: Re: Neat algorithms
Post by: Nominal Animal on August 28, 2019, 04:52:50 am
Anyway, the division module is still problematic for me because it still takes 32 clocks for computing 32 bit. The reson is that I haven't yet found an algorithm with an acceptable complexity.
What are you using now? Binary long division (https://en.wikipedia.org/wiki/Long_division#Non-decimal_radix) (i.e., shift and subtract)?
Title: Re: Neat algorithms
Post by: Berni on August 28, 2019, 05:24:48 am
Division is going to be slow no matter how you do it. It's normal for even modern CPUs to take many times more cycles do to a divide compared to a multiply. So whenever possible you usually masquerade the divide as a multiply operation with bit shift. Tho its not always possible to do, but is in most signal processing use cases.

Yeah these "DSP" blocks in FPGAs are mostly just glorified multiply accumulate units. The only reason they are called DSP is because multiply accumulate performance is the primary performance metric for DSPs as that's what they are good at (And most DSP algorithms need it). So the point is that these DSP blocks in a FPGA are typically significantly faster than a multiplier implemented in the logic fabric and it uses less silicon area so more of them can be packed in. Having more is very useful because you typically in FPGAs dedicate multiply units to one task in a large algorithm so you usually need many of them to implement it, as well as the typical running of many copies of the algorithms in parallel to take advantage of a FPGA as otherwise you might as well just be using a CPU or normal DSP in a lot of cases.
Title: Re: Neat algorithms
Post by: legacy on August 28, 2019, 01:00:44 pm
What are you using now? Binary long division (i.e., shift and subtract)?

Old-school is slow, but it's also simple  :D

I have also tried to implement a newton-raphson module, which is fast but damn too complex

(https://i.ibb.co/W2Cb4vq/arise-v2-division.png)

So, this is what I have currently implemented (in the simulation, the division is 8bit)
Title: Re: Neat algorithms
Post by: SiliconWizard on August 28, 2019, 03:40:15 pm
But do you have a working ADA compiler for your Arise-v2 core yet? :P
Title: Re: Neat algorithms
Post by: legacy on August 28, 2019, 05:04:29 pm
But do you have a working ADA compiler for your Arise-v2 core yet? :P

LOL. Unfortunately, only an asm-compiler  :D
Title: Re: Neat algorithms
Post by: coppice on August 28, 2019, 08:10:03 pm
Division is going to be slow no matter how you do it.
Yeah. Come up with a way to divide that's as fast as multiplying and generations of algorithms working around the slowness of divide go out the window.  :)
Title: Re: Neat algorithms
Post by: jpanhalt on August 28, 2019, 10:49:20 pm
Division is going to be slow no matter how you do it.
Yeah. Come up with a way to divide that's as fast as multiplying and generations of algorithms working around the slowness of divide go out the window.  :)

That's like saying you need a shorter piece of string.

What size data are you dealing with?  128/64 bit?  8/8 bit?    Or something else?  How many Tcy does your current algorithm take?   In other words, there are some fairly fast algorithms for division.  The so-called Kenyan (aka Russian and a lot of other names) and PicList has  some others.
Title: Re: Neat algorithms
Post by: coppice on August 29, 2019, 12:48:38 am
Division is going to be slow no matter how you do it.
Yeah. Come up with a way to divide that's as fast as multiplying and generations of algorithms working around the slowness of divide go out the window.  :)

That's like saying you need a shorter piece of string.

What size data are you dealing with?  128/64 bit?  8/8 bit?    Or something else?  How many Tcy does your current algorithm take?   In other words, there are some fairly fast algorithms for division.  The so-called Kenyan (aka Russian and a lot of other names) and PicList has  some others.
Obviously for really small numbers of bits you can often use a lookup approach. As the bits expand you are quickly into successive approximation approaches, typically with a skip ahead for small values, to avoid the logic exploding to unrealistic numbers of gates. The Russian division technique is basically successive approximation, although its often expressed in a way that doesn't immediately look like it is. After so many years of people failing to find the division equivalent of Wallace or Booth, that can be implemented in a realistic number of gates, with all the carry look ahead needed to avoid severe ripple delays, it seem unlikely we are going to do better than we do today.
Title: Re: Neat algorithms
Post by: SiliconWizard on August 29, 2019, 12:52:05 am
The fastest way of dividing, of course, being not to divide at all.
 >:D
Title: Re: Neat algorithms
Post by: coppice on August 29, 2019, 12:54:48 am
The fastest way of dividing, of course, being not to divide at all.
 >:D
The fastest way to divide is typically to reformulate the problems until it turns into a multiply.
Title: Re: Neat algorithms
Post by: legacy on August 29, 2019, 05:50:16 am
What size data are you dealing with?

The current VHDL code is parametric, and the data_size for all the ALU's modules, and for all the Cop1's module can be { 8, 16, 32, 64 } bit. This is not a problem for the simulator, neither for the synthesizers (Xilinx ISE v10.1 for Spartan2, Xilinx ISE v14 for Spartan3-6), but concerning the ISA ... well, the Arise-v2-r2 core is ... 32bit; there is also a 64bit branch, but people inside the group (four friends including me) have 4*3*2 (n!) different opinions regarding details (d'oh :palm: )

So, let's assume: 32bit

How many Tcy does your current algorithm take?

Currently, it evolves 1 clock to compute 1 bit, so 32 clock edges to compute quotient and reminder, both of 32bit of size, plus 2 clocks for handling the sign and the fsm.

Kenyan (aka Russian and a lot of other names) and PicList has  some others.

can you tell me more?
Title: Re: Neat algorithms
Post by: jpanhalt on August 29, 2019, 10:53:02 am
@legacy

Re: "Kenyan" Here you go: http://www.piclist.com/techref/method/math/muldiv.htm (http://www.piclist.com/techref/method/math/muldiv.htm)

Title: Re: Neat algorithms
Post by: jpanhalt on August 29, 2019, 12:35:21 pm
Quote from: legacy


So, let's assume: 32bit

How many Tcy does your current algorithm take?

Currently, it evolves 1 clock to compute 1 bit, so 32 clock edges to compute quotient and reminder, both of 32bit of size, plus 2 clocks for handling the sign and the fsm.

I was thinking embedded microcontrollers for which (assuming 8-bit MCU's) the fastest routine is considerably slower than the product of the two arguments, e.g., 24-bit divided by 16-bit (384). For example:

http://www.piclist.com/techref/microchip/math/div/24by16.htm (http://www.piclist.com/techref/microchip/math/div/24by16.htm)
Quote
;Inputs:
;   Dividend - AARGB0:AARGB1:AARGB2 (0 - most significant!)
;   Divisor  - BARGB0:BARGB1
;Temporary:
;   Counter  - LOOPCOUNT
;   Remainder- REMB0:REMB1
;Output:
;   Quotient - AARGB0:AARGB1:AARGB2
;
;       Size: 28
; Max timing: 4+24*(6+6+4+3+6)-1+3+2=608 cycles (with return)
; Min timing: 4+24*(6+6+5+6)-1+3+2=560 cycles (with return)
         
Title: Re: Neat algorithms
Post by: NorthGuy on August 29, 2019, 01:05:51 pm
Currently, it evolves 1 clock to compute 1 bit, so 32 clock edges to compute quotient and reminder, both of 32bit of size, plus 2 clocks for handling the sign and the fsm.

16-bit PICs do 16-bit division in 18 cycles (16 + 2). What a coincidence ;)
Title: Re: Neat algorithms
Post by: mrflibble on August 29, 2019, 02:13:48 pm
On the subject of Kenyan/Karatsuba/etc multiplication & division, this book might be of interest:
* https://cosec.bit.uni-bonn.de/science/mca/
* https://cosec.bit.uni-bonn.de/fileadmin/user_upload/science/mca/mcaintro.pdf
* https://assets.cambridge.org/97811070/39032/excerpt/9781107039032_excerpt.pdf
* https://assets.cambridge.org/97811070/39032/index/9781107039032_index.pdf

Specifically chapters 5,6,8,9,14,15. Or probably for the topics at hand in this order: chapters 8, 9, 5, 6, 14, 15.
Title: Re: Neat algorithms
Post by: mrflibble on August 29, 2019, 05:20:30 pm
My intention was to move the talk about guidelines to its own topic, but I failed, sorry.  ;D

Feynman's algorithm for logarithms

Quote from: doi:10.1063/1.881196
Feynman worked out in some detail the program for
computing Hopfield's network on the Connection Ma-
chine.
...

Well, you at least tried to keep it somewhat on topic, which counts for something. Oh, and bonus points for providing the doi:10.1063/1.881196 (https://doi.org/10.1063/1.881196).  :-+ I wish people would use that more often on here. And especially certain people who bang on about providing references ... and then proceed to not provide references themselves. Or they do provide references, but forget to actually check them. Thus ending up with a reference that resolves to, yes the correct author & subject, but alas the wrong publication. Which of course happens to be a condensed version of the original paper, and equally of course, does not contain the particular snippet pertaining to the subject in the forum post. Takes serious practice, that!  ;D

Also, thanks for the link you posted in some other thread: https://www.av8n.com/. (https://www.av8n.com/.) There's lots of interesting tidbits there. Especially the bits about ill posed problems and metacognition.

And before I forget, another (IMO) awesome resource is the Euler Archive. In particular the How Euler Did It (http://eulerarchive.maa.org/hedi/) section. From the main page:
Quote from: http://eulerarchive.maa.org/hedi/
How Euler Did It is an online MAA column, written by Ed Sandifer of Western Connecticut State University from 2003 to 2010. Each article examines a specific work or concept developed by Leonhard Euler, with the topics ranging from number theory to geography to fluid mechanics.

The Euler Archive, in collaboration with the MAA, hosts the article collection for the How Euler Did It series. The series was published in two volumes, both volumes are available from the MAA: How Euler Did It (2007), How Euler Did Even More (2014).

And while we're at it Project Euler (https://projecteuler.net/) provides a nice collection of Euler related problems. There are plenty to choose from to provide a bit of a challenge. :)
Quote from: https://projecteuler.net/
What is Project Euler?

Project Euler is a series of challenging mathematical/computer programming problems that will require more than just mathematical insights to solve. Although mathematics will help you arrive at elegant and efficient methods, the use of a computer and programming skills will be required to solve most problems.

The motivation for starting Project Euler, and its continuation, is to provide a platform for the inquiring mind to delve into unfamiliar areas and learn new concepts in a fun and recreational context.
Title: Re: Neat algorithms
Post by: jpanhalt on August 29, 2019, 05:44:29 pm
Currently, it evolves 1 clock to compute 1 bit, so 32 clock edges to compute quotient and reminder, both of 32bit of size, plus 2 clocks for handling the sign and the fsm.

16-bit PICs do 16-bit division in 18 cycles (16 + 2). What a coincidence ;)

That's a good point, and every time I struggle with multiply or divide with my my little PIC16F chips , I wonder about making the jump to a PIC24F.  However, I would not consider such slick hardware solutions, "neat algorithms."
Title: Re: Neat algorithms
Post by: coppice on August 29, 2019, 05:47:11 pm
Currently, it evolves 1 clock to compute 1 bit, so 32 clock edges to compute quotient and reminder, both of 32bit of size, plus 2 clocks for handling the sign and the fsm.

16-bit PICs do 16-bit division in 18 cycles (16 + 2). What a coincidence ;)

That's a good point, and every time I struggle with multiply or divide with my my little PIC16F chips , I wonder about making the jump to a PIC24F.  However, I would not consider such slick hardware solutions, "neat algorithms."
You would if you were a hardware engineer, looking for the fastest or smallest way to implement an instruction set.
Title: Re: Neat algorithms
Post by: mrflibble on September 02, 2019, 06:16:10 pm
Tango trees? Is there any good C implementation? Where is it used?
"Is there any good C implementation?" ... Only several ways to find out!

"Where is it used?"   ... Anywhere you'd use a balanced binary search tree, but need better performance than said balanced binary search tree. For a quick overview, see for example this here link (https://www.geeksforgeeks.org/tango-tree-data-structure/). For more in depth info, just google "Erik Demaine tango trees" (https://duckduckgo.com/?q=erik+demaine+tango+trees). That'll give you plenty of info of various levels of detail to choose from.
Title: Re: Neat algorithms
Post by: westfw on September 05, 2019, 07:20:40 am
Quote
Tango trees?
"Preferred paths ... most recently accessed..."
Sort-of sounds like a formalized and well-analyzed caching scheme, which is sort-of neat...

Title: Re: Neat algorithms
Post by: legacy on September 06, 2019, 11:56:50 am
Anywhere you'd use a balanced binary search tree

yeah, it's a log(log(n)). But it looks adding more complexity, therefore it seems that it makes sense only for large data-set, when "n" is a very big number, and so big that log(n) >> log(log(n)).
Title: Re: Neat algorithms
Post by: brucehoult on September 06, 2019, 10:31:46 pm
Anywhere you'd use a balanced binary search tree

yeah, it's a log(log(n)). But it looks adding more complexity, therefore it seems that it makes sense only for large data-set, when "n" is a very big number, and so big that log(n) >> log(log(n)).

It looks horribly complex. The code will not be small. For useful data sizes (between 64k nodes and 4G nodes) you need 5 flag bits in every node. That's going to turn into 8 bits if not 16 or 32 in reality.

If you've got that much data then you're probably dealing with storage with either cache lines or TLB or actual mass storage "disk blocks". Something with the property that you have a relatively long latency but then get a large chunk of data very quickly. When you have that kind of characteristic a binary tree is a very poor fit. Once you get that data block locally you should make as much use of it as possible. That means you should be using a B{*,+}-tree not a binary tree for vastly reduced tree depth and hence number of access time delays.

For things small enough to be in local memory with truly random access in constant time I use Scapegoat trees. That's a balanced binary tree with *zero* extra flag data in each node and extremely small and simple code that I can write in about ten minutes from memory in an exam/contest setting. They outperform AVR or red-black trees on an amortised (i.e. throughput) basis. However they are not suitable for real-time uses as on the few occasions when a rebalance is needed it is O(N) on some subtree. Splay trees also have similar performance characteristics, but much more complex code.
Title: Re: Neat algorithms
Post by: brucehoult on September 06, 2019, 11:21:04 pm
Here's a complete program that uses a scapegoat tree to sort lines of text. The whole program is 85 lines. The basic scapegoat tree code including find(), insert() and rebalancing is 43 lines of code, 31 of which are not blank or just "}".

Fifty entries in the global minNodesForDepth table is good for over half a billion nodes in the tree. You might want to replace the init_table() function with a static array initializer.

Code: [Select]
#include <stdio.h>
#include <string.h>
#include <stdlib.h>

struct node {char *key; node *left; node *right;};
int node_cnt;

#define BAL_OK 0
#define ALPHA 0.666667
#define TABSZ 50
int minNodesForDepth[TABSZ];

void init_table(){
  double A = 1.0/ALPHA;
  for (int n=0; n<TABSZ; ++n, A*=1.0/ALPHA){
    minNodesForDepth[n] = A;
  }
}

void printtree(node *root, int depth=0){
  if (!root) return;
  printtree(root->left, depth+1);
  for (int i=0; i<depth; ++i) printf("  ");
  printf("%s", root->key);
  printtree(root->right, depth+1);
}

node *find(node *root, char *key){
  int cmp = strcmp(key, root->key);
  if (cmp == 0) return root;
  return find(cmp<0 ? root->left : root->right, key);
}

node *makelinear(node *root, node *lst){
  if (!root) return lst;
  root->right = makelinear(root->right, lst);
  node *left = root->left;
  root->left = 0;
  return makelinear(left, root);
}

node *maketree(node *&lst, int n){
  if (n == 0) return 0;
  node *left = maketree(lst, (n-1)/2);
  node *root = lst; lst = lst->right;
  root->left = left;
  root->right = maketree(lst, n/2);
  return root;
}
 
int cntSz(node *root){
  return !root ? 0 : 1 + cntSz(root->left) + cntSz(root->right);
}

int insert(node *&root, node *n, int depth=0){
  if (root == 0){
    root = n;
    return ++node_cnt < minNodesForDepth[depth] ? 1 : BAL_OK;
  }

  int cmp = strcmp(n->key, root->key);
  int sz = insert(cmp<0 ? root->left : root->right, n, depth+1);
  if (sz == BAL_OK) return sz;

  int tot = sz + 1 + cntSz(cmp<0 ? root->right : root->left);
  if (sz <= tot*ALPHA) return tot;
  node *lst = makelinear(root, 0);
  root = maketree(lst, tot);
  return BAL_OK;
}

int main(){
  char buf[1000];
  init_table();
  node *tree = 0;
  while (fgets(buf, 1000, stdin)){
    node *n = (node*)malloc(sizeof(node));
    n->key = strdup(buf);
    n->left = n->right = 0;
    insert(tree, n);
  }

  printtree(tree);
  return 0;
}
Title: Re: Neat algorithms
Post by: legacy on September 07, 2019, 09:08:09 am
Once you get that data block locally you should make as much use of it as possible.

Yup, it's what layer0 and layer1 do.

Layer0: DMA and low-level primitives to read/write a disk-block
Layer1: caching and LRU

Layer1 uses ram as disk-cache, while Layer0 tries to reduce the number of disk IOp by aggregating them into a large chain. This also impacts on the "i-block" allocator (find_empty_block), which tries to aggregate i-blocks as close as possible (not always possible) to facilitate the Layer1's job.

That means you should be using a B{*,+}-tree not a binary tree for vastly reduced tree depth and hence number of access time delays.

Yup, in Layer2 I use "b+tree" because in "binary-trees" a node stores only a key and two pointers to the child nodes, whereas I need that each leaf stores as most i-blocks-entries (entries, glue between folders and files) or metadata (folders) or chunk (files' content) as possible, and seeks are always too expensive!

Each trunk and leaf is of 4Kbyte of size in my current implementation, and the layer0 consider it as the atomic size of each disk I/O operation (while the PHY layer might only consider 512 or 1024 bytes per command).
Title: Re: Neat algorithms
Post by: SiliconWizard on September 07, 2019, 03:45:21 pm
Note that many software implementations of trees in general, when manipulated in RAM, are a lot less efficient than expected from the algorithms due to accesses all over the place that lead to many cache misses. So in that respect, the choice of algorithm and data structures are key. It's not just a problem with RAM either, any medium that can be accessed more efficiently as consecutive blocks (hard drives being also that way for instance) are concerned.

Title: Re: Neat algorithms
Post by: brucehoult on September 07, 2019, 05:14:24 pm
Note that many software implementations of trees in general, when manipulated in RAM, are a lot less efficient than expected from the algorithms due to accesses all over the place that lead to many cache misses. So in that respect, the choice of algorithm and data structures are key. It's not just a problem with RAM either, any medium that can be accessed more efficiently as consecutive blocks (hard drives being also that way for instance) are concerned.

If you're doing really random pointer chasing, such that each access is just as likely to hit a new memory page as well as a new cache line then you're likely to lose more performance to TLB refills than to cache misses.

Look at the very popular ARM A53, as used as the LITTLE processor in most current phones and tablets, and the main processor in RaspberryPi or lower end phones.  It has typically 16 KB or 32 KB or 64 KB of L1 cache (256, 512, or 1024 cache lines), but it has only *10* L1 TLB entries, and a secondary L2 TLB with 512 entries.

I don't know how much time it takes for a L1 TLB miss that hits in L2, but as soon as you have more than 40 KB of data that you are jumping around in randomly you're going to start getting frequent TLB misses as well as the cache misses.
Title: Re: Neat algorithms
Post by: Nominal Animal on September 07, 2019, 08:26:01 pm
For in-memory data structures, I like hash tables with chained entries, the hash key stored in the entry, and the table itself an array of pointers:
Code: [Select]
struct hash_entry {
    struct hash_entry *next;
    size_t  hash;
    /* Data */
};

struct hash_table {
    size_t  size;  /* Number of entries in entry pointer array */
    size_t  used;  /* Number of entries in the table */
    struct hash_entry **entry;
    /* pthread_rwlock_t  rwlock; for thread-safe operation */
};
This is definitely not the most efficient way to go about it.  Pointer chaining trades some efficiency for robustness: if you have an entry, insert never fails.  Including the hash value itself in each node trades memory for ease of use, and reduces the number of full comparisons needed when a pointer chain grows long.

Essentially, it trades some efficiency, to keep itself simple and robust, and to avoid any pathological cases (like "you must resize the hash table now or you cannot insert that entry").  Even the thread-safe locking scheme is obvious: a single rw-lock protects the hash table structure.

I would not call this neat, though; it's just simple and robust.
Title: Re: Neat algorithms
Post by: westfw on September 07, 2019, 11:02:01 pm
Quote
many software implementations of trees in general, when manipulated in RAM, are a lot less efficient than expected from the algorithms due to accesses all over the place that lead to many cache misses.
In theory, that doesn't matter, because you're only affecting K in the overall t = K * f(N) performance equation.  If your N really justifies going from a log(N) algorithm to log(log(N)) (Tango Trees, say. No, wait - that's the "competitive ratio", a term I don't understand), then cache misses are going to need to be REALLY REALLY EXPENSIVE before the improvement goes away. I mean, for 264 items, log2(n) is 64 and log2(log2(n)) is ... 6.  But going from the O(N) to O(log(N)) is going to pretty clearly be a huge win, even with ~64 "cache misses", compared to 264 "perfect" memory accesses.
In reality, of course:
Title: Re: Neat algorithms
Post by: Nominal Animal on September 08, 2019, 12:40:17 am
If comparing the real-world difference between say O(log N) and O(N) algorithms is interesting, implement a radix sort (which is O(N) but with a rather high constant factor, and requires rather large amount of extra memory), and compare it to other sort algorithms.  To make it a bit more realistic, use key-value pairs as elements to be sorted, say double and a pointer on 64-bit architectures.

For very small N, insertion sort (https://en.wikipedia.org/wiki/Insertion_sort) is the best choice in practice, even though it has O(N2) time complexity.  Most common sorting algorithm is probably quicksort (https://en.wikipedia.org/wiki/Quicksort), even though it has O(N2) rare worst-case and O(N log N) expected time complexity.  In practice, radix sort (having O(N) time complexity), temporarily converting the IEEE-754 double keys so that all finite values sort according to their unsigned integer value by flipping either the sign bit or all other bits, will be fastest for large enough N.  However, the exact point depends heavily on the radix size used (determining the size of the temporary arrays needed), and how well that matches the hardware architecture used; and even then, because it is cache-intensive, it can negatively impact surrounding code as it evicts a lot of data from the caches.  The last time I checked this thoroughly, the changeover was N roughly a couple of million, but this was a decade ago on x86-64.



Brucehoults post and example code (https://www.eevblog.com/forum/programming/neat-algorithms/msg2671086/#msg2671086) that sorts input lines by reading the data into a (scapegoat) tree, is also an excellent example of how the chosen algorithmic approach affects real-world performance.

In mid-nineties, a C course I took required one to implement a simplified 'sort' command in an Unix environment (Solaris, if I recall correctly). I used a very similar approach, reading lines into a binary search tree, implementing the entire program in about 300 lines, a third of which were comments.

This was the era of spinning disk hard drives, and I/O speeds were much slower than they are today with SSD drives and multi-gigabyte RAM sizes (making many workloads completely cacheable in RAM).  Reading the input was the clear bottleneck.  If you first read the input to memory, then sort them, then you wait for all I/O to complete before you start computation (sorting).  If you read the lines into a sorting data structure like a tree, you essentially do the computation while I/O is still underway; and your sort is basically complete, when I/O completes.  This means that the read-then-sort takes much longer, using real-world wall-clock measurement, than reading the lines into a tree; even if the read-to-tree uses somewhat more CPU time.

(Today, if the input is in a file, it is likely cached, and the I/O is essentially free.  You need to use a slow pipe (a network connection or similar), or a slow(ish) generator, to see the difference.)

Just like comparing algorithms based on their big-O notation (https://en.wikipedia.org/wiki/Big_O_notation), even terms like fast must be carefully qualified, to convey real-world applicable information.  When we say "fast", we can refer to CPU time, or wall clock time (or, if we are stupid, our gut feeling about the algorithm).
For us humans, interactive tasks' speed should be measured in real-world wall clock time, but batch and background jobs in CPU time.  Similarly, asymptotic time behaviour is only really indicative of the big-N behaviour, and does not tell us anything about the constant factors.

Even comparing algorithm behaviour via real-world testing must be categorized into at least two sets: microbenchmarks that ignore everything except the task at hand, in an effort to compare apples to apples; and true benchmarks, where tasks similar to or simulating real-world computing tasks are compared to each other.

This means that to be able to sort in a neat way, one should have several different sorting algorithms and approaches in their toolbox ready to be used.  The neat part, then, is picking the correct ones to match the use cases and user preferences for the kind of tasks.
Title: Re: Neat algorithms
Post by: T3sl4co1l on September 08, 2019, 01:02:48 am
A valuable lesson about modern CPUs is, computation is almost for free.

And by "modern", I mean anything with generous caches, say Pentium 1 or 2 level and up.  Since after all, that's why caches exist!

Don't be afraid to waste computation, as long as it contributes to a faster overall result; given other constraints of course (like total CPU usage or battery consumption).  Concentrate more on IO latency and cache hits, than on inner loops.  (If in doubt, profile!)  Now that SMP is pervasive, don't be afraid to parallelize operations.  Example from simulation, you might test a bunch of timesteps at once (dispatch a different matrix initialization and inversion operation to each core), you only need one to work but you can test a bunch in parallel rather than having to wait for all the failures first.

Tim
Title: Re: Neat algorithms
Post by: coppice on September 08, 2019, 10:31:47 am
A valuable lesson about modern CPUs is, computation is almost for free.
Unless the computation is mobile, where every unnecessary instruction drains the battery a little more.
Title: Re: Neat algorithms
Post by: legacy on September 08, 2019, 01:22:17 pm
This was the era of spinning disk hard drives, and I/O speeds were much slower than they are today with SSD drives and multi-gigabyte RAM sizes (making many workloads completely cacheable in RAM).  Reading the input was the clear bottleneck.  If you first read the input to memory, then sort them, then you wait for all I/O to complete before you start computation (sorting).  If you read the lines into a sorting data structure like a tree, you essentially do the computation while I/O is still underway; and your sort is basically complete, when I/O completes.  This means that the read-then-sort takes much longer, using real-world wall-clock measurement, than reading the lines into a tree; even if the read-to-tree uses somewhat more CPU time.

This is what layer0 and layer1 do on my filesystem, but they are under the constraint that ... there is still no journaling, and you have to guaranty somehow that the consistency of the filesystem is not entirely compromised when the CPU crashes or when the board loses the power supply.

This introduces a trade-off on how large you can do your DMA-window when the increasing of IO performance tells you to make it the largest possible, with even the time-window the largest possible.

In short, this is telling you to consider a harddrive like if it was a block of large shared non-volatile ram, with a built-in coherent cache handler.

We will have this in the future (I do believe first in PDAs and Tablets) as soon as the flash-storage technology will be replaced by FeRam for the same production cost (I have made a prototype, 64Megabyte FeRam SSD cost 60 euro, and it's MEGA byte, not GIGA byte).

In the meanwhile, we still have to deal with ram unable to be volatile, so it loses data on crashes, so ... you have to reduce the window during data-moving from ram to disk, even if this means that the block that you are writing a block down to the disk is going to be modified again in a couple of cycles.

The trade-off requires you a myopic vision

You need to reduced time window in your to guaranty the coherency of the b+tree algorithm when it's structure is modified, and since you cannot have a large time window, you will have a lot of cases when you will write blocks that you would have better to keep in ram because they represent "temporary files", so information that in the long time will not even go copied on the hard-drive, and they simply waste IO cycles.
Title: Re: Neat algorithms
Post by: legacy on September 08, 2019, 01:30:42 pm
I really like the RISC-OS/Classic (=<4.39) filesystem, you are afraid of no crash because its IO time-window is so narrow that it goes almost on every data-change.

Do you modify a byte? The block is immediately written back on the harddrive.

It's solid as a stone, but it's damn slow  :D
Title: Re: Neat algorithms
Post by: SiliconWizard on September 08, 2019, 03:53:11 pm
Quote
many software implementations of trees in general, when manipulated in RAM, are a lot less efficient than expected from the algorithms due to accesses all over the place that lead to many cache misses.
In theory, that doesn't matter, because you're only affecting K in the overall t = K * f(N) performance equation.  If your N really justifies going from a log(N) algorithm to log(log(N)) (Tango Trees, say. No, wait - that's the "competitive ratio", a term I don't understand), then cache misses are going to need to be REALLY REALLY EXPENSIVE before the improvement goes away.

Well not really. It's all in the break-even point for N for different algorithms on a given platform... (also and as I hinted above, I'm not talking just about cache misses, but anything that could make accesses more expensive if they are further apart.) This point is not trivial to find theoretically and  the corresponding N could be larger than you may initially think.

And with that said, I'm not just talking about algorithms that have a different complexity in O(N). Just picking the most efficient algorithm for a given context amongst algorithms of similar complexity is no trivial task. For instance, some sorting algorithms in O(N.log(N)) have better "locality" than others, which makes them more efficient any time locality matters in terms of access time.

With that in mind, many implementations of tree-based algorithms are definitely not very efficient due to locality issues. Good implementations of trees in that respect definitely take some serious thought.
Title: Re: Neat algorithms
Post by: brucehoult on September 09, 2019, 04:37:31 am
A valuable lesson about modern CPUs is, computation is almost for free.
Unless the computation is mobile, where every unnecessary instruction drains the battery a little more.

But orders of magnitude less than an unnecessary memory load, especially from actual RAM.
Title: Re: Neat algorithms
Post by: westfw on September 09, 2019, 06:55:54 am
Quote
It's all in the break-even point for N for different algorithms on a given platform.
Well, sure.  But if you're contemplating moving from an N2 algorithm to an N*log(N) algorithm because N is getting "big", you probably don't need to immediately worry that you'll end up with an Nlog(N) algorithm that is sub-optimal due to cache locality issues...  (What's that saying?: "best is the enemy of better")
(By all means: MEASURE!)
Title: Re: Neat algorithms
Post by: mrflibble on September 16, 2019, 02:26:52 pm
A valuable lesson about modern CPUs is, computation is almost for free.
Unless the computation is mobile, where every unnecessary instruction drains the battery a little more.

But orders of magnitude less than an unnecessary memory load, especially from actual RAM.
Mobile, shmobile. Who the fuck cares about mobile, except for people wanting to make a phone call or people looking to cultivate an attention deficit disorder? Yeah yeah, phone faster than some old chippie in 1960's rocket, got it. Still too slow!

Where can I get one of those modern CPUs where computation is almost free? Or alternatively, where can I adjust this reality's dictionary lookup for "almost"? Sure computation is almost for free, but not nearly almost enough! Not only is there an endless supply of better fools to mess with any attempt at making stuff foolproof, there is also an endless supply of "better" problems to counteract the low low price of computation. Sure, my computer is significantly more capable than whatever I had 10 years ago. Yet, somehow I keep running into problems that aaaaaalmost fit ... but not quite.

Computation may be cheap and memory is cheap not super expensive, but you never have enough of either. For example, for some number theoretical mucking about I use a list of precomputed primes. Sure you could run a sieve from scratch every time you need some primes, but that does take longer, uses more energy, and did I mention it takes longer? Besides, at some point the sieve no longer fits main memory, and you will have to start using external storage. And then you find that the files are so big that you had better compress it. And then you find out that a lookup of "is this number in the list" requires a seek in the list on external storage, which results in a seek in a compressed file, which is generally not all that supported by your chosen compressor. Oh sure you can chop it up into several files, but that is just so arbitrary.

A large list of primes on which you want to do fast search operations is also a nice example of where you would like log(log(n)) instead of log(n). For ~ 232 entries a regular binary tree takes ~ 32 operations to do a search. If you can do that in log(log(n)) operations you go from 32 to 5. A factor of 6 in runtime can be sufficient motivation to use something a little more complex. Besides, if we all valued simplicity so much, why don't I see an abacus on every desk? :rant:

If a hideous compressed tree of trees of trees is significantly faster than the happy rainbow abacus, I will happily pay the brainfuck once.

And speaking of brainfuck ... https://esolangs.org/wiki/Language_list#B

I've been meaning to play around with BrainHack (https://github.com/Flakheads/BrainHack), but I am too chicken. I think I spot a "Danger! Massive time sink!" sticker on it, but I don't dare take a closer look. :scared:

Oh yeah, almost forgot ... While constructing some random prime trees I noticed that to build a balanced tree from a monotonic sequence for lowest cost (no rearranging of unbalanced trees while inserting) it was easiest to do that using the in-order vs in-memory transformation that Nominal Animal mentioned waaay back. :)
Title: Re: Neat algorithms
Post by: NorthGuy on September 16, 2019, 03:48:34 pm
A large list of primes on which you want to do fast search operations is also a nice example of where you would like log(log(n)) instead of log(n). For ~ 232 entries a regular binary tree takes ~ 32 operations to do a search. If you can do that in log(log(n)) operations you go from 32 to 5. A factor of 6 in runtime can be sufficient motivation to use something a little more complex.

If you search primes within 32-bit numbers, you have only 2G numbers to test (after eliminating even numbers which are not primes). So, you only need 2G-bit lookup table - 256MBytes of memory. Today, this is not much by any means. 4GByte lookup table will give you all primes in 40-bit numbers. And this is still quite modest amount of RAM by today's standards. One lookup and you're done. That is how you get things done when you have lots of resources, not by applying older algorithms which were designed to circumvent resource scarcity.
Title: Re: Neat algorithms
Post by: SiliconWizard on September 16, 2019, 04:04:38 pm
A large list of primes on which you want to do fast search operations is also a nice example of where you would like log(log(n)) instead of log(n). For ~ 232 entries a regular binary tree takes ~ 32 operations to do a search. If you can do that in log(log(n)) operations you go from 32 to 5. A factor of 6 in runtime can be sufficient motivation to use something a little more complex.

If you search primes within 32-bit numbers, you have only 2G numbers to test (after eliminating even numbers which are not primes). So, you only need 2G-bit lookup table - 256MBytes of memory. Today, this is not much by any means. 4GByte lookup table will give you all primes in 40-bit numbers. And this is still quite modest amount of RAM by today's standards. One lookup and you're done. That is how you get things done when you have lots of resources, not by applying older algorithms which were designed to circumvent resource scarcity.

True. Of course it all depends on the application and the machine it will run on. Wasting that much memory in some cases is largely justified, in others not so much. But your point in general is worth considering.

In some cases, there are also ways to circumvent the need and translate it to something else. For instance, here, you may actually not NEED to test any integer number being prime or not. It's sometimes possible to transform your approach by generating prime numbers instead of having to test them, something that's much faster than testing for primality. You sometimes need to think outside the box.

The same idea can be applied in many cases. A trivial one, for instance, would be generating sorted lists. If your application actually gets items one by one, it's much faster to insert each at the right place than to just insert each new one at the end of the list and re-sort the entire list after each, if you need the list to be sorted at all times. It may sound obvious, but in practice you often encounter naive approaches like the latter!
Title: Re: Neat algorithms
Post by: mrflibble on September 25, 2019, 01:49:17 pm
A large list of primes on which you want to do fast search operations is also a nice example of where you would like log(log(n)) instead of log(n). For ~ 232 entries a regular binary tree takes ~ 32 operations to do a search. If you can do that in log(log(n)) operations you go from 32 to 5. A factor of 6 in runtime can be sufficient motivation to use something a little more complex.

If you search primes within 32-bit numbers, you have only 2G numbers to test (after eliminating even numbers which are not primes). So, you only need 2G-bit lookup table - 256MBytes of memory. Today, this is not much by any means. 4GByte lookup table will give you all primes in 40-bit numbers. And this is still quite modest amount of RAM by today's standards. One lookup and you're done. That is how you get things done when you have lots of resources, not by applying older algorithms which were designed to circumvent resource scarcity.
Yeah, my bad. I could have typed a bit more to be more clear. With "a tree with ~ 232 entries" I meant a tree with exactly 232 -1 unique entries that are prime numbers. So a binary tree with N=4294967295 nodes, each node being a prime number, and each number being unique. When viewed as a sorted list, the numbers are not required to be consecutive primes, nor is the smallest number required to be 2. Whatever set is convenient for the problem at hand.

Anyways, the main point being: "4294967295 unique prime numbers", as opposed to "all prime numbers below 4294967296".

So for example a binary tree with 4294967295 nodes, the smallest entry being 2, and the largest entry being 104484802043. With 104484802043 being the 4294967295th prime. Also, log2(104484802043) is about 36.6, so a 37-bit number.

And while we're at it ... a 4 GByte LUT as described (eliminate even numbers, keep track of the rest) unfortunately only gets you up to 36-bit numbers, not 40-bit numbers.

That said, I agree that you should use the resources available, and not use old shit that is no longer relevant. Unfortunately infinity is fucking big, even if it is countable. So even if I limit myself to boring 48-bit primes, that still takes more than a few seconds to sieve. All 32-bit numbers? Sure, about half a second or so. But half a second or so multiplied by 216 does take longer than one cup of coffee. And 48-bit is the totally arbitrary constraint that more or less translates to "big enough that measured properties are statistically relevant". And it is also big enough that I need to be more careful in my C++ coding, because while writing to the histogram bins I manage to fuck up some memory access. Segfault, weeeey! Probably in the OpenMP section where I recombine the results of all threads.

And also, the 232 was an arbitrary example because of nice round numbers. As in log2(N) = 32, and log2(log2(N)) = 5. I could just as easily have picked 264 as example, the nice round numbers then being 64 and 6.

In some cases, there are also ways to circumvent the need and translate it to something else. For instance, here, you may actually not NEED to test any integer number being prime or not. It's sometimes possible to transform your approach by generating prime numbers instead of having to test them, something that's much faster than testing for primality. You sometimes need to think outside the box.
Absolutely. Transforming a problem into another form is my favorite way of solving problems. It's also the main reason (or so I tell myself ;D) why I try to solve toy problems and little puzzles. Not so much for the actual solution, but more as a good way to learn different ways of solving various problems. For example Project Euler (https://projecteuler.net/) has a nice collection. And after solving a particular problem you can take a look on the forum to see how other people did it. Some of it is "Yeah yeah, I did that too", or "Hah, my way is better!", but there is also plenty of "Doh! Why didn't I think of that?".
Title: Re: Neat algorithms
Post by: SiliconWizard on September 25, 2019, 03:13:03 pm
For example Project Euler (https://projecteuler.net/) has a nice collection. And after solving a particular problem you can take a look on the forum to see how other people did it. Some of it is "Yeah yeah, I did that too", or "Hah, my way is better!", but there is also plenty of "Doh! Why didn't I think of that?".

I just registered for some fun. Ran into some issue with the first problem though: https://projecteuler.net/problem=1
I double and triple-checked my answer, and the site still thinks it's erroneous. Could you check? Maybe I'm just tired. ::)
Title: Re: Neat algorithms
Post by: NorthGuy on September 25, 2019, 04:47:18 pm
Anyways, the main point being: "4294967295 unique prime numbers", as opposed to "all prime numbers below 4294967296".

It is generally irrelevant whether the numbers are consecutive primes or any other numbers selected by any criteria. What matters is the density. If density is high enough, you can build a lookup table (which has '1' if a number is in the set or '0' otherwise), which may take more space than the tree, but will eliminate the search completely - you just look it up. But since we now have more memory than ever before we can allow yourself a luxury of consuming them and make your search really fast.

32-bit primes (or 37-bit, or even 40-bit for that matter) are a bad example because the density is high enough and lookup table takes less space than the tree. So, you should prefer the lookup table anyway. However, if the density is low, you may use the lookup table anyway, even if it takes more memory than the tree - just because you can summon enough resources. You couldn't do this before, but you can now - hence the benefit of more resources.

Unfortunately, in real world, most of computation resources are simply wasted providing nearly no benefits.

And while we're at it ... a 4 GByte LUT as described (eliminate even numbers, keep track of the rest) unfortunately only gets you up to 36-bit numbers, not 40-bit numbers.

My mistake. 40-bit primes would need 64 GByte table. Would be crazy 20 years ago. Very feasible now.

That said, I agree that you should use the resources available, and not use old shit that is no longer relevant. Unfortunately infinity is fucking big, even if it is countable.

Very true. If you try to count it, even you grand-children won't be able to finish. But most of tasks are finite. I haven't felt constrained by resources on PC for a long time. If you never hit the limit, it's as good as infinite.
Title: Re: Neat algorithms
Post by: mrflibble on September 25, 2019, 04:55:39 pm
I just registered for some fun. Ran into some issue with the first problem though: https://projecteuler.net/problem=1
I double and triple-checked my answer, and the site still thinks it's erroneous. Could you check? Maybe I'm just tired. ::)
Just solved it, so seems to check out fine for me. Mainly two sanity checks:
- if you plug 10 into your code, does it give the known correct answer of 23?
- if you plug in 20, does it then return 78, or something else?
Those two are simple enough that you can check 'm by hand, and will catch the typical off-by-1 bug, and the other candidate in this case being double counting.

Anyways, seems to works fine. Maybe just add some coffee. ;D
Title: Re: Neat algorithms
Post by: SiliconWizard on September 25, 2019, 05:57:27 pm
Anyways, seems to works fine. Maybe just add some coffee. ;D

Damn. Turns out I needed some coffee. Nevermind. ;D
Title: Re: Neat algorithms
Post by: T3sl4co1l on September 25, 2019, 07:57:47 pm
Hmm, this should be the solution, no?  In handy JS format, so you can pop open F12 and copy-paste it.

Code: [Select]

[SPOILERS]

.
.
.


.
.
.


.
.
.


function summultiples(below) {

function triangular(n) { return (n*n + n) / 2; };

below--;
var threes = triangular(Math.floor(below / 3)) * 3;
var fives = triangular(Math.floor(below / 5)) * 5;
var fifteens = triangular(Math.floor(below / 15)) * 15;
return threes + fives - fifteens;
}

Tim
Title: Re: Neat algorithms
Post by: SiliconWizard on September 25, 2019, 08:58:36 pm
Yes, that's basically how I handled it. Just use the well known sum of the series 1+...+n and factor things. I was using Calc (the "C-style arbitrary precision calculator") and introduced a stupid typo I couldn't immediately spot. That happens... ;D
Title: Re: Neat algorithms
Post by: westfw on September 26, 2019, 01:08:30 am
For an "embedded systems" interview , consider the "obvious" C implementation:
Code: [Select]
long summultiples(long below)
{
    long i, sum=0;
    for (i=3; i < below; i+=3)
    sum += i;
    for (i=5; i < below; i+=5)
    sum += i;
    for (i=15; i < below; i+=15)
    sum -= i;
    return sum;
}
Title: Re: Neat algorithms
Post by: T3sl4co1l on September 26, 2019, 02:50:06 am
Well the divisions don't matter, they are by constants so can be transformed to clever multiplications by a sufficiently clever compiler. :)

Offhand, for the AVR instruction set I'm familiar with (8 bit, 8x8 mul), you'd expect... well heck, with longs all over, that's a 4-byte access for basically every statement, that's a big stinker right there.  Not a great example.  Well, er, if we stick with the minimum required for the problem as given (1000 requires just a little over 8 bits, so we should use uint16_t's), we need about 9 cycles per loop, plus some overhead per loop, and times three loops.  For argument's sake, let's say cycles = 3 * (9n + 4) + 17.  In comparison, the given method would incur about 40 cycles for the multiply (maybe a bit pessimistic, including call overhead?), so it doesn't take much N to be worth being clever (~9?).

And then for ARM, M0, it'll probably be about a cycle per instruction, and just a handful of cycles (3-5?) per loop, versus a 32 or 64 bit long-hand multiply -- which if it's one bit per cycle algorithm, will take quite a few cycles to complete, but I think a binary sequence is possible thanks to the one-cycle barrel shifter? are we assuming that's always implemented as well? -- so we'd be looking at >200 cycles there; but a lot fewer if we limit it to the 16-bit requirement (or for that matter, we can stop the long-multiply algorithm at the known required 10 bit level; or 9 bits even?).

If we inspect the overall expression (I didn't before, out of laziness), we can collect the N^2's together with a coefficient of (1/a + 1/b - 1/ab)/2, and the N's sum to just N/2.  a and b are known at compile time so the coefficient is as well, and the N*N*coefficient can be expressed as a triple product using at most... six? multiplies.  So the AVR (16 bit precision) is probably closer to 120 cycles (flat), in this case.  Which isn't that far off from the no-multiply ARM actually, interestingly enough (granted, the ARM likely can run faster; well, huh, if it's so basic it doesn't have mul, it might not even capable of 60MHz?).


I was tempted to write a variadic function, or, well I wouldn't do that honestly, but just supply an array of factors to test, but same idea since, variable length arrays in JS.  Anyway, that would require resolving the correct combination of factors (easy for two prime factors, as seen above; but what about others?), and also accounting for common factors (coprimality).  So there may end up being some factorization required, which would be dumb.  Maybe that's not necessary, I'd have to un-lazily think about it a little bit.  If coprimality doesn't matter, then it should suffice to subtract all composite terms, and there you have it; easily looped over.  The result would be O(number of factors) I think?

Tim
Title: Re: Neat algorithms
Post by: westfw on September 26, 2019, 07:50:39 am
Quote
a 4-byte access for basically every statement, that's a big stinker right there.
Well, yeah.  But summultiples(1000) doesn't fit in 16bits...

Quote
divisions don't matter, they are by constants so can be transformed to clever multiplications by a sufficiently clever compiler.
That'd be nice.  It doesn't seem to happen with either the ARM or AVR gcc compilers, though.  At least, not for 3, 5, and 15.

Here are the size results for AVR, CM4, CM0.   They're "somewhat interesting."

WWMac<6115> # AVR without Multiply instruction
WWMac<6116> avr-gcc -mmcu=attiny84 -Os -nostartfiles prob1.c -DFASTALG=0
WWMac<6117> size a.out
   text    data     bss     dec     hex filename
    242       0       4     246      f6 a.out
WWMac<6118> avr-gcc -mmcu=attiny84 -Os -nostartfiles prob1.c -DFASTALG=1
WWMac<6119> size a.out
   text    data     bss     dec     hex filename
    618       0       4     622     26e a.out

(OK, that's pretty dramatic.)


WWMac<6120> # AVR with multiply instructon
WWMac<6121> avr-gcc -mmcu=atmega328 -Os -nostartfiles prob1.c -DFASTALG=0
WWMac<6122> size a.out
   text    data     bss     dec     hex filename
    244       0       4     248      f8 a.out
WWMac<6123> avr-gcc -mmcu=atmega328 -Os -nostartfiles prob1.c -DFASTALG=1
WWMac<6124> size a.out
   text    data     bss     dec     hex filename
    650       0       4     654     28e a.out

(Interesting - I wasn't expecting it to be bigger.  Much faster, probably, using 8bit multiplies to do 32bit multiplication, but not as tight a loop.)

WWMac<6125> # ARM CM4
WWMac<6126> /usr/local/gcc-arm7-2018-q2u2/bin/arm-none-eabi-gcc -mcpu=cortex-m4 -mthumb -g -Os -nostartfiles prob1.c -DFASTALG=0
WWMac<6127> size a.out
   text    data     bss     dec     hex filename
     76       0       4      80      50 a.out
WWMac<6128> /usr/local/gcc-arm7-2018-q2u2/bin/arm-none-eabi-gcc -mcpu=cortex-m4 -mthumb -g -Os -nostartfiles prob1.c -DFASTALG=1
WWMac<6129> size a.out
   text    data     bss     dec     hex filename
    108       0       4     112      70 a.out

(This is more like it!)


WWMac<6130> # ARM CM0plus
WWMac<6131> /usr/local/gcc-arm7-2018-q2u2/bin/arm-none-eabi-gcc -mcpu=cortex-m0plus -mthumb -g -Os -nostartfiles prob1.c -DFASTALG=0
WWMac<6132> size a.out
   text    data     bss     dec     hex filename
     76       0       4      80      50 a.out
WWMac<6133> /usr/local/gcc-arm7-2018-q2u2/bin/arm-none-eabi-gcc -mcpu=cortex-m0plus -mthumb -g -Os -nostartfiles prob1.c -DFASTALG=1
WWMac<6134> size a.out
   text    data     bss     dec     hex filename
    600       0       4     604     25c a.out

(holy crap.  I knew there was some "not optimized" code in the CM0, but I thought it was mostly off in the floating point code.  But here, the 32bit divide function is really 460 bytes long!)
Title: Re: Neat algorithms
Post by: westfw on September 26, 2019, 09:16:52 am
Quote
divisions  can be transformed to clever multiplications
That'd be nice.  It doesn't seem to happen with either the ARM or AVR gcc compilers, though.
Ah.  the ARM gcc DOES do this on the m4 (where it barely matters.)But not on the M0plus.  Presumably it's upset that there is no 32*32->64bit multiply.  Hmmph.


The division routine is well unrolled, presumably for performance (but I wouldn't want to predict how it interacts with typical m0/m0+ flash memory systems.)  I'm sure it's all swell if you're running on one of the chips with 128k+ of flash, but perhaps not so much on those 4k chips "they" are trying to push on the 8bit users.  :-(gcclib does have separate code for __OPTIMIZE_SIZE__, but I don't know if anyone is distributing binaries that use that.(huh.  Arduino does.  Using the Arduino ARM compiler distribution, size for the M0 shrinks to 324 bytes total, with a 172byte division function.)
Title: Re: Neat algorithms
Post by: legacy on September 26, 2019, 10:02:18 am
(http://www.downthebunker.com/chunk_of/stuff/public/projects/akita/pic/akita_2_mini.jpg)

Interview in Denmark


Moral of the story, a piece of Erlang code can save your day better than 200 or 2K code of C can do.
And even give you a free-ticket to Greenland :D
Title: Re: Neat algorithms
Post by: SiliconWizard on September 26, 2019, 02:10:08 pm
(Just a note: this problem: https://projecteuler.net/problem=70 is typical of what I said earlier about primes and thinking outside the box. A naive approach will take forever to complete. I managed to implement that so that it takes about 30 ms to complete (and give a correct answer ;D ) on my Core i7/5930K.

Not a very hard one, but if you haven't done it yet, you'll probably spend a little while...
Title: Re: Neat algorithms
Post by: SiliconWizard on September 26, 2019, 02:17:26 pm
(...)
  • me - something like (... I pulled out my PDA from my pocket, and wrote the code to show him the result ... ) qsort([]) -> []; qsort([Pivot|Rest]) -> qsort([ X || X <- Rest, X < Pivot]) ++ [Pivot] ++ qsort([ Y || Y <- Rest, Y >= Pivot])
(...)

Ahah, nice. Too bad Erlang sometimes almost looks like the Brainfuck language: https://en.wikipedia.org/wiki/Brainfuck
 :-DD (just slightly exxagerating)

Oh, and while we're having a laugh, let's laugh (now some people may frown): here is the Whitespace language: https://en.wikipedia.org/wiki/Whitespace_%28programming_language%29
which can be loosely described as some variant of Brainfuck for Python users. :-DD
Title: Re: Neat algorithms
Post by: taktoa on December 01, 2019, 09:27:39 pm
One bleeding-edge algorithm quite relevant to electronics engineering that I find is not very well known is "compressive sensing". Every ADC depends on the Shannon-Nyquist theorem, which shows that if you sample at double the max frequency of a bandlimited signal, you can perfectly reconstruct it. What is not well known is that that is a sufficient condition for perfect reconstruction; if you make some assumptions about the signals you are likely to receive, you can do much better (i.e.: reconstruct a signal with fewer samples). In compressive sensing, the assumption is that there is some linear transformation you can apply to the signal that makes it sparse (have a lot of zeros); in other words, the signal must be compressible in some sense (noise is typically not compressible, but you also typically don't care if the noise in the signal you are trying to measure is replaced with different noise that has a similar distribution).

The typical setup for compressive sensing is that you have some matrix/linear transform F that when applied to a signal gives you a signal that has small values. For example, the discrete cosine transform has this property when applied to photographs (this is how JPEG works). Then you cook up a matrix M that, when multiplied by F on the right satisfies this technical thing called the restricted isometry property (RIP; this technicality doesn't actually matter much, for reasons I'll get into later).  Then you take each row of the matrix and you find some physical way to compute the dot product of this row with the vector representing your signal (for example, if you want to compute the dot product of an optical image with an arbitrary vector, you could bounce it off of a digital micromirror device and then focus it into a lens that is shining onto a single pixel). This gives you a new vector y, which has one element for each row of your matrix. Then you just have to solve the underconstrained (because the number of rows of MF / number of measurements is intentionally small) linear system MFx = y. Although it is underconstrained, we know something special about x: it has a lot of zeros! So we try to find the solution that maximizes the number of zeros, which is sometimes called the L0 "norm" (I use quotation marks because it is not actually a norm). It turns out that this is really difficult -- NP-complete, but the RIP we posited early saves us, since there is a theorem that if MF has the RIP and you maximize the L1 norm instead of L0, you get a solution to the L0 maximization problem. There is also a theorem that almost all matrices satisfy the RIP (i.e.: there are infinite matrices but only finitely many that don't satisfy RIP), so it suffices to simply choose M at random.

Some cool applications of this are single-pixel cameras (using the DMD approach I mentioned earlier), better MRI reconstruction, etc. I tried implementing one of the core algorithms (the "in-crowd" algorithm for basis pursuit denoising) required to do this, but haven't finished.
Title: Re: Neat algorithms
Post by: coppice on December 01, 2019, 09:36:25 pm
One bleeding-edge algorithm quite relevant to electronics engineering that I find is not very well known is "compressive sensing".
Compressive sensing is an interesting concept. It has been discussed on this forum a few times.
Title: Re: Neat algorithms
Post by: Yansi on December 01, 2019, 09:41:36 pm
Does anyone have any recommendation for a rather fast  single precision float logarithm for ARM? (M4, M7 cores with FPU)?

I don't care whether log natural or log2. I take the faster one!  (Need for some a real time audio dynamic processing).  Guessing log2 shall be easiest of them due to the float nature of storing the mantissa and exponent separately, taking the log2 just of the mantissa and add the exponent.
Title: Re: Neat algorithms
Post by: taktoa on December 01, 2019, 09:58:13 pm
Another neat algorithm is one for regular expression matching implemented by a tool called icgrep. Essentially, if you have a regular expression R, you look at all the primitive matching expressions (e.g.: character classes and characters), and you create a bitstream with 1 bit for every character in the text you are trying to search through, which is 0 if the character class matches and 1 if it doesn't. Then you build up more complicated regexes by taking bitwise OR (for union) or bitwise AND (for intersection) or bitwise NOT (for complement) to produce new bitstreams. The question then remains: how do we handle the * operator? It turns out that we can express it in terms of + in a clever way -- it exploits the carrying nature of +, though I don't remember the exact details on how it works.

EDIT: here's the paper: https://dl.acm.org/citation.cfm?id=2628079
Title: Re: Neat algorithms
Post by: T3sl4co1l on December 02, 2019, 05:49:34 am
Does anyone have any recommendation for a rather fast  single precision float logarithm for ARM? (M4, M7 cores with FPU)?

I don't care whether log natural or log2. I take the faster one!  (Need for some a real time audio dynamic processing).  Guessing log2 shall be easiest of them due to the float nature of storing the mantissa and exponent separately, taking the log2 just of the mantissa and add the exponent.

With FPU?  Just use the compiler as is? ???  I don't know that you'll do much faster than that...

For integer machines with multiply, repeated squaring and shifting gets subsequent bits of the result (the MSB shifted out is the next bit of the result); similarly, for machines without multiply, there's the Feynman algorithm (which I think was listed earlier in this thread?).  Expect to take ballpark a hundred instruction cycles to compute results of typical width on typical machines.

Tim
Title: Re: Neat algorithms
Post by: Berni on December 02, 2019, 07:48:19 am
Does anyone have any recommendation for a rather fast  single precision float logarithm for ARM? (M4, M7 cores with FPU)?

I don't care whether log natural or log2. I take the faster one!  (Need for some a real time audio dynamic processing).  Guessing log2 shall be easiest of them due to the float nature of storing the mantissa and exponent separately, taking the log2 just of the mantissa and add the exponent.

Yep just use logf(x) or log2f(x) from math.h

The C compilers built in libraries tend to have reasonable performance. But if you want things to be really fast and don't mind some precision loss you can use a lookup table with interpolation.

For fast optimized ways of doing more complex math there is the official ARM CMSIS-DSP library that contains inline hand optimized assembler code. It supports all sorts of math with vectors, complex numbers, matrices, FFT... etc in both float and fixed point.
Title: Re: Neat algorithms
Post by: Yansi on December 02, 2019, 11:44:06 am
You guys are wrong.

Math C library is not optimized for speed, but for best accuracy. I do not need the accuracy to the last bit or digit, hence I can make some approximations to make things significantly faster.

For example, by using fastlog2 from this library, I can get pretty good accuracy of (from a brief test) better than 0.1% for less than half the clock cycles. Task for me is I just need to figure out how much accuracy do I actually need).  As it is for an audio application, you are not probably gonna hear the difference of 0.1dB, let alone 0.01dB.

CMSIS DSP is not always the best suited library for all, neither does it provide any logarithm or exp (power) functions - at least I can not see one.

The library log2f  runs about 140 cycles on Cortex M4 (ARMCC, -O0).  I have a lot of processing power left for it currently to run even with these monstrosities, but I don't like it much. :)

EDIT: Forgot the link to the library! https://github.com/romeric/fastapprox
Title: Re: Neat algorithms
Post by: nigelwright7557 on December 05, 2019, 03:03:19 pm
If you are creating new algorithms and especially complicated ones always flowchart it first and get that right before coding.
One of the most complicated bits of code I wrote was a circuit board auto-router.
Even after careful flow charting it still wasnt quite right.
You can get caught out by things like getting stuck in infinite loops going around and round.
Testing is absolutely vital and you need to throw lots of different data at an algorithm to try and catch it out.


Title: Re: Neat algorithms
Post by: SiliconWizard on December 05, 2019, 04:43:42 pm
As we discussed earlier, sometimes the best approach to implement a costly operation is to do without it altogether. You can often rewrite your computations so that they'll use simpler/faster operations.

For instance, it's very common to avoid using the square root function. Likewise, you can often transform your formulas so you can avoid using log.

If this approach fails for any reason, and you absolutely need a log function, there are ways to implement it efficiently (at the expense of accuracy). There are polynomial approximations, for instance, which I think is what the above posted library does (along with a trick using IEEE FP format). It should be pretty fast already. If it's still not fast enough, and you can deal with even less accuracy, there's a very simple way of getting a log2(x). Scale x so you make it an integer with the max resolution you can, and then you can compute the integer log2(n) ultra fast using the ARM clz instruction for instance. If you're using GCC and don't want to mess with assembly directly, you can use one of the __builtin_clz() built-in functions.

Example of fast integer log2 in C (GCC only though!):

Code: [Select]
static inline uint8_t Log2Int(uint32_t n)
{
return (uint8_t)(8*sizeof(n) - 1 - ((uint8_t) __builtin_clzl(n)));
}

Then you can scale it back (scaling x with a multiplication such that n = [a.x] => log2(x) ~ Log2Int(n) - log2(a)
Of course the resolution is limited, but it could be enough.
If you need more accuracy, you can further compute the fractional part of the log2 up to a certain number of bits. I wrote the following function for instance:

Code: [Select]
// Binary log2(n) with nFrac fractional bits.
// Assumes: n >= 1
//
float BinaryLog2b(uint32_t n, uint8_t nFrac)
{
const uint32_t n1 = 1;
uint8_t nLog2, i;
float x, y, b;

nLog2 = Log2Int(n);
x = ((float) n) / ((float)(n1 << nLog2));
y = (float) nLog2;

for (i = 0, b = 0.5f; i < nFrac; i++, b *= 0.5f)
{
x = x * x;
if (x >= 2.0f)
{
y += b;
x *= 0.5f;
}
}

return y;
}

Tested on x86-64: with nFrac = 8, for instance, it's still 3 times faster than the std math lib log2() function, and with a decent accuracy. But it takes an integer input. Could typically be used with fixed-point, or after scaling a FP number as I said.


Title: Re: Neat algorithms
Post by: SiliconWizard on December 05, 2019, 04:57:50 pm
Testing is absolutely vital and you need to throw lots of different data at an algorithm to try and catch it out.

I completely second this. Too many people IMO, particularly from those that are using unit testing (which is already a good thing), are overlooking this, because of the "safety net" feeling.

Feeding your algorithms with A LOT of data, especially data from sources that are not YOU, makes all the difference. IME, the amount of bugs (and their "trickiness") you can catch this way is an order of magnitude higher than with unit testing alone (which doesn't mean you shouldn't do unit testing...)

Title: Re: Neat algorithms
Post by: NorthGuy on December 05, 2019, 07:28:09 pm
Testing is absolutely vital and you need to throw lots of different data at an algorithm to try and catch it out.

One of the best testing method is trying to deliberately create a situation which would break your algorithm.
Title: Re: Neat algorithms
Post by: mrflibble on February 27, 2020, 12:18:31 am
Recently came across a couple of linear programming formulations for graph theoretical problems. Which was pretty handy, because it was exactly what I was googling for at that time. Image that. ;D

The problem in particular was how to constrain a graph such that it will not contain any loops. Turns out you can do that by making an LP for the maximum average degree of the graph in question. See chapter 11 in this here pdf:
https://storage.googleapis.com/google-code-archive-downloads/v2/code.google.com/graph-theory-algorithms-book/latest-r1908.pdf

Related links:

For those interested, be sure to check the links for both the old and the new google code archive. The old & new both contains unique and IMO useful stuff that is not included in the other. Case in point, the new one does not contain the Linear Programming chapter. And best grab the bitbucket repo while you can. No way of knowing if (read: when) bitbucket is going further down the drain.