Author Topic: Neat algorithms  (Read 28565 times)

0 Members and 1 Guest are viewing this topic.

Offline BravoV

  • Super Contributor
  • ***
  • Posts: 7547
  • Country: 00
  • +++ ATH1
Re: Neat algorithms
« Reply #125 on: August 25, 2019, 07:04:21 am »
If you have a nice recursive algorithm but fear of overrunning stack, you can easily keep track of your recursion depth with an additional parameter you decrement each call and compare to a (hopefully reasonably choosen) maximum recursion depth (exiting with an error when the threshold has been reached).

This obviously uses extra stack space (and a little performance), so its up to you to to decide if you want to keep it in production code or remove it and call your testing (and additional safety marging due to its removal) good enough to be safe.

Not an expert in programming, nor experienced.

I've been wondering about the "control" on stack usage in recursion programming, seeing people raised lots of concerns.

My simple question is, if running a recursive program, without any control or aware of the stack limitation, isn't that like using a cpu by designed, that can't handle say like div by zero (eg. NMI or etc) from the 1st place ?

Edit : Simpler analogy, driving a tank with eyes closed, that you can't really be sure that you're crushing your enemy (expected & wanted) or your friends.  :-DD

Cmiiw.
« Last Edit: August 25, 2019, 07:19:10 am by BravoV »
 

Online RoGeorge

  • Super Contributor
  • ***
  • Posts: 6202
  • Country: ro
Re: Neat algorithms
« Reply #126 on: August 25, 2019, 07:13:12 am »
There is another topic about following or not the programming guidelines.

Quoting from the attached PDF:  https://www.eevblog.com/forum/chat/dogmatic-coding-standards/msg2636658/#msg2636658
Quote
Avoiding recursion results in having an acyclic function call graph, which code analyzers can exploit to prove limits on stack use and boundedness of exe-cutions.

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4037
  • Country: nz
Re: Neat algorithms
« Reply #127 on: August 25, 2019, 07:24:17 am »
There is another topic about following or not the programming guidelines.

Quoting from the attached PDF:  https://www.eevblog.com/forum/chat/dogmatic-coding-standards/msg2636658/#msg2636658
Quote
Avoiding recursion results in having an acyclic function call graph, which code analyzers can exploit to prove limits on stack use and boundedness of exe-cutions.

All that does is moves the problem from proving that you won't recurse too deep to proving that you won't run out of heap space or that the array you statically allocated for your hand-managed stack is large enough (and you didn't write buggy code doing it).
 

Offline mfro

  • Regular Contributor
  • *
  • Posts: 210
  • Country: de
Re: Neat algorithms
« Reply #128 on: August 25, 2019, 07:32:26 am »
... I've been wondering about the "control" on stack usage in recursion programming, seeing people raised lots of concerns.

My simple question is, if running a recursive program, without any control or aware of the stack limitation, isn't that like using a cpu by designed, that can't handle say like div by zero (eg. NMI or etc) from the 1st place ?

I think that is more a question of how to avoid/detect stack overflow in any program, not just those that use recursive algorithms.

Any program where the call stack depth is dependend on run time values is potentially prone to stack overflow, i.e. refraining from recursion is no guarantee to avoid it (it's just that recursion will make it more likely as it  increases stack load).

There are methods to detect a stack overflow; ranging from the poor man's stack guard where you vaccinate bottom of stack with a 'magic value' you regularily check for overwrites to unmapped pages below the stack on architectures that have a MMU.

Small µCs usually do not have the luxury of hardware supported faults on stack overflow, leading to the usual 'common sense' approach to ban everything that might cause it from µC code (MISRA-C, for example, explicitly prohibits recursion).
Beethoven wrote his first symphony in C.
 

Online RoGeorge

  • Super Contributor
  • ***
  • Posts: 6202
  • Country: ro
Re: Neat algorithms
« Reply #129 on: August 25, 2019, 07:56:07 am »
My intention was to move the talk about guidelines to its own topic, but I failed, sorry.  ;D

Offline T3sl4co1l

  • Super Contributor
  • ***
  • Posts: 21686
  • Country: us
  • Expert, Analog Electronics, PCB Layout, EMC
    • Seven Transistor Labs
Re: Neat algorithms
« Reply #130 on: August 25, 2019, 09:46:39 am »
If you have a nice recursive algorithm but fear of overrunning stack, you can easily keep track of your recursion depth with an additional parameter you decrement each call and compare to a (hopefully reasonably choosen) maximum recursion depth (exiting with an error when the threshold has been reached).

This obviously uses extra stack space (and a little performance), so its up to you to to decide if you want to keep it in production code or remove it and call your testing (and additional safety marging due to its removal) good enough to be safe.

Or use a static variable and increment it on every call and decrement it on every return. :)

(Disclaimer: may break tail recursion optimization?)

Tim
Seven Transistor Labs, LLC
Electronic design, from concept to prototype.
Bringing a project to life?  Send me a message!
 

Offline coppice

  • Super Contributor
  • ***
  • Posts: 8646
  • Country: gb
Re: Neat algorithms
« Reply #131 on: August 25, 2019, 11:48:35 am »
If you have a nice recursive algorithm but fear of overrunning stack, you can easily keep track of your recursion depth with an additional parameter you decrement each call and compare to a (hopefully reasonably choosen) maximum recursion depth (exiting with an error when the threshold has been reached).

This obviously uses extra stack space (and a little performance), so its up to you to to decide if you want to keep it in production code or remove it and call your testing (and additional safety marging due to its removal) good enough to be safe.

Not an expert in programming, nor experienced.

I've been wondering about the "control" on stack usage in recursion programming, seeing people raised lots of concerns.

My simple question is, if running a recursive program, without any control or aware of the stack limitation, isn't that like using a cpu by designed, that can't handle say like div by zero (eg. NMI or etc) from the 1st place ?

Edit : Simpler analogy, driving a tank with eyes closed, that you can't really be sure that you're crushing your enemy (expected & wanted) or your friends.  :-DD

Cmiiw.
The whole issue of blowing the stack is a big deal for any program. Recursive routines are just a notoriously troublesome example.

In most cases it is very difficult to work out the worst case biggest possible stack requirement, and make sure you have allowed enough stack space for that. On bigger machines most people do little more than allow lots of stack space and hope for the best. On small embedded machines you have to get your stack allocation right, as there is little RAM to waste. You might get some help from static analysis tools, but those tools are not generally foolproof. For example, they don't usually have the ability to work out the worst case program call depth + the worst case mix of nested interrupts.
 

Offline NorthGuy

  • Super Contributor
  • ***
  • Posts: 3146
  • Country: ca
Re: Neat algorithms
« Reply #132 on: August 25, 2019, 02:50:00 pm »
In most cases it is very difficult to work out the worst case biggest possible stack requirement, and make sure you have allowed enough stack space for that.

I don't think so. At any rate, it's much harder to predict heap behaviour, which, unlike stack, can be fragmented, possibly after days or months of running.
 

Offline SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14476
  • Country: fr
Re: Neat algorithms
« Reply #133 on: August 25, 2019, 02:57:45 pm »
In most cases it is very difficult to work out the worst case biggest possible stack requirement, and make sure you have allowed enough stack space for that.

I don't think so. At any rate, it's much harder to predict heap behaviour, which, unlike stack, can be fragmented, possibly after days or months of running.

Agree with that.
 

Offline coppice

  • Super Contributor
  • ***
  • Posts: 8646
  • Country: gb
Re: Neat algorithms
« Reply #134 on: August 25, 2019, 04:08:10 pm »
In most cases it is very difficult to work out the worst case biggest possible stack requirement, and make sure you have allowed enough stack space for that.

I don't think so. At any rate, it's much harder to predict heap behaviour, which, unlike stack, can be fragmented, possibly after days or months of running.
Whilst its true that heap analysis is harder, most small embedded designs completely avoid using a heap. They can't help using the stack, and they can't afford to waste more than a few bytes. When your worst case stack use is an usually deep point in some calls, and all the interrupts having nested, hitting worst case can be a very rare event. It has to be analysed, but most development tool chains doesn't really take that need seriously, and try to help.
 

Online RoGeorge

  • Super Contributor
  • ***
  • Posts: 6202
  • Country: ro
Re: Neat algorithms
« Reply #135 on: August 26, 2019, 09:57:13 am »
Feynman's algorithm for logarithms

Quote from: doi:10.1063/1.881196
Feynman worked out in some detail the program for
computing Hopfield's network on the Connection Ma-
chine.  The part that he was proudest of was the
subroutine for computing a logarithm.  I mention it here
not only because it is a clever algorithm, but also
because it is a specific contribution Richard made to the
mainstream of computer science.  He had invented it at
Los Alamos.

Consider the problem of finding the logarithm of a
fractional number between 1 and 2.  (The algorithm can be
generalized without too much difficulty.)  Feynman ob-
served that any such number can be uniquely represented
as a product of numbers of the form 1 + 2 -k, where k is an
integer.  Testing for the presence of each of these factors in
a binary representation is simply a matter of a shift and a
subtraction.  Once the factors are determined, the loga-
rithm can be computed by adding together the precomput-
ed logarithms of the factors.  The algorithm fit the
Connection Machine especially well because the small
table of the logarithms of 1 + 2 -k could be shared by all
the processors.  The entire computation took less time
than doing a division.

Online Nominal Animal

  • Super Contributor
  • ***
  • Posts: 6264
  • Country: fi
    • My home page and email address
Re: Neat algorithms
« Reply #136 on: August 26, 2019, 12:08:01 pm »
Quarter square multiplication, one of the shift-and-add algorithms:  Let's say you know Sn = floor(n2/4), i.e. Sn is a quarter of n squared, rounded towards zero.  If x ≥ y ≥ 0, then x·y = Sx+y - Sx-y.

I wonder if this could be used to implement a fast multiplication on an FPGA?  I dunno.
 

Offline SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14476
  • Country: fr
Re: Neat algorithms
« Reply #137 on: August 26, 2019, 01:54:03 pm »
Quarter square multiplication, one of the shift-and-add algorithms:  Let's say you know Sn = floor(n2/4), i.e. Sn is a quarter of n squared, rounded towards zero.  If xy ≥ 0, then x·y = Sx+y - Sx-y.

I wonder if this could be used to implement a fast multiplication on an FPGA?  I dunno.

For small integers, probably. Now for larger ones, I think the size of the required look-up table would quickly get impractical. Ideas?
 

Online Nominal Animal

  • Super Contributor
  • ***
  • Posts: 6264
  • Country: fi
    • My home page and email address
Re: Neat algorithms
« Reply #138 on: August 26, 2019, 04:09:21 pm »
for larger ones, I think the size of the required look-up table would quickly get impractical. Ideas?
I experimented with some approaches, but for larger ones, a binary shift-and-add ("barrel multiplier"?) wins. (It is difficult to keep N×N-bit product with 2N-bit result under 2N ops.)  So, this only makes sense if you have Si precalculated -- actually, Si = floor(abs(i)2/4) for i = 1-2N .. 2N+1-1, so that for 0≤a,b≤2N-1, a×b becomes Sa+b - Sa-b.  That makes the lookup table size just under 6 N 2N bits, turning the multiplication into two lookups, two subtractions, and one addition.  I do not know how expensive such ROM tables are in general.

For example, if one has only 8x8 to 16-bit multiplication, the naïve/direct method is to compute 32×32 to 64-bit product using (a0 + 28a1 + 216a2 + 224a3)(b0 + 28b1 + 216b2 + 224b3) = 20 a0 b0 + 28 a0 b1 + 216 a0 b2 + 224 a0 b3 + 28 a1 b0 + 216 a1 b1 + 224 a1 b2 + 232 a1 b3 + 216 a2 b0 + 224 a2 b1 + 232 a2 b2 + 240 a2 b3 + 224 a3 b0 + 232 a3 b1 + 240 a3 b2 + 248 a3 b3.
That is, with K words to construct each N-bit unsigned integer value, you have K2 terms.  However, a barrel multiplier can do it in O(KN), using just one-bit shifts (through carry), and additions; in hardware, several of them can be done in parallel.  (For K=4, a 32×32=64-bit multiplication can be done in seven 64-bit additions, where each summand is constructed of at most three multiplications affecting non-overlapping bits.)

Anyway, knowing such mathematical tricks exist, may be useful for some odd purposes.  Neat, in my opinion.
 

Offline coppice

  • Super Contributor
  • ***
  • Posts: 8646
  • Country: gb
Re: Neat algorithms
« Reply #139 on: August 26, 2019, 07:24:51 pm »
Quarter square multiplication, one of the shift-and-add algorithms:  Let's say you know Sn = floor(n2/4), i.e. Sn is a quarter of n squared, rounded towards zero.  If xy ≥ 0, then x·y = Sx+y - Sx-y.

I wonder if this could be used to implement a fast multiplication on an FPGA?  I dunno.
The critical importance of fast, silicon efficient, multiplication for DSP applications means a huge number of avenues to faster multiplication have been tried. You need to try something a bit more novel than a well established mathematical identity to have a good hope of breaking new ground. There are quite a few tricks for finding approximate answers, that people generally only know about if they work in an area where the approximation is effective. There aren't many obscure tricks for generating exact results, because everyone wants those.
 

Offline legacy

  • Super Contributor
  • ***
  • !
  • Posts: 4415
  • Country: ch
Re: Neat algorithms
« Reply #140 on: August 26, 2019, 09:40:34 pm »
how much code complexity does the recursion save you?

it saves a lot of debugging and team-working time, and it saves a lot of complexity in the test-report activity.

 

Offline legacy

  • Super Contributor
  • ***
  • !
  • Posts: 4415
  • Country: ch
Re: Neat algorithms
« Reply #141 on: August 26, 2019, 09:51:57 pm »
In most cases it is very difficult to work out the worst case biggest possible stack requirement, and make sure you have allowed enough stack space for that.

That's what I showed in my first post talking about *how* we do usually measure how much stack the algorithm is going to consume in both normal and abnormal operation. But it's just to make an idea, useful when the code is draft, and it cannot be used for any ufficial test-report.

On bigger machines most people do little more than allow lots of stack space and hope for the best. On small embedded machines you have to get your stack allocation right, as there is little RAM to waste. You might get some help from static analysis tools, but those tools are not generally foolproof. For example, they don't usually have the ability to work out the worst case program call depth + the worst case mix of nested interrupts.

We are on big embedded machines! PPC405, PPC440, PPC460, e500, etc, running RTOSes, but what not-avionic people do find hard seem the understanding of *how* do you test things.

Worst cases happen in abnormal conditions, where the "funniest" surprises greet you with their hands :D
 

Offline legacy

  • Super Contributor
  • ***
  • !
  • Posts: 4415
  • Country: ch
Re: Neat algorithms
« Reply #142 on: August 26, 2019, 10:18:13 pm »
We're talking about B-trees here. With disk-based blocks. What's your block size?

I am on B+tree, the block size is 4Kbyte, and block-pointers (iblock) are only stored on leaves.

I have rewritten the 90% of the library recursion-free, and I can easily show hot-points to guys in the testing-squad, e.g.  btree_key_inserts and  node_insert_after_splitting are two examples of "write-one-read-many" critical functions (they can be used with SMP) that insert a new { key,  iblock } into a leaf so as to exceed the tree's order, causing the leaf to be split in half.

This code is critical, but I have a way to express the exact use of memory that it consumes

Code: [Select]
    btree_key_t      key[DEFAULT_ORDER];
    btree_iblock_t   iblock[DEFAULT_ORDER];

Those two arrays are a private "temporary" sets of key and block-pointer to hold everything in order, including the new key and iblock. This stuff would be on the stack with the recursion approach, and writing a test case would be more complex by several orders of magnitude, including the fact that you cannot test the quality on any running hardware :D
 

Offline Howardlong

  • Super Contributor
  • ***
  • Posts: 5319
  • Country: gb
Re: Neat algorit
« Reply #143 on: August 26, 2019, 10:35:11 pm »
Regarding recursion, my anecdote is about a cheque printing algorithm (to print the words for the numbers) that one of my ex-colleagues presented as his own work. He was always quite full of himself, and to be honest we didn’t always see eye to eye on things: he was the young a blue sky thinker, whereas I’m a cynical old fart pragmatist with decades of battle wounds. His code was a heady mix of quite complex recursive self-referring SQL and XML. I even congratulated him on his ingenuity.

So part of me was really quite impressed with what he’d produced, but part of me was dreading supporting it as it was very write only code. But it did seem to work. One day we had a much larger than normal cheque batch, but not huge, about 30,000 cheques or so, which took down the database server, and with it all the call centre users.

The way the implementation worked under the hood is that the recursion objects were repeatedly allocated in RAM, so when the batch size hit a certain size, the machine was starved of memory, and ISTR it had 768GB RAM.

While I was figuring out how to resolve it, I took his code apart, and I suddenly realised while Googling part of it that my ex-colleague had pinched the whole code verbatim off the internet.

There are two morals. The first is that one should be very careful when re-using someone else’s code for your use case. The second is that some people can come over as smart, but aren’t.
 

Offline westfw

  • Super Contributor
  • ***
  • Posts: 4199
  • Country: us
Re: Neat algorithms
« Reply #144 on: August 27, 2019, 04:58:23 am »
Quote
In most cases it is very difficult to work out the worst case biggest possible stack requirement
Yeah, that's the claim.  But I don't think it's true.In many cases, you're dealing with standard algorithms whose stack depth requirements have been analyzed to death, providing many published papers and textbooks and theses.  That leaves only the stack requirements for each level as "unknown", and that's easy to measure for a particular compiler/cpu.Sure, "not all recursive algorithms" are that well analyzed.  But you're throwing away the ones that are for no good reason.
 

Online Nominal Animal

  • Super Contributor
  • ***
  • Posts: 6264
  • Country: fi
    • My home page and email address
Re: Neat algorithms
« Reply #145 on: August 27, 2019, 05:38:51 am »
Quarter square multiplication, one of the shift-and-add algorithms:  Let's say you know Sn = floor(n2/4), i.e. Sn is a quarter of n squared, rounded towards zero.  If xy ≥ 0, then x·y = Sx+y - Sx-y.

I wonder if this could be used to implement a fast multiplication on an FPGA?  I dunno.
The critical importance of fast, silicon efficient, multiplication for DSP applications means a huge number of avenues to faster multiplication have been tried.
Sure, but the solution space is even more huge.  This is just one of those utterly simple mathematical identities that people overlook, exactly because it is too simple.

I wondered whether it was a viable approach, and whether people were aware of it, considering that 1) FPGAs have LUTs (look-up tables) anyway, 2) here the look-up table contains at most 3×2N entries of 2N bits for N×N=2N-bit multiplication, unlike N2 for direct multiplication look-up, and 3) I haven't seen this mentioned in places that talk about multiplications on FPGAs (like here).

Again, I have zero experience with FPGAs, so I do not know.  I do know that if one assumes that everything interesting has already been found and examined, one will be sorely disappointed in the reality of things.
 

Offline mfro

  • Regular Contributor
  • *
  • Posts: 210
  • Country: de
Re: Neat algorithms
« Reply #146 on: August 27, 2019, 07:18:20 am »
Again, I have zero experience with FPGAs, so I do not know.

FPGAs usually have dedicated DSP units on silicon and vendors usually do not tell you about implementation details.

Anyway, I would not expect anybody to be able to compete with 'direct silicon' no matter how smart and simple the implementation might be.
Beethoven wrote his first symphony in C.
 

Online RoGeorge

  • Super Contributor
  • ***
  • Posts: 6202
  • Country: ro
Re: Neat algorithms
« Reply #147 on: August 27, 2019, 08:34:11 am »
Any dedicated hardware is in fact the "direct silicon" implementation of an algorithm, so a hardware MAC could still be improved by a more performant algorithm. (MAC == multiplier followed by an adder/accumulator, the basic block for hardware DSP in FPGAs)

An example from the former ALTERA FPGAs (now Intel):

Implementing Multipliers in FPGA Devices
https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/an/an306.pdf

A Survey and Comparative Analysis of Multiply-Accumulate (MAC) Block for Digital Signal Processing Application on ASIC and FPGA
https://scialert.net/fulltext/?doi=jas.2015.934.946
« Last Edit: August 27, 2019, 08:39:05 am by RoGeorge »
 

Offline coppice

  • Super Contributor
  • ***
  • Posts: 8646
  • Country: gb
Re: Neat algorithms
« Reply #148 on: August 27, 2019, 12:58:20 pm »
Again, I have zero experience with FPGAs, so I do not know.
FPGAs usually have dedicated DSP units on silicon and vendors usually do not tell you about implementation details.
The big FPGAs are mostly designed as DSP engines, so they have lots of dedicated hardware multipliers. In smaller FPGAs its still common to use macros to create multipliers out of logic blocks.

There is little point in the FPGA makers telling you the implementation details. You can easily tell from the specs. Its going to be based around a Booth or a Wallace tree, and if it has a latency greater than 1 clock cycle there will be some pipelining to help with the propagation delays in the tree.
 

Offline legacy

  • Super Contributor
  • ***
  • !
  • Posts: 4415
  • Country: ch
Re: Neat algorithms
« Reply #149 on: August 27, 2019, 08:39:07 pm »
about multiplications

My Arise-v2 core does implement MAC by a booth module. I am not considering any specific hardware, but rather a generic piece of VHDL code to be tested and simulated before it gets synthesized, so in this working conditions, "booth" is the best solution for me, and I would recommend it to everyone who wants to study how a fast MUL works.

Anyway, the division module is still problematic for me because it still takes 32 clocks for computing 32 bit. The reson is that I haven't yet found an algorithm with an acceptable complexity.
« Last Edit: August 27, 2019, 08:53:59 pm by legacy »
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf