Author Topic: Neat algorithms (Read 28571 times)

Nominal Animal · « **Reply #75 on:** August 19, 2019, 10:36:05 pm »

Graphviz has the benefit that it does all the heavy lifting for you. In Dot language, a simple three-node tree as a directed graph:

Code: [Select]

digraph {
    "root" -> "left";
    "root" -> "right";
}

You can add node modifiers, like "root" [ shape=hexagon ]; as well as edge attributes like labels and such. It is just text, which makes it especially useful for debug visualization from any program or script. Graphviz Dot (dot/neato/twopi/circo) will handle the layout for you, using a number of algorithms described in peer-reviewed articles. It is a surprisingly powerful tool -- or a set of tools, really. I like to generate the graphs as SVG, then hand-tweak them in Inkscape, for maximum quality and usefulness. See the example gallery for stuff it can do.

Quote from: legacy on August 19, 2019, 12:41:15 pm

A new algorithm is good but ... these things consume a lot of time when you use them for something practical.

Yet, that is the only way to truly learn/grok a new algorithm, and integrate it to ones "tool kit".

Knowing about an algorithm means you can choose between it and some others, but to implement it, you need engineering: good old elbow grease. (Well, brain grease, but that sounds icky.)

brucehoult · « **Reply #76 on:** August 19, 2019, 10:39:43 pm »

Quote from: legacy on August 19, 2019, 12:41:15 pm

B*trees is a special case of B+trees which guarantee that nodes are at least 2/3 full. This is good for my purpose because it improves my filesystem. But I have a serious problem: I cannot avoid recursion! B*tree is more prone to have recursion, and I'd like to avoid it as far as possible.

Why do you want to avoid recursion?

hamster_nz · « **Reply #77 on:** August 20, 2019, 03:01:30 am »

Quote from: brucehoult on August 19, 2019, 10:39:43 pm

Quote from: legacy on August 19, 2019, 12:41:15 pm
B*trees is a special case of B+trees which guarantee that nodes are at least 2/3 full. This is good for my purpose because it improves my filesystem. But I have a serious problem: I cannot avoid recursion! B*tree is more prone to have recursion, and I'd like to avoid it as far as possible.

Why do you want to avoid recursion?

It is seen as an honorable goal if you have limited stack space, or want a bounded run time.

Recursion seems to have a magical appeal to people that never clicked with me:

Code: [Select]

int factorial(int x) {
  if(x == 1)
    return 1;
  else
     return x* factorial(x-1);
}

Yeah... whatever - give me a 'for' loop any day.

T3sl4co1l · « **Reply #78 on:** August 20, 2019, 03:03:43 am »

Write it for tail recursion; compilers handle that optimization automatically.

Can always use an array as stack and put it in a for loop. It's just syntactically uglier. You're not actually avoiding recursion, it's still recursive (otherwise it wouldn't be the same algorithm), you're just moving where (and how much) the recursion is stored.

Tim

legacy · « **Reply #79 on:** August 20, 2019, 09:07:33 am »

Quote from: brucehoult on August 19, 2019, 10:39:43 pm

Why do you want to avoid recursion?

The recursion can make the algorithm not deterministic about the use of ram, it might consume too much ram, and I am limited to 4Mbyte on that board, with other stuff that needs ram. I am not even using malloc because I cannot have virtual memory, the CPU does not even have an MMU. Everything must be static and deterministic, with the worst-case known and calculated in advance.

On a PC, it's a completely different story, they have many more resources, and I can even reimplement the algorithm in Erlang, which has no problem in eating up to 1Gbyte of stack.

So the question is: can a tree be traversed without

recursion
stack
queue

and just a handful of pointers?

with my current implementation of B+tree-disk-oriented, I cannot use "ram-pointers", I need to use pointers to disk blocks, so the algorithm loads the most used nodes in ram, and leave leaves on disk. A leaf has the iblock to other disk-pointers. It can avoid recursion, but the tree needs to be traversed via queues.

it works, it's decently fast and elegant, but it only keeps nodes 50% empty, and this is not good when the storage size is a premium.

legacy · « **Reply #80 on:** August 20, 2019, 09:44:40 am »

Quote from: hamster_nz on August 20, 2019, 03:01:30 am

It is seen as an honorable goal if you have limited stack space, or want a bounded run time.

There is also the old good charm of the challenge launched in the far 1968 when Knuth posed the problem of designing a stack-free, tag-free, non-recursive algorithm that traversed the tree in in-order while leaving it unaltered in the end.

Since then, numerous solutions have appeared in the literature, but many professors still say that none of them is even close to the solution. Or just it's impossible.

In 1979 Morris suggested an algorithm that is clearly non-recursive, but which only at the first glance appears to use neither a stack nor tag fields ... but the stack is inside the tree, so once again it was only close to the solution.

There were several deviations from Morris solution, but here the main problem is that it seems that you have to mutate the concept of tree in order to find a valid solution to solve the original problem posed by Knuth, and this is considered as a cheat.

-

Can we have 1M USD/Euro for the first one that claims to have found a true valid solution?

westfw · « **Reply #81 on:** August 20, 2019, 10:18:12 am »

Meh. Trees (and graphs) are so inherently, obviously, and elegantly recursive that avoiding recursion in code that implements them is ... cutting off your nose to spite your face.
The algorithms involved are usually well defined and understood wrt their resource usage, and the insistence in some “safety standards” that programmers come up with some less-well-debugged, harder to-understand loop-based equivalent (or use a poorer data structure) is one of the main objections that I have toward such standards. :-(

RoGeorge · « **Reply #82 on:** August 20, 2019, 10:43:41 am »

Isn't the main objection against recursive algorithms, in general, about memory?

The fact that are using the computer stack (to store intermediate results obtained during each recursion step) thus making them slower and memory intensive.

coppice · « **Reply #83 on:** August 20, 2019, 11:13:10 am »

Quote from: RoGeorge on August 20, 2019, 10:43:41 am

Isn't the main objection against recursive algorithms, in general, about memory?

The fact that are using the computer stack (to store intermediate results obtained during each recursion step) thus making them slower and memory intensive.

The problem is not that they are memory intensive. Its that unless you take special action their memory use can be unconstrained. A routine that always calls itself a well known range of times has a known maximum memory requirement, and is OK. Many recursive routines call themselves an arbitrary number of times, and will eventually blow the stack up unless special action is taken to constrain the call depth and perhaps declare a failure to compute the true result.

legacy · « **Reply #84 on:** August 20, 2019, 11:46:40 am »

Quote from: coppice on August 20, 2019, 11:13:10 am

A routine that always calls itself a well known range of times has a known maximum memory requirement

This requires you to have an additional field in the structure for counting the number of calls, which correspond to used resources, therefore it *somehow* workarounds the problem making things back to deterministic, but it sucks a lot when you have to instrument the code for the test-report because it generates too many triplets, and a function that calls itself can consume more ram, and usually does consume more ram.

coppice · « **Reply #85 on:** August 20, 2019, 12:01:17 pm »

Quote from: legacy on August 20, 2019, 11:46:40 am

Quote from: coppice on August 20, 2019, 11:13:10 am
A routine that always calls itself a well known range of times has a known maximum memory requirement

This requires you to have an additional field in the structure for counting the number of calls, which correspond to used resources, therefore it *somehow* workarounds the problem making things back to deterministic, but it sucks a lot when you have to instrument the code for the test-report because it generates too many triplets, and a function that calls itself can consume more ram, and usually does consume more ram.

There are many algorithms which are recursive, but have an easy to identify maximum for the depth they will ever call to. Many divide and conquer type algorithms are like this. You usually know the maximum resolution you will ever be able to divide to, so you know the maximum depth of recursion you will ever get.

SiliconWizard · « **Reply #86 on:** August 20, 2019, 01:49:26 pm »

Head and tail recursion are often relatively easy to transform into pure iteration.
Some compilers can do this by themselves. Very simple example with GCC (9.2.0 here):

Code: [Select]

unsigned int Factorial(unsigned int n)
{
	if (n <= 1)
		return 1;
	else
		return n * Factorial(n - 1);
}

compiled with:

Code: [Select]

gcc -O3 -S TailRecursion.c
Assembly:

Code: [Select]

Factorial:
	.seh_endprologue
	movl	$1, %eax
	cmpl	$1, %ecx
	jbe	.L1
	.p2align 4,,10
	.p2align 3
.L2:
	movl	%ecx, %edx
	subl	$1, %ecx
	imull	%edx, %eax
	cmpl	$1, %ecx
	jne	.L2
.L1:
	ret

It can be very hard, or sometimes even impossible for other, less trivial forms of recursion.

RoGeorge · « **Reply #87 on:** August 20, 2019, 02:12:06 pm »

Quote from: SiliconWizard on August 20, 2019, 01:49:26 pm

It can be very hard, or sometimes even impossible for other, less trivial forms of recursion.

Are there any examples, or a demonstration about algorithms that are impossible to convert?

Asking just out of curiosity, because I always thought *any* recursive algorithm can be translated into a non recursive one.

coppice · « **Reply #88 on:** August 20, 2019, 02:15:58 pm »

Quote from: RoGeorge on August 20, 2019, 02:12:06 pm

Quote from: SiliconWizard on August 20, 2019, 01:49:26 pm
It can be very hard, or sometimes even impossible for other, less trivial forms of recursion.

Are there any examples, or a demonstration about algorithms that are impossible to convert?

Asking just out of curiosity, because I always thought *any* recursive algorithm can be translated into a non recursive one.

If you create your own stack, and save state in that, you can turn anything recursive into something you can claim to be non-recursive

SiliconWizard · « **Reply #89 on:** August 20, 2019, 03:14:39 pm »

Quote from: coppice on August 20, 2019, 02:15:58 pm

Quote from: RoGeorge on August 20, 2019, 02:12:06 pm
Quote from: SiliconWizard on August 20, 2019, 01:49:26 pm
It can be very hard, or sometimes even impossible for other, less trivial forms of recursion.

Are there any examples, or a demonstration about algorithms that are impossible to convert?

Asking just out of curiosity, because I always thought *any* recursive algorithm can be translated into a non recursive one.
If you create your own stack, and save state in that, you can turn anything recursive into something you can claim to be non-recursive

Yep.

Just a random example: https://www.geeksforgeeks.org/iterative-quick-sort/

Whether this is worth it is completely debatable though. The recursive form is much simpler and more readable.
Then it will depend on many factors, such as the cost of a function call on a given language and platform, the cost of using the stack instead of the heap, and many other things, versus the cost of allocating and handling your own stack, and the hindered readability...

Recursion is a powerful tool and most of all, when it applies, it makes code easier to write, more readable, look more like maths, and also it's usually easier to verify pre and post conditions.
In many simple cases, it can be optimized away by a decent compiler when possible anyway. So there's no hard and stubborn reason to avoid recursion. Just like any other tool, use it wisely and use it for a reason.

mfro · « **Reply #90 on:** August 20, 2019, 03:30:43 pm »

although it's from the coding stoneage, I still think this one qualifies:

Duff's device

Code: [Select]

send(to, from, count)
register short *to, *from;
register count;
{
    register n = (count + 7) / 8;
    switch (count % 8) {
    case 0: do { *to = *from++;
    case 7:      *to = *from++;
    case 6:      *to = *from++;
    case 5:      *to = *from++;
    case 4:      *to = *from++;
    case 3:      *to = *from++;
    case 2:      *to = *from++;
    case 1:      *to = *from++;
            } while (--n > 0);
    }
}

legacy · « **Reply #91 on:** August 20, 2019, 04:45:14 pm »

Quote from: SiliconWizard on August 20, 2019, 03:14:39 pm

Recursion is a powerful tool and most of all, when it applies, it makes code easier to write, more readable, look more like maths, and also it's usually easier to verify pre and post conditions.

So tell me why recursion isn't allowed for safety critical code in automotive and avionics applications, and I have never been authorized to use it.

Yansi · « **Reply #92 on:** August 20, 2019, 05:13:02 pm »

New subscriber to the topic.

RoGeorge · « **Reply #93 on:** August 20, 2019, 06:34:29 pm »

To me, the most impressive algorithms are the simplest ones, and the absolute winner so far is Von Neuman with his unbiasing a biased coin algorithm.

The problem is this:
- you have a biased coin. Somebody messed up with the coin, and now the head/tail chances are not 50/50 any more. You don't know how big is the bias, and don't have time to measure it. All you know is that it's not 50/50 any more.

How do you generate fair 50/50 head/tails with a biased coin?

"Von Neumann gave a simple solution: flip the coin twice. If it comes up heads followed by tails, then call the outcome HEAD. If it comes up tails followed by heads, then call the outcome TAIL. Otherwise (i.e., two heads or two tails occured) repeat the process." https://web.eecs.umich.edu/~qstout/abs/AnnProb84.html

Nominal Animal · « **Reply #94 on:** August 20, 2019, 07:19:03 pm »

Quote from: RoGeorge on August 20, 2019, 06:34:29 pm

To me, the most impressive algorithms are the simplest ones, and the absolute winner so far is Von Neuman with his unbiasing a biased coin algorithm.

In that case, I recommend you look up Metropolis-Hastings algorithm (for the inverse; to generate pseudorandom numbers according to a specific probability distribution P(x), when you only know a function f(x) that is proportional to P). From there, go on to e.g. simulated annealing, which can be used to find global minima or maxima of a function; it is very similar to Metropolis-Hastings.

(Or maybe in the reverse order, depending on what your own optimal approach paths are.)

Describing how these algorithms work and why takes quite a lot of text and math, but when implemented, they're nearly trivial: just a few lines of code. If you do not know the pattern that implements the method, and how they work, the code looks silly: naaah, that can't be right; it looks like it just finds the answer by accident, using random numbers like that. Yet, it works just fine.

SiliconWizard · « **Reply #95 on:** August 20, 2019, 08:31:05 pm »

Quote from: legacy on August 20, 2019, 04:45:14 pm

Quote from: SiliconWizard on August 20, 2019, 03:14:39 pm
Recursion is a powerful tool and most of all, when it applies, it makes code easier to write, more readable, look more like maths, and also it's usually easier to verify pre and post conditions.

So tell me why recursion isn't allowed for safety critical code in automotive and avionics applications, and I have never been authorized to use it.

Well, because it isn't allowed. Discussing rules that you have to follow and can't change may be intellectually stimulating, but it's pointless. Such is life when you're not one of those who set the rules.

As to the rationale...

I would say it's a cousin of the rule not allowing the use of dynamic allocation.
One specific risk of using recursion is that unless it's perfectly controlled and bug-free, you can run out of stack, which is arguably a bad thing, because it can be impossible to recover from.

Of course this is like any rule set to limit some risk. Like if you follow a rule of never drinking, you'll never get drunk. Thus the probability of you having a car accident due to driving while drunk is zero. That doesn't mean you'll never have a car accident. I like this illustration of risk mitigation.

Just a few random additional notes about recursion:

- There's a difference between recursion as a programming paradigm (or even notation) and effective recursion at run-time. As was pointed out above, you can write recursive code that won't actually lead to recursive execution. In C, this is a matter of optimization, but in languages such as functional languages, I think this is an important concept. (And that said, of course if you have a rule of forbidding recursion at run-time, relying on compiler optimization to get around it would probably be a bad idea!)

- There are means of making recursion a bit safer, such as controlling the depth at run-time (not hard to do). Of course you now have to decide what to do if the max depth has been reached and the function has not completed, but it gives you an opportunity to handle it gracefully instead of just letting the system crash.

- Recursion may happen without you realizing it. Beyond the simple case of a function directly calling itself, it may do so indirectly through a chain of calls of other functions. In moderately complex software, this is not necessarily easy to spot. Using code analysis tools may help.

- Implementing a typically recursive algorithm (not trivial to convert to iterative) without direct recursion, in the way coppice suggested, is often more of a trick than something really useful. It can lead to actually more inefficient code, without necessarily being much "safer".

- Using some functions of the C std lib, such as qsort, may introduce recursive code in your software. Most qsort implementations are recursive. In certified std libraries for safety-critical applications, qsort is probably not implemented as a recursive function. Something to check though.

- Some things, like parsers, are much easier and more elegant to write using recursion.

- I've never used recursion in any critical piece of "embedded" code even in settings where this was not a rule. A mix of common sense, risk mitigation and often no real need for/benefit of recursion for rather low-level stuff. I've never set a "no recursion" rule though when it was not strictly required from a regulatory POV.

hamster_nz · « **Reply #96 on:** August 20, 2019, 09:54:55 pm »

Quote from: SiliconWizard on August 20, 2019, 08:31:05 pm

- Recursion may happen without you realizing it. Beyond the simple case of a function directly calling itself, it may do so indirectly through a chain of calls of other functions. In moderately complex software, this is not necessarily easy to spot. Using code analysis tools may help.

Ah, the joy of avoiding fun things like trying to log an "Out Of Memory" exception to a file, when the file handling functions need to allocate memory....

T3sl4co1l · « **Reply #97 on:** August 20, 2019, 11:28:20 pm »

Quote from: coppice on August 20, 2019, 12:01:17 pm

There are many algorithms which are recursive, but have an easy to identify maximum for the depth they will ever call to. Many divide and conquer type algorithms are like this. You usually know the maximum resolution you will ever be able to divide to, so you know the maximum depth of recursion you will ever get.

Curiously the example that was given in protest was 1. a trivial loop, and 2. O(N) complexity. I suppose the teachers are to blame, because factorial(n) and especially egregiously fibonacci(n) are such common but such terrible examples.

Suffice it to say, you'd never use a recursive function when a linear loop works; trees are the perfect example because the depth is O(lg N), very modest even on fuckoff-massive structures.

Tim

Nominal Animal · « **Reply #98 on:** August 21, 2019, 01:21:31 am »

For a 512-order B+ tree, you need just 31 levels to record each atom in the observable universe, assuming the characteristics of B+ trees listed at Wikipedia are correct; I'm too lazy to check elsewhere.

(The observable universe contains something like 2²⁶⁵ atoms, and the number of records in a 512-order B+ tree with 31 levels (b=512, h=30) is roughly 2²³³ to 2²⁷⁰.)

If you implement a B+ tree, you choose the order (branching factor b). Then, the maximum number of records you can store directly defines the maximum depth h+1 of your tree. Robust code will keep track of its recursion depth, and if it exceeds h or h+1 depending on how you count the calls, the tree must be corrupted.

So, if you really worry about the maximum recursion depth for B+ trees, you must be using too small a branching factor b for the number of records you use.

legacy · « **Reply #99 on:** August 21, 2019, 04:12:54 am »

Quote from: Nominal Animal on August 21, 2019, 01:21:31 am

So, if you really worry about the maximum recursion depth for B+ trees, you must be using too small a branching factor b for the number of records you use.

yup, this approach is interesting. Thanks.

Do you know *how* we quickly and low costly check the maximal consume of the stack at runtime for a given test-case under recursion? We fill the stack area with a known pattern 0xdeadbeaf, and we check how many times the pattern appears after the algorithm has completed its execution

This works for a given environment and it cannot be written in a formal test-report without a manual justification, but it gives us useful information during the engineering steps.

It's a practical "artisanal" approach, which doesn't take too much time to give some useful information back to devs.


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

Author Topic: Neat algorithms (Read 28571 times)

Share me