Recursion - do you use it in the real world?

Quote from: NorthGuy on 06 Sep, 2017 17:42

I don't see how recursion is "natural" here.

Because you took a trivial example of an iteration over an array/list. Try something a bit more involved, such as searching for a given file in a directory hierarchy.

You can either futz with the stack by hand (in each directory push all subdirectories onto stack, compare files, if not found start popping subdirectories from the stack and repeating the procedure for each of them until the stack is empty) or you write a simple recursive function where you compare the files (the "first elements") and if the file isn't found you call the same procedure on each of the subdirectories (the "rest") ...

I dare to say that the recursive solution will be much shorter and more readable, plus you don't need to implement a stack data structure.

Same if you want to build an expression evaluator (e.g. for a calculator). You can do it with stack, but unless you are building an RPN calculator, it is not a particularly natural fit - the infix notation is more naturally represented as a tree and tree traversal is easier done recursively again.

Quote from: NorthGuy on 06 Sep, 2017 17:42

There's no reason to use recursion where it is not needed (e.g. where you necessarily want to use tail recursion because you feel it's cool). Similarly, there's no reason not to use recursion where the recursion helps (e.g. no reason to avoid recursion because you're scared of how dangerous it is). This is just as with everything else - forget the rules and use your brain cells instead.

Sure, no dispute with that. I am merely pointing out that recursion is far from an useless concept that has no use in embedded programming.

Quote from: bd139 on 06 Sep, 2017 18:21

Hell if you have a test case you can even make sure it doesn't shit the stack. LLVM at least has a half decent optimiser that can actually handle this well (unlike GCC which is a hairy testicle).

GCC had tail call optimization from before LLVM even existed. I am not going to claim to know which one performs better, but the feature is there since a long time.

Quote from: bd139 on 06 Sep, 2017 18:21

You C guys really need to go on a mental excursion and write some Haskell, Scheme or Erlang or something and come back having licked the toad of glory even if you never touch it again.

This. I have started with imperative programming (Basic, Z80 assembly, Pascal, C/C++ ...) as most people but getting exposed to the ideas behind languages like Lisp or (more recently) Haskell certainly did make me a much better programmer. Not because I am trying to write Lisp/Haskell in every project but simply because it teaches techniques that are super useful even when writing assembly code or C. For example pure functions, writing code that doesn't do side effects -> super helpful when dealing with multithreaded code (if a code doesn't have internal state/side effects it is implicitly thread safe). Such code also tends to be easier to maintain, because its effects are easier to test and verify.

Or that tail recursion - it is usually taught as a concept with Scheme or Lisp but it is very useful even in C. Or declarative programming in the style of Prolog - again a concept that is very useful even if you never touch Prolog in your life.

Quote from: bd139 on 06 Sep, 2017 18:21

Also to note, the stack and stack frames are a compromise in the machine model which happens to leak into the language abstraction, not a required part of the machine or the language. There doesn't have to be turtles all the way down until they topple with a stack overflow; there can be one well defined turtle. Some embedded systems guys may recognise this if they have dirty assembly fingers. There was another thread on that where all the C programmers whinged about goto (or JMP as it used to be called)

Yep. On the other hand, stack is pretty essential for implementing many things, such as function calls or interrupts. You can do without, but it is usually not fun. So it helps having support for it.

Quote from: janoc on 06 Sep, 2017 20:53

Quote from: NorthGuy on 06 Sep, 2017 17:42
I don't see how recursion is "natural" here.

Because you took a trivial example of an iteration over an array/list. Try something a bit more involved, such as searching for a given file in a directory hierarchy.

But searching the directory is not the case where the recursion cannot be removed through the tail call optimization. Here recursion greatly simplifies things.

At the same time, nearly any case where the recursion can be removed through the end call optimization will be more or less trivial. So, not much need for recursion in such cases.

This is exactly the distinction I wanted to make here:

Quote from: NorthGuy on 06 Sep, 2017 14:56

Back to the recursion.

Quote from: janoc on 06 Sep, 2017 21:11

Quote from: bd139 on 06 Sep, 2017 18:21
Also to note, the stack and stack frames are a compromise in the machine model which happens to leak into the language abstraction, not a required part of the machine or the language. There doesn't have to be turtles all the way down until they topple with a stack overflow; there can be one well defined turtle. Some embedded systems guys may recognise this if they have dirty assembly fingers. There was another thread on that where all the C programmers whinged about goto (or JMP as it used to be called)

Yep. On the other hand, stack is pretty essential for implementing many things, such as function calls or interrupts. You can do without, but it is usually not fun. So it helps having support for it.

As an experiment, I wrote a code generator (compiler is a bit too advanced a word!) for the pic16 architecture around 2000 that used coroutines for multi tasking and multi threading. It had a bastardisation of basic and pascal as a language. I got bored of it and never finished it but the approach is definitely viable and not terribly inflexible. If you have a complex system, you will have hundreds of coroutines though which makes a debugging nightmare. Synchronisation was a bitch as well.

Quote from: brucehoult on Yesterday at 11:11:51 PM>Quote from: westfw on Yesterday at 09:17:57 PM

I used to like a recusive decimal output function, even on moderately small systems.

Between the array and the code for the extra loop the recursive version is simpler and cleaner and the memory use is little different.

Actually, the memory use is a little depressing on most modern processors. The "temporary array" would require N bytes for an N digit number. A typical 32bit CPU will required at least 8*N bytes of stack space: word-wide pushes of each digit plus word-wide pushes of the PC for each recursive call. And it's harder to add formatting to the recursive version.
But it is cute and simple for basic conversion. (OTTH a typical 32bit CPU will usually have several KBytes (or much more) of stack, so...)

Quote from: westfw on 07 Sep, 2017 02:24

Actually, the memory use is a little depressing on most modern processors. The "temporary array" would require N bytes for an N digit number. A typical 32bit CPU will required at least 8*N bytes of stack space: word-wide pushes of each digit plus word-wide pushes of the PC for each recursive call. And it's harder to add formatting to the recursive version.
But it is cute and simple for basic conversion. (OTTH a typical 32bit CPU will usually have several KBytes (or much more) of stack, so...)

You can usually add saving the base pointer to that stack too, so make that 12*N - so that is more memory for one level pf recursion than it takes to buffer the the entire output.

More instructions to make the new stack frame, more time....

Suddenly

Code: [Select]

void print_dec(int n) {
  int i;
  unsigned int u;
  char digits[10];
  if(n < 0) {
    u = -n;
    putchar('-');
  } else {
    u = n;
  }

  for(i = 0; i < 10; i++) {
    digits[9-i] = u%10+'0';
    u/=10;
  }

  for(i = 0; i < 9; i++) {
    if(digits[i] != '0')
      break;
  }

  for(; i < 10; i++) {
    putchar(digits[i]);
  }
}

isn't looking so terrible.

Quote from: hamster_nz on 07 Sep, 2017 02:53

Quote from: westfw on 07 Sep, 2017 02:24
Actually, the memory use is a little depressing on most modern processors. The "temporary array" would require N bytes for an N digit number. A typical 32bit CPU will required at least 8*N bytes of stack space: word-wide pushes of each digit plus word-wide pushes of the PC for each recursive call. And it's harder to add formatting to the recursive version.
But it is cute and simple for basic conversion. (OTTH a typical 32bit CPU will usually have several KBytes (or much more) of stack, so...)
You can usually add saving the base pointer to that stack too, so make that 12*N - so that is more memory for one level pf recursion than it takes to buffer the the entire output.

More instructions to make the new stack frame, more time....

Suddenly
Code: [Select]
void print_dec(int n) { int i; unsigned int u; char digits[10]; if(n < 0) { u = -n; putchar('-'); } else { u = n; } for(i = 0; i < 10; i++) { digits[9-i] = u%10+'0'; u/=10; } for(i = 0; i < 9; i++) { if(digits[i] != '0') break; } for(; i < 10; i++) { putchar(digits[i]); } }isn't looking so terrible.

Let's try, shall we?

Using arm-linux-gnueabihf-gcc 5.4.0 20160609 compiling for ARMv7 Thumb2 with -Os (what I almost always use for everything) your code compiles to 140 bytes of code and uses 20 bytes of stack, for 160 total.

My recursive code (below) compiles to 52 bytes of code and uses 8 bytes of stack per digit printed, for a maximum of 80 bytes of stack. That's 132 bytes total.

My code+stack is less than your code! I need 60 bytes more of stack than you do.

Incidentally, both versions happen to use one 2-byte NOP for alignment...

I've tested both versions against printf on x86 for all 2^32 input values (takes about 5 min). All ok.

Code: [Select]

void print_dec(int n) {
    unsigned u = n;
    if (n < 0) {
        putchar('-');
        u = -n;
    }
    unsigned d = u/10, ch = u - d*10 + '0';
    if (d != 0) print_dec(d);
    putchar(ch);
}

Code: [Select]

00000000 <print_dec>:
   0:	b510      	push	{r4, lr}
   2:	1e04      	subs	r4, r0, #0
   4:	da03      	bge.n	e <print_dec+0xe>
   6:	202d      	movs	r0, #45	; 0x2d
   8:	4264      	negs	r4, r4
   a:	f7ff fffe 	bl	0 <putchar>
   e:	4a08      	ldr	r2, [pc, #32]	; (30 <print_dec+0x30>)
  10:	fba4 2302 	umull	r2, r3, r4, r2
  14:	3430      	adds	r4, #48	; 0x30
  16:	08d8      	lsrs	r0, r3, #3
  18:	230a      	movs	r3, #10
  1a:	fb03 4410 	mls	r4, r3, r0, r4
  1e:	b108      	cbz	r0, 6 <print_dec+0x6>
  20:	f7ff fffe 	bl	0 <print_dec>
  24:	4620      	mov	r0, r4
  26:	e8bd 4010 	ldmia.w	sp!, {r4, lr}
  2a:	f7ff bffe 	b.w	0 <putchar>
  2e:	bf00      	nop
  30:	cccccccd 	.word	0xcccccccd

Just for yuks I tried the same code on 32 bit RISCV. It compiles to 66 bytes of code. Unfortunately, the ABI says the stack pointer must always be 16 byte aligned, so it uses 16 bytes per level or 160 maximum even though it's storing less data than that. So my worst case total is 226 bytes of code+stack.

Using riscv32-unknown-elf-gcc 6.1.0 with -Os

Your function compiles to 146 bytes of code, and uses 32 bytes of stack, total 178 bytes.

Code: [Select]

00010222 <print_dec>:
   10222:       1141                    addi    sp,sp,-16
   10224:       c422                    sw      s0,8(sp)
   10226:       c226                    sw      s1,4(sp)
   10228:       c606                    sw      ra,12(sp)
   1022a:       842a                    mv      s0,a0
   1022c:       00055a63                bgez    a0,10240 <print_dec+0x1e>
   10230:       8581a503                lw      a0,-1960(gp) # 1b910 <_impure_ptr>
   10234:       02d00593                li      a1,45
   10238:       40800433                neg     s0,s0
   1023c:       4510                    lw      a2,8(a0)
   1023e:       3f55                    jal     101f2 <__sputc_r>
   10240:       45a9                    li      a1,10
   10242:       02b45533                divu    a0,s0,a1
   10246:       02b47433                remu    s0,s0,a1
   1024a:       03040413                addi    s0,s0,48
   1024e:       c111                    beqz    a0,10252 <print_dec+0x30>
   10250:       3fc9                    jal     10222 <print_dec>
   10252:       8581a503                lw      a0,-1960(gp) # 1b910 <_impure_ptr>
   10256:       85a2                    mv      a1,s0
   10258:       40b2                    lw      ra,12(sp)
   1025a:       4422                    lw      s0,8(sp)
   1025c:       4492                    lw      s1,4(sp)
   1025e:       4510                    lw      a2,8(a0)
   10260:       0141                    addi    sp,sp,16
   10262:       bf41                    j       101f2 <__sputc_r>

One reason for the 14 bytes more of code is this toolchain inlined putchar to putc, which needs to load stdout from a global and then pass it. I didn't look closely enough to figure it out, but I think that might also be the reason for saving three registers not just two (not that it makes any difference with mandated 16 byte alignment)

Quote from: brucehoult on 07 Sep, 2017 10:49

Let's try, shall we?

If you are not adverse to 'compact & unreadable' code this is most likely the same size as the recursive code, and most likely quicker too - but fudging ugly:

Code: [Select]

void print_dec_2(int n) {
  unsigned int u = n, i = 0;
  char digits[10];
  if(n < 0) {
    u = -u;
    putchar('-');
  }

  do {
    unsigned int t = u/10;
    digits[i++] = u-t*10;
    u            = t;
  } while(i<10 && u != 0);

  while(i) {
    putchar(digits[--i]+'0');
  }
}

It could be made smaller by removing the "i<10 &&" test, which traps overflows if ints need greater than 10 characters - e.g. a 64 bit int - but better safe than sorry.

Quote from: hamster_nz on 07 Sep, 2017 12:18

Quote from: brucehoult on 07 Sep, 2017 10:49
Let's try, shall we?

If you are not adverse to 'compact & unreadable' code this is most likely the same size as the recursive code, and most likely quicker too - but fudging ugly:

Code: [Select]
void print_dec_2(int n) { unsigned int u = n, i = 0; char digits[10]; if(n < 0) { u = -u; putchar('-'); } do { unsigned int t = u/10; digits[i++] = u-t*10; u = t; } while(i<10 && u != 0); while(i) { putchar(digits[--i]+'0'); } }
It could be made smaller by removing the "i<10 &&" test, which traps overflows if ints need greater than 10 characters - e.g. a 64 bit int - but better safe than sorry.

Congratulations, that's better at 96 bytes of code only 44 more than the recursive version. I took out the i<10 test as that's really better done at compile time with a test on sizeof(int)*CHAR_BIT or UINT_MAX or some such.

I do agree that's less easily understandable than the recursive version, but for me at least it's much more quickly understandable than your first version!

Code: [Select]

00000000 <print_dec_2>:
   0:	b57f      	push	{r0, r1, r2, r3, r4, r5, r6, lr}
   2:	1e04      	subs	r4, r0, #0
   4:	4b14      	ldr	r3, [pc, #80]	; (58 <print_dec_2+0x58>)
   6:	681a      	ldr	r2, [r3, #0]
   8:	461e      	mov	r6, r3
   a:	9203      	str	r2, [sp, #12]
   c:	da03      	bge.n	16 <print_dec_2+0x16>
   e:	202d      	movs	r0, #45	; 0x2d
  10:	4264      	negs	r4, r4
  12:	f7ff fffe 	bl	0 <putchar>
  16:	4811      	ldr	r0, [pc, #68]	; (5c <print_dec_2+0x5c>)
  18:	2500      	movs	r5, #0
  1a:	fba4 2300 	umull	r2, r3, r4, r0
  1e:	3501      	adds	r5, #1
  20:	eb0d 0205 	add.w	r2, sp, r5
  24:	08db      	lsrs	r3, r3, #3
  26:	eb03 0183 	add.w	r1, r3, r3, lsl #2
  2a:	eba4 0441 	sub.w	r4, r4, r1, lsl #1
  2e:	f802 4c01 	strb.w	r4, [r2, #-1]
  32:	461c      	mov	r4, r3
  34:	2b00      	cmp	r3, #0
  36:	d1f0      	bne.n	1a <print_dec_2+0x1a>
  38:	3d01      	subs	r5, #1
  3a:	f81d 0005 	ldrb.w	r0, [sp, r5]
  3e:	3030      	adds	r0, #48	; 0x30
  40:	f7ff fffe 	bl	0 <putchar>
  44:	2d00      	cmp	r5, #0
  46:	d1f7      	bne.n	38 <print_dec_2+0x38>
  48:	9a03      	ldr	r2, [sp, #12]
  4a:	6833      	ldr	r3, [r6, #0]
  4c:	429a      	cmp	r2, r3
  4e:	d001      	beq.n	54 <print_dec_2+0x54>
  50:	f7ff fffe 	bl	0 <__stack_chk_fail>
  54:	b004      	add	sp, #16
  56:	bd70      	pop	{r4, r5, r6, pc}
  58:	00000000 	.word	0x00000000
  5c:	cccccccd 	.word	0xcccccccd

MY EYES!!!

Quote from: brucehoult on 07 Sep, 2017 10:49

My recursive code (below) compiles to 52 bytes of code and uses 8 bytes of stack per digit printed, for a maximum of 80 bytes of stack. That's 132 bytes total.

Code: [Select]
void print_dec(int n) { unsigned u = n; if (n < 0) { putchar('-'); u = -n; } unsigned d = u/10, ch = u - d*10 + '0'; if (d != 0) print_dec(d); putchar(ch); }...

This is as well proper example why should not force recursion even that is "naturally" obvious.

1. You always send unsigned value in every call but first time. Internal conversion from signed to unsigned all the time.
2. You always check value is negative. As known from point 1, that is possible only once. Lost time for useless check.
3. Every recursion put on stack all variables. You spend more of the memory

Not only it waste resources, but it is as well extremely slow. Printing on screen or file is anyway slow, but try to convert millions of numbers to memory stream and you will notice difference.

Connected example: "Print reversed entered text and check it is a palindrome. Naturally, everybody will put a finger on last letter and start to copy each letter from right to the left. No recursion! It is the same principle with resulted string from converting number to string, thus recursion is only more complicated thing.

Challenge 2 (MCU related): "Assume there is no memory for additional temporary string nor stack for recursion. Convert integer and send result through serial port."

I will immediately reject every youngster application for the job if fail to see solution in 1 second. As well, the best for him and any company is that burn his diploma out...

Quote from: sasa on 07 Sep, 2017 14:28

Quote from: brucehoult on 07 Sep, 2017 10:49

My recursive code (below) compiles to 52 bytes of code and uses 8 bytes of stack per digit printed, for a maximum of 80 bytes of stack. That's 132 bytes total.

Code: [Select]
void print_dec(int n) { unsigned u = n; if (n < 0) { putchar('-'); u = -n; } unsigned d = u/10, ch = u - d*10 + '0'; if (d != 0) print_dec(d); putchar(ch); }...

This is as well proper example why should not force recursion even that is "naturally" obvious.

1. You always send unsigned value in every call but first time. Internal conversion from signed to unsigned all the time.
2. You always check value is negative. As known from point 1, that is possible only once. Lost time for useless check.
3. Every recursion put on stack all variables. You spend more of the memory

Not only it waste resources, but it is as well extremely slow. Printing on screen or file is anyway slow, but try to convert millions of numbers to memory stream and you will notice difference.

You're looking at the trees and missing the forest!

I'm fully aware of your three points. They make no difference. Try it!

Code: [Select]

#define N 10000000

char buf[N*24]; // more than enough
char *p = buf;

__attribute__ ((noinline))
void myputchar(char ch) {
  *p++ = ch;
}

void print_dec(int n) {
  unsigned u = n;
  if (n < 0) {
    myputchar('-');
    u = -n;
  }
  unsigned d = u/10, ch = u - d*10 + '0';
  if (d != 0) print_dec(d);
  myputchar(ch);
}

int main() {
  for (int i=-N; i<=N; ++i) print_dec(i);
  return 0;
}

On a Raspberry Pi 3, compiled gcc -std=c99 -Os -mthumb -march=armv7

real   0m5.996s
user   0m5.680s
sys   0m0.310s

With hamster_nz's most recent code substituted:

real   0m6.312s
user   0m6.020s
sys   0m0.290s

Whose code is faster?

There are a lot of people here with assumptions and prejudices about what is fast, what is slow, what is efficient, what is wasteful.

It would be good to check them sometimes :-)

Btw, on an Odroid XU4 (2.0 GHz ARM A15):

hamster_nz's iterative code:

real   0m2.197s
user   0m1.860s
sys   0m0.325s

My recursive code:

real   0m2.128s
user   0m1.770s
sys   0m0.355s

Closer, but the recursive is faster.

On an i7 3770:

hamster_nz's iterative code:

real   0m0.898s
user   0m0.876s
sys   0m0.020s

My recursive code:

real   0m0.856s
user   0m0.848s
sys   0m0.008s

Close, but again recursive has the edge.

hmmm 4 pages about recursive function... here's my take... recursion is a tool. if you think you need it, then you use it. if you think you dont, then dont forbid others from using it... if you are taboo about recursion, you can replace recursive function with few (2, 3 or 4) functions that can emulate recursion... but by using the recursive tool you need to be aware of the pros and cons...

recursive function pros:
1) simpler code, hence managing and understanding it will be alot easier, hence less prone to logic/semantics error. hence in accordance to kiss principle.

cons:
1) slower execution due to multiple self function calls (can be hundreds of recursive function calls) which in effect will execute housekeeping job a normal program did when jumping (calling) another (modular) function, such as preparing stack and pointers to arguments etc.
2) require deep stack memory space to store return points and local variables. a big no no to limited resource mcu..

with non-recursive functions, the pro and cons is the other way around except con #1 because they can still call hundreds of non-recursive function calls. more difficult to maintain, less readable etc, but requiring close to nothing stack space.

anyway. i cared to reply just because i see missing option in the polls. i want to vote... use recursion depending on the need or when appropriate, there should be no definable time limit where you should use or not use recursion. but its not in the poll list... recursion is a tool, understand it and learn its drawback is the way to go. speculating without really understanding it is a complete waste of time, much like what usually happened in religious discussions, you need to understand your enemy to love or hate them in the right way. imho...

Quote from: brucehoult on 07 Sep, 2017 15:36

Whose code is faster?

There are a lot of people here with assumptions and prejudices about what is fast, what is slow, what is efficient, what is wasteful.

It would be good to check them sometimes :-)

To be fair, your test does not really test what you want to test. The hamster_nz's code uses "%" operation. The compiler (it's the same GCC everywhere, right?) implements it as a division. In your code, the compiler replaces division with multiplication, which makes hamster_nz's code way slower compared to yours.

If you want a fair comparison, take your code, convert it into non-recursive version without touching anything else, then do the tests.

Quote from: NorthGuy on 07 Sep, 2017 16:09

The hamster_nz's code uses "%" operation. The compiler (it's the same GCC everywhere, right?) implements it as a division. In your code, the compiler replaces division with multiplication, which makes hamster_nz's code way slower compared to yours.

I just thought it is worth noting that, at some levels of optimization, GCC will do strength reduction on expensive operations with constants (e.g. '% 8' ==> ' & 0x7'), but it clearly hasn't here (I'm assuming, I haven't bothered to read the assembler).

Quote from: Mechatrommer on 07 Sep, 2017 15:58

2) require deep stack memory space to store return points and local variables. a big no no to limited resource mcu..

This is not necessarily true.

Take for example, PIC16 - 16-level deep hardware call stack, very small memory. Seems like it would be stupid to use recursion. XC8 doesn't even support recursion with default settings.

But, on the other hand, you have very small memory. How much data can you really fit into 1024 bytes? When your data is so small, its processing is unlikely to require huge recursion depth. May be you'll need 5-7 levels. Compared to this, the 16-level deep stack looks quite adequate.

Quote from: brucehoult on 07 Sep, 2017 10:49

your code compiles to 140 bytes of code and uses 20 bytes of stack, for 160 total.

My recursive code (below) compiles to 52 bytes of code and uses 8 bytes of stack per digit printed, for a maximum of 80 bytes of stack...

I need 60 bytes more of stack than you do.

Or to put it another way, you need 4 times more stack than he does. But this is a fairly trivial example. How does it scale up? Who's stack will blow out first?

Most MCUs have a lot more ROM than RAM, and many have even more limited stack space. But hey, just because this is the Microcontroller & FPGA forum doesn't mean...

Quote

On an i7 3770:

hamster_nz's iterative code:

real 0m0.898s
user 0m0.876s
sys 0m0.020s

My recursive code:

real 0m0.856s
user 0m0.848s
sys 0m0.008s

Close, but again recursive has the edge.

... that you can't try to win the argument by throwing an i7 at it!

Quote from: brucehoult on 07 Sep, 2017 15:36

You're looking at the trees and missing the forest!
...

There are a lot of people here with assumptions and prejudices about what is fast, what is slow, what is efficient, what is wasteful.

It would be good to check them sometimes :-)

Not sure what you are talking about. My attitude about recursion you can read at end of page 1 of this thread.

With 30+ years of professional experience, everybody have his own attitude about everything, especially in his own profession. But I'm the liberal one in using different methods and different solutions in different situations, as long it is the most suitable way - I do not force either as the best a priori motivated by some subjective reasons.

If this is only an exercise to prove that everything can be transformed to recursion, yes, it can. But the question is (looking the whole picture) is it the most suitable solution?

Quote from: NorthGuy on 07 Sep, 2017 16:31

Quote from: Mechatrommer on 07 Sep, 2017 15:58
2) require deep stack memory space to store return points and local variables. a big no no to limited resource mcu..
This is not necessarily true.

Take for example, PIC16 - 16-level deep hardware call stack, very small memory. Seems like it would be stupid to use recursion. XC8 doesn't even support recursion with default settings.

But, on the other hand, you have very small memory. How much data can you really fit into 1024 bytes? When your data is so small, its processing is unlikely to require huge recursion depth. May be you'll need 5-7 levels. Compared to this, the 16-level deep stack looks quite adequate.

if your recursive function doesnt store any local variables (i cant think of any recursive function that doesnt need local/temporary variable(s)), then 16 deep calls (give or take) is feasible, but if the function store say 2 variables, then (16/3) deep calls is feasible, ymmv. unless of course if a compiler can do a nice trick of saving local variables in heap memory. yes my statement is not entirely true but i'm talking what is generally recursion is used for. i dont think i will have a need for recursion in 16 deep stack processor, even if i do, recursive function that i usually used will not be runable in such tiny system. more like only for few led blink and some simple arithmetics. recursion is usually used in more complex data structures / algorithm afaik, other than simplicity, will be also because of performance reason. if you only have 1024 items in a list (which as you said pic16 can maximally hold on to), i dont think linear search or O(n^2) bubblesort on 1-16MHz processor will give much impact on performance, so recursive binary search/ O(N.log(N)) sort will not be necessary imho, but... ymmv...

Quote from: NorthGuy on 07 Sep, 2017 16:09

Quote from: brucehoult on 07 Sep, 2017 15:36
Whose code is faster?

There are a lot of people here with assumptions and prejudices about what is fast, what is slow, what is efficient, what is wasteful.

It would be good to check them sometimes :-)

To be fair, your test does not really test what you want to test. The hamster_nz's code uses "%" operation. The compiler (it's the same GCC everywhere, right?) implements it as a division. In your code, the compiler replaces division with multiplication, which makes hamster_nz's code way slower compared to yours.

If you want a fair comparison, take your code, convert it into non-recursive version without touching anything else, then do the tests.

Here we have yet another assumption confidently stated, without data to back it up. And, again, incorrectly.

Here is hamster_nz's code, compiled (you really could do this yourself in about 30 seconds, you know...):

Code: [Select]

00010418 <print_dec>:
   10418:       b57f            push    {r0, r1, r2, r3, r4, r5, r6, lr}
   1041a:       1e05            subs    r5, r0, #0
   1041c:       da03            bge.n   10426 <print_dec+0xe>
   1041e:       202d            movs    r0, #45 ; 0x2d
   10420:       426d            negs    r5, r5
   10422:       f7ff fff1       bl      10408 <myputchar>
   10426:       4a0f            ldr     r2, [pc, #60]   ; (10464 <print_dec+0x4c>)
   10428:       ae01            add     r6, sp, #4
   1042a:       2400            movs    r4, #0
   1042c:       fba5 0102       umull   r0, r1, r5, r2
   10430:       3401            adds    r4, #1
   10432:       2c0a            cmp     r4, #10
   10434:       ea4f 03d1       mov.w   r3, r1, lsr #3
   10438:       eb06 0104       add.w   r1, r6, r4
   1043c:       eb03 0083       add.w   r0, r3, r3, lsl #2
   10440:       eba5 0540       sub.w   r5, r5, r0, lsl #1
   10444:       f801 5c01       strb.w  r5, [r1, #-1]
   10448:       d002            beq.n   10450 <print_dec+0x38>
   1044a:       461d            mov     r5, r3
   1044c:       2b00            cmp     r3, #0
   1044e:       d1ed            bne.n   1042c <print_dec+0x14>
   10450:       3c01            subs    r4, #1
   10452:       5d30            ldrb    r0, [r6, r4]
   10454:       3030            adds    r0, #48 ; 0x30
   10456:       b2c0            uxtb    r0, r0
   10458:       f7ff ffd6       bl      10408 <myputchar>
   1045c:       2c00            cmp     r4, #0
   1045e:       d1f7            bne.n   10450 <print_dec+0x38>
   10460:       b004            add     sp, #16
   10462:       bd70            pop     {r4, r5, r6, pc}
   10464:       cccccccd        .word   0xcccccccd

Where is the division? There is none. The compiler implements divide/mod by 10 by multiplying by 0xcccccccd, exactly the same as in my code.

Quote from: brucehoult on 07 Sep, 2017 17:23

Where is the division?

here:

Quote from: brucehoult on 07 Sep, 2017 11:15

Your function compiles to 146 bytes of code, and uses 32 bytes of stack, total 178 bytes.

Code: [Select]
00010222 <print_dec>: 10222: 1141 addi sp,sp,-16 10224: c422 sw s0,8(sp) 10226: c226 sw s1,4(sp) 10228: c606 sw ra,12(sp) 1022a: 842a mv s0,a0 1022c: 00055a63 bgez a0,10240 <print_dec+0x1e> 10230: 8581a503 lw a0,-1960(gp) # 1b910 <_impure_ptr> 10234: 02d00593 li a1,45 10238: 40800433 neg s0,s0 1023c: 4510 lw a2,8(a0) 1023e: 3f55 jal 101f2 <__sputc_r> 10240: 45a9 li a1,10 10242: 02b45533 divu a0,s0,a1 10246: 02b47433 remu s0,s0,a1 1024a: 03040413 addi s0,s0,48 1024e: c111 beqz a0,10252 <print_dec+0x30> 10250: 3fc9 jal 10222 <print_dec> 10252: 8581a503 lw a0,-1960(gp) # 1b910 <_impure_ptr> 10256: 85a2 mv a1,s0 10258: 40b2 lw ra,12(sp) 1025a: 4422 lw s0,8(sp) 1025c: 4492 lw s1,4(sp) 1025e: 4510 lw a2,8(a0) 10260: 0141 addi sp,sp,16 10262: bf41 j 101f2 <__sputc_r>

Quote from: sasa on 07 Sep, 2017 16:55

Quote from: brucehoult on 07 Sep, 2017 15:36

You're looking at the trees and missing the forest!
...

There are a lot of people here with assumptions and prejudices about what is fast, what is slow, what is efficient, what is wasteful.

It would be good to check them sometimes :-)

Not sure what you are talking about. My attitude about recursion you can read at end of page 1 of this thread.

With 30+ years of professional experience, everybody have his own attitude about everything, especially in his own profession.

Everyone is entitled to their own opinions, but not their own facts. I'm seeing a lot of opinions that recursive code is bigger and or slower than iterative code. I'm the *only* one here who is bothering to establish facts to back up my opinions, by actually compiling and running code -- the results of which have been (at least for this example) that the recursive code is both smaller and faster than the iterative code.

Twice smaller code for a little more stack usage is a good trade-off in most circumstances. Bigger code is sitting there 100% of the time, whether in RAM or flash/ROM. A few bytes of stack usage is temporary, and as long as you have it in the first place is shared with other code running before and afterwards.

Quote from: NorthGuy on 07 Sep, 2017 17:30

Quote from: brucehoult on 07 Sep, 2017 17:23
Where is the division?

here:

Quote from: brucehoult on 07 Sep, 2017 11:15
Your function compiles to 146 bytes of code, and uses 32 bytes of stack, total 178 bytes.

Code: [Select]
00010222 <print_dec>: 10222: 1141 addi sp,sp,-16 10224: c422 sw s0,8(sp) 10226: c226 sw s1,4(sp) 10228: c606 sw ra,12(sp) 1022a: 842a mv s0,a0 1022c: 00055a63 bgez a0,10240 <print_dec+0x1e> 10230: 8581a503 lw a0,-1960(gp) # 1b910 <_impure_ptr> 10234: 02d00593 li a1,45 10238: 40800433 neg s0,s0 1023c: 4510 lw a2,8(a0) 1023e: 3f55 jal 101f2 <__sputc_r> 10240: 45a9 li a1,10 10242: 02b45533 divu a0,s0,a1 10246: 02b47433 remu s0,s0,a1 1024a: 03040413 addi s0,s0,48 1024e: c111 beqz a0,10252 <print_dec+0x30> 10250: 3fc9 jal 10222 <print_dec> 10252: 8581a503 lw a0,-1960(gp) # 1b910 <_impure_ptr> 10256: 85a2 mv a1,s0 10258: 40b2 lw ra,12(sp) 1025a: 4422 lw s0,8(sp) 1025c: 4492 lw s1,4(sp) 1025e: 4510 lw a2,8(a0) 10260: 0141 addi sp,sp,16 10262: bf41 j 101f2 <__sputc_r>

That's my code (compiled for RISC-V), not hamster_nz's code. You said " The hamster_nz's code uses "%" operation. The compiler (it's the same GCC everywhere, right?) implements it as a division. In your code, the compiler replaces division with multiplication, which makes hamster_nz's code way slower compared to yours."

The ARM compiler uses multiplication for both of our functions. The RISC-V compiler uses division for both of our functions.

Recursion - do you use it in the real world?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Navigation

Common actions