Author Topic: Recursion - do you use it in the real world? (Read 42209 times)

janoc · « **Reply #150 on:** September 09, 2017, 05:27:15 pm »

Sasa, I took your optimized code you have posted as a challenge and added my recursive variant to it.

int print_dec1(unsigned n, char*s, int l)
{
    unsigned d = n/10;
    char ch = n - d*10 + '0';
    *s = ch;
    
    if (d == 0)
        return l;
    else
        return print_dec1(d, s + 1, l + 1);
}

void reverse(char *a, char *b)
{
    if(a < b)
    {
        char t = *b;
        *b = *a;
        *a = t;
        reverse(a+1, b-1);
    }
}

char* print_dec(int n, char *s) {
    int       len = 0;
    unsigned  u   = n;
    char     *r   = s;
    
    if (n < 0) {
        *(r++) = '-';
        u = -n;
        len = 1;
    }    
    
    len += print_dec1(u, r, 0);
    *(s+len+1) = '\0';
    
    reverse(r, s+len);
    return s;
}

Gasp, 3 functions!

Compiled with optimization for size (which is what you had there, I believe: gcc -Os printdec.c )

Code: [Select]

itoc_sasa_1     : 772 ms
itoc_sasa_1     : 746 ms
itoc_sasa_1     : 740 ms
print_dec     : 769 ms
print_dec     : 775 ms
print_dec     : 773 ms

My code compiles to 142 bytes (all three functions together), yours to 96 bytes.

Hmm, not bad but let's see what happens if we actually turn on -O2 which does the tail call optimization (gcc -O2 printdec.c ):

Code: [Select]

itoc_sasa_1     : 244 ms
itoc_sasa_1     : 216 ms
itoc_sasa_1     : 216 ms
print_dec     : 217 ms
print_dec     : 217 ms
print_dec     : 216 ms

Guess what, the running time is identical, despite me having 3 functions there vs one you have. However, my code is now 3x longer for some reason (313 bytes vs 107 bytes). Didn't check why, maybe something got inlined.

It could be probably improved even more if I have used an "accumulator" argument in the recursive function which would eliminate the need for the reversing at the end at the expense of needing 2x the memory, but I really couldn't be bothered. I think this is more than enough to prove the point already.

For the record, I have used this gcc:

Code: [Select]

$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-mageia-linux-gnu/5.4.0/lto-wrapper
Target: x86_64-mageia-linux-gnu
Configured with: ../configure --prefix=/usr --libexecdir=/usr/lib --with-slibdir=/lib64 --with-pkgversion='Mageia 5.4.0-5.mga6' --with-bugurl=https://bugs.mageia.org/ --mandir=/usr/share/man --infodir=/usr/share/info --enable-checking=release --enable-languages=c,c++,ada,fortran,objc,obj-c++,java --enable-linker-build-id --build=x86_64-mageia-linux-gnu --host=x86_64-mageia-linux-gnu --with-cpu=generic --with-system-zlib --enable-threads=posix --enable-shared --enable-objc-gc --enable-long-long --enable-__cxa_atexit --disable-libunwind-exceptions --enable-clocale=gnu --enable-java-awt=gtk --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-1.5.0.0/jre --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --enable-gtk-cairo --disable-libjava-multilib --enable-ssp --disable-libssp --disable-werror --with-isl --with-python-dir=/lib/python2.7/site-packages --enable-lto
Thread model: posix
gcc version 5.4.0 (Mageia 5.4.0-5.mga6)

I have tried it with clang 3.9.1 too and the results are similar.

Now, to make it clear once more - I am not expecting people to write recursion instead of loops in situations like this. It is not very intuitive for most (especially my messy implementation) and would make the code harder to maintain. However, I hope we have proved once for all that recursive code does not have to be slower or use more resources than iterative code. If in doubt, measure!

This is the code I have used:
https://pastebin.com/Y3pJZuP1

sasa · « **Reply #151 on:** September 09, 2017, 08:27:12 pm »

Quote from: janoc on September 09, 2017, 05:27:15 pm

Now, to make it clear once more - I am not expecting people to write recursion instead of loops in situations like this. It is not very intuitive for most (especially my messy implementation) and would make the code harder to maintain. However, I hope we have proved once for all that recursive code does not have to be slower or use more resources than iterative code. If in doubt, measure!

We proved nothing yet, I'm afraid.

Results I have, only proves my point - compiler optimization parameters only matters, not a code quality. In default compiler optimization, your function is far more the worst of all - 2x slower! See sorted tables and test yourself if not believe.

My function is clearly the fastest, except with -Ofast. I could optimize it a bit more, but currently I have no time for playing around.

Hint: When use tail recursion on Intel/AMD 64-bit, always use -Ofast. Or not? Perhaps other code speed may suffer?

Few notes:

1. If have fast computer, elapsed time of 200ms and 210ms means nothing - resolution is not good enough. Enlarge data by factor 5 or 10.
2. Leave dummy test to execute, as suggested.

-O0 (if not specified, same as default, by GCC help page):

Code: [Select]

itoc_sasa_1     : 3730 ms
itoc_ref_2      : 4040 ms
itoc_NorthGuy_1 : 4245 ms
itoc_ref_1      : 4253 ms
itoc_janoc_1    : 6553 ms

-Os:

Code: [Select]

itoc_sasa_1     : 5890 ms
itoc_janoc_1    : 6017 ms
itoc_ref_1      : 6155 ms
itoc_ref_2      : 6264 ms
itoc_NorthGuy_1 : 6382 ms

-O (size and time, without any optimizations that take a great deal of compilation time)

Code: [Select]

itoc_sasa_1     : 2239 ms
itoc_ref_2      : 2785 ms
itoc_NorthGuy_1 : 2789 ms
itoc_ref_1      : 3254 ms
itoc_janoc_1    : 3539 ms

-Ofast

Code: [Select]

itoc_ref_2      : 1998 ms
itoc_janoc_1    : 2166 ms
itoc_sasa_1     : 2357 ms
itoc_NorthGuy_1 : 2371 ms
itoc_ref_1      : 2808 ms

NorthGuy · « **Reply #152 on:** September 09, 2017, 08:45:35 pm »

Speaking of compilers. I wonder how all these fare against itoa()?

janoc · « **Reply #153 on:** September 09, 2017, 09:52:53 pm »

Quote from: NorthGuy on September 09, 2017, 08:45:35 pm

Speaking of compilers. I wonder how all these fare against itoa()?

They don't - itoa() is not a standard function and many C libraries don't implement it. That's why this silly example - it is a commonly (re)-implemented function.

Quote from: sasa on September 09, 2017, 08:27:12 pm

We proved nothing yet, I'm afraid.

Results I have, only proves my point - compiler optimization parameters only matters, not a code quality. In default compiler optimization, your function is far more the worst of all - 2x slower! See sorted tables and test yourself if not believe.

So, you have disabled an optimization that the recursive code explicitly depends on in order to claim a "win".

That's a pretty desperate move.

GCC needs to have at least optimization level -O2 (no need for -Ofast which is -O3 + some extras) on in order to do tail recursion optimization. So if you turn that off, of course the code is going to be slower! What a surprise!

And with -Ofast my code was actually faster than yours!

The default optimization level with GCC is, surprise, none!

From the manpage:

Quote

Without any optimization option, the compiler's goal is to reduce the cost of compilation and to make debugging produce the expected results. Statements are independent: if you stop the program with a breakpoint between statements, you can then assign a new value to any variable or change the program counter to any other statement in the function and get exactly the results you expect from the source code.

So if you are compiling with the default options in production, you should better re-read the manual of your compiler. Pretty much nobody uses GCC with no optimization on, except for debug builds. I don't see what is the point in such comparison unless you are grasping at straws to prove your point.

sasa · « **Reply #154 on:** September 09, 2017, 11:05:08 pm »

Quote from: janoc on September 09, 2017, 09:52:53 pm

...
I don't see what is the point in such comparison unless you are grasping at straws to prove your point.

Please avoid that kind of terminology. If that is true, I would certainly hide result with -Ofast. I have missed to copy results with -O2.

-O2:

Code: [Select]

itoc_sasa_1     : 2082 ms
itoc_janoc_1    : 2209 ms
itoc_NorthGuy_1 : 2574 ms
itoc_ref_2      : 2580 ms
itoc_ref_1      : 3124 ms

Upper table certainly show different.

I do not know about what desperate move you are talking about. These are actual results on "clean environment" on mentioned Intel Atom and Linux. Everybody may try on any platform.

Did you provide corrections I have suggested you in order to get comparable times?

westfw · « **Reply #155 on:** September 10, 2017, 04:50:11 am »

Quote

GCC needs to have at least optimization level -O2 (no need for -Ofast which is -O3 + some extras) on in order to do tail recursion optimization.

Is this code even subject to tail optimization? No one has been posting any of the x86 assembly code :-(

Umm...

Code: [Select]

_itoc_ref_2_uint:
00000000000000d0    pushq    %rbp
00000000000000d1    movq    %rsp, %rbp
00000000000000d4    pushq    %rbx
00000000000000d5    pushq    %rax
00000000000000d6    movl    %edi, %eax
00000000000000d8    movl    %eax, %ecx
00000000000000da    movl    $0xcccccccd, %edi       ## imm = 0xCCCCCCCD
00000000000000df    imulq    %rcx, %rdi
00000000000000e3    shrq    $0x23, %rdi
00000000000000e7    imull    $-0xa, %edi, %ecx
00000000000000ea    leal    0x30(%rax,%rcx), %ebx
00000000000000ee    cmpl    $0xa, %eax
00000000000000f1    jb    0xf8
00000000000000f3    callq    _itoc_ref_2_uint
00000000000000f8    movq    _pp(%rip), %rax
00000000000000ff    leaq    0x1(%rax), %rcx
0000000000000103    movq    %rcx, _pp(%rip)
000000000000010a    movb    %bl, _myputchar(%rax)
000000000000010c    addq    $0x8, %rsp
0000000000000110    popq    %rbx
0000000000000111    popq    %rbp
0000000000000112    retq
0000000000000113    nopw    %cs:_myputchar(%rax,%rax)

That seems a bit weird in a couple of ways...
1) It sets of a stack frame that it doesn't use (even with -O3)
2) Doesn't x86 have a nice divide instruction that gives you the quotient and remainder in one instruction? The compiler doesn't seem to figure out that it could do that (or perhaps Multiply is that much faster than divide?) It does even worse if I use "%"...

hamster_nz · « **Reply #156 on:** September 10, 2017, 11:28:17 am »

Spurred on by decimal digit extraction debate, I was wondering if there is a faster way to divide a 32-bit integer by 10 on modern 64-bit CPU.....

It turns out there is:

Code: [Select]

   d = (i * 0xCCCCCCCDLLU)>>35;

is faster than GCC's default:

Code: [Select]

   d = i/10;

Test program (compile with/without REF defined):

Code: [Select]

#include <stdio.h>

unsigned div10(unsigned x) {
#ifdef REF
      /* Implement 32-bit division using divide */
      return x/10;
#else
      return (x * 0xCCCCCCCDLLU)>>35;
#endif
}

int main(int argc, char *argv[]) {
   unsigned i;
   for(i = 0; i < 0xFFFFFFFF; i++) {
      unsigned d,r;
      d = div10(i);
      /* Implement 32-bit division using 32x32 multiply */
      r   = i-10*d;
      if(r > 9) {
        printf("Invalid i =  %u, d = %u, r= %u\n",i,d,r);
      }
   }
   return 0;
}

Here is the assembler, if anybody is interested:

Code: [Select]

div10:
.LFB23:
        .cfi_startproc
        movl    %edi, %eax
        movl    $-858993459, %edx
        mull    %edx
        movl    %edx, %eax
        shrl    $3, %eax
        ret

vs

Code: [Select]

div10:
.LFB23:
        .cfi_startproc
        movl    %edi, %eax
        movl    $3435973837, %edx
        imulq   %rdx, %rax
        shrq    $35, %rax
        ret
        .cfi_endproc
.LFE23:

On my AMD A10-9600P laptop, using the multiply takes only 70% of the time (7.688 sec vs 10.764). Even when inlined I get much the same result. Anybody want to try on an Intel?

I've never seen this mentioned anywhere before - has anybody else seen this mentoned?

John Coloccia · « **Reply #157 on:** September 10, 2017, 11:38:17 am »

Quote from: sasa on September 09, 2017, 04:28:23 am

Main principle in programming is to solve problem on the most efficient and stable way.

You've said this a couple of times now, I couldn't disagree more. The main goal is to write bug-free, maintainable code that meets the requirement. "Most efficient" is a terrible goal and leads to the kind of unmaintainable garbage that I have to fix day in and day out.

This whole thread is kind of strange, actually. I use recursion if it's the right solution, and I don't use it if it's the wrong solution. It's like asking, "do you use function pointers in the real world?"

brucehoult · « **Reply #158 on:** September 10, 2017, 11:50:31 am »

Quote from: hamster_nz on September 10, 2017, 11:28:17 am

Spurred on by decimal digit extraction debate, I was wondering if there is a faster way to divide a 32-bit integer by 10 on modern 64-bit CPU.....

It turns out there is:
Code: [Select]
d = (i * 0xCCCCCCCDLLU)>>35;

Err .. this is *exactly* what gcc does anyway. I'd be disapointed by any compiler than didn't. Look at, for example, the code in my days ago post: https://www.eevblog.com/forum/microcontrollers/recursion-do-you-use-it-in-the-real-world/msg1296764/#msg1296764

hamster_nz · « **Reply #159 on:** September 10, 2017, 11:59:14 am »

Quote from: brucehoult on September 10, 2017, 11:50:31 am

Quote from: hamster_nz on September 10, 2017, 11:28:17 am
Spurred on by decimal digit extraction debate, I was wondering if there is a faster way to divide a 32-bit integer by 10 on modern 64-bit CPU.....

It turns out there is:
Code: [Select]
d = (i * 0xCCCCCCCDLLU)>>35;

Err .. this is *exactly* what gcc does anyway. I'd be disapointed by any compiler than didn't. Look at, for example, the code in my days ago post: https://www.eevblog.com/forum/microcontrollers/recursion-do-you-use-it-in-the-real-world/msg1296764/#msg1296764

You're missing that on x64 for the the '/10 ' version GCC only uses 32-bit registers, and the high word of the result ends up in the wrong register and then has to shuffle the results around.

When using the LLU constant, the result ends up in 64-bit register, so no register-to-register moves are required.

sasa · « **Reply #160 on:** September 10, 2017, 12:58:12 pm »

Quote from: hamster_nz on September 10, 2017, 11:28:17 am

Spurred on by decimal digit extraction debate, I was wondering if there is a faster way to divide a 32-bit integer by 10 on modern 64-bit CPU.....

I've never seen this mentioned anywhere before - has anybody else seen this mentoned?

This, as well as multiplying by shifting is used on platforms without hardware implementation. And that should be primarily the job for compiler optimizer for specific platform.

Quote from: John Coloccia on September 10, 2017, 11:38:17 am

You've said this a couple of times now, I couldn't disagree more. The main goal is to write bug-free, maintainable code that meets the requirement. "Most efficient" is a terrible goal and leads to the kind of unmaintainable garbage that I have to fix day in and day out.

As well, I have mentioned couple of times: "suitable way", "proper solution for the task", etc. But everybody here seems to prefer to pull up words from context and interpret as they like...

If function / algorithm is essential/low-level for the whole application, of course you should code it to be the fastest possible platform independent solution regardless what compiler optimization will produce. Otherwise should be used the most appropriate way... All that is also mentioned several times. Please read the whole thread.

The (useless) debate here is about appropriate clean and fast iterative solution for specific task and recursive "simple and clean code" compiler is capable to transfer info the fastest and shortest code, ignoring or minimizing effect every recursion theoretically have.

So far, results show that on Intel/AMD Linux 64-bit platform, the fastest recursive solution is indeed really fastest only with one parameter (-Ofast), while in all others confirms wide instability and can be second fastest or even the slowest.

On other platforms, with other machine code (RISC mostly), iterative solution with its stability will probably be again one of the top fastest (if not top), regardless what optimization is used. The reason is simple, but tail recursion fan ignore that fact and will try to find most suitable optimization parameter which will make it faster.

Then we come to MCU platform, where recursion is obviously hardly desirable. Then, there is no other way than use the most suitable iterative solution. Again, if avoid hand written ASM - not a single compiler will made better optimized code.

Etc, etc. As someone noted here that this is a "dic* contest", and basically is to disprove the theory that recursion should be forced because nature of the problem is recursive or it is the clearest way of coding producing smallest and fastest machine code.

T3sl4co1l · « **Reply #161 on:** September 10, 2017, 01:10:16 pm »

For example, I've got this for AVR:

Code: [Select]

;
;     uDivTen
;
; Unsigned division by ten.  45 words, takes 60 cycles (plus rcall).
;
; Input:
;   r17:r16 = operand
; Output:
;   r1:r0 = quotient
;   r3:r2 = remainder
;   (r16 = 10, r17 = 0)
;
uDivTen:
		push r7				; register usage:
		push r6				; r7:r6 = operand (in[0..1])
		push r5				; r5:r4:r3:r2 = 40 bit accumulator, D[1..3] (byte 0 discarded)
		push r4				; r1:r0 = partial products
		movw r6, r16			; r16 = temp, r17 = zero
		clr r17				; (but first, save operand)

		; multiply by K = 2^20 * 16 / 10 to make D[0..3] = in[0..1] * K[0..2]
		; (implicitly discarding D[0])

		ldi r16, (((exp2(24) / 10) + 1) & (0x0000ff << 16)) >> 16
		mul r7, r16			; in[1] * K[2]
		movw r4, r0			; save high word

		ldi r16, (((exp2(24) / 10) + 1) & (0x0000ff << 0)) >> 0
		mul r7, r16			; in[1] * K[0]
		movw r2, r0			; save low word

		mul r6, r16			; in[0] * K[0]
		add r2, r1			; accumulate to D[1] (discard lowest byte)
		adc r3, r17
		adc r4, r17
		adc r5, r17

		ldi r16, (((exp2(24) / 10) + 1) & (0x0000ff << 8)) >> 8
		mul r6, r16			; in[0] * K[1]
		add r2, r0			; accumulate to D[1..2]
		adc r3, r1
		adc r4, r17
		adc r5, r17

		mul r7, r16			; in[1] * K[1]
		add r3, r0			; accumulate to D[2..3]
		adc r4, r1
		adc r5, r17


		ldi r16, (((exp2(24) / 10) + 1) & (0x0000ff << 16)) >> 16
		mul r6, r16			; in[0] * K[2]
		add r3, r0			; accumulate to D[2..3]
		adc r4, r1
		adc r5, r17

; dig remainder out of the fractional part

		ldi r16, 0x10			; rounding bit
		add r3, r16
		ldi r16, 10
		mul r3, r16			; frac * 10
		mov r2, r1
		clr r3
		movw r0, r4			; quotient out

; r3 = 0, r2 = [0...9], r1:r0 = [0...6553]
		pop r4
		pop r5
		pop r6
		pop r7
		ret
; END PROC uDivTen

Downside is it does a 24 bit intermediate operation.

Tim

NorthGuy · « **Reply #162 on:** September 10, 2017, 01:44:04 pm »

Quote from: hamster_nz on September 10, 2017, 11:28:17 am

On my AMD A10-9600P laptop, using the multiply takes only 70% of the time (7.688 sec vs 10.764). Even when inlined I get much the same result. Anybody want to try on an Intel?

GCC uses multiplication instead of division. But ... Intel's division has early termination. If it is only 50% slower than multiplication, it might be more beneficial to use division. The division automatically provides the remainder which would eliminate the need for back multiplication and subtraction. There's a possibility that it actually could be faster. It's hard to figure out without trying.

Another question is how to tell C to use division. Perhaps, using /10 and %10 close to each other can trigger it. Otherwise, assembler is the way.

Anyway, this all is platform related.

brucehoult · « **Reply #163 on:** September 10, 2017, 03:10:55 pm »

Quote from: hamster_nz on September 10, 2017, 11:59:14 am

Quote from: brucehoult on September 10, 2017, 11:50:31 am
Quote from: hamster_nz on September 10, 2017, 11:28:17 am
Spurred on by decimal digit extraction debate, I was wondering if there is a faster way to divide a 32-bit integer by 10 on modern 64-bit CPU.....

It turns out there is:
Code: [Select]
d = (i * 0xCCCCCCCDLLU)>>35;

Err .. this is *exactly* what gcc does anyway. I'd be disapointed by any compiler than didn't. Look at, for example, the code in my days ago post: https://www.eevblog.com/forum/microcontrollers/recursion-do-you-use-it-in-the-real-world/msg1296764/#msg1296764

You're missing that on x64 for the the '/10 ' version GCC only uses 32-bit registers, and the high word of the result ends up in the wrong register and then has to shuffle the results around.

When using the LLU constant, the result ends up in 64-bit register, so no register-to-register moves are required.

I'm not missing that. I looked at it, and understood that it makes no difference at all on a modern x86 because the register moves end up as simple renames in the pipeline.

I think you're missing that div10() has been inlined into main() and in the non-REF case the division has been entirely optimized away, because gcc was able to turn your code into this:

Code: [Select]

int main(int argc, char *argv[]) {
   unsigned i;
   unsigned long long inc = 0xCCCCCCCD, tot = 0;
   for(i = 0; i < 0xFFFFFFFF; i++) {
      unsigned d,r;
      d = tot>>35;
      r = i - ((d + d<<2) << 1);
      if(r > 9) {
        printf("Invalid i =  %u, d = %u, r= %u\n",i,d,r);
      }
      tot += inc;
   }
   return 0;
}

Try adding __attribute__ ((noinline)) to div10() and you'll find that the execution times are identical, at least on my linux desktop (Core i7 6700K Skylake) and on my 2011 17" MacBook Pro (Core i7 2720QM Sandy Bridge).

I'd be very surprised if that didn't apply to your AMD as well.

hamster_nz · « **Reply #164 on:** September 10, 2017, 07:25:18 pm »

Quote from: brucehoult on September 10, 2017, 03:10:55 pm

Try adding __attribute__ ((noinline)) to div10() and you'll find that the execution times are identical, at least on my linux desktop (Core i7 6700K Skylake) and on my 2011 17" MacBook Pro (Core i7 2720QM Sandy Bridge).

I'd be very surprised if that didn't apply to your AMD as well.

Just checked on a core I3, and got 4.96 vs 7.68 seconds.

You are right, it is because "d = (i * 0xCCCCCCCDLLU)>>35;" can get inlined better, enabling the MUL ti be replaced by repeated addition.

So I can at least say it sometimes gives better results, but is no worse then the default.

EDIT: If you change the loop to "for(i = 0; i != 0xFFFFFFFF; i+=19)" you still get full coverage, and it prevents the MUL from being optimized away. It is still quicker.

Bruce Abbott · « **Reply #165 on:** September 10, 2017, 08:25:20 pm »

Quote from: John Coloccia on September 10, 2017, 11:38:17 am

You've said this a couple of times now, I couldn't disagree more. The main goal is to write bug-free, maintainable code that meets the requirement.

I agree. The real reason to avoid recursion is not that it might be less efficient, but because it can be non-obvious and easily introduce latent bugs.

Quote

"Most efficient" is a terrible goal and leads to the kind of unmaintainable garbage that I have to fix day in and day out.

People have different ideas on what constitutes 'efficient' code. Some (wrongly) think that 'cleaner' or 'more elegant' source code will always produce more efficient machine code. Others go to extreme lengths to avoid constructs that the compiler will probably optimize away anyway. The worst are people who condense it into terse one-liners, then don't add any comments because their code is 'self-documenting'.

There are cases where efficiency is more important than readability, but that still doesn't make 'unmaintainable garbage' acceptable. Tricky code still needs to be well written and documented, perhaps even more so than standard stuff that everyone recognizes. I write mainly in assembler (which most people probably think is 'unreadable') but I structure it well, avoid tricky stuff wherever possible, and comment everything. Usually I will start by writing the most 'obvious' code, then once it is all working I go back and optimize any bits that are worth it. Code running fast enough? Haven't run out of memory? No need to optimize.

brucehoult · « **Reply #166 on:** September 10, 2017, 09:20:57 pm »

Quote from: Bruce Abbott on September 10, 2017, 08:25:20 pm

Quote from: John Coloccia on September 10, 2017, 11:38:17 am
You've said this a couple of times now, I couldn't disagree more. The main goal is to write bug-free, maintainable code that meets the requirement.
I agree. The real reason to avoid recursion is not that it might be less efficient, but because it can be non-obvious and easily introduce latent bugs.

I'm sorry but that's just absolutely not true. In any complex situation, a recursive solution is far easier to reason about and prove correct (to whatever standard you want/need).

The things that are non-obvious and easily introduce latent bugs are:

1) spaghetti gotos
2) assignment of a new value to an already-initialized variable
3) dynamic memory allocation using malloc&free

This is simple mathematical fact.

Recursive algorithms lend themselves to reasoning about static questions, independent of the concepts of time and flow of program execution. "I know how to solve the simplest version of this problem. I know how to partially solve a complex problem, and reduce it (along some dimension) towards the simplest version of the problem. Therefore I've proved that the solution to the complex problem is correct".

This is called proof by induction, and was taught in High School when I was there. I don't know if it still is.

Once you've proven the algorithm is correct THEN you can think about what resources it will need. If the work you do to reduce the problem to a simpler one is always *before* the recursive call then the extra resources are zero ... you (or the compiler) can turn it into a loop. Doing it yourself is pretty error prone though -- especially if you're programming in assembly language.

hamster_nz · « **Reply #167 on:** September 10, 2017, 10:45:44 pm »

Quote from: brucehoult on September 10, 2017, 09:20:57 pm

Recursive algorithms lend themselves to reasoning about static questions, independent of the concepts of time and flow of program execution. "I know how to solve the simplest version of this problem. I know how to partially solve a complex problem, and reduce it (along some dimension) towards the simplest version of the problem. Therefore I've proved that the solution to the complex problem is correct".

This is called proof by induction, and was taught in High School when I was there. I don't know if it still is.

Once you've proven the algorithm is correct THEN you can think about what resources it will need. If the work you do to reduce the problem to a simpler one is always *before* the recursive call then the extra resources are zero ... you (or the compiler) can turn it into a loop. Doing it yourself is pretty error prone though -- especially if you're programming in assembly language.

I think that this might be a little bit generous with the truth, and self-selection of the problem being solved is at work.

I think ("it is my unresearched opinion") that problems that tend to naturally fit recursive solutions in general tend to be simpler to prove correct, as they only have a limited amount of active state at any one time and all decisions are made locally.

For example, is a recursive bubble sort any easier to prove correct than the normal iterative version?

Code: [Select]

void bubble(struct *data, int n) {
   /* Sort the biggest value to the end of the list */
   int i;
   for(i = 1; i < n; i++)
     compare_and_swap(data+i-1, data+i);
   if(n > 2)
      bubble(data,n-1);
}

I am sure (if I coded this correctly) that it works just as well as a iterative solution - apart from blowing up the stack at scale, but only if you don't have the correct compiler optimizations switched on.

John Coloccia · « **Reply #168 on:** September 11, 2017, 01:29:21 am »

Quote from: Bruce Abbott on September 10, 2017, 08:25:20 pm

Quote from: John Coloccia on September 10, 2017, 11:38:17 am
You've said this a couple of times now, I couldn't disagree more. The main goal is to write bug-free, maintainable code that meets the requirement.
I agree. The real reason to avoid recursion is not that it might be less efficient, but because it can be non-obvious and easily introduce latent bugs.

Quote
"Most efficient" is a terrible goal and leads to the kind of unmaintainable garbage that I have to fix day in and day out.
People have different ideas on what constitutes 'efficient' code. Some (wrongly) think that 'cleaner' or 'more elegant' source code will always produce more efficient machine code. Others go to extreme lengths to avoid constructs that the compiler will probably optimize away anyway. The worst are people who condense it into terse one-liners, then don't add any comments because their code is 'self-documenting'.

There are cases where efficiency is more important than readability, but that still doesn't make 'unmaintainable garbage' acceptable. Tricky code still needs to be well written and documented, perhaps even more so than standard stuff that everyone recognizes. I write mainly in assembler (which most people probably think is 'unreadable') but I structure it well, avoid tricky stuff wherever possible, and comment everything. Usually I will start by writing the most 'obvious' code, then once it is all working I go back and optimize any bits that are worth it. Code running fast enough? Haven't run out of memory? No need to optimize.

You must really hate Python. LOL.

My experience has generally been that if you write simple, straightforward code, you're giving the compiler writer the absolute best chance of figuring out just what the heck you had in mind, and the best chance at optimizing.

And anyhow, a great deal of code is I/O bound anyway. People spend too much time worrying about an instruction here and there, when what they should be worrying about is sensible data structures and organizing the algorithm so that pipelines and caching can work efficiently.

I don't mind assembly just so long as the obvious bits are written obviously, and the tricky bits are explained. For comments, I like to write psudeo-C code where I can. I definitely helps when you can see what you're trying to do right next to the assembly.

Bruce Abbott · « **Reply #169 on:** September 11, 2017, 06:14:13 am »

The Power of 10: Rules for Developing Safety-Critical Code

1. Avoid complex flow constructs, such as goto and recursion.

The NASA study of the Toyota Electronic throttle control firmware found at least 243 violations of these rules.

Quote

Toyota used a tool called gstack from Green Hills Software to determine maximum stack size requirements. The tool gstack is a static analyzer that computes an upper bound on the stack size required by a given executable. For the ETCS-i, it reported the maximum possible stack size as 1688 bytes. This result comes with a caveat, however: gstack cannot account for recursion, as stated in its user manual:

gstack cannot work if there are potential direct or indirect recursive calls in the program because it cannot predict how many times the recursion can occur. gstack prints a warning message if it detects a possible recursion in the call graph.

Faced with this limitation, Toyota added an extra margin of safety to the predicted bound by allocating 4096 bytes for the ETCS-i stack—more than double the original prediction. As to why recursion was present in the ETCS-i software, Toyota reported that it was a deliberate design choice in order to simplify the quality assurance process and reduce the total size of the executable...

The flow reveals that the function is actually doubly recursive: It flows from gehdlp_req_1ms to function2, recurses back to function2, and then recurses again back to gehdlp_req_1ms. Toyota engineers confirmed this software recursion. It is not clear what impact recursion has with respect to the larger UA problem. Whether one recursive loop or two, there are other sites of recursion in the ETCS-i that remain unanalyzed.

westfw · « **Reply #170 on:** September 11, 2017, 06:52:52 am »

Quote

The point of being good programmer is to write good optimized code which every compiler should transfer into efficient machine code

I can agree with this (at least, as much as I've quoted of it), and I'm especially amused that we now have people proposing the use of 64bit integers because it leads to faster code using a particular compiler on x86... :-)

sasa · « **Reply #171 on:** September 11, 2017, 07:35:25 am »

Quote from: Bruce Abbott on September 11, 2017, 06:14:13 am

The Power of 10: Rules for Developing Safety-Critical Code

1. Avoid complex flow constructs, such as goto and recursion.

The NASA study of the Toyota Electronic throttle control firmware found at least 243 violations of these rules.

Good point Bruce. But the fact is, that here seems no body works for NASA or at least on sensitive and safety-critical software, I have pointed already in this thread.

sasa · « **Reply #172 on:** September 11, 2017, 10:15:18 am »

Quote from: westfw on September 11, 2017, 06:52:52 am

Quote
The point of being good programmer is to write good optimized code which every compiler should transfer into efficient machine code
I can agree with this (at least, as much as I've quoted of it),

You have previously said that you disagree almost 100%

I find not very useful to pull up words and sentences out of context and then comment with opposite opinion. Your, nor anybody reactions "I agree/disagree" will not change a bit in my professional and principle attitudes.

Quote

and I'm especially amused that we now have people proposing the use of 64bit integers because it leads to faster code using a particular compiler on x86... :-)

You imply that I have proposed exactly that? Not likely...
Seems that you have meant on somebody else, but why you comment in the same sentence and my quote?

hamster_nz · « **Reply #173 on:** September 11, 2017, 10:46:02 am »

Quote from: sasa on September 11, 2017, 10:15:18 am

Quote from: westfw on September 11, 2017, 06:52:52 am
and I'm especially amused that we now have people proposing the use of 64bit integers because it leads to faster code using a particular compiler on x86... :-)

You imply that I have proposed exactly that? Not likely...
Seems that you have meant on somebody else, but why you comment in the same sentence and my quote?

Don't worry Sasa, that jibe was aimed at me, for daring to play around with functionally equivalent code, just out of idle curiosity.

John Coloccia · « **Reply #174 on:** September 11, 2017, 11:03:18 am »

Quote from: sasa on September 11, 2017, 07:35:25 am

Quote from: Bruce Abbott on September 11, 2017, 06:14:13 am
The Power of 10: Rules for Developing Safety-Critical Code

1. Avoid complex flow constructs, such as goto and recursion.

The NASA study of the Toyota Electronic throttle control firmware found at least 243 violations of these rules.

Good point Bruce. But the fact is, that here seems no body works for NASA or at least on sensitive and safety-critical software, I have pointed already in this thread.

Actually, I work on safety critical software. You seem to make a lot of assumptions about people, or maybe you're intentionally trying to be condescending. Either way, even with safety critical software, you have to write utilities and things that have nothing to do with the software and have nothing to do with safety critical code. For example, I just wrote a utility that grabs the change log for a release out of Mercurial. It's recursive and that's the correct solution.


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

Poll

How often have you come across recursion?

Author Topic: Recursion - do you use it in the real world? (Read 42209 times)

Share me