Author Topic: dsPic erasing array faster  (Read 8308 times)

0 Members and 1 Guest are viewing this topic.

Offline NorthGuy

  • Super Contributor
  • ***
  • Posts: 3137
  • Country: ca
Re: dsPic erasing array faster
« Reply #25 on: April 01, 2018, 05:31:35 pm »
why isn’t it destroyed by interrupts?

All the registers are supposed to be preserved by the ISR.

There is a RCOUNT register which holds the counter for "repeat". If you're in the middle of "repeat" and an interrupt happens, then, if your ISR also uses "repeat", it'll destroy your RCOUNT. In such case, RCOUNT must be preserved by the ISR too. Whether the C-handled ISR can correctly figure out if it needs to save RCOUNT or not, I don't know. Thus it's a good idea to check, and possibly add preservation of RCOUNT to the ISR code.
 

Offline SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14297
  • Country: fr
Re: dsPic erasing array faster
« Reply #26 on: April 01, 2018, 05:44:21 pm »
Quoting the manual:
Quote
The compiler arranges for registers W8-W15 to be preserved across ordinary function
calls. Registers W0-W7 are available as scratch registers. For interrupt functions, the
compiler arranges for all necessary registers to be preserved, namely W0-W15 and
RCOUNT.
 

Offline @rtTopic starter

  • Super Contributor
  • ***
  • Posts: 1051
Re: dsPic erasing array faster
« Reply #27 on: April 01, 2018, 05:52:10 pm »
Sounds good then thanks :)

I vaguely remember having to do it myself.
It must have been w, and I guess wherever the program counter was at, and maybe a bank or page bit. That’s about how clear it is to me now.
I’m really hoping not to have to get into assembler again. Maybe unless there’s another big simple freebie like this,
but it really took me some 18 months to think of this (the question I asked in the OP in the first place).


 

Online rstofer

  • Super Contributor
  • ***
  • Posts: 9886
  • Country: us
Re: dsPic erasing array faster
« Reply #28 on: April 01, 2018, 10:45:20 pm »
The fact that dsPic includes dedicated hardware registers for looping makes it a very powerful chip.  I suppose you need to see if RCOUNT is available (or save the contents) but you simply won't find a faster way to clear memory in code.

Rather than shying away from assembler, I think I would try to maximize my use and knowledge of the features.  It's a very powerful chip
 

Offline Fredderic

  • Regular Contributor
  • *
  • Posts: 67
  • Country: au
Re: dsPic erasing array faster
« Reply #29 on: April 02, 2018, 02:20:30 am »
Did anyone check what assembly memset() is producing?  If it's not in the standard library, could it be being special-cased by the compiler to produce optimal code?  (I've seen many C/C++ compilers doing things like that.)

Rolling your own memset() and memcpy() functions were all the rage back in the DOS days when computers were slower than a modern 8-bitter.  Had a really nice one memorised — because I'd used it so often — that rather elegantly handled unaligned byte buffers (and would then sometimes roll out aligned variants as needed by clipping off the entry and exit alignment ramps when it was guaranteed safe)…  a good fast clear or value set, really can make a big difference, especially where graphics is concerned.

… but that was about two decades ago now.  These days, compilers really should be MUCH better, and there should at least be a memset() on offer that'll do it pretty much as fast as it can get (though once or twice it's not actually been in the stdlib, but was off in a compiler-specific library for some stupid reason).
 

Offline @rtTopic starter

  • Super Contributor
  • ***
  • Posts: 1051
Re: dsPic erasing array faster
« Reply #30 on: April 02, 2018, 04:25:53 am »
I did unzip the library mentioned, and also only found memcpy().

This is probably misplaced effort.
Using the original code in the OP dealing with 1024 bytes (it was changed to 512 for the forum for some reason) to clear the frame buffer,
overall, the program draws 353 frames every 10 seconds, and I get one more frame every 10 seconds with the assembler loop.
Thinking about it, drawing text and such things is much more time consuming, but I’ll keep it.
It does also happen to be a program memory optimisation too.


 

Online JPortici

  • Super Contributor
  • ***
  • Posts: 3452
  • Country: it
Re: dsPic erasing array faster
« Reply #31 on: April 02, 2018, 06:03:58 am »
memset is referenced in <strings.h> and not in <stdlib.h> but i can't seem to find the source. i'll write a small program and check the disassembly listing file..

it wasn't in the disassembly file  ::)
so i looked at the generated assembly

Code: [Select]
#include <string.h>
#include <xc.h>

unsigned char framebuffer[1024]__attribute__((aligned (2)));

int main() {
  memset(framebuffer,0,sizeof(framebuffer));
 
  while(1) {
    Nop();
  }
 
  return 0;
}

before calling the functions the working registers are loaded with the call arguments
W0 with the address of framebuffer (0x1000)
W1 with the value, in this case 0
W2 with the size of framebuffer, 0x400

the generated assembly for memset, using -O0. On the left the address, on the right the assembly
Code: [Select]
02E4 MOV W0,W3
02E6 BRA 0x2EC ;This points to three lines lower, 0x2EC is the absolute address
02E8 MOV.B W1, [W3++]
02EA DEC W2,W2
02EC CP0 W2
02EE BRA NZ, 0x2E8 ;This points to three lines UP
02F0 RETURN

with -O1 (@rt, it's included in free mode however. and it arelady makes TONS of useful optimizations. in this case the code generated with -O0 is especially slow because of the two branches. you have many cycles penalities on branches on dspic33e because you have to recreate the pipeline, which dspic33F and dspic30F don't have, as they are slower cores)
Code: [Select]
02E4  MOV #0x1000, W0
02E6  REPEAT #0x1FF
02E8  CLR [W0++]

:)

this was also visible in the disassembly listing file

BONUS: memset to another value
Code: [Select]
;memset(framebuffer,1,sizeof(framebuffer));
02E4  MOV #0x0101, W2
02E6  MOV #0x1000, W0
02E8  REPEAT #0x1FF
02EA  MOV W2, [W0++]
instead of clearing, the repeated instruction is a copy from one register to the memory location pointed to by W0. W2 is loaded to 0x0101 because (of course) the memory in the dsPIC is word addressable so one 16bit move takes 1 cycle, 8 bit move takes 1 cycle (so better combine them) 32 bit move however takes 2 cycles.

BONUS2: and what happens if the memset target has an odd number of bytes?
yep, what you expect
Code: [Select]
;char framebuffer[1025];
;memset(framebuffer,2,sizeof(framebuffer));
02E4  MOV #0x0202, W2
02E6  MOV #0x1000, W0
02E8  REPEAT #0x1FF
02EA  MOV W2, [W0++]
02EC  MOV.B W2, [W0++]
same code, with a single MOV.B added. don't have to modify W2 because with byte instructions (.B) only the lower byte of the accumulator  is used
« Last Edit: April 02, 2018, 06:32:36 am by JPortici »
 

Offline @rtTopic starter

  • Super Contributor
  • ***
  • Posts: 1051
Re: dsPic erasing array faster
« Reply #32 on: April 02, 2018, 06:33:46 am »
Nice :)
My main.c file is size optimised with pro version.

It might be worth pulling apart sprintf since I use it a lot. Especially where format specifiers are used, but I guess it’s going to be efficient.
It’s usually program memory space that I’m more concerned about.
 

Online JPortici

  • Super Contributor
  • ***
  • Posts: 3452
  • Country: it
Re: dsPic erasing array faster
« Reply #33 on: April 02, 2018, 06:40:04 am »
i don't use s/printf so i can't comment on that, but if it's blocking code no matter how optimizied it is, it will stall the program until the hardware has finished transfer.
 

Offline Twoflower

  • Frequent Contributor
  • **
  • Posts: 735
  • Country: de
Re: dsPic erasing array faster
« Reply #34 on: April 02, 2018, 08:02:35 am »
You mentioned that your compiler is set to size optimization. Give the best speed optimization a try. Depending on the code and compiler the size difference isn't very large, but the code is much faster.
 

Offline @rtTopic starter

  • Super Contributor
  • ***
  • Posts: 1051
Re: dsPic erasing array faster
« Reply #35 on: April 02, 2018, 08:13:07 am »
I started out that way, but the chip is full, and it will no longer compile for speed.
There’s no pin for pin replacement. It would be nice if there were a dsPic33FJ256GP802.

JPortici, Where did you get the generated assembler? I assume you put NOPs in there to help find it?

Some of this is becoming worthwhile for the program memory if nothing else.
Code: [Select]
asm volatile ("MOV #_framebuffer,W0"); // clear framebuffer
asm volatile ("Repeat #511");
asm volatile ("CLR [W0++]");
asm volatile ("MOV #_imagebuffer,W0"); // clear image buffers
asm volatile ("Repeat #1023");
asm volatile ("CLR [W0++]");
 
/*
clrfb = 0;
while (clrfb < 512) { // clear entire frame buffer
framebuffer[clrfb] = 0; // clear framebuffer
imagebufferwide[clrfb] = 0; // clear first image
imagebufferwide[clrfb+512] = 0; // clear second image
clrfb++;
} // clrfb
*/
« Last Edit: April 02, 2018, 08:31:55 am by @rt »
 

Offline hans

  • Super Contributor
  • ***
  • Posts: 1626
  • Country: nl
Re: dsPic erasing array faster
« Reply #36 on: April 02, 2018, 09:02:39 am »
You can try to pick O3 by file in MPLAB, or in GCC (which XC16 is) even by function:

Code: [Select]
int foo(void) __attribute__((optimize("-O3")));

int foo(void) {
   // do stuff
}

However, this level of handpicked optimization is not all-reaching. Sometimes the compiler refuses to go all out on the optimization.
 

Offline @rtTopic starter

  • Super Contributor
  • ***
  • Posts: 1051
Re: dsPic erasing array faster
« Reply #37 on: April 02, 2018, 09:11:24 am »
I have already used different optimisation levels by file. I have left Microchip’s SD Card (MDDFS) alone,
but it’s news to me that you can use it for a given function. Thanks :)

I suppose I can tell by the change in the memory gauge after compiling whether something like that worked.
 

Online JPortici

  • Super Contributor
  • ***
  • Posts: 3452
  • Country: it
Re: dsPic erasing array faster
« Reply #38 on: April 02, 2018, 08:10:52 pm »
I started out that way, but the chip is full, and it will no longer compile for speed.
There’s no pin for pin replacement. It would be nice if there were a dsPic33FJ256GP802.

JPortici, Where did you get the generated assembler? I assume you put NOPs in there to help find it


nope, i compiled the project then looked at the program memory view.
with -O0 i there was a call to a memset label.. which wasn't present anywhere in the code  ::)
so i decoded the call address and looked from there. crazy i know but i'm familliar with dsPIC assembly, and it was a very simple program

i want to remind you that the best code came from using the C libraries.. memsed was compiled to the exact same assembly i suggested, which (amazing!!!) used the dsPIC instruction set at its advantage.. so i suggest you use memset to avoid any possible issue. it was even inlined, no calls to external functions!

anyway,
window -> debug -> output -> disassembly listing file for the listing file.
window -> pic memory views -> program memory, if the compiler doesn't show everything in the listing.
at optimizations higher than -O1 the listing will be harder to follow... you have to keep track of the addresses..
« Last Edit: April 02, 2018, 08:16:53 pm by JPortici »
 

Offline SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14297
  • Country: fr
Re: dsPic erasing array faster
« Reply #39 on: April 02, 2018, 08:33:32 pm »
@JPortici: good catch!

Actually on a lot of platforms, a lot of the C standard library functions (such as memcpy, memset, strcpy, strcmp, etc) get compiled as efficient inline code by GCC once you activate optimizations. Seen that on x86 platforms. But I wasn't necessarily expecting this on a dsPIC  :-+
 

Online JPortici

  • Super Contributor
  • ***
  • Posts: 3452
  • Country: it
Re: dsPic erasing array faster
« Reply #40 on: April 02, 2018, 08:34:34 pm »
yes, that was a very pleasant surprise :)
 

Offline @rtTopic starter

  • Super Contributor
  • ***
  • Posts: 1051
Re: dsPic erasing array faster
« Reply #41 on: April 09, 2018, 10:52:04 pm »
Hi again :)

Can I dodge anding a larger variable type in C to access a single bit flag in assembler for dsPic as easy as it would be done for 8 bit RISC?
In the assembler I’m used to, I could test any bit in “configbits” in a single instruction, and conditionally execute the next instruction.

Code: [Select]
if ((configbits & 0b00010000) == 0b00000000) {bit flag was clear} else {bit flag was set}

I imagine this might compile to some ideal assembler, but if not, it might save program memory.
Cheers, Brek.
« Last Edit: April 09, 2018, 10:53:56 pm by @rt »
 

Online JPortici

  • Super Contributor
  • ***
  • Posts: 3452
  • Country: it
Re: dsPic erasing array faster
« Reply #42 on: April 19, 2018, 06:08:02 am »
Hi, don't know how i missed the reply.

did you ever get to check the instruction set?
the ideal assembly outcome would be a BTSS/BTSC instruction (Bit Test, Skip if Set/Clear) but i don't know if or how the compiler detects that you are testing a single bit.

when using bitfields (check xc16 manual on how to write them) and writing on single bits, such as for example LATAbits.LATA0 it tends to use bit instructions so BSET,BCLR,BTG
so i *think* that using bitfields will tell the compiler to use bit instructions
 

Offline @rtTopic starter

  • Super Contributor
  • ***
  • Posts: 1051
Re: dsPic erasing array faster
« Reply #43 on: April 21, 2018, 06:27:25 pm »
Hi :)
I’m familiar with anything from 8 bit, and btfss/sc work until I try using it with a bit of any C defined unsigned char.
It could just be that this has to be a 16 bit variable.

The problems for me are probably more to do with using it in a C project than just using asm.
I’ve been looking at examples and Microchip docs for specific things.
I’m sure a jump table would work, but I haven’t found anywhere practical for it in the particular project.

A delay loop can’t be more speed efficient, but based on your example, it can take little memory for a delay:
Code: [Select]
asm volatile ("Repeat #4999"); // number of repeats - 1
asm volatile ("nop"); // dead delay
It costs the same as the C call to a C delay function, to just write small delays.

I have tried a zero overhead DOSTART loop. I think this probably works, but doesn’t like C code inside it.
It would have been to replace something like this:

Code: [Select]
int i = 0;

while (i < 1024) {
imagebuffer[i] = framebuffer[i];
i++; // increment count
}

so after the DOSTART, I would still increment i, and address the arrays with i.
It’s only the actual while loop and braces that would be replaced.
I’m not sure what happens there. It freezes, but doesn’t reset.




 

Offline NorthGuy

  • Super Contributor
  • ***
  • Posts: 3137
  • Country: ca
Re: dsPic erasing array faster
« Reply #44 on: April 21, 2018, 07:46:17 pm »
I have tried a zero overhead DOSTART loop. I think this probably works, but doesn’t like C code inside it.
It would have been to replace something like this:

Code: [Select]
int i = 0;

while (i < 1024) {
imagebuffer[i] = framebuffer[i];
i++; // increment count
}

You still can do it with the repeat:

Code: [Select]
asm volatile ("MOV #_framebuffer,W0");
asm volatile ("MOV #_imagebuffer,W1");
asm volatile ("REPEAT #1023");
asm volatile ("MOV [W0++],[W1++]");
 

Offline @rtTopic starter

  • Super Contributor
  • ***
  • Posts: 1051
Re: dsPic erasing array faster
« Reply #45 on: April 22, 2018, 06:57:26 am »
Hi :) I wouldn’t have known to type that, but think I see how it works.
If the first two instructions copy the locations, and the last line copies the contents, then increments both locations.

It compiles, and saves a couple of words, but doesn’t work however.
The dsPic doesn’t reset, but since this is all copying image data, I just see nothing on the display.
This is a simpler example that should work, but doesn’t work to replace the C code above it.

I’d like to do more, such as xor a value from one buffer with a literal, and copy the result to the other,
but it’s hard to get adventurous when the simple stuff can break it.

Code: [Select]
int clibuf = 0; // clear the first image buffer
while (clibuf < 1024) {imagebuffer[clibuf] = 0; clibuf++;}
/*
asm volatile ("MOV #_imagebuffer,W0");
asm volatile ("Repeat #511");
asm volatile ("CLR [W0++]");
*/
 

Offline NorthGuy

  • Super Contributor
  • ***
  • Posts: 3137
  • Country: ca
Re: dsPic erasing array faster
« Reply #46 on: April 22, 2018, 02:14:51 pm »
It compiles, and saves a couple of words, but doesn’t work however.
The dsPic doesn’t reset, but since this is all copying image data, I just see nothing on the display.
This is a simpler example that should work, but doesn’t work to replace the C code above it.

It's hard to tell why. If I want assembler, I prefer to write it directly. When you write from C, it's always a danger that you may destroy something. Try saving/restoring W1:

Code: [Select]
push w1

... your code here

pop w1

Also you can inspect the disassembly and then you will see what the complier does around your code.

I’d like to do more, such as xor a value from one buffer with a literal, and copy the result to the other,
but it’s hard to get adventurous when the simple stuff can break it.

Repeat can do this too.

You can do it on the byte level, for example for 2048 bytes:

Code: [Select]
mov.b #literal,w2
repeat #2047
xor.b w2,[w0++],[w1++]

Or, you can do it on the word level, for example for 1024 words:

Code: [Select]
mov #literal,w2
repeat #1023
xor w2,[w0++],[w1++]
 

Offline @rtTopic starter

  • Super Contributor
  • ***
  • Posts: 1051
Re: dsPic erasing array faster
« Reply #47 on: April 22, 2018, 05:04:18 pm »
Hello again :)
Everything I tried that didn’t work this morning, is now working, in fact everything I’ve tried is now successful.
If any of those small routines did not work in a given location, they do if I replace W0,W1 with W1,W2. So I guess C was still doing something with W0.

It seems to be likely if the assembler is going inside a C loop that is incrementing a private variable.

A not yet properly tested hypothesis, but true so far:
I can always use W0,W1 if the assembler is the first thing in a void C function.

So I guess I can try the XOR now, thanks :)

A silly thing I did on my own:
Code: [Select]
// copy the selected image buffer to the frame buffer
// imageselect can be zero, or offset for image b 1024.
void copyimgtoframebuf(int imageselect) {
int i = 0;
while (i < 1024) {
framebuffer[i] = imagebuffer[i+imageselect];
i++;
} // i
}



// copy the selected image buffer to the frame buffer
void copyimgtoframebuf() {
asm volatile ("MOV #_framebuffer,W1");      //
asm volatile ("MOV #_imagebuffer,W0");      //
asm volatile ("BTSC _imageselect,#10");     //
asm volatile ("MOV #_imagebuffer+1024,W0"); //
asm volatile ("REPEAT #511");               //
asm volatile ("MOV [W0++],[W1++]");         //
}

It might be better to copy the 1024 value out of imageselect, and add it to the image buffer location, instead of the BTSC here.






« Last Edit: April 22, 2018, 05:05:54 pm by @rt »
 

Offline NorthGuy

  • Super Contributor
  • ***
  • Posts: 3137
  • Country: ca
Re: dsPic erasing array faster
« Reply #48 on: April 22, 2018, 05:21:09 pm »
Everything I tried that didn’t work this morning, is now working, in fact everything I’ve tried is now successful.
If any of those small routines did not work in a given location, they do if I replace W0,W1 with W1,W2. So I guess C was still doing something with W0.

Saving all used registers (with push) and then restoring them (with pop) should solve all such problems.

If you want to add imageselect to w0:

Code: [Select]
mov _imageselect,w2
add w0,w2,w0

However, if imageselect is passed as a parameter to the function, it may be passed in w0. If you bust w0 then it is no longer valid.

 

Offline @rtTopic starter

  • Super Contributor
  • ***
  • Posts: 1051
Re: dsPic erasing array faster
« Reply #49 on: April 22, 2018, 05:29:02 pm »
It does seem reliable if I use a function with no arguments or return value
Code: [Select]
void invertframe() {
asm volatile ("MOV #_framebuffer,W1");
asm volatile ("MOV #_imagebuffer,W0");
asm volatile ("MOV.B #255,W2");
asm volatile ("REPEAT #1023");
asm volatile ("XOR.B W2,[w0++],[w1++]");
}
That sort of thing has never broken yet. This function can be called from inside a C loop as well.

EDIT,,,
let’s not do that...

Code: [Select]
void invertframe() {
asm volatile ("MOV #_framebuffer,W1");
asm volatile ("MOV #_imagebuffer,W0");
asm volatile ("MOV #0xFFFF,W2");
asm volatile ("REPEAT #511");
asm volatile ("XOR W2,[W0++],[W1++]");
}



« Last Edit: April 22, 2018, 05:43:47 pm by @rt »
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf