Author Topic: dsPic erasing array faster  (Read 1883 times)

0 Members and 1 Guest are viewing this topic.

Online @rt

  • Frequent Contributor
  • **
  • Posts: 778
dsPic erasing array faster
« on: April 01, 2018, 06:07:17 pm »
Hi Guys :)

I have been clearing a large array of bytes, byte at a time, in 512 separate writes to RAM, like so:

Code: [Select]
unsigned char bigarray[512];
int counter = 0;

while (counter < 512) {
bigarray[counter] = 0;
counter++;
}

It has occurred to me that since the dsPic is a 16 bit processor, had I declared the same RAM array as unsigned integer types,
a similar loop could have deleted 16 bits at a time, resulting in a quicker pass, only needing 256 writes.

Code: [Select]
unsigned int bigarray[256];
int counter = 0;

while (counter < 256) {
bigarray[counter] = 0;
counter++;
}

Is there a way to alias this array so that such an operation can be performed, while the array is still defined as unsigned char?
Cheers, Brek.

« Last Edit: April 01, 2018, 06:08:57 pm by @rt »
 

Offline Geoff_S

  • Regular Contributor
  • *
  • Posts: 54
  • Country: au
Re: dsPic erasing array faster
« Reply #1 on: April 01, 2018, 06:09:39 pm »
You could consider a C union.

But do you really need to ?  Is this erase loop critical in your overall application ?
 

Offline blueskull

  • Supporter
  • ****
  • Posts: 9785
  • Country: cn
  • Power Electronics Guy
Re: dsPic erasing array faster
« Reply #2 on: April 01, 2018, 06:11:18 pm »
uint8_t bigarray[512];
uint16_t *bigarray_alias=(uint16_t*)bigarray;
int i;
for(i=0;i<256;i++)
    bigarray_alias=0;
SIGSEGV is inevitable if you try to talk more than you know. If I say gibberish, keep in mind that my license plate is SIGSEGV.
 

Online @rt

  • Frequent Contributor
  • **
  • Posts: 778
Re: dsPic erasing array faster
« Reply #3 on: April 01, 2018, 06:12:27 pm »
absolutely!
It’s a bitwise monochrome frame buffer for graphics, and is erased every frame.
If the time for erasing was halved, the frame rate increase might be perceivable.

Similarly, there are times the frame buffer is copied back & forth from an image buffer,
where a new way to handle clearing the array would probably also translate to an improvement there as well.
 

Online @rt

  • Frequent Contributor
  • **
  • Posts: 778
Re: dsPic erasing array faster
« Reply #4 on: April 01, 2018, 06:14:19 pm »
Wow thanks :)
I think I’ll see a result if that works the way I think it does.

 

Offline hans

  • Frequent Contributor
  • **
  • Posts: 954
  • Country: nl
Re: dsPic erasing array faster
« Reply #5 on: April 01, 2018, 06:27:48 pm »
Have you tried?
Code: [Select]
memset(bigarray, 0, sizeof(bigarray));
I think the stdlibs should be pretty optimized (in speed) to handle these operations, as they are used literally everywhere.

Another way, if you want to keep doing it manually, is unroll your loop e.g. 8 times manually:
Code: [Select]
#include <stdint.h> // for uint16_t, otherwise adjust type..

unsigned char bigarray[512];
int counter = 0;
uint16_t* bigarray16 = (uint16_t*) bigarray;
while (counter < 32) {
   *bigarray16++ = 0;
   *bigarray16++ = 0;
   *bigarray16++ = 0;
   *bigarray16++ = 0;
   *bigarray16++ = 0;
   *bigarray16++ = 0;
   *bigarray16++ = 0;
   *bigarray16++ = 0;
   counter++;
}
This way we only got the overhead of increasing counter, comparison and branching 1 out of 8 times.
You could also modify the counter to be down counting, as comparison with 0 is faster than comparison with a constant, but after unrolling that is just nitpicking.

If you're also dealing with slow copying times, I would definitely recommended memcpy, as some implementations use a peculiar C-style switch-case fallthrough: https://en.wikipedia.org/wiki/Duff%27s_device
Not sure if that is implemented in XC16 stdlib, though. It is also only effective if you're dealing with odd-sized number of bytes, e.g. if you want to unroll the loop 8 times, but then need 4.25 loop iterations.
« Last Edit: April 01, 2018, 06:47:01 pm by hans »
 

Online andersm

  • Frequent Contributor
  • **
  • Posts: 917
  • Country: fi
Re: dsPic erasing array faster
« Reply #6 on: April 01, 2018, 06:44:38 pm »
uint8_t bigarray[512];
uint16_t *bigarray_alias=(uint16_t*)bigarray;
Do it the other way around. Otherwise there is no guarantee that bigarray is correctly aligned.

EDIT: If the chip has a suitable peripheral, you can also try using DMA. IIRC the 16-bit PICs also have some zero-overhead loop instructions that the C compiler probably won't generate on its own (but might be used in memset).
« Last Edit: April 01, 2018, 06:54:11 pm by andersm »
 

Online @rt

  • Frequent Contributor
  • **
  • Posts: 778
Re: dsPic erasing array faster
« Reply #7 on: April 01, 2018, 06:51:03 pm »
Hard to tell, but will be easier to measure. It is working correctly though,
and I see no reason to do something slower.

The real delay is in the function that sends to the LCD hardware itself.
The only function with dead delays between toggling pins.
Maybe the dead delays could be replaced with code to erase the part of the frame that was just written.

@andersm,
Do you just mean swap the two lines will prevent the actual array being offset by a byte?
« Last Edit: April 01, 2018, 06:52:53 pm by @rt »
 

Online andersm

  • Frequent Contributor
  • **
  • Posts: 917
  • Country: fi
Re: dsPic erasing array faster
« Reply #8 on: April 01, 2018, 06:56:39 pm »
Do you just mean swap the two lines will prevent the actual array being offset by a byte?
No, I mean make the array uint16_t and the alias a uint8_t*. An uint8_t array may be allocated on an odd address, which is invalid for word operations on PIC24/dsPIC.

Online @rt

  • Frequent Contributor
  • **
  • Posts: 778
Re: dsPic erasing array faster
« Reply #9 on: April 01, 2018, 07:03:52 pm »
Ok, I understand. I should make the alias the real name of the array then, which everything is already using.

This works also. I haven’t measured anything yet speed-wise.
Code: [Select]
memset(framebuffer, 0, sizeof(framebuffer));

Doesn’t memset copy to the array byte at a time though?
« Last Edit: April 01, 2018, 07:08:58 pm by @rt »
 

Online JPortici

  • Super Contributor
  • ***
  • Posts: 2144
  • Country: it
Re: dsPic erasing array faster
« Reply #10 on: April 01, 2018, 07:14:11 pm »
uint8_t bigarray[512];
uint16_t *bigarray_alias=(uint16_t*)bigarray;
Do it the other way around. Otherwise there is no guarantee that bigarray is correctly aligned.

EDIT: If the chip has a suitable peripheral, you can also try using DMA. IIRC the 16-bit PICs also have some zero-overhead loop instructions that the C compiler probably won't generate on its own (but might be used in memset).

precisely.
-using the __aligned attribute one can make the buffer aligned to whatever number of bytes (2,4,8,16,32,..)
-example of asm code for zero-overhead loop
Code: [Select]
asm("MOV #_bigarray,W0");    //Put the base address of your variable in W0
asm("Repeat #255");
asm("CLR [W0++]");    //Clear the content of the 16 bit ram word at address W0. Post-Increment W0 by 2
« Last Edit: April 01, 2018, 07:17:59 pm by JPortici »
 

Offline hans

  • Frequent Contributor
  • **
  • Posts: 954
  • Country: nl
Re: dsPic erasing array faster
« Reply #11 on: April 01, 2018, 07:17:41 pm »

Doesn’t memset copy to the array byte at a time though?

I've taken a look at the stdlib implementation of Microchip. If you go to the XC16 folder, it's in src/libpic30.zip

I couldn't find memset, but memcpy is doing 1 byte or word copy at a time, depending on pointer alignment. It's written in assembly, so it's not going to be touched by the compiler for unrolling.
That is quite a disappointing implementation, but I guess if it works that's worth something.
 
In that case, you could enjoy a speedup by doing a manual implementation, like the many alternatives that have presented itself here :)
 

Offline Twoflower

  • Frequent Contributor
  • **
  • Posts: 371
  • Country: de
Re: dsPic erasing array faster
« Reply #12 on: April 01, 2018, 07:18:08 pm »
Have you actually looked if you see really 512 RAM accesses? Your loop is very simple so there's a chance the compiler changed that to 16 bit accesses anyway if the compiler is setup to optimize for speed. You should check the disassembled result to have a look (or just run speed tests).

Before try to optimize your code and make the code harder to understand for the compiler and yourself just check if what the compiler actually did. And you should run some performance tests to see if manual fiddling around improves the situation or worsen it. This might happen if you try to optimize the code and the compiler fails to 'understand' the function of it. Also loop-unrolling is probably done if you optimize for speed but not for size (for example gcc -O3 vs. gcc -Os).

Some compilers can create a log file that tells you what he did and were some issues are.

And in general: If the function is run fast enough, don't waste too much time. If the function is called very seldom, don't waste too much time. The exception might be battery powered devices as unnecessary compute/memory accesses waste precious energy.
 

Online JPortici

  • Super Contributor
  • ***
  • Posts: 2144
  • Country: it
Re: dsPic erasing array faster
« Reply #13 on: April 01, 2018, 07:22:59 pm »
aah if only the DMA engine could do memory-memory transferts... :D
 

Online @rt

  • Frequent Contributor
  • **
  • Posts: 778
Re: dsPic erasing array faster
« Reply #14 on: April 01, 2018, 07:25:23 pm »
I’m up for trying as many approaches as possible.
Right now this:
Code: [Select]
unsigned char framebuffer[1024]__attribute__((aligned (2)));
unsigned char imagebuffer[2048]__attribute__((aligned (2)));
unsigned int *framebufferwide=(unsigned int*)framebuffer;
unsigned int *imagebufferwide=(unsigned int*)imagebuffer;
If I do it back to front, a fair bit more program memory is consumed.

 

Online C

  • Super Contributor
  • ***
  • Posts: 1345
  • Country: us
Re: dsPic erasing array faster
« Reply #15 on: April 01, 2018, 11:28:39 pm »

Change the direction of your loop.

Start at Max size and go to 0 on index.
A down to loop is ofter much faster.

First think Binary not human and then as high a level as you can get and see what you get from compiler.

 

Online NorthGuy

  • Frequent Contributor
  • **
  • Posts: 911
  • Country: ca
Re: dsPic erasing array faster
« Reply #16 on: April 02, 2018, 12:25:24 am »
If you want to force the C compiler to produce some particular assembler code, it's easier to do it directly. The commands that JPortici has suggested will do the cleaning in 258 instruction cycles. Any code without "repeat" will be at least 5-7 times slower (depending on which dsPIC you use). If memset() is not using "repeat" (either as a direct compiler-emitted code or through a function where "repeat" takes a register as an argument), then no amount of C "wizardry" will help - XC!6 is not good at this. If you really need it to be fast, no reason to waste your time with C, just use assembler - it's only 3 lines - much more clear and much less work than anything you can do in C.

Then again, why do you need it to be fast? May be what C does by default is fast enough for you. Then you don't need to worry at all.

Or perhaps, you can alter your algorithm to avoid cleaning at all - this is often possible.

 

Online JanJansen

  • Frequent Contributor
  • **
  • Posts: 373
  • Country: nl
Re: dsPic erasing array faster
« Reply #17 on: April 02, 2018, 12:31:34 am »
Hi, take a look at my 12864B display code i posted at Microchip : http://www.microchip.com/forums/m966941.aspx
I copy 32 bit zeros in the array RAM, works fastest i found.

extern unsigned char screen[ 64 ][ 16 ];
extern unsigned char lcdtemp;
extern unsigned long*lcdptr;
//------------------------------------------------------------------------------
// clear graphical display backbuffer
//------------------------------------------------------------------------------
#define ClearScreen lcdtemp = 0; \
lcdptr = ( unsigned long * )screen; \
do{ *(lcdptr++) = 0; }while( lcdtemp++ != 255 );
aliexpress parachute
 

Online @rt

  • Frequent Contributor
  • **
  • Posts: 778
Re: dsPic erasing array faster
« Reply #18 on: April 02, 2018, 01:44:25 am »
I did try the asm, and it compiled, but crashed at runtime.
Since the array is really called frame buffer, and is really 1024 bytes, it looked like this:

Code: [Select]
asm("MOV #_framebuffer,W0");
asm("Repeat #511”);
asm("CLR [W0++]”);

Twoflower, I can measure the speed in one way or another. I have a frame counter for example,
but it’s easier for me to have as many working examples first.
I’m not really complaining about speed, but the method I originally asked about, if it were possible,
seemed as though it would be faster for free, so I thought why not ask.

 

Offline SiliconWizard

  • Frequent Contributor
  • **
  • Posts: 560
  • Country: fr
Re: dsPic erasing array faster
« Reply #19 on: April 02, 2018, 01:54:04 am »
Did it crash with a properly aligned array?
 

Online @rt

  • Frequent Contributor
  • **
  • Posts: 778
Re: dsPic erasing array faster
« Reply #20 on: April 02, 2018, 02:02:57 am »
If the example in post 14 constitutes a properly aligned array, yes.
There’s nothing unusual about the RCON register, so not really a crash,
but a lot goes wrong. The screen is blank, an LED output that is supposed
to be on, turns off, which is the LCD backlight, so I don’t see anything else.

EDIT,,
Maybe I jumped the gun there, and tried it before adding the suffix to those declarations to align them.
It appears to be working now.
I don’t know anything about this particular assembler, but assuming W0 is half of an accumulator,
why isn’t it destroyed by interrupts?
« Last Edit: April 02, 2018, 02:18:07 am by @rt »
 

Offline SiliconWizard

  • Frequent Contributor
  • **
  • Posts: 560
  • Country: fr
Re: dsPic erasing array faster
« Reply #21 on: April 02, 2018, 02:32:03 am »
You could take a look at the generated assembly after the compilation stage (-S option of GCC) to make sure it's exactly what you wrote.

One thing you could try is add the keyword 'volatile' after the asm keyword. You can see this "trick" at several places in some Microchip headers and explained in the compiler's manual.

Code: [Select]
asm volatile ("MOV #_framebuffer,W0");
asm volatile ("Repeat #511”);
asm volatile ("CLR [W0++]”);

The 'volatile' keyword guarantees that the C compiler will insert the assembly instruction as is. Otherwise it may rearrange or even discard some inlined asm code if it thinks it can, usually as part of the optimization stage.
 

Offline SiliconWizard

  • Frequent Contributor
  • **
  • Posts: 560
  • Country: fr
Re: dsPic erasing array faster
« Reply #22 on: April 02, 2018, 03:09:36 am »
I don’t know anything about this particular assembler, but assuming W0 is half of an accumulator,
why isn’t it destroyed by interrupts?

The dsPIC line have 16 so-called working registers, W0-W15. They can be used for a lot of purposes, including indirect addressing, such as in this example.
You can take a look at this if interested: ww1.microchip.com/downloads/en/DeviceDoc/70157C.pdf

It's the responsibility of ISRs to preserve context, including registers. C compiled code will use working registers as well.
 

Online @rt

  • Frequent Contributor
  • **
  • Posts: 778
Re: dsPic erasing array faster
« Reply #23 on: April 02, 2018, 03:20:06 am »
Ok, I added the volatile words, and it doesn’t compile with any more or less words, and still works the same,
so I’ll leave it that way.. Thanks :)
I’d know if it was a problem because if the frame buffer wasn’t cleared, the screen would simply gradually turn black,
never clearing anything drawn to it, just as occurs if I comment out the function.

I was proficient with asm for 8 bit pics, such that I can still read and modify reasonable code, but do I really want to go there again?

My debugging hasn’t evolved far either. I still serial debug this with a serial routine in code, a Commodre Amiga, and terminal program, and CRT monitor!
To measure this loop I’d output a pulse once per call to the function that updates the display, and use an actual frequency counter to count framerate.
I am aware this is funny for these times, but given my understanding of DSPs and the repeat, I trust this is going to be faster that anything else.
 

Online JPortici

  • Super Contributor
  • ***
  • Posts: 2144
  • Country: it
Re: dsPic erasing array faster
« Reply #24 on: April 02, 2018, 03:20:11 am »
did you also check INTCON1?
if the buffer is located at the end of ram it's possible that an address trap was generated after the last CLR instruction..
After the CLR W0 would point to a location outside of the data memory.. maybe you could replace the sequence as
Code: [Select]
asm volatile ("MOV #framebuffer,W0");
asm volatile ("REPEAT #510");
asm volatile ("CLR [W0++]");
asm volatile ("CLR [W0]");

or maybe what happened is that you altered the value of W0 without saving it before (push W0 -> sequence -> pop W0) or without telling the compiler to reload its value after the operation (Register clobber)
 

Online @rt

  • Frequent Contributor
  • **
  • Posts: 778
Re: dsPic erasing array faster
« Reply #25 on: April 02, 2018, 03:26:23 am »
JPortici,
I think now, it only played up before I aligned the arrays when defining them.
As I said I don’t know much about asm for this, but there’s no other assembler in the program, so it’s not me ever writing to W0 myself.
The program does have two interrupt routines that appear to be working ok with the code now. One for hardware serial interrupt,
and the other using timer 1 to keep real time in the absence of GPS.

 

Online NorthGuy

  • Frequent Contributor
  • **
  • Posts: 911
  • Country: ca
Re: dsPic erasing array faster
« Reply #26 on: April 02, 2018, 03:31:35 am »
why isn’t it destroyed by interrupts?

All the registers are supposed to be preserved by the ISR.

There is a RCOUNT register which holds the counter for "repeat". If you're in the middle of "repeat" and an interrupt happens, then, if your ISR also uses "repeat", it'll destroy your RCOUNT. In such case, RCOUNT must be preserved by the ISR too. Whether the C-handled ISR can correctly figure out if it needs to save RCOUNT or not, I don't know. Thus it's a good idea to check, and possibly add preservation of RCOUNT to the ISR code.
 

Offline SiliconWizard

  • Frequent Contributor
  • **
  • Posts: 560
  • Country: fr
Re: dsPic erasing array faster
« Reply #27 on: April 02, 2018, 03:44:21 am »
Quoting the manual:
Quote
The compiler arranges for registers W8-W15 to be preserved across ordinary function
calls. Registers W0-W7 are available as scratch registers. For interrupt functions, the
compiler arranges for all necessary registers to be preserved, namely W0-W15 and
RCOUNT.
 

Online @rt

  • Frequent Contributor
  • **
  • Posts: 778
Re: dsPic erasing array faster
« Reply #28 on: April 02, 2018, 03:52:10 am »
Sounds good then thanks :)

I vaguely remember having to do it myself.
It must have been w, and I guess wherever the program counter was at, and maybe a bank or page bit. That’s about how clear it is to me now.
I’m really hoping not to have to get into assembler again. Maybe unless there’s another big simple freebie like this,
but it really took me some 18 months to think of this (the question I asked in the OP in the first place).


 

Offline rstofer

  • Super Contributor
  • ***
  • Posts: 4451
  • Country: us
Re: dsPic erasing array faster
« Reply #29 on: April 02, 2018, 08:45:20 am »
The fact that dsPic includes dedicated hardware registers for looping makes it a very powerful chip.  I suppose you need to see if RCOUNT is available (or save the contents) but you simply won't find a faster way to clear memory in code.

Rather than shying away from assembler, I think I would try to maximize my use and knowledge of the features.  It's a very powerful chip
 

Offline Fredderic

  • Contributor
  • Posts: 12
  • Country: au
Re: dsPic erasing array faster
« Reply #30 on: April 02, 2018, 12:20:30 pm »
Did anyone check what assembly memset() is producing?  If it's not in the standard library, could it be being special-cased by the compiler to produce optimal code?  (I've seen many C/C++ compilers doing things like that.)

Rolling your own memset() and memcpy() functions were all the rage back in the DOS days when computers were slower than a modern 8-bitter.  Had a really nice one memorised — because I'd used it so often — that rather elegantly handled unaligned byte buffers (and would then sometimes roll out aligned variants as needed by clipping off the entry and exit alignment ramps when it was guaranteed safe)…  a good fast clear or value set, really can make a big difference, especially where graphics is concerned.

… but that was about two decades ago now.  These days, compilers really should be MUCH better, and there should at least be a memset() on offer that'll do it pretty much as fast as it can get (though once or twice it's not actually been in the stdlib, but was off in a compiler-specific library for some stupid reason).
 

Online @rt

  • Frequent Contributor
  • **
  • Posts: 778
Re: dsPic erasing array faster
« Reply #31 on: April 02, 2018, 02:25:53 pm »
I did unzip the library mentioned, and also only found memcpy().

This is probably misplaced effort.
Using the original code in the OP dealing with 1024 bytes (it was changed to 512 for the forum for some reason) to clear the frame buffer,
overall, the program draws 353 frames every 10 seconds, and I get one more frame every 10 seconds with the assembler loop.
Thinking about it, drawing text and such things is much more time consuming, but I’ll keep it.
It does also happen to be a program memory optimisation too.


 

Online JPortici

  • Super Contributor
  • ***
  • Posts: 2144
  • Country: it
Re: dsPic erasing array faster
« Reply #32 on: April 02, 2018, 04:03:58 pm »
memset is referenced in <strings.h> and not in <stdlib.h> but i can't seem to find the source. i'll write a small program and check the disassembly listing file..

it wasn't in the disassembly file  ::)
so i looked at the generated assembly

Code: [Select]
#include <string.h>
#include <xc.h>

unsigned char framebuffer[1024]__attribute__((aligned (2)));

int main() {
  memset(framebuffer,0,sizeof(framebuffer));
 
  while(1) {
    Nop();
  }
 
  return 0;
}

before calling the functions the working registers are loaded with the call arguments
W0 with the address of framebuffer (0x1000)
W1 with the value, in this case 0
W2 with the size of framebuffer, 0x400

the generated assembly for memset, using -O0. On the left the address, on the right the assembly
Code: [Select]
02E4 MOV W0,W3
02E6 BRA 0x2EC ;This points to three lines lower, 0x2EC is the absolute address
02E8 MOV.B W1, [W3++]
02EA DEC W2,W2
02EC CP0 W2
02EE BRA NZ, 0x2E8 ;This points to three lines UP
02F0 RETURN

with -O1 (@rt, it's included in free mode however. and it arelady makes TONS of useful optimizations. in this case the code generated with -O0 is especially slow because of the two branches. you have many cycles penalities on branches on dspic33e because you have to recreate the pipeline, which dspic33F and dspic30F don't have, as they are slower cores)
Code: [Select]
02E4  MOV #0x1000, W0
02E6  REPEAT #0x1FF
02E8  CLR [W0++]

:)

this was also visible in the disassembly listing file

BONUS: memset to another value
Code: [Select]
;memset(framebuffer,1,sizeof(framebuffer));
02E4  MOV #0x0101, W2
02E6  MOV #0x1000, W0
02E8  REPEAT #0x1FF
02EA  MOV W2, [W0++]
instead of clearing, the repeated instruction is a copy from one register to the memory location pointed to by W0. W2 is loaded to 0x0101 because (of course) the memory in the dsPIC is word addressable so one 16bit move takes 1 cycle, 8 bit move takes 1 cycle (so better combine them) 32 bit move however takes 2 cycles.

BONUS2: and what happens if the memset target has an odd number of bytes?
yep, what you expect
Code: [Select]
;char framebuffer[1025];
;memset(framebuffer,2,sizeof(framebuffer));
02E4  MOV #0x0202, W2
02E6  MOV #0x1000, W0
02E8  REPEAT #0x1FF
02EA  MOV W2, [W0++]
02EC  MOV.B W2, [W0++]
same code, with a single MOV.B added. don't have to modify W2 because with byte instructions (.B) only the lower byte of the accumulator  is used
« Last Edit: April 02, 2018, 04:32:36 pm by JPortici »
 

Online @rt

  • Frequent Contributor
  • **
  • Posts: 778
Re: dsPic erasing array faster
« Reply #33 on: April 02, 2018, 04:33:46 pm »
Nice :)
My main.c file is size optimised with pro version.

It might be worth pulling apart sprintf since I use it a lot. Especially where format specifiers are used, but I guess it’s going to be efficient.
It’s usually program memory space that I’m more concerned about.
 

Online JPortici

  • Super Contributor
  • ***
  • Posts: 2144
  • Country: it
Re: dsPic erasing array faster
« Reply #34 on: April 02, 2018, 04:40:04 pm »
i don't use s/printf so i can't comment on that, but if it's blocking code no matter how optimizied it is, it will stall the program until the hardware has finished transfer.
 

Offline Twoflower

  • Frequent Contributor
  • **
  • Posts: 371
  • Country: de
Re: dsPic erasing array faster
« Reply #35 on: April 02, 2018, 06:02:35 pm »
You mentioned that your compiler is set to size optimization. Give the best speed optimization a try. Depending on the code and compiler the size difference isn't very large, but the code is much faster.
 

Online @rt

  • Frequent Contributor
  • **
  • Posts: 778
Re: dsPic erasing array faster
« Reply #36 on: April 02, 2018, 06:13:07 pm »
I started out that way, but the chip is full, and it will no longer compile for speed.
There’s no pin for pin replacement. It would be nice if there were a dsPic33FJ256GP802.

JPortici, Where did you get the generated assembler? I assume you put NOPs in there to help find it?

Some of this is becoming worthwhile for the program memory if nothing else.
Code: [Select]
asm volatile ("MOV #_framebuffer,W0"); // clear framebuffer
asm volatile ("Repeat #511");
asm volatile ("CLR [W0++]");
asm volatile ("MOV #_imagebuffer,W0"); // clear image buffers
asm volatile ("Repeat #1023");
asm volatile ("CLR [W0++]");
 
/*
clrfb = 0;
while (clrfb < 512) { // clear entire frame buffer
framebuffer[clrfb] = 0; // clear framebuffer
imagebufferwide[clrfb] = 0; // clear first image
imagebufferwide[clrfb+512] = 0; // clear second image
clrfb++;
} // clrfb
*/
« Last Edit: April 02, 2018, 06:31:55 pm by @rt »
 

Offline hans

  • Frequent Contributor
  • **
  • Posts: 954
  • Country: nl
Re: dsPic erasing array faster
« Reply #37 on: April 02, 2018, 07:02:39 pm »
You can try to pick O3 by file in MPLAB, or in GCC (which XC16 is) even by function:

Code: [Select]
int foo(void) __attribute__((optimize("-O3")));

int foo(void) {
   // do stuff
}

However, this level of handpicked optimization is not all-reaching. Sometimes the compiler refuses to go all out on the optimization.
 

Online @rt

  • Frequent Contributor
  • **
  • Posts: 778
Re: dsPic erasing array faster
« Reply #38 on: April 02, 2018, 07:11:24 pm »
I have already used different optimisation levels by file. I have left Microchip’s SD Card (MDDFS) alone,
but it’s news to me that you can use it for a given function. Thanks :)

I suppose I can tell by the change in the memory gauge after compiling whether something like that worked.
 

Online JPortici

  • Super Contributor
  • ***
  • Posts: 2144
  • Country: it
Re: dsPic erasing array faster
« Reply #39 on: April 03, 2018, 06:10:52 am »
I started out that way, but the chip is full, and it will no longer compile for speed.
There’s no pin for pin replacement. It would be nice if there were a dsPic33FJ256GP802.

JPortici, Where did you get the generated assembler? I assume you put NOPs in there to help find it


nope, i compiled the project then looked at the program memory view.
with -O0 i there was a call to a memset label.. which wasn't present anywhere in the code  ::)
so i decoded the call address and looked from there. crazy i know but i'm familliar with dsPIC assembly, and it was a very simple program

i want to remind you that the best code came from using the C libraries.. memsed was compiled to the exact same assembly i suggested, which (amazing!!!) used the dsPIC instruction set at its advantage.. so i suggest you use memset to avoid any possible issue. it was even inlined, no calls to external functions!

anyway,
window -> debug -> output -> disassembly listing file for the listing file.
window -> pic memory views -> program memory, if the compiler doesn't show everything in the listing.
at optimizations higher than -O1 the listing will be harder to follow... you have to keep track of the addresses..
« Last Edit: April 03, 2018, 06:16:53 am by JPortici »
 

Offline SiliconWizard

  • Frequent Contributor
  • **
  • Posts: 560
  • Country: fr
Re: dsPic erasing array faster
« Reply #40 on: April 03, 2018, 06:33:32 am »
@JPortici: good catch!

Actually on a lot of platforms, a lot of the C standard library functions (such as memcpy, memset, strcpy, strcmp, etc) get compiled as efficient inline code by GCC once you activate optimizations. Seen that on x86 platforms. But I wasn't necessarily expecting this on a dsPIC  :-+
 

Online JPortici

  • Super Contributor
  • ***
  • Posts: 2144
  • Country: it
Re: dsPic erasing array faster
« Reply #41 on: April 03, 2018, 06:34:34 am »
yes, that was a very pleasant surprise :)
 

Online @rt

  • Frequent Contributor
  • **
  • Posts: 778
Re: dsPic erasing array faster
« Reply #42 on: April 10, 2018, 08:52:04 am »
Hi again :)

Can I dodge anding a larger variable type in C to access a single bit flag in assembler for dsPic as easy as it would be done for 8 bit RISC?
In the assembler I’m used to, I could test any bit in “configbits” in a single instruction, and conditionally execute the next instruction.

Code: [Select]
if ((configbits & 0b00010000) == 0b00000000) {bit flag was clear} else {bit flag was set}

I imagine this might compile to some ideal assembler, but if not, it might save program memory.
Cheers, Brek.
« Last Edit: April 10, 2018, 08:53:56 am by @rt »
 

Online JPortici

  • Super Contributor
  • ***
  • Posts: 2144
  • Country: it
Re: dsPic erasing array faster
« Reply #43 on: April 19, 2018, 04:08:02 pm »
Hi, don't know how i missed the reply.

did you ever get to check the instruction set?
the ideal assembly outcome would be a BTSS/BTSC instruction (Bit Test, Skip if Set/Clear) but i don't know if or how the compiler detects that you are testing a single bit.

when using bitfields (check xc16 manual on how to write them) and writing on single bits, such as for example LATAbits.LATA0 it tends to use bit instructions so BSET,BCLR,BTG
so i *think* that using bitfields will tell the compiler to use bit instructions
 

Online @rt

  • Frequent Contributor
  • **
  • Posts: 778
Re: dsPic erasing array faster
« Reply #44 on: April 22, 2018, 04:27:25 am »
Hi :)
I’m familiar with anything from 8 bit, and btfss/sc work until I try using it with a bit of any C defined unsigned char.
It could just be that this has to be a 16 bit variable.

The problems for me are probably more to do with using it in a C project than just using asm.
I’ve been looking at examples and Microchip docs for specific things.
I’m sure a jump table would work, but I haven’t found anywhere practical for it in the particular project.

A delay loop can’t be more speed efficient, but based on your example, it can take little memory for a delay:
Code: [Select]
asm volatile ("Repeat #4999"); // number of repeats - 1
asm volatile ("nop"); // dead delay
It costs the same as the C call to a C delay function, to just write small delays.

I have tried a zero overhead DOSTART loop. I think this probably works, but doesn’t like C code inside it.
It would have been to replace something like this:

Code: [Select]
int i = 0;

while (i < 1024) {
imagebuffer[i] = framebuffer[i];
i++; // increment count
}

so after the DOSTART, I would still increment i, and address the arrays with i.
It’s only the actual while loop and braces that would be replaced.
I’m not sure what happens there. It freezes, but doesn’t reset.




 

Online NorthGuy

  • Frequent Contributor
  • **
  • Posts: 911
  • Country: ca
Re: dsPic erasing array faster
« Reply #45 on: April 22, 2018, 05:46:17 am »
I have tried a zero overhead DOSTART loop. I think this probably works, but doesn’t like C code inside it.
It would have been to replace something like this:

Code: [Select]
int i = 0;

while (i < 1024) {
imagebuffer[i] = framebuffer[i];
i++; // increment count
}

You still can do it with the repeat:

Code: [Select]
asm volatile ("MOV #_framebuffer,W0");
asm volatile ("MOV #_imagebuffer,W1");
asm volatile ("REPEAT #1023");
asm volatile ("MOV [W0++],[W1++]");
 

Online @rt

  • Frequent Contributor
  • **
  • Posts: 778
Re: dsPic erasing array faster
« Reply #46 on: April 22, 2018, 04:57:26 pm »
Hi :) I wouldn’t have known to type that, but think I see how it works.
If the first two instructions copy the locations, and the last line copies the contents, then increments both locations.

It compiles, and saves a couple of words, but doesn’t work however.
The dsPic doesn’t reset, but since this is all copying image data, I just see nothing on the display.
This is a simpler example that should work, but doesn’t work to replace the C code above it.

I’d like to do more, such as xor a value from one buffer with a literal, and copy the result to the other,
but it’s hard to get adventurous when the simple stuff can break it.

Code: [Select]
int clibuf = 0; // clear the first image buffer
while (clibuf < 1024) {imagebuffer[clibuf] = 0; clibuf++;}
/*
asm volatile ("MOV #_imagebuffer,W0");
asm volatile ("Repeat #511");
asm volatile ("CLR [W0++]");
*/
 

Online NorthGuy

  • Frequent Contributor
  • **
  • Posts: 911
  • Country: ca
Re: dsPic erasing array faster
« Reply #47 on: April 23, 2018, 12:14:51 am »
It compiles, and saves a couple of words, but doesn’t work however.
The dsPic doesn’t reset, but since this is all copying image data, I just see nothing on the display.
This is a simpler example that should work, but doesn’t work to replace the C code above it.

It's hard to tell why. If I want assembler, I prefer to write it directly. When you write from C, it's always a danger that you may destroy something. Try saving/restoring W1:

Code: [Select]
push w1

... your code here

pop w1

Also you can inspect the disassembly and then you will see what the complier does around your code.

I’d like to do more, such as xor a value from one buffer with a literal, and copy the result to the other,
but it’s hard to get adventurous when the simple stuff can break it.

Repeat can do this too.

You can do it on the byte level, for example for 2048 bytes:

Code: [Select]
mov.b #literal,w2
repeat #2047
xor.b w2,[w0++],[w1++]

Or, you can do it on the word level, for example for 1024 words:

Code: [Select]
mov #literal,w2
repeat #1023
xor w2,[w0++],[w1++]
 

Online @rt

  • Frequent Contributor
  • **
  • Posts: 778
Re: dsPic erasing array faster
« Reply #48 on: April 23, 2018, 03:04:18 am »
Hello again :)
Everything I tried that didn’t work this morning, is now working, in fact everything I’ve tried is now successful.
If any of those small routines did not work in a given location, they do if I replace W0,W1 with W1,W2. So I guess C was still doing something with W0.

It seems to be likely if the assembler is going inside a C loop that is incrementing a private variable.

A not yet properly tested hypothesis, but true so far:
I can always use W0,W1 if the assembler is the first thing in a void C function.

So I guess I can try the XOR now, thanks :)

A silly thing I did on my own:
Code: [Select]
// copy the selected image buffer to the frame buffer
// imageselect can be zero, or offset for image b 1024.
void copyimgtoframebuf(int imageselect) {
int i = 0;
while (i < 1024) {
framebuffer[i] = imagebuffer[i+imageselect];
i++;
} // i
}



// copy the selected image buffer to the frame buffer
void copyimgtoframebuf() {
asm volatile ("MOV #_framebuffer,W1");      //
asm volatile ("MOV #_imagebuffer,W0");      //
asm volatile ("BTSC _imageselect,#10");     //
asm volatile ("MOV #_imagebuffer+1024,W0"); //
asm volatile ("REPEAT #511");               //
asm volatile ("MOV [W0++],[W1++]");         //
}

It might be better to copy the 1024 value out of imageselect, and add it to the image buffer location, instead of the BTSC here.






« Last Edit: April 23, 2018, 03:05:54 am by @rt »
 

Online NorthGuy

  • Frequent Contributor
  • **
  • Posts: 911
  • Country: ca
Re: dsPic erasing array faster
« Reply #49 on: April 23, 2018, 03:21:09 am »
Everything I tried that didn’t work this morning, is now working, in fact everything I’ve tried is now successful.
If any of those small routines did not work in a given location, they do if I replace W0,W1 with W1,W2. So I guess C was still doing something with W0.

Saving all used registers (with push) and then restoring them (with pop) should solve all such problems.

If you want to add imageselect to w0:

Code: [Select]
mov _imageselect,w2
add w0,w2,w0

However, if imageselect is passed as a parameter to the function, it may be passed in w0. If you bust w0 then it is no longer valid.

 

Online @rt

  • Frequent Contributor
  • **
  • Posts: 778
Re: dsPic erasing array faster
« Reply #50 on: April 23, 2018, 03:29:02 am »
It does seem reliable if I use a function with no arguments or return value
Code: [Select]
void invertframe() {
asm volatile ("MOV #_framebuffer,W1");
asm volatile ("MOV #_imagebuffer,W0");
asm volatile ("MOV.B #255,W2");
asm volatile ("REPEAT #1023");
asm volatile ("XOR.B W2,[w0++],[w1++]");
}
That sort of thing has never broken yet. This function can be called from inside a C loop as well.

EDIT,,,
let’s not do that...

Code: [Select]
void invertframe() {
asm volatile ("MOV #_framebuffer,W1");
asm volatile ("MOV #_imagebuffer,W0");
asm volatile ("MOV #0xFFFF,W2");
asm volatile ("REPEAT #511");
asm volatile ("XOR W2,[W0++],[W1++]");
}



« Last Edit: April 23, 2018, 03:43:47 am by @rt »
 

Online NorthGuy

  • Frequent Contributor
  • **
  • Posts: 911
  • Country: ca
Re: dsPic erasing array faster
« Reply #51 on: April 23, 2018, 03:47:52 am »
It does seem reliable if I use a function with no arguments or return value
Code: [Select]
void invertframe() {
asm volatile ("MOV #_framebuffer,W1");
asm volatile ("MOV #_imagebuffer,W0");
asm volatile ("MOV.B #255,W2");
asm volatile ("REPEAT #1023");
asm volatile ("XOR.B W2,[w0++],[w1++]");
}

If you align your buffers at word boundary then using words will be almost twice as fast:

Code: [Select]
asm volatile ("MOV #_framebuffer,W1");
asm volatile ("MOV #_imagebuffer,W0");
asm volatile ("MOV #0xffff,W2");
asm volatile ("REPEAT #511");
asm volatile ("XOR W2,[w0++],[w1++]");

There's a special command for inverting, although it doesn't make much difference:

Code: [Select]
asm volatile ("MOV #_framebuffer,W1");
asm volatile ("MOV #_imagebuffer,W0");
asm volatile ("REPEAT #511");
asm volatile ("COM [w0++],[w1++]");

 

Online @rt

  • Frequent Contributor
  • **
  • Posts: 778
Re: dsPic erasing array faster
« Reply #52 on: April 23, 2018, 04:00:14 am »
I should have known that, it was comf.
Probably compliment file register.
 

Online @rt

  • Frequent Contributor
  • **
  • Posts: 778
Re: dsPic erasing array faster
« Reply #53 on: April 24, 2018, 02:34:47 pm »
My first DO loop worked, which is a software bit bang for SPI write. MSB goes out first.
The device select pin has already been pulled low before calling this to send a byte.

I remembered no way to copy a bit directly, so I don’t like that with the use of BTSS/BTSC,
the data pin would be set faster than it is cleared, so a serial waveform would look odd, despite working.
That is why the input byte is rotated where it is (for an extra instruction delay after the data pin possibly being sent low).

I’m wondering if I should AND the portb latch pin with the data bit for even timing?
So it would take the same time to set or clear the data pin before the latch pin is set?

Code: [Select]
void WriteSPIFast(unsigned char data_out) {
SPICLOCKLAT = 0; SPIOUTLAT = data_out >> 7; SPICLOCKLAT = 1;
SPICLOCKLAT = 0; SPIOUTLAT = data_out >> 6; SPICLOCKLAT = 1;
SPICLOCKLAT = 0; SPIOUTLAT = data_out >> 5; SPICLOCKLAT = 1;
SPICLOCKLAT = 0; SPIOUTLAT = data_out >> 4; SPICLOCKLAT = 1;
SPICLOCKLAT = 0; SPIOUTLAT = data_out >> 3; SPICLOCKLAT = 1;
SPICLOCKLAT = 0; SPIOUTLAT = data_out >> 2; SPICLOCKLAT = 1;
SPICLOCKLAT = 0; SPIOUTLAT = data_out >> 1; SPICLOCKLAT = 1;
SPICLOCKLAT = 0; SPIOUTLAT = data_out; SPICLOCKLAT = 1;
SPICLOCKLAT = 0; SPIOUTLAT = 0;
}

void WriteSPIFaster(unsigned char data_out) {
srbyte = data_out;
asm volatile ("BITBANG: DO #7, ENDBANG");
asm volatile ("BCLR LATB,#8");
asm volatile ("BTSC.b _srbyte,#7");
asm volatile ("BSET LATB,#7");
asm volatile ("BTSS.b _srbyte,#7");
asm volatile ("BCLR LATB,#7");
asm volatile ("RLNC.b _srbyte");
asm volatile ("ENDBANG: BSET LATB,#8");
asm volatile ("BCLR LATB,#8");
asm volatile ("BCLR LATB,#7");
}
 

Online NorthGuy

  • Frequent Contributor
  • **
  • Posts: 911
  • Country: ca
Re: dsPic erasing array faster
« Reply #54 on: April 25, 2018, 01:22:43 am »
My first DO loop worked, which is a software bit bang for SPI write. MSB goes out first.
The device select pin has already been pulled low before calling this to send a byte.

I remembered no way to copy a bit directly, so I don’t like that with the use of BTSS/BTSC,
the data pin would be set faster than it is cleared, so a serial waveform would look odd, despite working.
That is why the input byte is rotated where it is (for an extra instruction delay after the data pin possibly being sent low).

I’m wondering if I should AND the portb latch pin with the data bit for even timing?
So it would take the same time to set or clear the data pin before the latch pin is set?

You can do even timing and a little bit faster:

Code: [Select]
// This assumes that LATB.7/8 are cleared at the entry
asm volatile ("LSR.B  _srbyte,WREG");
asm volatile ("XOR.B _srbyte,WREG"); // W0 holds differences between consecutive bits
asm volatile ("BITBANG: DO #7, ENDBANG");
asm volatile ("BTSC w0,#7");
asm volatile ("BTG LATB,#7");
asm volatile ("SL w0,w0");
asm volatile ("BSET LATB,#8");
asm volatile ("ENDBANG: BCLR LATB,#8");
asm volatile ("BCLR LATB,#7");

However, the best way is to use an SPI module.
 

Online @rt

  • Frequent Contributor
  • **
  • Posts: 778
Re: dsPic erasing array faster
« Reply #55 on: April 25, 2018, 03:13:53 pm »
Thanks :)
Some instructions there I’m unfamiliar with. It will give me something to chew on.
I am using the hardware SPI for an SD card, and a few different software routines to match variable clock speed of another device.

I didn’t think I’d be as interested, but every time something comes back to me, significant program memory is saved.



 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf