Author Topic: dsPic erasing array faster  (Read 8392 times)

0 Members and 1 Guest are viewing this topic.

Offline @rtTopic starter

  • Super Contributor
  • ***
  • Posts: 1058
dsPic erasing array faster
« on: April 01, 2018, 08:07:17 am »
Hi Guys :)

I have been clearing a large array of bytes, byte at a time, in 512 separate writes to RAM, like so:

Code: [Select]
unsigned char bigarray[512];
int counter = 0;

while (counter < 512) {
bigarray[counter] = 0;
counter++;
}

It has occurred to me that since the dsPic is a 16 bit processor, had I declared the same RAM array as unsigned integer types,
a similar loop could have deleted 16 bits at a time, resulting in a quicker pass, only needing 256 writes.

Code: [Select]
unsigned int bigarray[256];
int counter = 0;

while (counter < 256) {
bigarray[counter] = 0;
counter++;
}

Is there a way to alias this array so that such an operation can be performed, while the array is still defined as unsigned char?
Cheers, Brek.

« Last Edit: April 01, 2018, 08:08:57 am by @rt »
 

Offline Geoff_S

  • Regular Contributor
  • *
  • Posts: 88
  • Country: au
Re: dsPic erasing array faster
« Reply #1 on: April 01, 2018, 08:09:39 am »
You could consider a C union.

But do you really need to ?  Is this erase loop critical in your overall application ?
 

Offline @rtTopic starter

  • Super Contributor
  • ***
  • Posts: 1058
Re: dsPic erasing array faster
« Reply #2 on: April 01, 2018, 08:12:27 am »
absolutely!
It’s a bitwise monochrome frame buffer for graphics, and is erased every frame.
If the time for erasing was halved, the frame rate increase might be perceivable.

Similarly, there are times the frame buffer is copied back & forth from an image buffer,
where a new way to handle clearing the array would probably also translate to an improvement there as well.
 

Offline @rtTopic starter

  • Super Contributor
  • ***
  • Posts: 1058
Re: dsPic erasing array faster
« Reply #3 on: April 01, 2018, 08:14:19 am »
Wow thanks :)
I think I’ll see a result if that works the way I think it does.

 

Offline hans

  • Super Contributor
  • ***
  • Posts: 1637
  • Country: nl
Re: dsPic erasing array faster
« Reply #4 on: April 01, 2018, 08:27:48 am »
Have you tried?
Code: [Select]
memset(bigarray, 0, sizeof(bigarray));
I think the stdlibs should be pretty optimized (in speed) to handle these operations, as they are used literally everywhere.

Another way, if you want to keep doing it manually, is unroll your loop e.g. 8 times manually:
Code: [Select]
#include <stdint.h> // for uint16_t, otherwise adjust type..

unsigned char bigarray[512];
int counter = 0;
uint16_t* bigarray16 = (uint16_t*) bigarray;
while (counter < 32) {
   *bigarray16++ = 0;
   *bigarray16++ = 0;
   *bigarray16++ = 0;
   *bigarray16++ = 0;
   *bigarray16++ = 0;
   *bigarray16++ = 0;
   *bigarray16++ = 0;
   *bigarray16++ = 0;
   counter++;
}
This way we only got the overhead of increasing counter, comparison and branching 1 out of 8 times.
You could also modify the counter to be down counting, as comparison with 0 is faster than comparison with a constant, but after unrolling that is just nitpicking.

If you're also dealing with slow copying times, I would definitely recommended memcpy, as some implementations use a peculiar C-style switch-case fallthrough: https://en.wikipedia.org/wiki/Duff%27s_device
Not sure if that is implemented in XC16 stdlib, though. It is also only effective if you're dealing with odd-sized number of bytes, e.g. if you want to unroll the loop 8 times, but then need 4.25 loop iterations.
« Last Edit: April 01, 2018, 08:47:01 am by hans »
 

Offline andersm

  • Super Contributor
  • ***
  • Posts: 1198
  • Country: fi
Re: dsPic erasing array faster
« Reply #5 on: April 01, 2018, 08:44:38 am »
uint8_t bigarray[512];
uint16_t *bigarray_alias=(uint16_t*)bigarray;
Do it the other way around. Otherwise there is no guarantee that bigarray is correctly aligned.

EDIT: If the chip has a suitable peripheral, you can also try using DMA. IIRC the 16-bit PICs also have some zero-overhead loop instructions that the C compiler probably won't generate on its own (but might be used in memset).
« Last Edit: April 01, 2018, 08:54:11 am by andersm »
 

Offline @rtTopic starter

  • Super Contributor
  • ***
  • Posts: 1058
Re: dsPic erasing array faster
« Reply #6 on: April 01, 2018, 08:51:03 am »
Hard to tell, but will be easier to measure. It is working correctly though,
and I see no reason to do something slower.

The real delay is in the function that sends to the LCD hardware itself.
The only function with dead delays between toggling pins.
Maybe the dead delays could be replaced with code to erase the part of the frame that was just written.

@andersm,
Do you just mean swap the two lines will prevent the actual array being offset by a byte?
« Last Edit: April 01, 2018, 08:52:53 am by @rt »
 

Offline andersm

  • Super Contributor
  • ***
  • Posts: 1198
  • Country: fi
Re: dsPic erasing array faster
« Reply #7 on: April 01, 2018, 08:56:39 am »
Do you just mean swap the two lines will prevent the actual array being offset by a byte?
No, I mean make the array uint16_t and the alias a uint8_t*. An uint8_t array may be allocated on an odd address, which is invalid for word operations on PIC24/dsPIC.

Offline @rtTopic starter

  • Super Contributor
  • ***
  • Posts: 1058
Re: dsPic erasing array faster
« Reply #8 on: April 01, 2018, 09:03:52 am »
Ok, I understand. I should make the alias the real name of the array then, which everything is already using.

This works also. I haven’t measured anything yet speed-wise.
Code: [Select]
memset(framebuffer, 0, sizeof(framebuffer));

Doesn’t memset copy to the array byte at a time though?
« Last Edit: April 01, 2018, 09:08:58 am by @rt »
 

Offline JPortici

  • Super Contributor
  • ***
  • Posts: 3461
  • Country: it
Re: dsPic erasing array faster
« Reply #9 on: April 01, 2018, 09:14:11 am »
uint8_t bigarray[512];
uint16_t *bigarray_alias=(uint16_t*)bigarray;
Do it the other way around. Otherwise there is no guarantee that bigarray is correctly aligned.

EDIT: If the chip has a suitable peripheral, you can also try using DMA. IIRC the 16-bit PICs also have some zero-overhead loop instructions that the C compiler probably won't generate on its own (but might be used in memset).

precisely.
-using the __aligned attribute one can make the buffer aligned to whatever number of bytes (2,4,8,16,32,..)
-example of asm code for zero-overhead loop
Code: [Select]
asm("MOV #_bigarray,W0");    //Put the base address of your variable in W0
asm("Repeat #255");
asm("CLR [W0++]");    //Clear the content of the 16 bit ram word at address W0. Post-Increment W0 by 2
« Last Edit: April 01, 2018, 09:17:59 am by JPortici »
 

Offline hans

  • Super Contributor
  • ***
  • Posts: 1637
  • Country: nl
Re: dsPic erasing array faster
« Reply #10 on: April 01, 2018, 09:17:41 am »

Doesn’t memset copy to the array byte at a time though?

I've taken a look at the stdlib implementation of Microchip. If you go to the XC16 folder, it's in src/libpic30.zip

I couldn't find memset, but memcpy is doing 1 byte or word copy at a time, depending on pointer alignment. It's written in assembly, so it's not going to be touched by the compiler for unrolling.
That is quite a disappointing implementation, but I guess if it works that's worth something.
 
In that case, you could enjoy a speedup by doing a manual implementation, like the many alternatives that have presented itself here :)
 

Offline Twoflower

  • Frequent Contributor
  • **
  • Posts: 737
  • Country: de
Re: dsPic erasing array faster
« Reply #11 on: April 01, 2018, 09:18:08 am »
Have you actually looked if you see really 512 RAM accesses? Your loop is very simple so there's a chance the compiler changed that to 16 bit accesses anyway if the compiler is setup to optimize for speed. You should check the disassembled result to have a look (or just run speed tests).

Before try to optimize your code and make the code harder to understand for the compiler and yourself just check if what the compiler actually did. And you should run some performance tests to see if manual fiddling around improves the situation or worsen it. This might happen if you try to optimize the code and the compiler fails to 'understand' the function of it. Also loop-unrolling is probably done if you optimize for speed but not for size (for example gcc -O3 vs. gcc -Os).

Some compilers can create a log file that tells you what he did and were some issues are.

And in general: If the function is run fast enough, don't waste too much time. If the function is called very seldom, don't waste too much time. The exception might be battery powered devices as unnecessary compute/memory accesses waste precious energy.
 

Offline JPortici

  • Super Contributor
  • ***
  • Posts: 3461
  • Country: it
Re: dsPic erasing array faster
« Reply #12 on: April 01, 2018, 09:22:59 am »
aah if only the DMA engine could do memory-memory transferts... :D
 

Offline @rtTopic starter

  • Super Contributor
  • ***
  • Posts: 1058
Re: dsPic erasing array faster
« Reply #13 on: April 01, 2018, 09:25:23 am »
I’m up for trying as many approaches as possible.
Right now this:
Code: [Select]
unsigned char framebuffer[1024]__attribute__((aligned (2)));
unsigned char imagebuffer[2048]__attribute__((aligned (2)));
unsigned int *framebufferwide=(unsigned int*)framebuffer;
unsigned int *imagebufferwide=(unsigned int*)imagebuffer;
If I do it back to front, a fair bit more program memory is consumed.

 

Offline C

  • Super Contributor
  • ***
  • Posts: 1346
  • Country: us
Re: dsPic erasing array faster
« Reply #14 on: April 01, 2018, 01:28:39 pm »

Change the direction of your loop.

Start at Max size and go to 0 on index.
A down to loop is ofter much faster.

First think Binary not human and then as high a level as you can get and see what you get from compiler.

 

Offline NorthGuy

  • Super Contributor
  • ***
  • Posts: 3140
  • Country: ca
Re: dsPic erasing array faster
« Reply #15 on: April 01, 2018, 02:25:24 pm »
If you want to force the C compiler to produce some particular assembler code, it's easier to do it directly. The commands that JPortici has suggested will do the cleaning in 258 instruction cycles. Any code without "repeat" will be at least 5-7 times slower (depending on which dsPIC you use). If memset() is not using "repeat" (either as a direct compiler-emitted code or through a function where "repeat" takes a register as an argument), then no amount of C "wizardry" will help - XC!6 is not good at this. If you really need it to be fast, no reason to waste your time with C, just use assembler - it's only 3 lines - much more clear and much less work than anything you can do in C.

Then again, why do you need it to be fast? May be what C does by default is fast enough for you. Then you don't need to worry at all.

Or perhaps, you can alter your algorithm to avoid cleaning at all - this is often possible.

 

Offline JanJansen

  • Frequent Contributor
  • **
  • Posts: 380
  • Country: nl
Re: dsPic erasing array faster
« Reply #16 on: April 01, 2018, 02:31:34 pm »
Hi, take a look at my 12864B display code i posted at Microchip : http://www.microchip.com/forums/m966941.aspx
I copy 32 bit zeros in the array RAM, works fastest i found.

extern unsigned char screen[ 64 ][ 16 ];
extern unsigned char lcdtemp;
extern unsigned long*lcdptr;
//------------------------------------------------------------------------------
// clear graphical display backbuffer
//------------------------------------------------------------------------------
#define ClearScreen lcdtemp = 0; \
lcdptr = ( unsigned long * )screen; \
do{ *(lcdptr++) = 0; }while( lcdtemp++ != 255 );
aliexpress parachute
 

Offline @rtTopic starter

  • Super Contributor
  • ***
  • Posts: 1058
Re: dsPic erasing array faster
« Reply #17 on: April 01, 2018, 03:44:25 pm »
I did try the asm, and it compiled, but crashed at runtime.
Since the array is really called frame buffer, and is really 1024 bytes, it looked like this:

Code: [Select]
asm("MOV #_framebuffer,W0");
asm("Repeat #511”);
asm("CLR [W0++]”);

Twoflower, I can measure the speed in one way or another. I have a frame counter for example,
but it’s easier for me to have as many working examples first.
I’m not really complaining about speed, but the method I originally asked about, if it were possible,
seemed as though it would be faster for free, so I thought why not ask.

 

Online SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14445
  • Country: fr
Re: dsPic erasing array faster
« Reply #18 on: April 01, 2018, 03:54:04 pm »
Did it crash with a properly aligned array?
 

Offline @rtTopic starter

  • Super Contributor
  • ***
  • Posts: 1058
Re: dsPic erasing array faster
« Reply #19 on: April 01, 2018, 04:02:57 pm »
If the example in post 14 constitutes a properly aligned array, yes.
There’s nothing unusual about the RCON register, so not really a crash,
but a lot goes wrong. The screen is blank, an LED output that is supposed
to be on, turns off, which is the LCD backlight, so I don’t see anything else.

EDIT,,
Maybe I jumped the gun there, and tried it before adding the suffix to those declarations to align them.
It appears to be working now.
I don’t know anything about this particular assembler, but assuming W0 is half of an accumulator,
why isn’t it destroyed by interrupts?
« Last Edit: April 01, 2018, 04:18:07 pm by @rt »
 

Online SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14445
  • Country: fr
Re: dsPic erasing array faster
« Reply #20 on: April 01, 2018, 04:32:03 pm »
You could take a look at the generated assembly after the compilation stage (-S option of GCC) to make sure it's exactly what you wrote.

One thing you could try is add the keyword 'volatile' after the asm keyword. You can see this "trick" at several places in some Microchip headers and explained in the compiler's manual.

Code: [Select]
asm volatile ("MOV #_framebuffer,W0");
asm volatile ("Repeat #511”);
asm volatile ("CLR [W0++]”);

The 'volatile' keyword guarantees that the C compiler will insert the assembly instruction as is. Otherwise it may rearrange or even discard some inlined asm code if it thinks it can, usually as part of the optimization stage.
 

Online SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14445
  • Country: fr
Re: dsPic erasing array faster
« Reply #21 on: April 01, 2018, 05:09:36 pm »
I don’t know anything about this particular assembler, but assuming W0 is half of an accumulator,
why isn’t it destroyed by interrupts?

The dsPIC line have 16 so-called working registers, W0-W15. They can be used for a lot of purposes, including indirect addressing, such as in this example.
You can take a look at this if interested: ww1.microchip.com/downloads/en/DeviceDoc/70157C.pdf

It's the responsibility of ISRs to preserve context, including registers. C compiled code will use working registers as well.
 

Offline @rtTopic starter

  • Super Contributor
  • ***
  • Posts: 1058
Re: dsPic erasing array faster
« Reply #22 on: April 01, 2018, 05:20:06 pm »
Ok, I added the volatile words, and it doesn’t compile with any more or less words, and still works the same,
so I’ll leave it that way.. Thanks :)
I’d know if it was a problem because if the frame buffer wasn’t cleared, the screen would simply gradually turn black,
never clearing anything drawn to it, just as occurs if I comment out the function.

I was proficient with asm for 8 bit pics, such that I can still read and modify reasonable code, but do I really want to go there again?

My debugging hasn’t evolved far either. I still serial debug this with a serial routine in code, a Commodre Amiga, and terminal program, and CRT monitor!
To measure this loop I’d output a pulse once per call to the function that updates the display, and use an actual frequency counter to count framerate.
I am aware this is funny for these times, but given my understanding of DSPs and the repeat, I trust this is going to be faster that anything else.
 

Offline JPortici

  • Super Contributor
  • ***
  • Posts: 3461
  • Country: it
Re: dsPic erasing array faster
« Reply #23 on: April 01, 2018, 05:20:11 pm »
did you also check INTCON1?
if the buffer is located at the end of ram it's possible that an address trap was generated after the last CLR instruction..
After the CLR W0 would point to a location outside of the data memory.. maybe you could replace the sequence as
Code: [Select]
asm volatile ("MOV #framebuffer,W0");
asm volatile ("REPEAT #510");
asm volatile ("CLR [W0++]");
asm volatile ("CLR [W0]");

or maybe what happened is that you altered the value of W0 without saving it before (push W0 -> sequence -> pop W0) or without telling the compiler to reload its value after the operation (Register clobber)
 

Offline @rtTopic starter

  • Super Contributor
  • ***
  • Posts: 1058
Re: dsPic erasing array faster
« Reply #24 on: April 01, 2018, 05:26:23 pm »
JPortici,
I think now, it only played up before I aligned the arrays when defining them.
As I said I don’t know much about asm for this, but there’s no other assembler in the program, so it’s not me ever writing to W0 myself.
The program does have two interrupt routines that appear to be working ok with the code now. One for hardware serial interrupt,
and the other using timer 1 to keep real time in the absence of GPS.

 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf