What about this:
if ((lcdptr++) == lcdptrPlus127)
Why would the parentheses make a difference?
What about this:
if ((lcdptr++) == lcdptrPlus127)
Why would the parentheses make a difference?
Probably increment before comparing. However I'd strongly advice to split operations in single lines because code like this is wildly obfustigated and impossible to maintain long term. Screens are big nowadays so there is no need to cram as much C code as possible in a 80x40 terminal screen.
What about this:
if ((lcdptr++) == lcdptrPlus127)
Why would the parentheses make a difference?
Probably increment before comparing. However I'd strongly advice to split operations in single lines because code like this is wildly obfustigated and impossible to maintain long term. Screens are big nowadays so there is no need to cram as much C code as possible in a 80x40 terminal screen.
It's still a post-increment operation. The value of (lcdptr++) should be evaluated before the comparison. The value of (lcdptr++) would be the value of lcdptr prior to the increment operation. If instead we wrote (++lcdptr) we would get the value of lcdptr after the increment.
Its not how I would do it, but the Clear() function in post 10 works fine-
this is simply printing out some info about the pointers, then running Clear() 1024/32 times where the last time shows that the pointers are a match (before the increment, as the 'end' pointer is one 'position' short of the end, which is a little different than most would do)
pScreen: 0x80000058
lcdptr: 0x80000058
lcdptrPlus127: 0x80000450
[Clear start][lcdptr: 0x80000058]
lcdptr: 0x80000060
lcdptr: 0x80000068
lcdptr: 0x80000070
lcdptr: 0x80000070
[Clear end][lcdptr: 0x80000078]
[Clear start][lcdptr: 0x80000078]
lcdptr: 0x80000080
lcdptr: 0x80000088
lcdptr: 0x80000090
lcdptr: 0x80000090
[Clear end][lcdptr: 0x80000098]
...
...
[Clear start][lcdptr: 0x80000438]
lcdptr: 0x80000440
lcdptr: 0x80000448
lcdptr: 0x80000450
lcdptr: 0x80000450
lcdptr==lcdptrPlus127
[Clear end][lcdptr: 0x80000458]
FYI, this Clear() function run 32 times takes 340 cp0 cycles (680 cpu cycles), the cls function I showed earlier takes 308 cpu cycles and memset takes 424 cpu cycles. So by calling the Clear() function multiple times, it takes an extra 372 cycles.
I don't know all the details of the C rules and which compiler follows them or not, but these 2 produce the same result in xc32-
if ((lcdptr++) == lcdptrPlus127)
if (lcdptr++ == lcdptrPlus127)
To make it really nasty (not that I would advocate using code like this in any serious fashion) how about:
/* using simple variable names for clarity */
if ((z == (y = x++)) && (x == 5))
{
printf ("%d\n", x);
}
Assuming that execution reaches the printf, what does it print?
What about this:
if ((lcdptr++) == lcdptrPlus127)
Why would the parentheses make a difference?
Exactly. Maybe I'm not remembering the correct terms any more, (I sure thought the concept was called an "execution unit") but that is the concept. Ponder my other example.
To make it really nasty (not that I would advocate using code like this in any serious fashion) how about:
/* using simple variable names for clarity */
if ((z == (y = x++)) && (x == 5))
{
printf ("%d\n", x);
}
Assuming that execution reaches the printf, what does it print?
Don't we need to know the initial value of x before entering the if statement?
Don't we need to know the initial value of x before entering the if statement?
Just think about what it takes to make it to the printf. x must be 5, right? But when is the increment?
Anyway, I can tell you that back in the late 1980s, this was fuzzy. I had a coworker code what was essentially this example "by accident". Depending on whether you used Lattice or Borland, you got 5 or 6 as the output.
Don't we need to know the initial value of x before entering the if statement?
Just think about what it takes to make it to the printf. x must be 5, right? But when is the increment?
Anyway, I can tell you that back in the late 1980s, this was fuzzy. I had a coworker code what was essentially this example "by accident". Depending on whether you used Lattice or Borland, you got 5 or 6 as the output.
OK, I see.
It seems that in modern C++ the logical && operator introduces a sequence point, so the results of x++ should be complete before doing the (x == 5) comparison. Therefore modern C++ should print 5 and not 6.
(This is logical because && short circuits. It is required that everything before the && is evaluated before looking at anything after the &&.)
(This is logical because && short circuits. It is required that everything before the && is evaluated before looking at anything after the &&.)
Hmmm... I'm not sure I follow the logic here. Shouldn't the various sub-clauses, connected with &&, be evaluatable in any order without changing the outcome? That would argue for carrying out the increment after the entire expression was evaluated. Of course, those kind of side effects are exactly why all of this should be avoided like the plague.
(This is logical because && short circuits. It is required that everything before the && is evaluated before looking at anything after the &&.)
Hmmm... I'm not sure I follow the logic here. Shouldn't the various sub-clauses, connected with &&, be evaluatable in any order without changing the outcome? That would argue for carrying out the increment after the entire expression was evaluated. Of course, those kind of side effects are exactly why all of this should be avoided like the plague.
They are logically, but the language
specifies that && short-circuits. As soon as a false is encountered as the rvalue of the first operand, the language spec guarantees that the second operand is not evaluated.
This permits you to do things like without a segfault if the pointer is 0:
if (pointer && pointer->value > 0) ...
If pointer is 0, pointer is never dereferenced.
People need to read a C book (or two
) - yes, I wrote one.
Anyway, || and && have ALWAYS been sequence points and shortcircuited, since K&R. Heck, since Ritchie's original compiler.
Are you sure that your pointer is aligned as you expect at the start?
It was my own stupid fault, you are right, the pointer was not init, only when redraw not on start, how stupid.
Thanks for your mental support all.
it's be really difficult to out-perform memset,
I like to have my code the fastest i can get and am willing to test the timing for various methods.
Only i still have this problem : i can not set my PIC32MX in fast performance mode yet, take a look at this topic :
https://www.microchip.com/forums/m404723.aspxThe codes they suggest are not working, seems to be outdated info.
So i read :
NO Caching turned on
NO Prefetch buffer enabled
7 FLASH wait states
1 SRAM wait states.
I only find one thing in the datasheet : BMXWSDRM: CPU Instruction or Data Access from Data RAM Wait State bit
nothing about caching and other wait states.
I will look a bit into more and maybe make another topic about this.
Then i can proof if maybe my code is faster then memset.
I read i make more problems then i asked for.
Dont worry its all good, thanks.
But yes, memcpy/memset lead to very efficient *inline* code (no call to a library's function) when using optimizations, at least with GCC, so doing the same "by hand" is rarely a good idea. GCC knows how to schedule the instructions in the most efficient way and will do so much more easily if you use memset() than if you devise your own assignments which may be harder for GCC to optimize. Hardly worth it.
NB: you can still memset a portion (4, 8, 16, 32 bytes) at a time if you don't want to pause the program for the whole time.
Indeed like Anders sayd,
only without chip running optimal no point to do testing.
only without chip running optimal no point to do testing
You don't have much to do on that chip except set the freq you want.
You can simulate the various options, and just use cp0 count to time them. It really doesn't matter what the cpu speed is when simulating this, as you are counting cpu cycles not absolute time.
#include <xc.h>
#include <stdint.h>
#include <string.h>
uint8_t screen [64][16];
uint32_t cls32(uint32_t addr){
uint64_t* p = (uint64_t*) addr;
*p++ = 0; *p++ = 0; *p++ = 0; *p++ = 0; //4*8bytes = 32bytes
return (uint32_t)p;
}
void cls(){
uint64_t* p = (uint64_t*) screen;
uint32_t i = sizeof(screen)/128;
for(; i; i--){
*p++ = 0; *p++ = 0; *p++ = 0; *p++ = 0; //4*8bytes = 32bytes
*p++ = 0; *p++ = 0; *p++ = 0; *p++ = 0; //4*8bytes = 32bytes
*p++ = 0; *p++ = 0; *p++ = 0; *p++ = 0; //4*8bytes = 32bytes
*p++ = 0; *p++ = 0; *p++ = 0; *p++ = 0; //4*8bytes = 32bytes
}
}
//volatile so not optimized away and can see in simulator
volatile uint32_t t_mem, t_mem32, t_cls32, t_cls, t_dma;
uint32_t zero = 0;
void main(void) {
uint32_t i = (uint32_t)screen;
uint32_t j = i + sizeof(screen);
//memset
uint32_t t = __builtin_mfc0(9, 0);
memset(screen, 0, 64*16);
t_mem = __builtin_mfc0(9, 0) - t;
//memset - 32bytes at a time
t = __builtin_mfc0(9, 0);
for( ; i<j; memset((void*)i, 0, 32), i+=32);
t_mem32 = __builtin_mfc0(9, 0) - t;
//cls32 - 32bytes at a time
t = __builtin_mfc0(9, 0);
i = (uint32_t)screen;
j = i + sizeof(screen);
for(; i<j; i = cls32(i));
t_cls32 = __builtin_mfc0(9, 0) - t;
//cls
t = __builtin_mfc0(9, 0);
cls();
t_cls = __builtin_mfc0(9, 0) - t;
//dma
DCH0SSA = 0x1FFFFFFF & (uint32_t)&zero;
DCH0DSA = 0x1FFFFFFF & (uint32_t)screen;
DCH0SSIZ = 4;
DCH0DSIZ = sizeof(screen);
DCH0CSIZ = sizeof(screen);
DCH0CONbits.CHPRI = 3;
DCH0CONbits.CHEN = 1;
DMACONbits.ON = 1;
DCH0ECONbits.CFORCE = 1;
t = __builtin_mfc0(9, 0);
while(DCH0CONbits.CHBUSY);
t_dma = __builtin_mfc0(9, 0) - t;
for(;;);
}
//-Os
// CP0 count
//t_mem = 0xCF (207) = 414 cpu cycles
//t_mem32 = 0x2D2 (722) = 1444 cpu cycles
//t_cls32 = 0x122 (290) = 580 cpu cycles
//t_cls = 0x91 (145) = 290 cpu cycles
//t_dma = 0x403 (1027)
The dma code may not be correct, but it simulates ok. The memset is probably a bad thing to use in pieces as it takes a bit of code to get going (check alignment, etc). The all-in-one cls is fastest as you can eliminate much looping where cpu cycles are wasted in branching.
You don't say what the lcd is driven by, so its not clear whether dma could also be used to send data to the lcd.
Thank you CV.
The diplay is driven by 4 bit mode.
I have problems : sometimes i see chinese characters.
The whole day its good, then all the sudden it give garbage on screen.
So i think its a timing issue, and made the timing very slow, it wont help.
Then i doubled my breadboard cables, added extra to be sure.
It is so weird if something running all day without problems, then the next day its no good.
Its very weird i have a few days wasted on this 12864B displsy.
So my question now is : does anyone use this 12864B display in 4-bit parallel mode successfully ?
I also readed that that display library, u8g lib also dont support 4 bit mode for this display, it got me thinking, maybe it wont work at all ?
I hope anyone can confirm.
does anyone use this 12864B display
You will have to be more specific, 12864b probably describes every other lcd panel with those pixel dimensions.
Its with ST7920 chipset display from ebay china, i can make a picture tomorow.
Only i dont know if the real ST7920 is in there, since the times can be set much faster then this manual :
http://www.hpinfotech.ro/ST7920.pdf
Some of those LCDs are badly implemented. You sneeze a little and they don't work well. Some times you have a add timing delay between commands...
Did i wreck it ?, i did connected it wrong one time.
The thing is it works for a day, then problems arise, then next day all good, and repeat.
I give up on this display, going for the full color TFT next time.