Put static in front of it , and remember to pre declare it in the top.
And maybe even tell avrgcc to inline the function.
static inline uint8_t myfunc(void); //predeclare
It's generally better to leave the decision on whether or not to inline the function up to the compiler. Unless maybe the function is used in a really tight loop or an ISR.
Bingo. Compilers generally do a much better job of optimizing code than programmers do. The exception to this is DSPs with strange instruction sets, cache management, memory alignment requirements, etc. Then you really have to know exactly what you're doing or you will botch things up royally. Also, it pays to know how your specific processor's memory management and cache management work if you want to get every drop of performance out of a piece of code. I've done a lot of real time coding, and processing is RARELY the problem. Nearly every problem ends up being I/O bound and you're always coming up against issues such as flushing the pipeline, caching the wrong data, etc etc etc. That's where the money is. The compiler does a fine job on it's own of figuring out how to handle mundane tasks such as function calls, loop unrolling, inlining and things like that.
Let me give you an example. Which of these will run faster, assuming no additional optimization:
for (i=0; i<SOME_NUMBER; i++)
{
Do_Something(i);
}
or
{
i=0
UnRolled;
DoSomething;
i++
UnRolled;
DoSomething;
i++
UnRolled;
DoSomething;
i++
UnRolled;
DoSomething;
i++
UnRolled;
DoSomething;
i++
UnRolled;
DoSomething;
i++
UnRolled;
DoSomething;
i++
UnRolled;
DoSomething;
i++
}
Most people still say the second way is obviously faster, but most of the time the first way actually has the advantage because it's quite likely that DoSomething() will be sitting in the instruction cache after the first call and the rest will execute blazingly fast, whereas the second way forces a bunch more memory to be read in...which is SLOOOOOOWWWWWWW. Also, churning on "localized" memory and spitting the results out at the end, as opposed to working on scattered memory...same thing. Intelligent memory management is where you get truly huge gains, and compilers are very poor at this because while they can optimize instructions very well, they're very poor at figuring out your semantics. How does it know what's OK and what's not? It generally doesn't and can't take full advantage of the architecture without intelligent layout from the designer to show the way.