The fewer things that you handle with interrupts, the more responsive each interrupt can be. You generally learn polling before interrupts so that you get a good handle on the basics of program flow in the first place, and it's often the best way to do things even when you do understand everything there is to know about interrupts.
Not everything needs microsecond-immediate attention. As long as you never busy-wait for anything (see the next section), the main-loop polling rate is usually more than sufficient.
Likewise, don't do anything more than what is absolutely necessary in an interrupt handler. Any time that you spend there is time that you can't be doing something else, including lower-priority interrupts. (priority as you've told the chip, which is not necessarily what you intended

) So do the absolute bare minimum that really must be done RIGHT NOW and get out. If you can pare it all the way down to just setting a flag and leaving, then you can eliminate that interrupt altogether and just poll for the condition itself instead of the flag that you would have set. That's a big plus!
The entire point of this section is to keep the interrupts reserved for when you really do need a drop-everything-immediate response. You don't want to have to wait for printf to finish because you standardized on interrupts for everything, including your debug spew.
(Why are you using printf anyway? It's HUGE! In fact, most of the desktop-standard functions don't appear very often here. The environment is just too small to make them practical.)
---
Multitasking is easy once you understand state machines. Instead of busy-waiting for one thing at a time, you can poll-wait for lots of things at the same time. Your main function might then look something like this:
void main()
{
//clock and other chip-wide set up
init_module_A();
init_module_B();
init_module_C();
init_module_D();
while(1)
{
run_module_A();
run_module_B();
if (run_module_C()) //returns non-zero if some C-related event happened, this allows other modules to synchronize to it
{
trigger_module_D();
module_A_input.foo = module_B_get_next_output(); //module A might have a simple static behavior, whereas module B has a sequence that must be kept
}
run_module_D();
}
}
and a module file might include:
enum
{
STARTING,
BYTE_ODD,
BYTE_EVEN,
DONE
} state_machine;
void init_module_C()
{
//set up ONLY what is needed to run this module
state_machine = STARTING;
}
uint8_t run_module_C()
{
switch(state_machine)
{
case STARTING:
if (ready_to_start)
{
//start code here
state_machine = BYTE_ODD;
}
break;
case BYTE_ODD:
//do something with the odd-numbered bytes
state_machine = BYTE_EVEN;
break;
case BYTE_EVEN:
//do something with the even-numbered bytes
if (more_bytes_to_come)
{
state_machine = BYTE_ODD;
}
else
{
state_machine = DONE;
}
break;
default:
//error-correction and normal reset
state_machine = STARTING;
break;
}
return (state_machine == DONE);
}
Now you can copy a module file to a different project that has a different use for that same concept, or even make a central library out of them, without rewriting anything.
Also notice the poll-wait for case STARTING. This is how you wait for things without blocking everything else.
And it's usually faster to check for zero than for any other value. Just load, and check the Z flag; instead of load, subtract, and check the Z flag. It's not much, but if you're short on code space or processing time, it might help to arrange things like that.
---
Global variables are okay, if used SPARINGLY! Intentional inputs and outputs of a module, for example, but nothing more. Declare those in the module's header file; everything else in the source file. Global variables in the source file are global to that module (below the declaration), but not to the entire program. That can be useful too, but again, only when needed.
Like desktop programming, try to keep everything as local as you can get away with, except to the point of using trivial access functions. In that case, just make the variable itself accessible; you often don't have much of a stack to work with. (Recursion is right out!)
Likewise for function prototypes, custom datatypes, etc. Only what the rest of the project needs to see goes in the header; everything else goes in the source.
---
There are some good embedded C++ compilers, but most of the time you don't need C++. When you do, it's nice to have, and it's not really that bloated when you do it right, but most of the time you just don't need those tools at all. C is perfectly fine.
---
Math is interesting in a lot of cases. If you're on an 8-bit architecture (0-255 or -128 to 127), then you have a time penalty for using anything bigger. Sometimes that's okay, sometimes not. Likewise for using 32-bit numbers on a 16-bit architecture, etc. The
(u)intN_t datatypes tell you exactly how big it is:
uint16_t is 16 bits unsigned (0 to 65535).
You might also be restricted to adding, subtracting, and shifting, as the only native operations. (shifting is essentially multiplying or dividing by powers of 2) Multiplication by a non-power of 2 can give you a significant time penalty if you don't have the on-chip hardware for it (some compilers are smart enough to convert a constant multiplier into a combination of shifts and adds; others just pull in their standard block of longhand library code), and division by a variable is a nightmare by comparison! It's literally doing explicit long division in that block of library code. So try to avoid it if at all possible.
Floating-point is even worse than that (floats and doubles), unless you have a floating-point accelerator, and then it only helps you for the size that it's designed for. (a 32-bit FP accelerator works for floats, but not doubles) Fixed-point is guaranteed to work anywhere, which is simply you the programmer keeping track of what fraction you're counting by, and fixing it up (shifting) as needed to keep the answer straight without overflowing or underflowing in the middle somewhere. Instead of wishing you had 8 times the resolution in your 0-to-15 counter, just count by 1/8ths! That's fixed-point. The compiler and the hardware still think you're working with integers, so you need to keep track of the fractional point yourself, but that's how you get fractions on an integer-only machine.
(For a real-world example, "integer" audio is actually fixed-point that is entirely fractional. Instead of -32768, 16384, 8192, 4096, etc. for a signed number, these bit values are -1, 1/2, 1/4, 1/8, etc. They're handled by the exact same circuitry that handles "true integers", so the hardware can't tell the difference, but that's how the audio industry and the software that it uses actually interpret it. A different size, like 8-bit or 24-bit, either truncates the fractional bits or adds more of them so that the peak value is always +/-1. Small DSP's often have a few bits above the fractional point to make them slightly more forgiving, and larger DSP's almost always use floating-point with FP 1.0 = "integer" 011111111111... at the points of conversion.)