When faced with a similar problem (though not exactly the same as yours) here is what i did.
Is it the fastest code - no
Was it fast enough - yes
void LEDColumnUpdate(uint8_t led1, uint8_t led2)
{
// led1 and led2 are a number 0-31 to set which of the 5 leds on a column are active.
PORTB = ((led1 & 0b00000001) << 4) | ((led1 & 0b00000100) << 3) ;
PORTE = ((led2 & 0b00000010) >> 1) | ((led2 & 0b00000001) << 1)
| ((led2 & 0b00001000) >> 1) | ((led2 & 0b00000100) << 1)
| (led2 & 0b00010000) | ((led1 & 0b00001000) << 2)
| ((led1 & 0b00010000) << 2) | ((led1 & 0b00000010) << 6);
}
Basically it masks out a specific bit in the variable and shifts it right or left to match up with the bit in the PORT register were that column is physically connected. Then all of these are OR'ed together.
I actually had 5 leds, not 2 as shown here, i removed the others from the function before posting to make it easier to read. Some of the other leds were on ports used for other things and needed special handling to insure no interrupt happened while they were updating.
When the project was done the 8mhz ATMega micro was
- Multiplexing 5x 5x7 led blocks with 5 level PWM brightness control (fast enough that it didn't flicker)
- Doing the math for led animations and scrolling messages
- Performing some software signal filtering and syncing to 50Hz light pulses from a phototransistor in order to capture data in the pulses.
- Scanning 3 pushbuttons with software debounce
And it wasn't showing any signs of speed issues
The reason you can only see 4 led blocks is because the bigger top 5x7 one is bicolor red/green, from the codes point of view it's addressed like 2 separate 5x7 blocks.