Coincidentally, well sort of, I've spent a large proportion of the last week rewriting millis(), micros(), delay() and delayMicroseconds() code for my fork of ATTinyCore. Anyway, while my work is not directly related (or quite finished), I can offer two pieces of of advice.
Instead of digitalWrite (which as others have pointed out, is slow and if you are trying to get us accuracy... good luck with that), toggle the pin directly....
if (micros() - time >= 1000){
time = micros();
PIND |= (_BV(2) | _BV(3) | _BV(4) | _BV(5) ); // 2,3,4,5 are on PORTD
PINB |= _BV(5); // 13 is PORTB.5
}
The other is that the millis timer interrupt handler (which also is used for timing micros() but not for timing delayMicroseconds()) takes a not-insignificant amount of time. In the ATTinyCore the interrupt handler is literally "ovrf++;" and that with it's interrupt handler overhead comes in at like 55-75 clocks, at 16 MHz that could be almost 5uS each time the interrupt handler runs. Now look at the millis interrupt handler in the mainline Arduino core....
{
unsigned long m = timer0_millis;
unsigned char f = timer0_fract;
m += MILLIS_INC;
f += FRACT_INC;
if (f >= FRACT_MAX) {
f -= FRACT_MAX;
m += 1;
}
timer0_fract = f;
timer0_millis = m;
timer0_overflow_count++;
}
holy mother of god, I havn't disassembled it, but that looks freaking massive to me, I hate to think what it clocks in at.
If you do not need to use millis(), or micros(), then you could freely disable the timer 0 overflow interrupt handler, and in the case of the standard arduino core instead of using delay (which requires millis()), you could craft a function to use delayMicroseconds().
Incidentally, it's often overlooked that delayMicroseconds() does not disable the interrupt handler (it would be a bad idea to do so, even though it once did) and as a result it's horrifically inaccurate if you don't disable interrupts around it, especially that millis() (timer 0 overflow) interrupt.