One thing that you do not take into account is the frequency of the various operations. In your example (a clock?), it seems like you are displaying your number much more frequently (many times per second for multiplexing your display) than you are doing calculations on it (an increment each second), so the performance decision is biased. If you were doing a lot of calculations prior to display, the four-byte approach would become more complex in computation, even though the display code becomes simpler.
You can also consider:
1) Binary Coded Decimal (two 4-bit digits in each byte.) As free-electron says, direct support for BCD used to be pretty common. But it's not any more.
2) calculate display patterns only when the value changes. Requires a byte per displayed digit, plus at least one bit somewhere to indicate a value change; so it is more RAM intensive than either version you have suggested. But the display code becomes trivial.
3) Some compilers optimize "divide by constant" operations.
4) Eliminating the divide code means that you have to not use it ANYWHERE in your code.
There's a maxim: "premature optimization is the root of a lot of evil" (or something like that.)
It's pretty much true. There is little point in making some code take 4 bytes of RAM, 128bytes of code and run in 342 cycles if the chip you're using has 1k of RAM, 8k of code, and is fast enough that your code "must run" in 8000 cycles. Or frequently, even if you can buy a slightly different chip for an extra $0.25 that suddenly doubles the code and ram space. The best bet is to write your code so that it is easily understandable and maintainable, and then if (and only if) it happens to be too slow or too big (or you can save $x by going to a smaller/cheaper chip), you can think about optimizing your code differently.
(and at that point, you analyze the code and optimize the parts where you get the biggest return for your efforts.)