I will be happy if someone answers my question.
Which question? The original one? Many people have answered it. The answer is "It depends". Unless you can tell exactly *which* microcontroller you are talking about.
No, the other one, here:
Sorry, but I don't understand what does it mean ALU width?
Because ALU operates with operands which are stored in registers. (For older architectures accumulator). So register width would be the width of ALU?
As pointed out above, the Z80 uses a 4 bit ALU with 8 bit registers so even ALU width is meaningless as a descriptor. I never realized the ALU was only 4 bits but, sure enough, it is.
I don't think any of those hardware-based statistics is the proper measure of 8-bit, 16-bit, 32-bit, 64-bit. They are all just implementation details that could change on a newer compatible processor with a different architecture but that runs all the same programs.
Ok, that basically never happened with microcomputers in the 1970s. Every new processor ran a completely different instruction set. At most -- and even in the 1980s -- at most a new microprocessor could run (most) programs from an older one, but not vice versa.
That wasn't the case with mini-computers and mainframes, even then.
In the 1960s IBM made dozens of different machines that all ran every program for the 32 bit IBM 360 instruction set. Some of them had 8 bit memory busses and some had 64 bit. Some executed most instructions in one clock cycle, and some used a little interpreter program (microcode) that took five or maybe even ten clock cycles to execute even simple instructions.
DEC made a wide range of CPUs that all could run the same PDP-11 programs -- and then a wide range of CPUs that all could run the same VAX programs. Data General and PR1ME did the same. Newer CPUs weren't necessarily faster than older ones -- they also pushed the compatible range down to smaller and cheaper (but slow) machines as time went on.
I think the only meaningful measure is something like "What is the largest data size that a program can use in a single instruction?" (you obviously need to not count things like the x86 "rep" prefix) The bit-ness is a property of the instruction set (and the programs), not of a particular micro-architecture implementing that instruction set.
That can be hard to understand for old instruction sets that only had a single implementation and then were abandoned.
And, yes, I think that means some old CPUs should be reclassified now, and in particular many of the "8 bit" CPUs should be reclassified as 16 bit -- pretty *poor* 16 bit CPUs, but 16 bit nonetheless.
For example the Z80 had single instructions to take the 16 bit contents of HL or IX or IY, add the contents of BC, DE, SP or itself, and put the result back in the original register. You could also add or subtract with carry any of BC, DE, HL or SP to HL, and as these set the flags according to the 16 bit result you could then branch directly according to that. You could directly load a 16 bit constant into any of BC, DE, HL, IX, IY, or SP or load or store a 16 bit value directly to/from any of those to an absolute memory location. You could also add an 8 bit constant to the contents of IX or IY and use the 16 bit result as a memory address to load or store any 8 bit register (A, B, C, D, E, H, L). You could push BC, DE, HL, IX or IY onto the stack, or pop them from the stack.
The only things you couldn't do directly with 16 bit values were to copy from one 16 bit register to another (but something like PUSH DE; POP BC is just two bytes of program code) or to load or store them via a pointer (which also needs two instructions).
The Z80 has a 16 bit instruction set. A pretty inconvenient one, in many ways, but the 8 bit instructions aren't exactly convenient to use either!