Are we talking about Brands popularity (Microchip or Atmel) or about uCU families (see: architecture) capabilities (PIC/AVR), in order for the OP to choose from?
Was not me the one who suggested the (non Microchip/Atmel but STM, perhaps) Cortex-M architecture as the next step performance level?
Engineering is not a popularity contest. Certainly, ARMs are popular, but does this make them more powerful?
Speaking about PIC16 architecture. It is certainly somehow clumsy, but it has what you need for MCUs. For example, you can set a bit anywhere in the memory with a single instruction such as
bsf variable,0
Similarly, it can do increments, decrements, logical operation etc. How the all-powerful ARM fares against that? Here it is:
ldr r1,[r0]
orr r1,r1,#1
str r1,[r0]
This is three instructions instead of one - three times slower (given all cache hits) and three times more code space. Doesn't sound very powerful, does it? And I don't even begin to discuss how the address of the variable got into r0.
But wait a minute, PIC's operation was atomic, but with ARM we now have 3 instructions. If an interrupt happens between loading and storing and try to modify other bits of the same variable ... Boom! Hopefully, your arm has LDREX/STREX instructiions. Then you can do:
again:
ldrex r1,[r0]
orr r1,r1,#1
strex r1,r1,[r0]
cmp r1,#0
bne again
It is now 5 instructions and it takes much longer to execute, and even longer if an interrupt happens in the middle. Looks like 48 MHz STM32 with ARM hardly can keep up with 8 MHz (32 MHz FOSC) PIC16 doing simple bit-setting. Does it sound extremely powerful to you?
Of course, this is only one particular operation, but this is not the only situation where PICs fare well. However, if you want to multiply bunch of 32-bit numbers as they do in benchmark tests, STM32 will go way ahead. If you really want to do lots of 32-bit arithmetics or floating point, it would be silly to take PIC16. This is something that PIC16 cannot do. But for simple embedded tasks, PIC16 is actually quite efficient.