There is no way on earth I would recommend learning assembly language for an ARM device as a first go at programming.
The basics of the ARM assembly languages are among the better ones. If you can find something with an ARM7TDMI then that's fairly simple to learn on. Hard to know whether A32 mode or T16 mode is better to start with, as they both have strong and weak points. But either is massively better than the complexity of ARMv7 / T32.
It's usually the hardware setup stuff that kills you on modern ARM SoCs.
Cortex M0, and the Pi Pico, are probably a good modern, easily obtained, place to start.
I also would caution against assembly language for mid-range PICs. They don't have very many instructions and extremely limited addressing formats along with an extremely limited stack and no PUSH/POP op codes, bottom line: they are UGLY to program. Keeping track of paging and banking is a PITA!
A restricted ISA isn't a bad thing in itself. A beginner should mostly use their time to learn how to use the available tools, not learning what the tools are. They shouldn't be wondering if they've missed something, or if there is some cool construct/instruction that does exactly what they want but they just haven't found it yet. They shouldn't be wondering what is legal and what is illegal.
But PIC goes too far, and in a very awkward direction that doesn't fit modern programming practices.
A week or two ago I did an exercise where I reduced the already pretty slim (37 instructions) RV32I instruction set down to just 10 (ten) instructions that are still sufficient to write absolutely any program, or to compile C to. Except for SB/SH (which need more) each missing instruction can be replaced by 2-4 of the remaining instructions.
I hand-altered some compiled C code (e.g. my count primes benchmark) to use only the 10 instructions and found the code size increase was less than 30% and the execution time increase much less than that (as most of the extra code was preloading constants outside a loop, for use inside the loop).
The instructions:
LW, SW Rd, imm(Rb): load/store with register base and 12 bit signed offset
ADDI Rd, Rs, imm: add 12 bit signed immediate constant
ADD, NAND, SLL, SRA: Rd = Rs1 op Rs2 register to register ALU operations
JAL Rd,.+imm, JALR Rd,imm(Rb): unconditional branches
BLT Rs1,Rs2,.+imm: conditional branch
That's a really small instruction set to be both universal and to offer code size and speed close to real world instruction sets.
(NAND of course isn't a standard RV32I instruction, but is replacing AND, OR, XOR. I originally included BLTU until I realised you can emulate it by simply adding 0x80000000 to both values and then use BLT)
Working 10-instruction subset assembly language source for my Primes benchmark is here:
https://hoult.org/primes.S (I didn't convert main(), just countPrimes())
The original C source is here:
https://hoult.org/primes.txtThe AVR ATmega128 is a reasonable architecture and assembly programming isn't overly difficult but why bother? C works very well with these ATmegas.
Yes, a good choice for first adventures in assembly language programming, up there with RISC-V and ARM and MSP430.
I think C working well with an ISA is a basic prerequisite for bothering with the ISA at all, these days. If a compiler can use it easily then probably so can a human.
The one thing about learning programming, whether C or asm, on a microcontroller is the difficulty of observing execution. If you can attach a debugger and single-step then that's good, otherwise it can be best to learn using an emulator with instruction-by-instruction trace output.
IBM used to guesstimate 100 lines of debugged code per day. Those 100 lines can do a little (assembly) or a lot (FORTRAN), take your pick.
The correct figure is
tenSee "The Mythical Man Month" from 1975, by the manager of the software when the IBM 360 was developed.