Why would anyone want to change that bootloader - what could it do differently for better or worse, than the one that it comes with ?
Meh. If you're going to teach assembly language, you ought to at least teach (a subset of) a REAL assembly language.
I agree entirely.
I'm also thinking about a different binary coding of a real assembly language such as RISCV to simplify working in raw machine code, but keeping the assembler (and therefore compilers too) compatible.
One of the tenets of the RISC movement might be stated "it doesn't make sense to design a processor with an "elegant" assembler/machine language; you should design the processor to execute code that compilers can emit easily." So modern cpus tend to be a bit ugly.
I disagree. It's true the original designers of RISC said something along those lines, but it's not like they made things deliberately ugly or something.
Some early RISCs didn't have pipeline interlocks, which meant the programmer needed to think about how long each instruction would take and make sure anything using the result was enough instructions later that the result would be available in time. That's a bit of a chore for the programmer to keep track of, especially if it applies to add/subtract and and/or/xor not only to multiply/divide and memory loads. Two things quickly killed this idea 1) enough transistors became available that pipeline interlocks could be added easily, and 2) CPUs got faster a lot more quickly than memory did, meaning load delays got a LOT longer, and also highly variable depending on whether the load hit in cache or not.
The other thing that happened was some studies indicated that programmers could write a certain number of lines of code in a day, and that number didn't depend on whether they were writing in a high level language or in assembly language. So some designers (most famously VAX) decided to bring assembly language and machine language closer to a high level language. They made it possible to do things such as
a[x] = b[y] + c[z]
in a single instruction, and also added single instructions to do all the work of function call or return -- saving registers, adjusting stack frames etc. x86 got some of the same philosophy too.
The RISC people said "That's nuts! We can compile
a[x] = b[y] + c[z]
into four simple instructions (or even seven or ten) and it runs just as fast or faster -- EVEN ON YOUR VAX".
The biggest problem with the RISC way is that the poor old programmer has to find registers to put all the intermediate results in. The VAX has to do that too, but it does it for you, using hidden registers the programmer doesn't know about and can't access.
If you want elegace, If the absence of the lovely PDP11 (sigh), I think I'd recommend the MSP430. ARM or AVR wouldn't be awful.
Agreed on all of those, with caveats, and indeed I suggested basically the same set of machines.
PDP11 doesn't have enough registers. Only six, effectively (or five if you don't save/restore the link register). In
a[x] = b[y] + c[z]
that's only enough to hold a, b, c, x, y, z but not the actual data! Need to keep at least some of them in the stack frame. Ugh. (32 bit x86 has the same problem of course)
MSP430 is basically a PDP11 with twice the registers and half the addressing modes. Usually a good trade-off.
ARM and Thumb I already discussed. Teaching a subset of Thumb1 is I think a good choice, especially as it runs on a huge range of real hardware that people already have or can buy cheaply, from Raspberry Pi to Android to every iPhone before the new models just announced (which I assume are, like iOS 11 everywhere, Aarch64 only). Say, for example, just instruction formats 2, 3, 9, 11, 13, 14, 16, 18, 19 at first (i.e. slightly less than half of them) from page 2 here
https://ece.uwaterloo.ca/~ece222/ARM/ARM7-TDMI-manual-pt3.pdf.
That's still a lot more complex than RISCV's effectively four instruction formats.
AVR is good. Lots of registers. Simple instructions. The main problems are it's only 8 bits (making dealing with useful sized numbers or even pointers a pain), and fiddly support for using multiple pointers/arrays at the same time. But the hardware is very cheap and easily available, with excellent tool support.