Which is partly why Sweet16, a semi-bytecode like 16 bit ALU, was created in/for the Apple, by Steve Wozniak. Which somewhat overcomes, the 6502's limitations.
https://en.wikipedia.org/wiki/SWEET16
"runs at about one-tenth the speed of the equivalent native 6502 code", see?
It was removed very soon, with the AUTOSTART ROM, in 1978 (79?) IIRC. Not that anybody was using it anyways. The big loss if you ask me was the mini-assembler (F666G), also gone with that ROM "upgrade".
These days, processors are usually so fast, that a speed loss of /10 (or even /100), is not necessarily a big show stopper. Hence the popularity of relatively inefficient, scripting/interpreted languages, these days, such as Python.
But the 6502, was somewhat relatively slow and a bit weak, that such a speed loss (/10), would be quite devastating. Especially if it was on top of a Basic Interpreter, which already slows things down by a factor of x100 or even hundreds, compared to hand crafted, well optimised machine code (assembly language),
The thing that "speeded up", the 6502, was the fact that, because of the lack of hardware floating point (home computers, in general, at that time). Floating point took so long (on a typical 6502 1 MHz, or equivalent cpu, e.g. Z80), that the relative slowness of a Basic Interpreter, didn't really matter that much.
Which is partly why Sweet16, a semi-bytecode like 16 bit ALU, was created in/for the Apple, by Steve Wozniak. Which somewhat overcomes, the 6502's limitations.
https://en.wikipedia.org/wiki/SWEET16
"runs at about one-tenth the speed of the equivalent native 6502 code", see?
It was removed very soon, with the AUTOSTART ROM, in 1978 (79?) IIRC. Not that anybody was using it anyways. The big loss if you ask me was the mini-assembler (F666G), also gone with that ROM "upgrade".
I liked and used SWEET16 and included a copy in my own programs. It was just over 300 bytes of code.
You're getting very far from "the 6502 is faster here". You're at "a particular well equipped 6502 computer is faster than a particular poorly-equipped AVR computer".
I'm pretty sure I can write a 6502 emulator for the AVR which will run faster than a real 6502. Using that external SRAM interface.
Writing emulators, JITs and compilers is my job and specialty.
The 6502 is indeed not suited to any language that requires 16 or 32 bit variables. Or functions that are required to work if called recursively.
My code generation scheme is 7 bytes and runs in 44 clock cycles. (ldx #REGA; ldy #REGB; jsr ADD16). X and Y are not modified by ADD16 so if the previous or next operations use A or B then those registers don't need to be reloaded.
I liked and used SWEET16 and included a copy in my own programs. It was just over 300 bytes of code.
More or less, what year was that?
At the time I didn't even know what it was for :-) (*) and shortly after the mini-assembler was gone, I had to buy me another one and got Mike Westerfield's ORCA/M, and learned to do lots of things with macros then.
(*) The red book had a listing, but in 1978 it was still too soon for me to understand what was the point.
My code generation scheme is 7 bytes and runs in 44 clock cycles. (ldx #REGA; ldy #REGB; jsr ADD16). X and Y are not modified by ADD16 so if the previous or next operations use A or B then those registers don't need to be reloaded.
That sounds very impressive, and clever.
By using the X/Y register pair, as 'pretend' accumulators A and B (the 6502's single accumulator, limits/hinders its assembly language). That is a neat trick.
But the 6502, was somewhat relatively slow and a bit weak, that such a speed loss (/10), would be quite devastating. Especially if it was on top of a Basic Interpreter, which already slows things down by a factor of x100 or even hundreds, compared to hand crafted, well optimised machine code (assembly language),
I don't know how you'd get a speed loss of 10x ON TOP OF the speed loss of BASIC. I mean .. ok .. you could write a bytecode interpreter in BASIC.
I also wrote a (very partial) VAX emulator for the Apple ][ (in assembly language) during the summer holidays. Don't even ask the execution speed :-) :-)
That sounds very impressive, and clever.
By using the X/Y register pair, as 'pretend' accumulators A and B (the 6502's single accumulator, limits/hinders its assembly language). That is a neat trick.
In this scheme, X and Y are not accumulators, but pointers to the accumulators, which are groups of 2 or 4 bytes located anywhere in Zero Page.
ADD16:
clc
lda $0000,y
adc $00,x
sta $00,x
lda $0001,y
adc $01,x
sta $01,x
rts
If variable A is in locations $05 and $06 and B is in locations $87 and $88 then you do A += B with:
ldx #$05
ldy #$87
jsr ADD16
You can have up to 128 such 16 bit variables or 64 32 bit variables.
ADD32:
clc
lda $0000,y
adc $00,x
sta $00,x
lda $0001,y
adc $01,x
sta $01,x
lda $0002,y
adc $02,x
sta $02,x
lda $0003,y
adc $03,x
sta $03,x
rts
Then even later (I'm not sure how many iterations, this has gone on for in practice ?), an even newer release of the calculator comes out, with a somewhat fast, modern Arm core on it. Which emulates the previous cpu, which itself then again, emulates an even older (original) cpu. In order to be the calculator.
I also wrote a (very partial) VAX emulator for the Apple ][ (in assembly language) during the summer holidays. Don't even ask the execution speed :-) :-)The question almost ask itself:
Did you try your 6502 emulator on the VAX emulator?
Many years after, I learned the emulator had been used in many schools for a long time from a co-worker who just matched my name with his high-school memories of CS classes.
In some respects the 6502, was an early RISC processor. Even though technically speaking it was a CISC one.
Then even later (I'm not sure how many iterations, this has gone on for in practice ?), an even newer release of the calculator comes out, with a somewhat fast, modern Arm core on it. Which emulates the previous cpu, which itself then again, emulates an even older (original) cpu. In order to be the calculator.
Well, yes, sure, you can nest emulators.
It's quite ok to do this because the original machine was so much slower than the new ones, but the program was carefully written to run satisfactorily on it.
I may or may not have run "][ in a Mac", on "Basilisk II" (compiled for PowerPC), on the built in "Rosetta" PowerPC emulator on an Intel Mac. Because I could.
And that's fine if you have all those accumulated over the years.
At some point it becomes easier just to write a 6502 emulator for that Intel machine and skip the intermediate 68000 and PowerPC emulations.
Apple's new ARM Macs will have an Intel emulator, but that emulator won't run Rosetta inside it -- Rosetta hasn't been supported for many years already. So I'd need to find or write my own PowerPC emulator. It's *definitely* much easier to write a 6502 emulator than a PowerPC emulator.
In some respects the 6502, was an early RISC processor. Even though technically speaking it was a CISC one.
It's neither. It's more of a minimal instruction set computer (I can find a video of Sophie Wilson saying so), much like the PIC, 8051, DEC PDP8, Data General Nova and other of the earliest microcomputers and minicomputers.
[That didn't really happen the same way with mainframes (aka "computers" from 1940 to 1965) because the early ones were build by or for "money is no object!" government organizations modelling atomic explosions or whatever. It was only in the early minicomputer and microcomputer eras that "any computer is better than no computer" was the rule. And some IBM machines such as the 1130 or the bottom end System/360 too I guess]
There were some similar microprocessors, similar era, which might have, at least in some cost optimisation senses, beaten it, there. E.g. The Z80, having a half-sized 4 bit ALU, which, because of its relatively large number of clock cycles, could fit that in, time wise.
Also, Motorola, did the extreme cost reduced (yet 8 bit to the outside world), 1 bit internally (i.e. bit serial), MC6804P2, which Hitachi and maybe others, second sourced.
I knew about the Z80 4 bit ALU. I didn't know about the MC6804P2.
The smallest FPGA implementation of RISC-V, Olof Kindgren's "SERV", is bit serial. Of course this makes it very slow, with most instructions taking 32 clock cycles (except jumps, load/store, SLT, shifts) but it seems to be popular. It runs at 50 MHz in an ICE40 and 220 MHz on Artix-7 so that's still around 1 to 7 32-bit MIPS https://github.com/olofk/serv
(I enjoyed the Video about it, from the designer)
You use a modern, powerful Arm processor, with its potentially, highly complicated peripheral set. Amazingly powerful peripherals, yes. But they can have 2,000 page manuals, which make for very heavy reading.
QuoteYou use a modern, powerful Arm processor, with its potentially, highly complicated peripheral set. Amazingly powerful peripherals, yes. But they can have 2,000 page manuals, which make for very heavy reading.If what you're after is the equivalent of an 80's 8bit micro, you can ignore most of those 2000 pages, and still have the benefit of more memory and cheaper boards... (plus, you know, several "beginner environments" to fall back on, should you want to do a printf() without having to read that section on the UART (which requires understanding the clock system and the power managers and ...)
Being able to do full system full speed emulation of old machines in freaking JAVASCRIPT is I think one of the most ridiculous aspects of current PCs (and phones).
I also wrote a (very partial) VAX emulator for the Apple ][ (in assembly language) during the summer holidays. Don't even ask the execution speed :-) :-)