Just thought my opinion was interesting. Wasn't it? Ok. I'm out too.
If not already mentioned, consider the AVR Mega2560.
You can configure it to have, 32K of external address/data bus space, because it has so many pins, available.
It is 8 bit, and has an instruction set, somewhat similar, to early 8 bit processors.
It can go fairly fast, if you want, I'd have to look it up, but something like 16 MHz. (Compared to early 8 bit cpus, which only ran at 1 MHz, originally, 6502).
It is a modern and very available chip.
It has lots of built in peripheral I/O devices, a lot more than the usual arduino offerings.
Is a fully fledged arduino member, with the information sources and ready built boards/hardware availability.
But you can buy just the chip.
Quite a bit of onboard flash.
If not already mentioned, consider the AVR Mega2560.
You can configure it to have, 32K of external address/data bus space, because it has so many pins, available.
If not already mentioned, consider the AVR Mega2560.
You can configure it to have, 32K of external address/data bus space, because it has so many pins, available.
Ohhh .. I hadn't realized that! And I own a few of them. Really should read the datasheet :-)
You can configure [atmega2560] to have, 32K of external address/data bus space
So, there use to be a 32K ram expansion board, that you could buy. But I heard they stopped selling it.
Not sure if the following link is the same or a different version:
https://hackaday.io/project/21561-arduino-mega-2560-32kb-ram-shield
But anyway, you can still buy, relatively blank, I/O shields for the Arduino Mega2560's, for not much money (a few dollars, if I remember correctly), from China. Which should allow you to build up your own 32K (or partial 64K, see nice post about it above this one) board on it.
The cool feature of the 8 bit AVR 16 MHz, is that it is genuinely slow enough to just about count as a vintage (like), old school cpu, combined with an 8 bit instruction set, which wouldn't look too out of place, if it came from the 1970s or 1980s.
There are some harsh facts, though:I always figured that when I need to start messing with banking registers, it's time for a different (probably 32bit) architecture.
- These chips, by themselves, are more expensive that newer, faster, ARM or PIC32 chips with much more internal RAM (and the ability to execute from RAM.)
- Using them with external RAM is outside of "normal." You will need to reconfigure the linker scripts and possibly the compiler, and the amount of help and example code you can expect to find "on the web" is small.
QuoteYou can configure [atmega2560] to have, 32K of external address/data bus spaceNot for storing instructions, though/
https://www.rugged-circuits.com/new-products/quadram
XMEGA chips support up to 16MB of external RAM (as well as it can be supported by an 8bit architecture with "natively 16bit" addresses. Bank registers. Eww!) The ATXMEGAA1U Xplained Pro Eval board has an Xmega with 512k bytes of external RAM...
There are some harsh facts, though:I always figured that when I need to start messing with banking registers, it's time for a different (probably 32bit) architecture.
- These chips, by themselves, are more expensive that newer, faster, ARM or PIC32 chips with much more internal RAM (and the ability to execute from RAM.)
- Using them with external RAM is outside of "normal." You will need to reconfigure the linker scripts and possibly the compiler, and the amount of help and example code you can expect to find "on the web" is small.
Really? In what respect?
Since Quantum Mechanics, the math is having more interests for Differential Algebraic, Tensor Algebra and etc.
Things used by the modern physic to describe new ideas, and the computer science reflects a part of his "new hype" with the deep learning AI.
For example, the Google Tensor processor uses rather modern math, both for training operators and so called "autonomous" machines; universities have new courses, and laboratories propose new challenges.
Just 20 years ago, if you had made a thesis on AI, you would have discusses a "lisp-based machine".
Really? In what respect?
Since Quantum Mechanics, the math is having more interests for Differential Algebraic, Tensor Algebra and etc.
Things used by the modern physic to describe new ideas, and the computer science reflects a part of his "new hype" with the deep learning AI.
For example, the Google Tensor processor uses rather modern math, both for training operators and so called "autonomous" machines; universities have new courses, and laboratories propose new challenges.
Just 20 years ago, if you had made a thesis on AI, you would have discusses a "lisp-based machine".
Ok, how does this preclude the use of C on the majority of applications that don't require such heavy maths?
Not trying to offend you, but the speed thing is not especially relevant (if aiming for old retro vintage, slow type of computer). The idea is to make something, broadly similar to a 1MHz 6502 or 4 MHz Z80 system, so the (mostly) single cycle instruction execution time at 16 MHz, is actually something like 30x faster, than those old/original processors, and has many more registers.
Isn't CUDA a sort of dialect of C?
Not trying to offend you, but the speed thing is not especially relevant (if aiming for old retro vintage, slow type of computer). The idea is to make something, broadly similar to a 1MHz 6502 or 4 MHz Z80 system, so the (mostly) single cycle instruction execution time at 16 MHz, is actually something like 30x faster, than those old/original processors, and has many more registers.
32x faster running NOP. Or INX, LDA # etc. But the average 6502 instruction execution time is probably at least 3 cycles -- the time to load / store / add / compare etc something from Zero Page. With roughly equal numbers of 4+ cycle instructions and 2 cycle instruction mixed in. So that makes 50x a better estimate. But also you need more instructions on 6502, even when dealing with 8 bit variables. If you've got more than 3 variables in a function and do A = B + C then you're looking at 2 instructions and 2 clock cycles on AVR (whether 8 bit or 16 bit) but 4 instructions and 11 clock cycles on 6502 for 8 bit and 7 instructions and 20 clock cycles for 16 bit. So you can easily start to see 5 to 10 times more clock cycles on 6502 than AVR before even taking the 16 MHz vs 1 MHz into account.
I reckon 100x would be a good estimate of the speed ratio for skillful hand-written assembly language, 200x for compiled C.
Oh, sorry, I meant OpenCL not CUDA:
https://developer.apple.com/library/archive/documentation/Performance/Conceptual/OpenCL_MacProgGuide/Introduction/Introduction.html
Not trying to offend you, but the speed thing is not especially relevant (if aiming for old retro vintage, slow type of computer). The idea is to make something, broadly similar to a 1MHz 6502 or 4 MHz Z80 system, so the (mostly) single cycle instruction execution time at 16 MHz, is actually something like 30x faster, than those old/original processors, and has many more registers.
32x faster running NOP. Or INX, LDA # etc. But the average 6502 instruction execution time is probably at least 3 cycles -- the time to load / store / add / compare etc something from Zero Page. With roughly equal numbers of 4+ cycle instructions and 2 cycle instruction mixed in. So that makes 50x a better estimate. But also you need more instructions on 6502, even when dealing with 8 bit variables. If you've got more than 3 variables in a function and do A = B + C then you're looking at 2 instructions and 2 clock cycles on AVR (whether 8 bit or 16 bit) but 4 instructions and 11 clock cycles on 6502 for 8 bit and 7 instructions and 20 clock cycles for 16 bit. So you can easily start to see 5 to 10 times more clock cycles on 6502 than AVR before even taking the 16 MHz vs 1 MHz into account.
I reckon 100x would be a good estimate of the speed ratio for skillful hand-written assembly language, 200x for compiled C.
Attempting to use 'standard' units and independent from us, measurements. That would be the 'MIP' then. Because they are both 8 bit, the data size ambiguity, doesn't matter so much.
6502 = 0.430 MIPS at 1.000 MHz, source: https://en.wikipedia.org/wiki/Instructions_per_second
Mega2560 = up to 1 MIPS per MHz, i.e. an 8 MHz processor can achieve up to 8 MIPS = 16 MIPS (@ 16 MHz)
Source: https://en.wikipedia.org/wiki/AVR_microcontrollers
So that gives 16 / 0.430 = x37.21 times faster.
There isn't really an answer as such. Because it varies so much, with what exactly you are trying to do (the program), how well (efficiently) it is implemented, and how good (if not assembler), the compilers are. There could be other factors as well, such as exactly how you measure it (e.g. what the test data is, some data patterns may favour one cpu, over the other, etc).
It isn't only the MIPS, it's that you've got only three (true) registers in the 6502 (ignore zero-page), and they're only 8 bits each, that's quite a handicap that makes simple things unnecessarily complicated, and too often turns one liners into a multi line register load-store-swap ugly mess.
As I said before, and I stand by it, 100x for carefully hand-written code, and 200x for compiled C code. (Dhrystone shows 250x)
Which is partly why Sweet16, a semi-bytecode like 16 bit ALU, was created in/for the Apple, by Steve Wozniak. Which somewhat overcomes, the 6502's limitations.
https://en.wikipedia.org/wiki/SWEET16
Which is partly why Sweet16, a semi-bytecode like 16 bit ALU, was created in/for the Apple, by Steve Wozniak. Which somewhat overcomes, the 6502's limitations.
https://en.wikipedia.org/wiki/SWEET16
"runs at about one-tenth the speed of the equivalent native 6502 code", see?
It was removed very soon, with the AUTOSTART ROM, in 1978 (79?) IIRC. Not that anybody was using it anyways. The big loss if you ask me was the mini-assembler (F666G), also gone with that ROM "upgrade".
There was no floating point at all in the Apple ][, it came with the monitor, a 6502 mini-assembler, and Woz's Integer Basic. FP came years later ("Applesoft") with the II Plus.
The Apple II became one of several recognizable and successful computers during the 1980s and early 1990s, although this was mainly limited to the USA
As I said before, and I stand by it, 100x for carefully hand-written code, and 200x for compiled C code. (Dhrystone shows 250x)
In real life though, the 6502 can be, let's say x25 faster than the Mega2560, as regards hobby projects.
Assumptions:
The 6502 has hardware acceleration (video/sound), and is hand crafted machine code.
The Mega2560, no hardware enhancements, and all code runs via a poorly written Basic Interpreter, someone found, on the internet, for the Mega2560. Which is especially slow. (Not to be confused with cheating, to make a POINT on a forum, ).
Here are a few examples:
You use a (old-era 6502, Based Home computer) Commodore 64 or Atari 800 (6502), or possibly other similar computer, available at the same time.
The hobby project, uses a simple, self-designed, memory-mapped video card, interfaced to the Mega2560.
But the old-era home computer, has hardware sprite chips, and sound chips, potentially, greatly speeding up games, from that era.
But the Mega2560, doesn't.
Also, you compare a 6502, era (time) correct Chess program, written by expert(s), in hand crafted machine code. With the hobbyists, Mega2560, Basic Interpreter's, version of a Chess program.
Again, the 6502, may have a x25 speed advantage.
Arguably, the 6502 is not really suited for C compilers, so your x200, is really just a way of saying its architecture, is not well suited to C compilers. Whereas the Mega2560 (e.g. rich number of registers, and at least, a somewhat orthogonal instruction set), make it well suited to C compilers.
Which is partly why Sweet16, a semi-bytecode like 16 bit ALU, was created in/for the Apple, by Steve Wozniak. Which somewhat overcomes, the 6502's limitations.
https://en.wikipedia.org/wiki/SWEET16