Author Topic: Superscalar 68000, have you seen the Apollo core ? What do you think about ? (Read 43194 times)

legacy · « **on:** December 24, 2014, 01:51:08 pm »

Quote

APOLLO CPU

The Apollo CPU is a modern CISC CPU design. Apollo is code compatible with the Motorola M68K and ColdFire families. The CPU is fully pipelined and superscalar supporting execution of up to 2 integer instructions per cycle. The CPU features two address calculation engines and two integer execution engines.

The size efficient variable length instruction encoding provides market leading code density and optimal cache utilization.

The CPU features a full internal Harvard architecture with separate multiway data and instruction caches. The instruction and data caches are designed to support concurrent instruction fetch, operand read, and operand write references on every clock. The operand data cache permits simultaneous read and write access each clock. The caches come with write combining, as well as memory stream detection and automatic memory prefetching. The combination of these features enables the core to be very efficient in memory and data manipulation tasks.

The branch prediction and branch folding makes the core ideally suited for execution of control flow code.

Optionally, a fully pipelined, double precision FPU is available to be included in the Core.

The Core is fully written in VHDL and can also be synthesized to be used in an FPGA device. When synthesized in an FPGA, the core offers a good combination of moderate FPGA space consumption and excellent performance. The core can reach up to 200 MHz / 400 Mips in consumer type Cyclone FPGA, and up to 400 MHz / 800 Mips in enterprise type FPGA. Clock by clock the core performs very good and scores in many benchmarks better than several ColdFire, ARM and PowerPC cores on a clock by clock comparison.

Features

Fully User-Code Compatible with MC68000
Superscalar Implementation of M68000 Architecture
Dual Integer Instruction Execution Improves Performance
Branch Cache Reduces Branches to Zero Cycles
Separate Data and Instruction Caches
Full Harvard Architecture allows Simultaneous Access to both caches
Data Cache allows Read and Write Access on Each Clock
Bus Snooping
32bit Address bus
Optimized to Achieve Very High Performance Using DDR DRAM Memory
128 bit Deep Store Buffer and One Deep Push Buffer to Maximize Write Bandwidth
Automatic memory stream detection and prefetching
Several memory loads can be held in flight in parallel to maximize bandwidth

for more info, see here the apollo-core project

legacy · « **Reply #1 on:** December 24, 2014, 01:51:58 pm »

personally i do not trust it is really possible, perhaps not for hobby purposes

guys, what do you think about ?

Zad · « **Reply #2 on:** December 24, 2014, 02:18:15 pm »

Or use a $10 ARM cored chip which has a huge support ecosystem? People do seem to get carried away with the "because we can" side of things.

paulie · « **Reply #3 on:** December 24, 2014, 03:39:40 pm »

Quote from: legacy on December 24, 2014, 01:51:08 pm

Superscalar Implementation of M68000 Architecture
Full Harvard Architecture allows Simultaneous Access to both caches

68k was Von Neumann, definitely not Harvard. How does that work?

I will say that those who do actual programming (aka assembly, aka Real Men) would find 68k a breath of fresh air compared to ARM.

amyk · « **Reply #4 on:** December 24, 2014, 03:54:46 pm »

Quote from: paulie on December 24, 2014, 03:39:40 pm

Quote from: legacy on December 24, 2014, 01:51:08 pm
Superscalar Implementation of M68000 Architecture
Full Harvard Architecture allows Simultaneous Access to both caches

68k was Von Neumann, definitely not Harvard. How does that work?

They're talking about the cache: http://en.wikipedia.org/wiki/Modified_Harvard_architecture#Split_cache_architecture

legacy · « **Reply #5 on:** December 24, 2014, 05:27:44 pm »

Quote from: paulie on December 24, 2014, 03:39:40 pm

68k was Von Neumann, definitely not Harvard. How does that work?

it's a common approach in fpga, in order not to stall the fetch and the Load/Store stage
don't worry about that, it's just an implementation detail, it doesn't matter with the ISA, which is 68000

legacy · « **Reply #6 on:** December 24, 2014, 05:28:26 pm »

Quote from: Zad on December 24, 2014, 02:18:15 pm

Or use a $10 ARM cored chip which has a huge support ecosystem? People do seem to get carried away with the "because we can" side of things.

this soft core could be funny for things like "Amiga/Classic"

paulie · « **Reply #7 on:** December 24, 2014, 07:57:53 pm »

Hmmmm... Not really being an fpga freak I didn't realize MCU can "go both ways". Nice thing about this site... you learn something just about every day.

Rasz · « **Reply #8 on:** December 24, 2014, 10:12:18 pm »

400MHz is inside $1K virtex
100-200MHz is inside $100 cyclone

it was made with amiga accelerator "market" in mind (that is ~100 people running sysinfo) so you can forget about it, it will never see the light of day. Authors probably hope to get a job offer at IBM (HAHA good luck, IBM is getting rid of its chip business at the moment).

Reminds me of http://www.majsta.com , that dude supposedly joined 'apollo team' whatever that means.

legacy · « **Reply #9 on:** December 24, 2014, 11:53:32 pm »

Quote from: Rasz on December 24, 2014, 10:12:18 pm

400MHz is inside $1K virtex
100-200MHz is inside $100 cyclone

yes, it's unbelievable story, unbelievable project: see TINA

personally i do not trust it will never exist !

legacy · « **Reply #10 on:** December 24, 2014, 11:55:29 pm »

that diagram was discussed 1 year ago, and it is still under discussion

the same happens to the superscalar 68000 core

chickenHeadKnob · « **Reply #11 on:** December 25, 2014, 12:14:51 am »

The choice to call this new soft-core "Apollo" or "68020" is very confusing, if that are the names they choose. Back in the day, when the 68020 was a real part number from Motorola, Mentor Graphics (the chip CAD/EDA company), was selling a line of work stations called Apollo. That line used 68020 processors. Worse yet Mentor Graphics implemented a CPU board with a 68020 simulated with PAL's and other small scale integrated parts because they couldn't get real 68020 in time, I think Motorola was having trouble producing or delayed at the time. That emulated 68020 CPU board was a mess of bodge wires and revisions, a real banjo board. Needed constant repair

theoldwizard1 · « **Reply #12 on:** December 25, 2014, 12:19:37 am »

Quote from: legacy on December 24, 2014, 05:27:44 pm

Quote from: paulie on December 24, 2014, 03:39:40 pm
68k was Von Neumann, definitely not Harvard. How does that work?

it's a common approach in fpga, in order not to stall the fetch and the Load/Store stage
don't worry about that, it's just an implementation detail, it doesn't matter with the ISA, which is 68000

First, anyone who is "in to" specific implementation of a given Instruction Set Architecture should have already read Computer Architecture, Fifth Edition: A Quantitative Approach. I think mine is 1st edition, but it is all still relative.

Harrvard versus Von Neumann IS an implementation detail, but it is an extremely important detail !

The biggest performance "bottleneck" is accessing memory for instructions and data. For years, bigger caches, and multiple layer caches have been the solution. Pretty much all Harvard architecture machine are "folded" back into a single address space.

IMHO, the best way to improve performance is to improve the CPU to memory (cache or main memory) access and the fastest way to do this is wider buses. IIRC, the Digital Equipment Corporation Alpha architecture chips used a 256 bit wide bus. Of course this causes all sorts of other issues.

I am a bit surprised that this is an implementation of the full M68K instead of the Coldfire. The Coldfire was designed for small size and performance.

The last M68k chip, the 68060, had many interesting features. Fast instruction decoding and parallel Effective Address calculation.

nctnico · « **Reply #13 on:** December 25, 2014, 01:39:29 am »

I very much doubt they can get a core to run at >400MHz inside an FPGA. Sure FPGAs can run at these speeds but you can only have a tiny bit of simple logic. As soon as logic gets more complex routing and logic delays severely hamper the maximum clock frequency.

legacy · « **Reply #14 on:** December 25, 2014, 01:44:17 am »

Quote from: theoldwizard1 on December 25, 2014, 12:19:37 am

Surprised that this is an implementation of the full M68K instead of the Coldfire. The Coldfire was designed for small size and performance.

Coldfire is not 100% compatible with 68k, so it may be a problem for a 68k system like Amiga.

Quote from: theoldwizard1 on December 25, 2014, 12:19:37 am

the 68060 had many interesting features. Fast instruction decoding and parallel Effective Address calculation.

Inside there is a RISC super scalar design, which makes things much more complex to be implemented, especially for the pipeline around the EA calculation for complex addressing modes (which go much more simpler in the 68000)

legacy · « **Reply #15 on:** December 25, 2014, 01:58:02 am »

Quote from: nctnico on December 25, 2014, 01:39:29 am

I very much doubt they can get a core to run at >400MHz inside an FPGA

I have a lot of dubs, like yours, never used something so fast, never seen >400Mhz fpga.

Also, all my two toy-softcores (1) are using half the physically frequency provided to the fpga_clock, i mean in my case i usually use soft core_clock = fpga_clock / 2. I am using a pair of Xilinx fpga, a Spartan3 @ 50Mhz and Spartan6 @ 100Mhz, so in my case the max soft core clock is 25Mhz and 50 Mhz.

The "Apollo 68k-softcore" (2) seems to have fpga_clock equal to the softcore_clock, so 400 Mhz clock, it's unbelievable for me, unfortunately there are no sources, not released yet, so

(1) the first one is MIPS3K compatible, the second is called "ponoku" and is a tiny-RISC ISA i have been developing, similar to MIPS2K but not compatible, they are both multi cycle, not pipelined, Harrvard approached without any cache because i want them designed the simpler they can go.

(2) i have discovered that it was previously called "Natami 68070", i think it is an other big source of confusion

legacy · « **Reply #16 on:** December 25, 2014, 02:09:34 am »

Quote from: edavid on December 25, 2014, 01:48:03 am

Is this CPU core a hobby project, or something they are trying to sell?

good question, i can't understand

What i can say:

the TINA project is related to the mobo
this mobo uses fpgas in order to implement all of the features of an old Amiga 1200 (video, sound, process, and so on)
one of these fpga may use

the OpenCores TG68K soft core, which is not superstar, and which is simple a 68000 compatible soft core
the Apollo Softcore, which is superscalar, super cpu, and you can't reach mars with it

About Apollo i have NO information, i only know this project was called "Natami"
About TINA it seems there is a small-company off the stage, but they have chosen to hide the company name.

legacy · « **Reply #17 on:** December 25, 2014, 02:15:21 am »

here it is an other project which uses the TG68K, it is called "Vampire", it is an accelerator project for Amiga, and it seems someone has improved the old soft core trying to provide more features which makes it partially compatible with 68020

just an other ball of confusion from the Amiga Community

it seems sources are downloadable from here, you can give an eye, if you want

BloodyCactus · « **Reply #18 on:** December 25, 2014, 03:52:08 pm »

I believe this came about because people wanted something besides the TG68k softcore for other Amiga + AtariST projects. Right now the TG68k core has some issues, so of course people spin up other projects for 68k in fpga.

legacy · « **Reply #19 on:** December 25, 2014, 04:05:41 pm »

yeah, it may be, what i can't believe is the superscalar version of the 68020, the Apollo-core (it was Natami core), or whatever they want to call it: it is too complex to be made for hobby purposes, very very complex to be validated, it consumes a lot of human resources.

Look at the date of the first Natami news: it was claimed in the far 2007, we are in 2014 (2015 in a few days) and … no Natami-core, dead project, dead code, dead hardware, everything is dead and nothing has been done/released etc, so they have changed the project name, and reloaded the game, again

The TG68K is a 68000 core, with a pipeline, not superscalar, and it has costed a lot of resources, it was claimed in 2007, and it was ready a few years ago, actually they want to improve it as 68020-compliant core, but it's thousand of million of light year away from a super scalar approach, so … i can't believe the Apollo/Natami/whatever core!

BloodyCactus · « **Reply #20 on:** December 25, 2014, 10:36:07 pm »

natami. lol. lets take everyones design ideas, * 10000, include kitchen sink! at least its 'officially' dead now.

theoldwizard1 · « **Reply #21 on:** December 26, 2014, 04:21:07 pm »

Quote from: legacy on December 25, 2014, 01:44:17 am

Quote from: theoldwizard1 on December 25, 2014, 12:19:37 am
Surprised that this is an implementation of the full M68K instead of the Coldfire. The Coldfire was designed for small size and performance.

Coldfire is not 100% compatible with 68k, so it may be a problem for a 68k system like Amiga.

True, but when Coldfire was introduced, Motorola had a set of macros that were added to the assembler and it became assembly language compatible !

Much of what was left out were instruction and/or addressing modes that were very seldom used.

theoldwizard1 · « **Reply #22 on:** December 26, 2014, 04:32:25 pm »

Quote from: nctnico on December 25, 2014, 01:39:29 am

I very much doubt they can get a core to run at >400MHz inside an FPGA. Sure FPGAs can run at these speeds but you can only have a tiny bit of simple logic. As soon as logic gets more complex routing and logic delays severely hamper the maximum clock frequency.

Very TRUE ! Unless multiple parts of the design are ASYNCHRONOUS timing distribution at higher speeds is CRITICAL !

If you look at the die photos of the DEC Alpha chip you can clearly see the clock line. Why should something as insignificant as the clock show up on a die photo ? Because the had to make it HUGE and drive it hard so that no part of the chip would have re-drive the signal an introduce timing variability !

If arts of the design are physically going to be on differnt FPGAs, they had better be asynchronous !

legacy · « **Reply #23 on:** December 26, 2014, 05:26:41 pm »

Quote from: theoldwizard1 on December 26, 2014, 04:21:07 pm

True, but when Coldfire was introduced, Motorola had a set of macros that were added to the assembler and it became assembly language compatible !

Much of what was left out were instruction and/or addressing modes that were very seldom used.

understood, but how tu run a legacy binary software, e.g. AmigaOS kernel and amiga Applications ?
i think such a macros and fixes are usable if you could rebuild from sources. Am i wrong ?

theoldwizard1 · « **Reply #24 on:** December 26, 2014, 06:43:43 pm »

Quote from: legacy on December 26, 2014, 05:26:41 pm

Quote from: theoldwizard1 on December 26, 2014, 04:21:07 pm
True, but when Coldfire was introduced, Motorola had a set of macros that were added to the assembler and it became assembly language compatible !

Much of what was left out were instruction and/or addressing modes that were very seldom used.

understood, but how tu run a legacy binary software, e.g. AmigaOS kernel and amiga Applications ?
i think such a macros and fixes are usable if you could rebuild from sources. Am i wrong ?

You are 100% correct ! I just ASSUMED someone had the sources !

Designing a general purpose CPU for multitasking because there are no "simple" benchmarks that are really relative to the application that is going to be run. As I stated before, wider data paths, especially off chip data paths, are the simplest solution but have many technically issue for implementation.

My memory of the M68K instruction set is sketchy, but I seem to recall that there are instruction that modify memory contents directly. These instruction are going to cause inherent delays (remember, memory access are always the slowest thing a CPU can do) as well as causing the write-back cache to be being dumped to memory.


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

Author Topic: Superscalar 68000, have you seen the Apollo core ? What do you think about ? (Read 43194 times)

Share me