Author Topic: Superscalar 68000, have you seen the Apollo core ? What do you think about ?  (Read 31669 times)

0 Members and 1 Guest are viewing this topic.

Offline legacy

  • Super Contributor
  • ***
  • Posts: 4349
  • Country: ch
Quote
APOLLO CPU

The Apollo CPU is a modern CISC CPU design. Apollo is code compatible with the Motorola M68K and ColdFire families. The CPU is fully pipelined and superscalar supporting execution of up to 2 integer instructions per cycle. The CPU features two address calculation engines and two integer execution engines.

The size efficient variable length instruction encoding provides market leading code density and optimal cache utilization.

The CPU features a full internal Harvard architecture with separate multiway data and instruction caches. The instruction and data caches are designed to support concurrent instruction fetch, operand read, and operand write references on every clock. The operand data cache permits simultaneous read and write access each clock. The caches come with write combining, as well as memory stream detection and automatic memory prefetching. The combination of these features enables the core to be very efficient in memory and data manipulation tasks.

The branch prediction and branch folding makes the core ideally suited for execution of control flow code.

Optionally, a fully pipelined, double precision FPU is available to be included in the Core.

The Core is fully written in VHDL and can also be synthesized to be used in an FPGA device. When synthesized in an FPGA, the core offers a good combination of moderate FPGA space consumption and excellent performance. The core can reach up to 200 MHz / 400 Mips in consumer type Cyclone FPGA, and up to 400 MHz / 800 Mips in enterprise type FPGA. Clock by clock the core performs very good and scores in many benchmarks better than several ColdFire, ARM and PowerPC cores on a clock by clock comparison.

Features
  • Fully User-Code Compatible with MC68000
  • Superscalar Implementation of M68000 Architecture
  • Dual Integer Instruction Execution Improves Performance
  • Branch Cache Reduces Branches to Zero Cycles
  • Separate Data and Instruction Caches
  • Full Harvard Architecture allows Simultaneous Access to both caches
  • Data Cache allows Read and Write Access on Each Clock
  • Bus Snooping
  • 32bit Address bus
  • Optimized to Achieve Very High Performance Using DDR DRAM Memory
  • 128 bit Deep Store Buffer and One Deep Push Buffer to Maximize Write Bandwidth
  • Automatic memory stream detection and prefetching
  • Several memory loads can be held in flight in parallel to maximize bandwidth


for more info, see here the apollo-core project
 

Offline legacy

  • Super Contributor
  • ***
  • Posts: 4349
  • Country: ch
personally i do not trust it is really possible, perhaps not for hobby purposes  :-//

guys, what do you think about ?
 

Offline Zad

  • Super Contributor
  • ***
  • Posts: 1013
  • Country: gb
    • Digital Wizardry, Analogue Alchemy, Software Sorcery
Or use a $10 ARM cored chip which has a huge support ecosystem? People do seem to get carried away with the "because we can" side of things.

Offline paulie

  • Frequent Contributor
  • **
  • Banned!
  • Posts: 849
  • Country: us
  • Superscalar Implementation of M68000 Architecture
  • Full Harvard Architecture allows Simultaneous Access to both caches

68k was Von Neumann, definitely not Harvard. How does that work?

I will say that those who do actual programming (aka assembly, aka Real Men) would find 68k a breath of fresh air compared to ARM.
 

Offline amyk

  • Super Contributor
  • ***
  • Posts: 6541
  • Superscalar Implementation of M68000 Architecture
  • Full Harvard Architecture allows Simultaneous Access to both caches

68k was Von Neumann, definitely not Harvard. How does that work?
They're talking about the cache: http://en.wikipedia.org/wiki/Modified_Harvard_architecture#Split_cache_architecture
 

Offline legacy

  • Super Contributor
  • ***
  • Posts: 4349
  • Country: ch
68k was Von Neumann, definitely not Harvard. How does that work?

it's a common approach in fpga, in order not to stall the fetch and the Load/Store stage
don't worry about that, it's just an implementation detail, it doesn't matter with the ISA, which is 68000
 

Offline legacy

  • Super Contributor
  • ***
  • Posts: 4349
  • Country: ch
Or use a $10 ARM cored chip which has a huge support ecosystem? People do seem to get carried away with the "because we can" side of things.

this soft core could be funny for things like "Amiga/Classic"
 

Offline paulie

  • Frequent Contributor
  • **
  • Banned!
  • Posts: 849
  • Country: us
Hmmmm... Not really being an fpga freak I didn't realize MCU can "go both ways". Nice thing about this site... you learn something just about every day.
 

Online Rasz

  • Super Contributor
  • ***
  • Posts: 2291
  • Country: 00
    • My random blog.
400MHz is inside $1K virtex
100-200MHz is inside $100 cyclone

it was made with amiga accelerator "market"  in mind (that is ~100 people running sysinfo) so you can forget about it, it will never see the light of day. Authors probably hope to get a job offer at IBM (HAHA good luck, IBM is getting rid of its chip business at the moment).

Reminds me of http://www.majsta.com , that dude supposedly  joined 'apollo team' whatever that means.
Who logs in to gdm? Not I, said the duck.
My fireplace is on fire, but in all the wrong places.
 

Offline legacy

  • Super Contributor
  • ***
  • Posts: 4349
  • Country: ch
400MHz is inside $1K virtex
100-200MHz is inside $100 cyclone

yes, it's unbelievable story, unbelievable project: see TINA



personally i do not trust it will never exist !
 

Offline legacy

  • Super Contributor
  • ***
  • Posts: 4349
  • Country: ch


that diagram was discussed 1 year ago, and it is still under discussion  :palm:
the same happens to the superscalar 68000 core :palm: :palm:
 

Offline chickenHeadKnob

  • Frequent Contributor
  • **
  • Posts: 829
  • Country: ca
  • doofus programus semi-retiredae
The choice to call this new soft-core "Apollo" or "68020" is very confusing, if that are the names they choose. Back in the day, when the 68020 was a real part number from Motorola, Mentor Graphics (the chip CAD/EDA company), was selling a line of work stations called Apollo. That line used 68020 processors. Worse yet Mentor Graphics implemented a CPU board with a 68020 simulated with PAL's and other small scale integrated parts because they couldn't get real 68020 in time, I think Motorola was having trouble producing or delayed at the time. That emulated 68020 CPU board was a mess of bodge wires and revisions, a real banjo board. Needed constant repair :palm:
 

Offline theoldwizard1

  • Regular Contributor
  • *
  • Posts: 153
68k was Von Neumann, definitely not Harvard. How does that work?

it's a common approach in fpga, in order not to stall the fetch and the Load/Store stage
don't worry about that, it's just an implementation detail, it doesn't matter with the ISA, which is 68000

First, anyone who is "in to" specific implementation of a given Instruction Set Architecture should have already read Computer Architecture, Fifth Edition: A Quantitative Approach.  I think mine is 1st edition, but it is all still relative.

Harrvard versus Von Neumann IS an implementation detail, but it is an extremely important detail !

The biggest performance "bottleneck" is accessing memory for instructions and data.  For years, bigger caches, and multiple layer caches have been the solution.  Pretty much all Harvard architecture machine are "folded" back into a single address space.

IMHO, the best way to improve performance is to improve the CPU to memory (cache or main memory) access and the fastest way to do this is wider buses.  IIRC, the Digital Equipment Corporation Alpha architecture chips used a 256 bit wide bus.  Of course this causes all sorts of other issues.

I am a bit surprised that this is an implementation of the full M68K instead of the Coldfire.  The Coldfire was designed for small size and performance.

The last M68k chip, the 68060, had many interesting features.  Fast instruction decoding and parallel Effective Address calculation.
« Last Edit: December 25, 2014, 12:22:05 am by theoldwizard1 »
 

Online nctnico

  • Super Contributor
  • ***
  • Posts: 18031
  • Country: nl
    • NCT Developments
I very much doubt they can get a core to run at >400MHz inside an FPGA. Sure FPGAs can run at these speeds but you can only have a tiny bit of simple logic. As soon as logic gets more complex routing and logic delays severely hamper the maximum clock frequency.
There are small lies, big lies and then there is what is on the screen of your oscilloscope.
 

Offline legacy

  • Super Contributor
  • ***
  • Posts: 4349
  • Country: ch
Surprised that this is an implementation of the full M68K instead of the Coldfire.  The Coldfire was designed for small size and performance.

Coldfire is not 100% compatible with 68k, so it may be a problem for a 68k system like Amiga.


the 68060 had many interesting features.  Fast instruction decoding and parallel Effective Address calculation.

Inside there is a RISC super scalar design, which makes things much more complex to be implemented, especially for the pipeline around the EA calculation for complex addressing modes (which go much more simpler in the 68000)
 

Offline legacy

  • Super Contributor
  • ***
  • Posts: 4349
  • Country: ch
I very much doubt they can get a core to run at >400MHz inside an FPGA

I have a lot of dubs, like yours, never used something so fast, never seen >400Mhz fpga.

Also, all my two toy-softcores (1) are using half the physically frequency provided to the fpga_clock, i mean in my case i usually use soft core_clock = fpga_clock / 2. I am using a pair of Xilinx fpga, a Spartan3 @ 50Mhz and Spartan6 @ 100Mhz, so in my case the max soft core clock is 25Mhz and 50 Mhz.

The "Apollo 68k-softcore" (2) seems to have fpga_clock equal to the softcore_clock, so 400 Mhz clock, it's unbelievable for me, unfortunately there are no sources, not released yet, so :-//


(1) the first one is MIPS3K compatible, the second is called "ponoku" and is a tiny-RISC ISA i have been developing, similar to MIPS2K but not compatible, they are both multi cycle, not pipelined, Harrvard approached without any cache because i want them designed the simpler they can go.

(2) i have discovered that it was previously called "Natami 68070", i think it is an other big source of confusion :palm:
 

Offline legacy

  • Super Contributor
  • ***
  • Posts: 4349
  • Country: ch
Is this CPU core a hobby project, or something they are trying to sell?

good question, i can't understand  :-//

What i can say:
  • the TINA project is related to the mobo
  • this mobo uses fpgas in order to implement all of the features of an old Amiga 1200 (video, sound, process, and so on)
  • one of these fpga may use
    • the OpenCores TG68K soft core, which is  not superstar, and which is simple a 68000 compatible soft core
    • the Apollo Softcore, which is superscalar, super cpu, and you can't reach mars with it

About Apollo i have NO information, i only know this project was called "Natami"
About TINA it seems there is a small-company off the stage, but they have chosen to hide the company name.
 

Offline legacy

  • Super Contributor
  • ***
  • Posts: 4349
  • Country: ch
here it is an other project which uses the TG68K, it is called "Vampire", it is an accelerator project for Amiga, and it seems someone has improved the old soft core trying to provide more features which makes it partially compatible with 68020

just an other ball of confusion from the Amiga Community :palm:

it seems sources are downloadable from here, you can give an eye, if you want

 

Offline BloodyCactus

  • Frequent Contributor
  • **
  • Posts: 477
  • Country: us
    • Kråketær
I believe this came about because people wanted something besides the TG68k softcore for other Amiga + AtariST projects. Right now the TG68k core has some issues, so of course people spin up other projects for 68k in fpga.

-- Aussie living in the USA --
 

Offline legacy

  • Super Contributor
  • ***
  • Posts: 4349
  • Country: ch
yeah, it may be, what i can't believe is the superscalar version of the 68020, the Apollo-core (it was Natami core), or whatever they want to call it: it is too complex to be made for hobby purposes, very very complex to be validated, it consumes a lot of human resources.

Look at the date of the first Natami news: it was claimed in the far 2007, we are in 2014 (2015 in a few days) and … no Natami-core, dead project, dead code, dead hardware, everything is dead and nothing has been done/released etc, so they have changed the project name, and reloaded the game, again :-DD

The TG68K is a 68000 core, with a pipeline, not superscalar, and it has costed a lot of resources, it was claimed in 2007, and it was ready a few years ago, actually they want to improve it as 68020-compliant core, but it's thousand of million of light year away from a super scalar approach, so … i can't believe the Apollo/Natami/whatever core!
 

Offline BloodyCactus

  • Frequent Contributor
  • **
  • Posts: 477
  • Country: us
    • Kråketær
natami. lol. lets take everyones design ideas, * 10000, include kitchen sink! at least its 'officially' dead now.
-- Aussie living in the USA --
 

Offline theoldwizard1

  • Regular Contributor
  • *
  • Posts: 153
Surprised that this is an implementation of the full M68K instead of the Coldfire.  The Coldfire was designed for small size and performance.

Coldfire is not 100% compatible with 68k, so it may be a problem for a 68k system like Amiga.
True, but when Coldfire was introduced, Motorola had a set of macros that were added to the assembler and it became assembly language compatible !

Much of what was left out were instruction and/or addressing modes that were very seldom used.
 

Offline theoldwizard1

  • Regular Contributor
  • *
  • Posts: 153
I very much doubt they can get a core to run at >400MHz inside an FPGA. Sure FPGAs can run at these speeds but you can only have a tiny bit of simple logic. As soon as logic gets more complex routing and logic delays severely hamper the maximum clock frequency.
Very TRUE !  Unless multiple parts of the design are ASYNCHRONOUS timing distribution at higher speeds is CRITICAL !

If you look at the die photos of the DEC Alpha chip you can clearly see the clock line.  Why should something as insignificant as the clock show up on a die photo ?  Because the had to make it HUGE and drive it hard so that no part of the chip would have re-drive the signal an introduce timing variability !

If arts of the design are physically going to be on differnt FPGAs, they had better be asynchronous !
 

Offline legacy

  • Super Contributor
  • ***
  • Posts: 4349
  • Country: ch
True, but when Coldfire was introduced, Motorola had a set of macros that were added to the assembler and it became assembly language compatible !

Much of what was left out were instruction and/or addressing modes that were very seldom used.

understood, but how tu run a legacy binary software, e.g. AmigaOS kernel and amiga Applications ?
i think such a macros and fixes are usable if you could rebuild from sources. Am i wrong ?
 

Offline theoldwizard1

  • Regular Contributor
  • *
  • Posts: 153
True, but when Coldfire was introduced, Motorola had a set of macros that were added to the assembler and it became assembly language compatible !

Much of what was left out were instruction and/or addressing modes that were very seldom used.

understood, but how tu run a legacy binary software, e.g. AmigaOS kernel and amiga Applications ?
i think such a macros and fixes are usable if you could rebuild from sources. Am i wrong ?
You are 100% correct !  I just ASSUMED someone had the sources !


Designing a general purpose CPU for multitasking because there are no "simple" benchmarks that are really relative to the application that is going to be run.  As I stated before, wider data paths, especially off chip data paths, are the simplest solution but have many technically issue for implementation.

My memory of the M68K instruction set is sketchy, but I seem to recall that there are instruction that modify memory contents directly.  These instruction are going to cause inherent delays (remember, memory access are always the slowest thing a CPU can do) as well as causing the write-back cache to be being dumped to memory.
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf