Author Topic: Superscalar 68000, have you seen the Apollo core ? What do you think about ? (Read 43220 times)

richardman · « **Reply #75 on:** October 06, 2015, 10:08:53 pm »

Quote from: edavid on October 06, 2015, 09:04:25 pm

I don't think the "numerous" part is right - I don't think it was common at all. For one thing, 68000s were very expensive, so no one wanted to pay for 2 of them, when Motorola was promising the 68010 very soon. It made more sense to put the money into RAM. Also, most of the early 68K systems shipped with a (crappy) Unisoft Unix port, which didn't even support demand paging.

As for the way it worked, what I remember (which may be wrong) was that the second CPU was run far enough behind that it could be interrupted and its pre-fault stack puke saved. Then, when the main CPU was ready to resume the faulted process, it would reload that good stack puke and resume at the instruction that had faulted.

The story was certainly passed around A LOT at that time (1984/85). The web forgets and now we only have truncated memory :-/ I suppose it has to be more than one, but who knows how many.

Unisoft was the company I was thinking about. They just handed out "build-of-the-day" to workstation vendors as official releases.

There was also a non-AT&T Unix vendor that claims to be independently developed until a couple Bell Labs guys did a dump on the kernel files at a trade show and found the Bell Labs copyright messages. Afterward, they quietly dropped some claims in their advertising.

ale500 · « **Reply #76 on:** October 07, 2015, 04:20:54 am »

I only could dig that Stratus had a maybe similar 2-CPU design, if that was only due to its fault-tolerance or to fault-page... is not clear to me. But, no documents so far.

richardman · « **Reply #77 on:** October 07, 2015, 05:19:24 am »

Apollo Domain definitely had 2-CPU, and I am almost certain Pixel Workstation did also.

grumpydoc · « **Reply #78 on:** October 07, 2015, 07:31:05 pm »

Quote from: edavid on October 06, 2015, 09:04:25 pm

As for the way it worked, what I remember (which may be wrong) was that the second CPU was run far enough behind that it could be interrupted and its pre-fault stack puke saved. Then, when the main CPU was ready to resume the faulted process, it would reload that good stack puke and resume at the instruction that had faulted.

The more I think about this the more I think this one instruction behind business can't be correct.

I don't think the 2nd CPU could be more than 1 instruction behind otherwise you would have all sorts of problems keeping the CPUs executing the same code due to the effect of, or lack of writes to memory. Even then it might be hard, the lead CPU would probably have to have writes ignored - eg what happens if the lead CPU executes a test & set (an atomic R-M-W operation on the 68000). If the write is ignored, fine the 2nd CPU will see the unchanged value and take the same path but if not it will possibly take a different branch (since test-and-set is usually followed by a branch).

However if writes from the lead CPU are ignored think about what happens if code reads a value, modifies it, writes it back and then immediately reads it again. You probably wouldn't write code like that by hand but it could be generated by a compiler - especially one without much optimisation. The lead CPU write goes ignored so the read will get the old value.

You might fix the above by prioritising lag CPU writes of main memory before lead CPU reads within the same cycle but what about instructions with wildly differing timings. Eg a register op followed by a DIV. The lead CPU will do the register op then start the DIV, at the same time the lag CPU will be on the register op but will move to the DIV while the lead CPU is still executing it. In that case the lead CPU would no longer be one instruction ahead, it would just be a few clock cycles ahead.

We haven't even thought about arbitrating the bus between the two CPUs

In short I can't see that the idea of executing the same code works.

What would work, however is that one CPU executes the code and the other just handles page faults. The CPU which is executing code is just delayed - perhaps by simply not asserting DTACK until the memory operation can complete.

Looking at a few of the online notes that is how the scheme is described.

edavid · « **Reply #79 on:** October 07, 2015, 08:05:19 pm »

Quote from: grumpydoc on October 07, 2015, 07:31:05 pm

I don't think the 2nd CPU could be more than 1 instruction behind otherwise you would have all sorts of problems keeping the CPUs executing the same code due to the effect of, or lack of writes to memory. Even then it might be hard, the lead CPU would probably have to have writes ignored - eg what happens if the lead CPU executes a test & set (an atomic R-M-W operation on the 68000). If the write is ignored, fine the 2nd CPU will see the unchanged value and take the same path but if not it will possibly take a different branch (since test-and-set is usually followed by a branch).

The lead CPU does all the memory accesses. The 2nd CPU doesn't read live memory data, it gets the same data that the lead CPU saw (delayed by a FIFO).

I think that in some cases you have to decode what instruction caused the fault, and back out any pre-fault side effects.

Quote

What would work, however is that one CPU executes the code and the other just handles page faults. The CPU which is executing code is just delayed - perhaps by simply not asserting DTACK until the memory operation can complete.

If that was how they did it, they wouldn't have needed an expensive 68000 CPU for the service processor, they could have used a Z80 or whatever.

However, synchronous page faults are so slow, that it's not really worth building a system that way.

grumpydoc · « **Reply #80 on:** October 07, 2015, 10:21:23 pm »

Quote from: edavid on October 07, 2015, 08:05:19 pm

The lead CPU does all the memory accesses. The 2nd CPU doesn't read live memory data, it gets the same data that the lead CPU saw (delayed by a FIFO).

OK, but the lead CPU, by definition, can't complete the faulting instruction so you have to be able to have either CPU in the lead CPU role.

And putting a buffer/FIFO in there does not sound simple

Quote

I think that in some cases you have to decode what instruction caused the fault, and back out any pre-fault side effects.

Which doesn't sound do-able in the general case.

Quote

Quote
What would work, however is that one CPU executes the code and the other just handles page faults. The CPU which is executing code is just delayed - perhaps by simply not asserting DTACK until the memory operation can complete.
If that was how they did it, they wouldn't have needed an expensive 68000 CPU for the service processor, they could have used a Z80 or whatever.

True but having two identical processors simplifies system design - both talk to memory the same way, you only need to write for one ISA etc.

I honestly remembered the story as two CPUs in lock-step but the more I think about it the more I think it is so much harder to do that than just having a system where one CPU does the paging while holding the other mid instruction. Occam's razor and all that.

Bassman59 · « **Reply #81 on:** October 07, 2015, 11:01:36 pm »

Quote from: richardman on October 05, 2015, 08:31:10 am

Anyway, Apollo machines were made by Apollo Inc., not Mentor Graphics as a poster mentioned. They were 68K based, so they used 2 68K core executing the same code, and when one fell over due to memory paging, the other would check the content of the stack so the whole machine can recover. The 68020 solved that finally by putting more machine context onto the fault stack so all instructions can be restarted.

At the bomb factory I worked for after college, we had a bunch of Apollo machines in an R&D lab. (Networked with token ring!) They ran Mentor Graphics Boardstation (I think!), and were used for analog board design and simulation. They were frightfully expensive (but, hey, bomb factory $$$). As I remember, yes, the hardware was made by Apollo, but the system, both software and hardware, was sold by Mentor.

It was a big deal when the machines were upgraded to 68040 motherboards. (Or maybe they were 030s? This was a long time ago.)

Quote

Apollo was bought by HP back in the days, around 1986 I believe.

It was later than 1986, because I started at the bomb factory in 1988, and the machines were Apollo, not HP-Apollo.

edavid · « **Reply #82 on:** October 08, 2015, 12:01:20 am »

Quote from: grumpydoc on October 07, 2015, 10:21:23 pm

Quote from: edavid on October 07, 2015, 08:05:19 pm
The lead CPU does all the memory accesses. The 2nd CPU doesn't read live memory data, it gets the same data that the lead CPU saw (delayed by a FIFO).
OK, but the lead CPU, by definition, can't complete the faulting instruction so you have to be able to have either CPU in the lead CPU role.

Sure it can... after the fault is resolved, the lead CPU is restarted at the faulting instruction.

Rasz · « **Reply #83 on:** October 08, 2015, 02:26:32 am »

you can restart as long as you are able to fire up interrupt on the following CPU before 'bad' instruction

richardman · « **Reply #84 on:** October 08, 2015, 03:08:15 am »

Quote from: Bassman59 on October 07, 2015, 11:01:36 pm

At the bomb factory I worked for after college, we had a bunch of Apollo machines in an R&D lab. (Networked with token ring!) They ran Mentor Graphics Boardstation (I think!), and were used for analog board design and simulation. They were frightfully expensive (but, hey, bomb factory $$$). As I remember, yes, the hardware was made by Apollo, but the system, both software and hardware, was sold by Mentor.

It was a big deal when the machines were upgraded to 68040 motherboards. (Or maybe they were 030s? This was a long time ago.)

Quote
Apollo was bought by HP back in the days, around 1986 I believe.

It was later than 1986, because I started at the bomb factory in 1988, and the machines were Apollo, not HP-Apollo.

Cool. I worked for LTX, the test machine company (Left Teradyne at Xmas :-) ) only for 9 months. After a spinout on RT. 128 doing a 360 on a snow day was enough for me to say "THANK GOD I AM ALIVE. I AM QUITTING ASAP!" :-) and went to work for Whitesmiths, the company that produced the first commercial C compiler outside of AT&T.

So... back to the OP, this superscalar 68K core is... dead? Too bad. It has a lot going for it. I mean the 68K architecture.

OTOH, even the mighty Motorola, back in the days, could not bring out a 16/32 bits processor, in the forms of mCore. It looks and sounds great, but everyone was/is trampling to move to ARM...

DJohn · « **Reply #85 on:** October 08, 2015, 11:20:18 am »

Quote from: grumpydoc on October 07, 2015, 10:21:23 pm

I honestly remembered the story as two CPUs in lock-step but the more I think about it the more I think it is so much harder to do that than just having a system where one CPU does the paging while holding the other mid instruction. Occam's razor and all that.

Lock-step is the story I remember hearing, but it has to be urban legend. It's too hard to make it work (if it's possible at all - what happens if there are two or more faults in the same instruction?), and the alternative is both simple and obvious.

The 68K will sit waiting for ever if you don't assert /DTACK. When the MMU sees a request for a page that isn't in physical memory, all it has to do is interrupt the other processor, wait for the signal that the data is loaded (and the page tables updated), then tell the first processor to continue. There's no need to restart any instructions. The rest of the time, your second processor can be running useful code (anything that can't fault).

grumpydoc · « **Reply #86 on:** October 08, 2015, 12:54:52 pm »

Quote from: edavid on October 07, 2015, 08:05:19 pm

The 2nd CPU doesn't read live memory data, it gets the same data that the lead CPU saw (delayed by a FIFO).

Not being funny but how does the FIFO know how deep to be?

I'm not overly familiar with 68k assembler (having done all my coding on such systems in C) but AFAICS a 68K instruction can do multiple reads - certainly up to three if both the source and destination operands live in memory. So if we could somehow lock-step instructions we would need to buffer a variable number of reads.

edavid · « **Reply #87 on:** October 08, 2015, 03:04:03 pm »

Quote from: grumpydoc on October 08, 2015, 12:54:52 pm

Quote from: edavid on October 07, 2015, 08:05:19 pm
The 2nd CPU doesn't read live memory data, it gets the same data that the lead CPU saw (delayed by a FIFO).
Not being funny but how does the FIFO know how deep to be?

I don't think it matters as long as it's deep enough.

I should probably stop trying to remember this stuff though, it's just been too long and it makes my head hurt

edavid · « **Reply #88 on:** October 08, 2015, 03:10:23 pm »

Quote from: DJohn on October 08, 2015, 11:20:18 am

The 68K will sit waiting for ever if you don't assert /DTACK. When the MMU sees a request for a page that isn't in physical memory, all it has to do is interrupt the other processor, wait for the signal that the data is loaded (and the page tables updated), then tell the first processor to continue. There's no need to restart any instructions. The rest of the time, your second processor can be running useful code (anything that can't fault).

This is a common misconception about demand paging systems... in practice, they are never built this way. Demand paging only gives a benefit if you can overlap other processing with paging.

Rasz · « **Reply #89 on:** October 08, 2015, 03:59:29 pm »

Quote from: DJohn on October 08, 2015, 11:20:18 am

The 68K will sit waiting for ever if you don't assert /DTACK. When the MMU sees a request for a page that isn't in physical memory, all it has to do is interrupt the other processor, wait for the signal that the data is loaded (and the page tables updated), then tell the first processor to continue. There's no need to restart any instructions. The rest of the time, your second processor can be running useful code (anything that can't fault).

you are in the middle of instruction, afaik you cant hold cpu there (maybe can static variants?), load will fail and corrupt

grumpydoc · « **Reply #90 on:** October 08, 2015, 05:22:35 pm »

Quote

you are in the middle of instruction, afaik you cant hold cpu there (maybe can static variants?), load will fail and corrupt

It should be fine - by not asserting DTACK the CPU will insert wait states so it isn't halted or being held without a running clock. It's just waiting for the memory operation to compete..........

ale500 · « **Reply #91 on:** October 12, 2015, 08:44:10 am »

Going back to the superscalar 68000 topic.... if someone (else) want to (also) undertake this, one possibility would be to make a code-morphing hybrid core/software package. That would also be quite an undertake... I'm tempted to do exactly that but for the simpler (but not by much) 6809, just as an exercise because I think there is not really much point wasting time in something this old and not that used, or is it ?

legacy · « **Reply #92 on:** October 12, 2015, 08:40:18 pm »

Quote from: ale500 on October 12, 2015, 08:44:10 am

code-morphing hybrid core/software package

can you explain it ?

ale500 · « **Reply #93 on:** October 13, 2015, 05:32:28 am »

You have a piece of software that converts a stream of 68k instructions into a stream of your super-fast-but-really-simple-processor instructions. Your simple super fast core can natively do in a few instructions what the 68 k do in one, but faster

.

There are a couple of possibilities regarding for instance flags calculation, they don't need to be calculated every time if they are not used. Such a system has a bit of latency but once going the throughput should be good...

richardman · « **Reply #94 on:** October 13, 2015, 09:40:55 am »

Quote from: ale500 on October 13, 2015, 05:32:28 am

You have a piece of software that converts a stream of 68k instructions into a stream of your super-fast-but-really-simple-processor instructions. Your simple super fast core can natively do in a few instructions what the 68 k do in one, but faster .

There are a couple of possibilities regarding for instance flags calculation, they don't need to be calculated every time if they are not used. Such a system has a bit of latency but once going the throughput should be good...

x86 does this since the late 90s. IMHO, not worth it for any new design - as there aren't a whole lot of existing 68K code anymore (except for old Amiga, Mac, ST etc.) so if you are going to do a fast 68K, then do a fast 68K. If you want to do a fast RISC, do a fast RISC.

Transmeta tried that too, RIP

IanP · « **Reply #95 on:** February 19, 2016, 04:51:51 am »

Just a little update on the progress of the Apollo core and Vampire 2. The Vampire 2 production board for the Amiga 600 began shipping 1 month ago (18th January 2016) to those that pre-ordered it. The boards are shipping with the Silver 1 version of the core described as "stable and fast" but not guaranteed 100% bug free. The Silver 2 version is expected to be released in the next few days. Vampire 2 users will be able to update to the Silver 2 core using an easy software procedure on the Amiga to field program the new core (no need to disassemble the computer). Silver 2 will fix some issues and likely boost performance a little. Next month is the target for production of the Vampire 2 for the Amiga 500 to begin, pending successful tests of the prototype boards. The specifications for the Amiga 500 version of the Vampire 2 are the same as the Amiga 600 version apart from the addition of a 44 pin IDE connector (like on the Amiga 600 and A1200 motherboards). Although targeted at the popular A500/A500+ this version of the Vampire 2 should also work in the original A1000, the A2000 and it's variants and possibly the CDTV. It is hoped that the Gold version of the Apollo core (including new features) will be ready in time to be installed prior to the first shipments of the Vampire 500 V2.

Modern Vintage Gamer reviewed and tested the Vampire 600 V2 with the Silver 1 Apollo core.

Lots more information available on the Apollo forums http://www.apollo-core.com/knowledge.php?b=0 where you'll find links to more videos and discussions about the Apollo core and the Vampire boards.

legacy · « **Reply #96 on:** February 19, 2016, 12:16:37 pm »

interesting

richardman · « **Reply #97 on:** February 19, 2016, 08:52:03 pm »

IMPRESSIVE!!!

sleary78 · « **Reply #98 on:** September 02, 2016, 01:13:43 pm »

So is the Apollo core available as verilog/VHDL or is it just an unlicensed rip off of the 68K series?

edavid · « **Reply #99 on:** September 02, 2016, 02:56:17 pm »

Quote from: sleary78 on September 02, 2016, 01:13:43 pm

So is the Apollo core available as verilog/VHDL or is it just an unlicensed rip off of the 68K series?

What a weird question.

It's a softcore, but it doesn't seem to be available unless you buy the "Vampire" board.

Why would you call it a "rip off"? Since when do you need a license to build a CPU emulator? What kind of license are you even talking about?


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

Author Topic: Superscalar 68000, have you seen the Apollo core ? What do you think about ? (Read 43220 times)

Share me