EEVblog Electronics Community Forum

Electronics => Microcontrollers => Topic started by: obiwanjacobi on February 19, 2016, 06:38:12 am

Title: Mill CPU Architecture
Post by: obiwanjacobi on February 19, 2016, 06:38:12 am
Ran into this a couple days ago - but it is a couple of years old.

A new CPU architecture that, by the looks of it, is trying to remedy some of the shortcomings in modern CPUs.

ootbcomp.com (http://millcomputing.com/docs/)

YouTube (https://www.youtube.com/channel/UCKdGg6hZoUYnjyRUb08Kjbg/videos)

Having watched 3 videos I find this stuff fascinating.

 :popcorn:
Title: Re: Mill CPU Architecture
Post by: HAL-42b on February 19, 2016, 06:46:03 am
I remember it was a kind of stream processor. Interesting indeed.
Title: Re: Mill CPU Architecture
Post by: ataradov on February 19, 2016, 06:48:19 am
Yeah, I've been tracking it for sometime. Gandalf presentations are fun to watch, but architecture itself is very questionable.

There has been a number of similar architectures designed over the years, mostly by researches with no  implementation in hardware. Counterflow Pipeline is a good example of one, but there are many others.

Mill will have a ton of problems scaling up to a level of modern processors. It will have huge problems with things like caches, virtual memory, memory protection, virtualization.
Title: Re: Mill CPU Architecture
Post by: andersm on February 19, 2016, 07:21:32 am
I'll wait until there's actual silicon that can be tested against real-world problems. So far the most impressive thing about the Mill is their PR machine.
Title: Re: Mill CPU Architecture
Post by: obiwanjacobi on February 22, 2016, 07:08:04 am
I don't care if they ever build it. I like thinking about this stuff and I love hearing about other people's ideas.  :blah:

(just had a nice idea myself but not sure if it can be done   8) )
Title: Re: Mill CPU Architecture
Post by: HAL-42b on February 22, 2016, 08:19:10 am
We have ben stuck in the VonNeumann architecture for way too long and this has been very detrimental in some aspects.

Most importantly security is impossible on a VonNeumman machine because everything is on the same memory space. therefore the machine can't keep itself secure from itself.

Turing and VonNeumman knew the security implications of this, matter of fact they knew everything there is to know about information theory but none of that was important in their time.

My theory is that at a later date somebody decided that this situation should be perpetuated for consumer equipment.

I don't think a non-VonNeumann architecture would have been allowed to proliferate the consumer market. That decision was probably made at somebody's desk at the Pentagon some time in the 60's or 70's, probably after analyzing VonNeumann's report  on the architecture. 

He did not invent that architecture btw, he was merely reporting to the big brass about the work of Eckert and Mauchly who were building ENIAC. VonNeumann was the highest vetted computer and code expert at the time having been on the American side of the Enigma work. He was probably submitting daily reports about the work including security and vulnerability analysis about the machine and the people working on it.

Until recently it was kept secret but most of the Bombe code crunching work was actually done in America under VonNeumann's supervision. The data was relayed by undersea cables each day and there were far more Bombe machines in America than there were in Bletchley. Nobody was allowed to know.

Turing in comparison on the British side was not well indoctrinated and not at all trusted. As soon as the war ended they put him to gather dust on a desk as far away from the military as they can. Later when his homosexuality became public knowledge he became too big of a security liability even at that position. What if the Russians learned about that and blackmailed him into subversion?

Title: Re: Mill CPU Architecture
Post by: ataradov on February 22, 2016, 08:25:16 am
Most importantly security is impossible on a VonNeumman machine because everything is on the same memory space. therefore the machine can't keep itself secure from itself.
This is absolutely not true. Today it is hard to find even a simple [ARM] micro without memory protection unit. It is typically not very capable, but good enough for most applications.

Higher-end MPUs and CPUs have very good memory protection mechanism. Go ahead and try to crash something other than your own application under any modern OS.

It is like saying C is not secure, lets all use LISP. There is a natural selection, and nobody wants LISP, however good it is at calculating a factorial.

Same goes for MCUs. What we have is probably not the best, but most definitely good enough. And any proposal that claims to solve all the problems at once is bound to fail.
Title: Re: Mill CPU Architecture
Post by: obiwanjacobi on February 22, 2016, 08:58:48 am
There is a video on Mill Security which I found very interesting - not that I am an expert in any of this...
Title: Re: Mill CPU Architecture
Post by: HAL-42b on February 22, 2016, 09:14:36 am
This is absolutely not true. Today it is hard to find even a simple [ARM] micro without memory protection unit. It is typically not very capable, but good enough for most applications.

Higher-end MPUs and CPUs have very good memory protection mechanism. Go ahead and try to crash something other than your own application under any modern OS.

It is like saying C is not secure, lets all use LISP. There is a natural selection, and nobody wants LISP, however good it is at calculating a factorial.

Same goes for MCUs. What we have is probably not the best, but most definitely good enough. And any proposal that claims to solve all the problems at once is bound to fail.

This has been proven wrong many times and it is also not very relevant in an age where hardware comes backdoored from the factory. Memory protection is at best a hack.

Turing and VonNeumann could have told you even back then the properties of a secure cryptographic or computational system. These properties are fundamental and if not observed no subsequent hack can remedy them.

1 -Cyphertext key and cleartext should never ever be stored in the same place. 2 - The only place these three come together must be in the cryptographic engine, 3 - which should have no persistent memory and should not be Turing complete. It should be a finite state machine with only enough states to let it do its job.

From this you can infer that no computer is fit for cryptography or security work because it is Turing-complete and also has memory.

You can extend this to imply that storing code and data on a Turing-complete machine is also fundamentally insecure. A Turing-complete system is incapable of keeping secrets from itself. You can't change this fact by merely writing more code for the Turing machine to compute.

The Berkeley architecture mitigates that to an extent but the mere fact that it is not the dominant architecture on the market today indicates that there are other non-scientific factors at work.

This also means that even with Berkeley architecture you still can not store the above three elements on the same machine.

Btw has anyone counted how many separate Turing-complete systems are there in a computer?
Title: Re: Mill CPU Architecture
Post by: tggzzz on February 22, 2016, 09:16:43 am
Have a look at the comp.arch archives. Ivan Godard has been very active discussing the reasons and consequences with a group of very experienced and knowlegable people. His answers are good and solid, but that won't be the determining factor in the Mill's success/failure.

Information has only been disclosed in dribs and drabs, after the relevant patents have been filed.

My opinion: very interesting, but we will see if it is ever built and used in anger. I would expect some of the patents to be licenced to manufacturers.
Title: Re: Mill CPU Architecture
Post by: coppice on February 22, 2016, 09:33:39 am
Search on YouTube for Ivan Godard and you will find hours of material from him on the Mill architecture, including a series of talks going though things in fine detail.
Title: Re: Mill CPU Architecture
Post by: Bruce Abbott on February 22, 2016, 04:01:40 pm
This has been proven wrong many times and it is also not very relevant in an age where hardware comes backdoored from the factory.
Actually it is relevant because it shows that CPU architecture is not the problem.

Quote
Memory protection is at best a hack.
How so?

Quote
From this you can infer that no computer is fit for cryptography or security work because it is Turing-complete and also has memory.
Now you are talking about a computer system, not the CPU.

Title: Re: Mill CPU Architecture
Post by: ataradov on February 22, 2016, 05:08:04 pm
Btw has anyone counted how many separate Turing-complete systems are there in a computer?
I'm not arguing against security, I'm arguing for sanity.

There is no way I'm going back to programming anything with Harvard architecture, so are 99.9% of the programmers. That's just too painful and impractical.

It is not an all or nothing situation, there must be a combined approach. Which is what we have right now and I really see no downsides.

One more thing on the Mill - it can't handle interrupts efficiently without extra hardware. Mill heavily relies on compiler keeping track of how many things are stored on the belt at the moment. Interrupts introduce uncertainty into that process.
Title: Re: Mill CPU Architecture
Post by: HAL-42b on February 22, 2016, 05:12:13 pm
Actually it is relevant because it shows that CPU architecture is not the problem.


Quote
Now you are talking about a computer system, not the CPU.

If you insist on making that distinction... On my part I don't know how to take an existing CPU and design a different architecture around it, so the point of the exercise  is lost on me.

You must have noticed that I don't even distinguish between computers and cryptographic systems. From the information theory perspective they are governed by the same laws.


Quote
Quote
Memory protection is at best a hack.
How so?

The Red Pill Attack is an attack first demonstrated in 2006 by Joanna Rutkowska where a complete system working on the bare hardware is transferred to virtual environment without the system realizing it. It renders memory protection completely irrelevant.

This is only one in a long list of attacks that defeat memory protection. All in all leaving such a fundamental security task to the operating system is foolish, a hack at best, and completely inexplicable from security standpoint.

This is like leaving prison security to the filing clerk. It would work 100% if all prisoners were honest...

Title: Re: Mill CPU Architecture
Post by: ataradov on February 22, 2016, 05:16:02 pm
The Red Pill Attack is an attack first demonstrated in 2006 by Joanna Rutkowska where a complete system working on the bare hardware is transferred to virtual environment without the system realizing it. It renders memory protection completely irrelevant.
We are talking about different things. If x86 was Harvard, it still would be possible to get the image of the entire system and put it into an emulator. How does it solve anything?

You need a real hardware authentication method. That's what UEIFI and Secure boot are.
Title: Re: Mill CPU Architecture
Post by: c4757p on February 22, 2016, 05:25:07 pm
I think it is just very clear that HAL-42b does not work in, around, or anywhere near computer security in any sense of the phrase. |O
Title: Re: Mill CPU Architecture
Post by: HAL-42b on February 22, 2016, 05:27:41 pm
There is no way I'm going back to programming anything with Harvard architecture, so are 99.9% of the programmers. That's just too painful and impractical.

It is not an all or nothing situation, there must be a combined approach. Which is what we have right now and I really see no downsides.

One more thing on the Mill - it can't handle interrupts efficiently without extra hardware. Mill heavily relies on compiler keeping track of how many things are stored on the belt at the moment. Interrupts introduce uncertainty into that process.

Programming implies a Turing-Complete machine and language. What I'm saying is that if you want guaranteed security you should stay away from Turing complete machines. So no programming necessary  ;D

You can let your computer handle the cyphertext but the cleartext, the key and the crypto algorithm should reside outside of it on a non-Turing complete device with minimal capabilities.  Something like the KYK-13 key storage device.
Title: Re: Mill CPU Architecture
Post by: c4757p on February 22, 2016, 05:31:55 pm
Non-programmable, simplistic machines have never been compromised. Nope! Never. Not once in the entire history of security. In fact, I wouldn't have any relevant, specific experience with that! Nope. Not one bit. ^-^
Title: Re: Mill CPU Architecture
Post by: HAL-42b on February 22, 2016, 05:37:56 pm
That's what UEIFI and Secure boot are.

Those are not end user verifiable. They are good for Microsoft but don't do much for you as an end user. Coreboot maybe.
Title: Re: Mill CPU Architecture
Post by: ataradov on February 22, 2016, 05:51:52 pm
You can let your computer handle the cyphertext but the cleartext, the key and the crypto algorithm should reside outside of it on a non-Turing complete device with minimal capabilities.  Something like the KYK-13 key storage device.
Ok, but how does bashing of the VonNeumman architecture come into play here? There are plenty of other security devices like that, just use them with whatever architecture you like. Mill does not solve any of this on its own and will still need external security devices, which is fine.

Title: Re: Mill CPU Architecture
Post by: tggzzz on February 22, 2016, 05:55:04 pm
One more thing on the Mill - it can't handle interrupts efficiently without extra hardware.

Do you have a reference for that?

Many things about the mill are NYF (not yet filed), and I haven't been keeping up recently. As we all know, absence of evidence is not evidence of absence,
Title: Re: Mill CPU Architecture
Post by: ataradov on February 22, 2016, 05:59:27 pm
Do you have a reference for that?
I watched all of his lectures, and that statement is based purely on my understanding of the architecture. It is possible, of course, that they have thought this through, but I feel like they really did not and interrupts will be implemented as a hack with a separate belt, which will mean very inefficient communication between the main belt and interrupt belt.

They are not really sure how virtualization and context switching will work, that's is clear from Q&A. He just did not have a good answer for any this.

And you can stretch this "we have cool stuff, but patents are not filed yet" story only so far before you either need to start giving answers or stop promoting your stuff until you have filed for patents.

Right now it looks like "we don't know yet, but we'll think about it, file a patent, and tell you when its done".
Title: Re: Mill CPU Architecture
Post by: legacy on February 22, 2016, 06:19:48 pm
If I should ever design my own CPU the most useful instruction would be Halt and Catch Fire (HCT, 0x6666), just to have funnier talks when people will ask WTF is the reason for that  ;D
Title: Re: Mill CPU Architecture
Post by: legacy on February 22, 2016, 06:25:16 pm
(three characters (https://en.wikipedia.org/wiki/Halt_and_Catch_Fire) typed behind the humor
in one video, Ivan Godard explained the euphemism :D)
Title: Re: Mill CPU Architecture
Post by: HAL-42b on February 22, 2016, 06:47:47 pm
Ok, but how does bashing of the VonNeumman architecture come into play here? There are plenty of other security devices like that, just use them with whatever architecture you like. Mill does not solve any of this on its own and will still need external security devices, which is fine.

I'm not defending the Mill architecture. To me it is just a curiosity. The Harvard Architecture on the other hand is a marked improvement over the VonNeumman in terms of security.

VonNeumman has been tested in the real world and every aspect of it has been defeated comprehensively. It was built without any consideration for security and I assert this was known to its creators right from the start. At the time this was not important. A computer only did computations and as long as computations were correct there was no reason for concern. This has been perpetuated by the industry ever since then, trough the advent of the personal computers and later in the Internet age and to this day. Security was left to the operating system and then to third parties. You couldn't make it any worse if you tried.
Title: Re: Mill CPU Architecture
Post by: hamster_nz on February 22, 2016, 10:52:33 pm
The Harvard Architecture on the other hand is a marked improvement over the VonNeumman in terms of security.

A bit of an over generalization? Most vonNeumman CPUs that are able to run an OS with virtual memory implement a lot of the "obvious" security enhancements using MMU features (read-only pages, no-execute flags...).
Title: Re: Mill CPU Architecture
Post by: Bruce Abbott on February 22, 2016, 11:22:37 pm
Quote
Now you are talking about a computer system, not the CPU.
If you insist on making that distinction... On my part I don't know how to take an existing CPU and design a different architecture around it, so the point of the exercise  is lost on me.
Because the design of the system has a huge impact on security - far more than what memory map the CPU uses.

Quote
The Red Pill Attack is an attack first demonstrated in 2006 by Joanna Rutkowska where a complete system working on the bare hardware is transferred to virtual environment without the system realizing it. It renders memory protection completely irrelevant.
Any system can be compromised if you are able to emulate it exactly - but that doesn't make memory protection a hack.

You know what are hacks? Cache RAM, pipelining, out-of-order instruction execution, predictive branching, superscalar. All hacks that were added to make up for memory not being as fast as we want.

Quote
This is only one in a long list of attacks that defeat memory protection. All in all leaving such a fundamental security task to the operating system is foolish, a hack at best, and completely inexplicable from security standpoint.
The primary purpose of memory protection is to prevent an errant program from trashing memory that doesn't belong to it. How easily it can be defeated depends on how it is implemented. If it is controlled by an insecure operating system then that is the fault of the operating system, not memory protection.

Quote
This is like leaving prison security to the filing clerk. It would work 100% if all prisoners were honest...
Most prisons have high walls topped with barbed wire.  But prisoners will still find a way over them if not stopped - proving that the walls are a foolish hack...
Title: Re: Mill CPU Architecture
Post by: obiwanjacobi on February 26, 2016, 06:34:53 am
Here is a video on the Mill Security. Again I am no expert, but it did sound clever to me  ::)

https://www.youtube.com/watch?v=5osiYZV8n3U (https://www.youtube.com/watch?v=5osiYZV8n3U)
Title: Re: Mill CPU Architecture
Post by: Mechanical Menace on February 26, 2016, 10:57:26 am
The Harvard Architecture on the other hand is a marked improvement over the VonNeumman in terms of security.

Implementing security is always a balancing act with keeping something usable. How do you implement self modifying code on a pure Harvard architecture machine? Or load a program when you can't treat code as data? The answer? Modified Harvard*, which comes with all the same security problems as Von Neumann. Pure Harvard is just terrible for a general purpose computer.

*Most of the time. Some MCUs and DSPs just have special load/save program memory instructions, but they aren't general purpose.
Title: Re: Mill CPU Architecture
Post by: Schol-R-LEA on May 09, 2017, 05:10:52 am
Do you have a reference for that?
I watched all of his lectures, and that statement is based purely on my understanding of the architecture. It is possible, of course, that they have thought this through, but I feel like they really did not and interrupts will be implemented as a hack with a separate belt, which will mean very inefficient communication between the main belt and interrupt belt.

I apologize for the very, very late response, but I thought that a bit of thread necromancy was called for, as I don't know if you ever found the answer to this matter.

Note that I am myself still unsure if the Mill is ever going to be a working system, or if will even come close to the promoted performance if it does (probably not), but frankly (as Godard says himself many times) these things are only the bare minimum of a successful product in any case. Circumstance, happenstance, and marketing are far more important no matter what, for any product.

That having been said, you are correct in that it would use a second belt for interrupts, as is explained in the second of Godard's lectures. Indeed, this is the case for any procedure call - and  in the Mill design, an interrupt is simply a procedure call initiated externally.

However, this is a misleading answer, and the way it is misleading is directly related to the question you raise. The question presupposes that the belt is a single fixed sequence of registers treated as a FIFO queue - which is the way it seems to the programming model, but is not, in fact, the way it is implemented in any of the planned designs.

Note that I said 'designs'. Godard makes this point repeatedly in the videos, that planned Mill CPUs would be a family, not in the sense that the x86 is, with a single binary execution model and a single basic hardware implementation (which might change over time, but would be the same for a given generation), but a family in the sense that the System 360 was - a single programming model, and a (mostly) common assembly language for compilers to target, but with different concrete instruction sets (they intend to use the same kind if 'assembly specializer' IBM used, in fact) and hardware implementations that could be radically different.

He gives one example approach to implementing the belt, one which he says is one they do mean to use in some models but which would not be universal. The layout he described is a large anonymous register file and a pointer to the current head of the belt; the belt would be operated on as a ring buffer. As elements are added to the belt, the belt head (which is entirely inaccessible to the program, being part of the CPU's internal state) would advance, and the results would be added to the top of the belt while the bottom of the belt is cleared (to prevent insecure peeking, though that really shouldn't be possible anyway unless there is a flaw in the hardware - there is no way to access the parts of the belt register file outside of the section currently in use by the belt).

There is a separate, sequestered stack for procedure return addresses. Procedure arguments are to be placed onto the belt; any values that still need to be kept by the function but would go off of the belt would be explicitly spilled to a scratchpad area (which is the case for data that would fall off the belt before it is 'dead' in general, not something specific to procedures). If a nested procedure (assuming that you are using them, which in turn means it depends on your using a language which supports nested procedures with lexical scope) needs to use a value in the caller's scope, it is up to the compiler to keep track of where it is on the belt (or in the scratchpad).

When a procedure call occurs, the belt head is advanced one 'belt length' past the current procedure's belt. When the procedure exits - well, I am not entirely sure if I've got this right, but my understanding is that the compiler is meant to arrange the return values so that they are at the end of the belt, and the previous procedure's values are then copied to the remainder of the belt in hardware (and the residual values in the previous belt cleared) as part of the return op. The important thing here is that the return values are spliced to the values at the top of the caller's belt.

A later video talks about how the sequencing of execution is used in this to chain successive calls, and thus provide hardware tail call optimization, but that's another issue entirely. Presumably, in this design, if the call depth exceeds the belt register file and wraps, the overwritten part  would be invisibly spilled to the scratchpad or something similar.

Interrupts are essentially just procedure calls in this model. When the interrupt occurs, the interrupt routine is looked up in a vector table as usual; however, I get the impression from the talks and the Mill Computing wiki that the vector table is yet another sequestered register file like the return address stack and the parts of the belt register file not currently in use, and that setting a vector requires a specific instruction. Since a hard interrupt wouldn't have a return value (I get the impression that there are no soft interrupts, and even if there are, they would again behave like a regular procedure), the merge step can be skipped, but this would be true of a void procedure anyway.

Now, I don't know enough about how this would actually perform, so I can't speak to its plausibility as a working model, but that should at least answer the question according to the available information.
Title: Re: Mill CPU Architecture
Post by: ataradov on May 09, 2017, 05:51:38 am
The primary problem with all those inaccessible states is that virtualization and threading are pretty much not going to work. Even if you make all those things accessible though some special registers, it would take a lot of time to save context. This is exactly the same problem as with processors with large explicit register file (like SPARK).
Title: Re: Mill CPU Architecture
Post by: tggzzz on May 09, 2017, 07:47:30 am
The primary problem with all those inaccessible states is that virtualization and threading are pretty much not going to work. Even if you make all those things accessible though some special registers, it would take a lot of time to save context. This is exactly the same problem as with processors with large explicit register file (like SPARK).

I've asked Goddard that, and he has privately responded to me indicating why there isn't a vast amount of state to be saved during context switches.

One thing to consider with conventional out-of-order processors is that where they don't need to save some in-flight computational state, they do need to re-compute it when they restart.

Overall my suspicion is that the patents will be more influential and financially important than the implementations, but I hope I am wrong.
Title: Re: Mill CPU Architecture
Post by: ataradov on May 09, 2017, 07:54:55 am
there isn't a vast amount of state to be saved during context switches.
Don't you actually need to save the whole belt contents?
No matter how ingenious your system is, you need to save the whole state, there is no working around that. I find it hard to believe that hand waving without some evidence.

One thing to consider with conventional out-of-order processors is that where they don't need to save some in-flight computational state, they do need to re-compute it when they restart.
It does not matter, same exact thing will happen here. Things on the belt is a state, that has been committed. There is no recomputing that.
Title: Re: Mill CPU Architecture
Post by: obiwanjacobi on May 09, 2017, 11:27:09 am
You can only throw away state that is still in the pipeline - just like with a branch. (at least that is how I understand it for classic pipelined CPUs)
Title: Re: Mill CPU Architecture
Post by: tggzzz on May 09, 2017, 11:47:37 am
there isn't a vast amount of state to be saved during context switches.
Don't you actually need to save the whole belt contents?
No matter how ingenious your system is, you need to save the whole state, there is no working around that. I find it hard to believe that hand waving without some evidence.

Clearly not; for example you don't have to save the state in any of the many caches in a modern high-speed processor!

Quote
One thing to consider with conventional out-of-order processors is that where they don't need to save some in-flight computational state, they do need to re-compute it when they restart.
It does not matter, same exact thing will happen here. Things on the belt is a state, that has been committed. There is no recomputing that.

In this context the belt is roughly equivalent to the architected registers in a conventional processor.  They are only a small part of the state in a processor; much of the state is "hidden" from the ABI, changes between processor implementations, and is usually undocumented. The claim is that that state is much smaller, so context switches can be faster.

N.B. I have only been watching the Mill from a distance. All I know is that it is radical, interesting, and has apparently avoided the blind alleys found in conventional architectures.

If you want more information, I suggest you look at comp.arch, where Ivan Goddard frequently discusses and explains the Mill. He also frequently says "that patent is not yet filed"!
Title: Re: Mill CPU Architecture
Post by: Schol-R-LEA on May 09, 2017, 01:29:32 pm
This is both a strength and weakness of the whole 'family' narrative, both regarding the design itself and regarding the promotion of the project.

Keeping the ABI and API relatively abstract eases a lot of the pressures both for development and for future iteration, because it means that they aren't tied to a specific implementation and can redesign without adding as many more translation layers and other workarounds (something that has some bearing on, say, the x86). It is the hardware design equivalent of loose coupling. It also means that a dead end can be backed out of quickly and relatively painlessly. However, it also means that the designers won't ever have an endpoint for iterating the design - it makes it so that they can always say, "well, this didn't work out, but the next implementation will work, for sure!" This isn't a setup that instills confidence in the investors or outside observers.

It makes it hard to pin down the answers, especially when they can hide behind the argument that the patents aren't all processed (meaning they can avoid revealing hard answers by claiming IP rights concerns). This means that no one can gainsay their claims yet, but it also weakens their claims because it makes it look like handwaving. Whether this apparent evasiveness is actually justified hardly matters when they can always say,"yeah, but that's just one possible implementation".

While I have high hopes for the architecture, they are more of the "wouldn't it be great if they were right?" sort of thing than a "I just know this is going to be great!" variety. New development is inherently risky, both in terms of technical viability and market success, and even a workable and effective product that is measurably superior still has a long road to go before it can be described as a successful product. The one thing I will give Godard is that he does seem to be aware of this, and speaks of it many times in interviews and lectures, which indicates that he at least knows how to sound realistic about the prospects which the design (or any other new one) has in the market. Whether that is itself just a tactical move is yet to be seen.

And yes, I agree with tggzzz that, if the individual technologies going into the Mill prove viable (even if the whole does not), then there is a good chance that the patents will have a greater impact than the Mill itself.

Note that I said 'patents', not technologies; a concern with anything like this is the possibility that, if the owners turn out to be less ethical than they portray themselves, or the company fails and ends up selling their IP, the patents could end up in the hands of large corporations (or worse, a patent-troll law firm) and used as a bludgeon against anything remotely similar. This is nothing new, though, and is just a part of innovation. IIUC, Watt is seen as the father of the steam engine less because of his improvements over the Newcomen design, and more because Newcomen refused to license his design, leading Watt (or rather, his engineers) to develop a newer model that wasn't covered by the patents, and because he was better at finding applications, markets, and customers for it (Newcomen apparently also refused to sell engines for anything he wasn't directly involved in, meaning his engines were only getting used in mines for pumping).
Title: Re: Mill CPU Architecture
Post by: ataradov on May 09, 2017, 04:11:09 pm
In this context the belt is roughly equivalent to the architected registers in a conventional processor.
Presumably belt needs to be large to be useful. If it is roughly the same as the modern processors (~16 registers), then I really don't see how this whole thing is different. You can make a weird register allocator for the normal compiler that will basically do what belt does.

"that patent is not yet filed"!
He's been saying this for years on various pretty primitive stuff. I gave them benefit of the doubt, but at this point, I call BS. They may be milking investors, or something, but the useful output from them is nil. There is noting to look at here, until they are actually ready to release something.
Title: Re: Mill CPU Architecture
Post by: tggzzz on May 09, 2017, 04:22:16 pm
In this context the belt is roughly equivalent to the architected registers in a conventional processor.
Presumably belt needs to be large to be useful. If it is roughly the same as the modern processors (~16 registers), then I really don't see how this whole thing is different. You can make a weird register allocator for the normal compiler that will basically do what belt does.

IIRC the number is implementation dependent, and is simply a parameter input into the tools that create the implementation and the toolchain.

If you understood the Mill, you would realise that the belt in combination with many other facets of the architecture enables the "DSP-like" issue rate. If you have registers, you lose important properties that enable the speedup; hence you can't "make a weird register allocator" to the same effect.


Quote
"that patent is not yet filed"!
He's been saying this for years on various pretty primitive stuff. I gave them benefit of the doubt, but at this point, I call BS. They may be milking investors, or something, but the useful output from them is nil. There is noting to look at here, until they are actually ready to release something.

I appreciate that you don't understand the Mill; neither you nor I have sufficient information to either call BS nor to say it will work. However, for every topic they have chosen to disclose publicly, they have a very good story.
Title: Re: Mill CPU Architecture
Post by: ataradov on May 09, 2017, 04:28:59 pm
IIRC the number is implementation dependent, and is simply a parameter input into the tools that create the implementation and the toolchain.
Yes, but it is important for the discussion to have an approximate number, or a number they expect to be barely useful vs a number providing substantial benefit. But, "family", I know, I know...

If you understood the Mill, you would realise that the belt in combination with many other facets of the architecture enables the "DSP-like" issue rate.
That sounds like a religious argument. Issue rate is largely defined by the memory architecture (if we are talking about desktop class processors), and your ability to deliver instructions in time. How your core is organized is also important, but not that much. Even the fastest code will do nothing if memory is slow.

An with modern register-based CPUs there is a reasonable balance between the two. Significant improvement in one or the other will not do much to improve overall performance.

If you have registers, you lose important properties that enable the speedup; hence you can't "make a weird register allocator" to the same effect.
What properties?
Title: Re: Mill CPU Architecture
Post by: tggzzz on May 09, 2017, 04:55:17 pm
IIRC the number is implementation dependent, and is simply a parameter input into the tools that create the implementation and the toolchain.
Yes, but it is important for the discussion to have an approximate number, or a number they expect to be barely useful vs a number providing substantial benefit. But, "family", I know, I know...

If you understood the Mill, you would realise that the belt in combination with many other facets of the architecture enables the "DSP-like" issue rate.
That sounds like a religious argument. Issue rate is largely defined by the memory architecture (if we are talking about desktop class processors), and your ability to deliver instructions in time. How your core is organized is also important, but not that much. Even the fastest code will do nothing if memory is slow.

An with modern register-based CPUs there is a reasonable balance between the two. Significant improvement in one or the other will not do much to improve overall performance.

If you have registers, you lose important properties that enable the speedup; hence you can't "make a weird register allocator" to the same effect.
What properties?

All the answers to those questions are provided in the comp.arch archives, youtube videos, and the millcomputing website.

I have no intention of wasting my life to produce a poor and probably inaccurate summary of that information.
Title: Re: Mill CPU Architecture
Post by: free_electron on May 09, 2017, 05:29:17 pm
The Red Pill Attack is an attack first demonstrated in 2006 by Joanna Rutkowska where a complete system working on the bare hardware is transferred to virtual environment without the system realizing it. It renders memory protection completely irrelevant.
We are talking about different things. If x86 was Harvard, it still would be possible to get the image of the entire system and put it into an emulator. How does it solve anything?

You need a real hardware authentication method. That's what UEIFI and Secure boot are.
FUNNY. x86 is a modified Harvard with uniform memory space. intel started by making pure harvard machines 8048 , 8051 et al were pure harvard machines. Arm cortex is also harvard !
Title: Re: Mill CPU Architecture
Post by: tggzzz on May 10, 2017, 07:29:07 am
All the answers to those questions are provided in the comp.arch archives, youtube videos, and the millcomputing website.

FYI, for Godard's informative postings in comp.arch, see
https://groups.google.com/forum/#!searchin/comp.arch/Ivan$20Godard|sort:date
Title: Re: Mill CPU Architecture
Post by: hamster_nz on May 10, 2017, 09:19:54 am
I remember Itanium - another promising CPU  'family', that offered lots of promise, from some of the world's greatest minds.It was deeply pipelined, supported by big names (Intel, HP...), it took years to get to market, and when it did it had a few rough corners when it came to interrupta, traps and dealing with memory latency.

If I remember correctly, the vendors also claimed that it was superiour - the smarts were inthe software tool chain, and just neede better compilers to unleash the beast within.

No matter how much money was thrown at the project (and a lot of money was thrown at it)  it just couldn't outperform a CPU that can schedule instructions dynamically at runtime as data became available.

Given the slow time to a viable product, The Mill must have some big technical issues slowing it down....
Title: Re: Mill CPU Architecture
Post by: tggzzz on May 10, 2017, 10:00:38 am
I remember Itanium - another promising CPU  'family', that offered lots of promise, from some of the world's greatest minds.It was deeply pipelined, supported by big names (Intel, HP...), it took years to get to market, and when it did it had a few rough corners when it came to interrupta, traps and dealing with memory latency.

If I remember correctly, the vendors also claimed that it was superiour - the smarts were inthe software tool chain, and just neede better compilers to unleash the beast within.

No matter how much money was thrown at the project (and a lot of money was thrown at it)  it just couldn't outperform a CPU that can schedule instructions dynamically at runtime as data became available.

Given the slow time to a viable product, The Mill must have some big technical issues slowing it down....

Not quite. It was that there was insufficient parallelism that could be determined at compile time - a problem that had been investigated since the 1960s with little success. The Itanic programme was started, IIRC, in ~1990, and the first product was, again IIRC, in about 2000.

The Mill architects are well aware of why Itanic failed and, appear to have avoided those traps.

The time-to-market is not a reliable indicator of future success; there are too many other factors involved.  In particular, the engineering resources HP applied to the Itanic were vast; I suspect much larger than the Mill's resources.
Title: Re: Mill CPU Architecture
Post by: richardman on May 10, 2017, 09:14:35 pm
I was in the teams that worked on the original Itanium, and I was/am in the Mills development team, although I have never been super-active.

Arguably, Itanium lost because it went to Intel. It was a project too costly even for the then behemoth HP to carry, especially because of the fab issues. Unfortunately, the bet was wrong, and Intel made mincemeat out of it. Note that the first Itanium performance pretty much sucks and only recovered some when the HP-designed follow-ons came out. Unfortunately, HP contributions stopped and so it goes...

Whether the original HP-PA WW (Wide Word) would have the performance they promised, we will never know now.

As for Mills, I am sorry to say that few people contributed to this thread understand what it is about. The information is out there, so to speak. Will it be successful eventually as a product? We shall see.
Title: Re: Mill CPU Architecture
Post by: ataradov on May 10, 2017, 09:22:06 pm
As for Mills, I am sorry to say that few people contributed to this thread understand what it is about.
That comes from the lack of actual concise information. I watched all of his lectures, they are entertaining, but they are so spread in time that it is hard to keep track of everything. "Family" thing does not help either. And no, I'm not going to read through years worth of news group archives to get bits and pieces of information.

If they really want to clarify things, they should write a short, but complete reference manual of sorts. Then all relevant information will be in one place. And once more than two people actually understand the benefits, then spreading of misinformation will stop.

Right now it looks like another Itanic.
Title: Re: Mill CPU Architecture
Post by: hamster_nz on May 11, 2017, 12:01:31 am
As for Mills, I am sorry to say that few people contributed to this thread understand what it is about. The information is out there, so to speak.
Even obvious questions seem unanswered...

"So if the CPU us a family, each with different capabilities. How can a compiler statically schedule instructions/operations when the exposed micro-architecture changes underneath it between different family members?"

I'm guessing the answer would be a magic wand wave like "Oh, we plan to add another layer abstraction, which dynamically recompiles the executable to take advantage of the CPU it is running on - it's a easy to solve software problem" - pleasing to the software people in the audience (as hardware is a software problem from their perspective), but doesn't answer the question.

Instruction parallelism also seems to be magic wand. If playing around with FPGAs has taught me one thing it is that some computations parallelize easily (either at a macro or micro level), and other things are impossible.  Take for example, the (old, obsolete, broken) RC4 algorithm - https://en.wikipedia.org/wiki/RC4:

Code: [Select]
  i := 0
  j := 0
  while GeneratingOutput:
      i := (i + 1) mod 256
      j := (j + S[i]) mod 256
      swap values of S[i] and S[j]
      K := S[(S[i] + S[j]) mod 256]
      output K
  endwhile

Even in dedicated hardware, with multi-port memory you can't generate more than one K value per cycle, every cycle. I gave it a try and quickly failed, and then found people have written papers - e.g. https://link.springer.com/chapter/10.1007/978-3-642-17401-8_24.

No matter how great you CPU is, or your software tools, you can't extract non-existing instruction-level parallelism. Traditional CPUs address this by going multi-core, hyperthreading and shared some execution units. But I can't see how you can do this on The Mill, which seems to demand that you statically schedule everything at compile time.

It might be really good at doing some things, giving it a niche use-case like high-end DSPs, but it won't be great at mos things.

And I have yet to see it demonstrated, not even running slowly in a high end FPGA.

Very much unlike RISC-V where many cores fit can fit on a low-end FPGA, and I even have one in hard silicon on my bench.
Title: Re: Mill CPU Architecture
Post by: ataradov on May 11, 2017, 12:07:44 am
"So if the CPU us a family, each with different capabilities. How can a compiler statically schedule instructions/operations when the exposed micro-architecture changes underneath it between different family members?"
I read "family" as in ARM Cortex family of architectures. So compiler can expect specific hardware at compile time. Actual capabilities will change between the family members, of course, but nobody promised direct code transfer between them.
Title: Re: Mill CPU Architecture
Post by: tggzzz on May 11, 2017, 06:30:26 am
As for Mills, I am sorry to say that few people contributed to this thread understand what it is about.
That comes from the lack of actual concise information. I watched all of his lectures, they are entertaining, but they are so spread in time that it is hard to keep track of everything. "Family" thing does not help either. And no, I'm not going to read through years worth of news group archives to get bits and pieces of information.

In which case it is also unrealistic to expect us to spend our time doing your research.

Quote
If they really want to clarify things, they should write a short, but complete reference manual of sorts. Then all relevant information will be in one place. And once more than two people actually understand the benefits, then spreading of misinformation will stop.

What makes you think it is in their interest to give you all such information? The IPR considerations alone mean they would be foolish to do that!

Anybody using that to infer that there is no such information, would be a fool.

Quote
Right now it looks like another Itanic.

And yet up above you say you don't have sufficient information to make such a judgement. That makes any such opinion not worth the paper it isn't written on.
Title: Re: Mill CPU Architecture
Post by: ataradov on May 11, 2017, 06:34:27 am
That makes any such opinion not worth the paper it isn't written on.
I personally don't care if Mill succeeds or fails, I have zero vested interest in it.

But public perception of a product is very important. If significant amount of people will get something into their head, it is very hard to convince them otherwise.
Title: Re: Mill CPU Architecture
Post by: richardman on May 11, 2017, 08:33:23 am
If you have concrete technical questions about the Mills, and are interested enough to write about them in a forum, I'd encourage you to contact the people involved, as it's unlikely anyone else would be able to answer them.

You may want to talk about the "right ways" to do marketing, or how to introduce a new CPU architecture properly etc. At least the answers you get here would be interesting answers. Technical questions though, would unlikely to get interesting or at least technically correct answers here.
Title: Re: Mill CPU Architecture
Post by: tggzzz on May 11, 2017, 08:52:20 am
That makes any such opinion not worth the paper it isn't written on.
I personally don't care if Mill succeeds or fails, I have zero vested interest in it.

That is clear, since apparently you have zero interest in bothering to find out about it!

Quote
But public perception of a product is very important. If significant amount of people will get something into their head, it is very hard to convince them otherwise.

I doubt you could be convinced, since there is little evidence to believe you are prepared to find out and understand what's already been publicised.

"Richardman"'s post #50 is sane, and the public perception of your stance would be improved if you followed his suggestions.
Title: Re: Mill CPU Architecture
Post by: hamster_nz on May 11, 2017, 09:25:21 am
Wow... Nice weekend wolf pack being stirred up by somebody  sharing their (valid) opinions, in an open public space.

Calls of "the information is out there", with no links count for nothing. The only recent information I can find is thing like "rasing funding to make a proof of concept FPGA prototype (just like in 2013). Tumbleweeds are blowing through the forum....

Apparently the engineers are working for 'sweat' equity'', and 13 years has not been long enough to write enough HDL to prove the concept.

Money can't be an issue - as the engineers working for promises, and $1300 will get you 200,000 LUT6s and 400,000 flip-flops, plus plenty of RAM blocks - more than enough for a demo CPU. plus a toolset license.

From afar it looks pretty much like a cult! :D

Title: Re: Mill CPU Architecture
Post by: tggzzz on May 11, 2017, 09:49:52 am
Calls of "the information is out there", with no links count for nothing.

Likewise "I don't understand it and I can't be bothered to do basic research, but I'll make a statement anyway" count for nothing.

I previously provided one link to informed statements, in reply #1. Here's another, which isn't exactly difficult to uncover: http://millcomputing.com/ (http://millcomputing.com/)

Quote
Money can't be an issue - as the engineers working for promises, and $1300 will get you 200,000 LUT6s and 400,000 flip-flops, plus plenty of RAM blocks - more than enough for a demo CPU. plus a toolset license.

A hardware implementation isn't the major problem, so buying a few FPGAs is a distraction from the difficult and novel issues: the toolchain and architecture.

Some people are happy to work for free; I would be on a sufficiently interesting problem where I could make a decent contribution. In the past I've been known to turn down higher paying jobs in favour of noticeably lower paying jobs because I preferred the work at the latter.

Quote
From afar it looks pretty much like a cult! :D

So does most technology in its infancy.

Title: Re: Mill CPU Architecture
Post by: Sal Ammoniac on May 11, 2017, 11:31:31 pm
Here is a video on the Mill Security. Again I am no expert, but it did sound clever to me  ::)

https://www.youtube.com/watch?v=5osiYZV8n3U (https://www.youtube.com/watch?v=5osiYZV8n3U)

Is that Gandalf the White narrating it?
Title: Re: Mill CPU Architecture
Post by: TheOtherDave on July 10, 2017, 02:50:18 am
Even obvious questions seem unanswered...

"So if the CPU us a family, each with different capabilities. How can a compiler statically schedule instructions/operations when the exposed micro-architecture changes underneath it between different family members?"

I'm guessing the answer would be a magic wand wave like "Oh, we plan to add another layer abstraction, which dynamically recompiles the executable to take advantage of the CPU it is running on - it's a easy to solve software problem" - pleasing to the software people in the audience (as hardware is a software problem from their perspective), but doesn't answer the question.

https://youtu.be/D7GDTZ45TRw

TL;DR: Software gets distributed in something like an IR or bytecode, which is then compiled for the specific CPU at install time.
Title: Re: Mill CPU Architecture
Post by: brucehoult on July 10, 2017, 12:37:57 pm
I remember Itanium - another promising CPU  'family', that offered lots of promise, from some of the world's greatest minds.It was deeply pipelined, supported by big names (Intel, HP...), it took years to get to market, and when it did it had a few rough corners when it came to interrupta, traps and dealing with memory latency.

If I remember correctly, the vendors also claimed that it was superiour - the smarts were inthe software tool chain, and just neede better compilers to unleash the beast within.

No matter how much money was thrown at the project (and a lot of money was thrown at it)  it just couldn't outperform a CPU that can schedule instructions dynamically at runtime as data became available.

Given the slow time to a viable product, The Mill must have some big technical issues slowing it down....

Mostly I believe it's slow because it's a small team, largely working for sweat-equity. Ivan offered me to join about three years ago but I definitely needed something with a salary at the time. They have got some funding now, and can pay salaries, so that's an advance.

They have quite a lot of software tools. Generators for models of different sizes, simulators, compiler. I'd have hoped that by now it would at least be possible to load small or medium Mill models into an FPGA. I don't *think* they've got to that stage yet.

On the other hand, that is something that takes 3 or 4 years for a new design even at huge companies like Intel or Samsung with huge teams working.

Title: Re: Mill CPU Architecture
Post by: legacy on July 10, 2017, 01:22:06 pm
so, it's opensource project, which will take 4-5 years from now
can I get a full job there? at MIT? good salary?  :D
Title: Re: Mill CPU Architecture
Post by: tggzzz on July 10, 2017, 01:24:40 pm
so, it's opensource project, which will take 4-5 years from now
can I get a full job there? at MIT? good salary?  :D

What makes you think that?

For what is actually is, see http://millcomputing.com/ (http://millcomputing.com/)
Title: Re: Mill CPU Architecture
Post by: legacy on July 10, 2017, 01:25:38 pm
wow, the ISA (http://millcomputing.com/wiki/Instruction_Set) goes very long  :o
Title: Re: Mill CPU Architecture
Post by: legacy on July 10, 2017, 01:26:39 pm
What makes you think that?

https://www.youtube.com/watch?v=Bxga49vukQ8 (https://www.youtube.com/watch?v=Bxga49vukQ8)

This interview :D
Title: Re: Mill CPU Architecture
Post by: legacy on July 10, 2017, 01:34:10 pm
Oh, listed to what he says in the interview: the assembly compiler comes automagically from a specification file, as well as the C compiler (back end) and the first part of the simulator, top to bottom :o :o :o

Interesting to extend the design.
Title: Re: Mill CPU Architecture
Post by: brucehoult on July 10, 2017, 02:41:38 pm
wow, the ISA (http://millcomputing.com/wiki/Instruction_Set) goes very long  :o

Not really.

When you look at something like ...

Code: [Select]
f2uefse - Exactly convert a binary floating point value to a unsigned integer, rounding toward even and producing saturating result values.

.. and another 100 instructions like it, it's really just one instruction "convert" with half a dozen subfields that can be mixed arbitrarily. IBM has always done that with e.g. System/360 and Power, while various other companies have a single "cvt" instruction mnemonic with a bunch of optional attributes, perhaps after the operands. It makes no difference to the actual complexity of the machine, just the documentation.

ARM has gone that way too with Aarch64, with zillions of mnemonics such as CSEL, CSET, CSETM, CSINC, CSINV, CSNEG that are all actually all the same conditional select opcode with options to invert or increment the 2nd argument. And mnemonics as far removed as SXTB/SXTH/SXTW (sign extend) and ASR/LSR/LSL immediate are actually just special cases of a single BitFieldMove opcode -- in fact it turns out you can sign extend from ANY bit position with a single instruction. not only 8/16/32.

I think the idea is probably that it simplifies the assembler/disassembler, by making the parsing (text or binary) simpler at the expense of having many more patterns to match. Well, and makes life a bit simpler for the programmer too, not having to remember that a BitFieldMove with imms != '111111' && imms + 1 == immr is the way to get a left shift.
Title: Re: Mill CPU Architecture
Post by: brucehoult on July 10, 2017, 03:11:31 pm
And I have yet to see it demonstrated, not even running slowly in a high end FPGA.

Very much unlike RISC-V where many cores fit can fit on a low-end FPGA, and I even have one in hard silicon on my bench.

A very interesting comparison, and really at the heart of any concerns I have about The Mill.

RISC-V has been developed by a similar-sized team, starting later. They iterated the ISA over the first four years, but did it while at the same time creating not only compiler and binutils and multiple sims but also FPGA implementations *and* fabbing multiple generations of working chips. They also have a high level processor family generator tool with completely plug-and-play selection of options such as caches, MMUs, several FPU options, several multiplier and divider and shifter options (including "none" for all of those), 32 bit or 64 bit. You can also choose between 3 stage or 5 stage inorder pipeline, or (not 100% integrated yet) multiple dispatch out-of-order. Today there is one RISC-V chip you anyone can buy (I got my HiFive1 board in January), plus several other companies have their own chips for internal use. There must be near to a dozen implementations you can put into an FPGA.

I guess there are two major obvious differences:

1) the RISC-V people did the work while on cosy academic salaries, with financial/practical support from big companies such as Intel, Samsung, TSMC. Mill has been largely self-funding, without any tasty grants.

2) RISC-V is being done in an open-source manner, with the intention of changing the world for the better and with increasing amounts of work being done by the community, rather than getting rich (though there are likely to be pretty nice consulting and customisation opportunities around it). The Mill people want to change the world too, but they want to get filthy rich doing it.

I'd like to see both succeed.
Title: Re: Mill CPU Architecture
Post by: legacy on July 10, 2017, 06:53:01 pm
the intention of changing the world for the better and with increasing amounts of work being done by the community, rather than getting rich

oh, well ... so opensource now actually sounds like the political theory derived from Karl Marx, advocating class war and leading to a society in which all property is publicly owned and each person works and is paid according to their abilities and needs.

Title: Re: Mill CPU Architecture
Post by: legacy on July 10, 2017, 06:54:27 pm
(but I don't trust it)
Title: Re: Mill CPU Architecture
Post by: brucehoult on July 10, 2017, 08:48:58 pm
the intention of changing the world for the better and with increasing amounts of work being done by the community, rather than getting rich

oh, well ... so opensource now actually sounds like the political theory derived from Karl Marx, advocating class war and leading to a society in which all property is publicly owned and each person works and is paid according to their abilities and needs.

It's difficult to see how that could be the case.

Open source is based on purely voluntary interactions, done because each party calculates that they gain more benefit from doing so than the costs. You may participate or not, as you wish, or participate in a rival open source or for-profit project.

Marx's theories on the other hand can only be put into practice by means of compulsion. Even if the compulsion starts with the best of intentions it's absolute power leads inevitably after a period to the most ruthless and barbaric rising to the top of the heap. Dissent will swiftly lead to a bullet in the back of the head, or a one way trip to a forced labour camp.

I'm writing these words about 2 km from Butyrka prison, and five km from Lubyanka. Look them up. Solzhenitsyn has a lot of information about both.
Title: Re: Mill CPU Architecture
Post by: legacy on July 11, 2017, 08:11:23 am
well, I have given a look to the architecture, and listened a few presentations

What I think: I don't like a lot of things in this architecture

First of all: the belt! It's a queue register which shifts on the right (consequentely it loses a slot) every clock edge. This belt has been designed to solve the "renaming registers" problem which ussually afflicts VLIW-like architectures. So, you have a lot of processing in parallel, no general purpose registers, and the belt is the only way to pass and get information to and from them.

So, when you have to issue an operation, say C = A OPP B, you have to rename A, B, C in term of belt positions.

This point varies at every clock edge, and you also need to have an optimal scheduling data flow to understand the correct order.

Therefore is definitively something you can't assemble by hand like we have always done with 6800 or 68000, you need a compiler, it's a must!

It's even worse than the pipelined version of MIPS without the automatic stall and hazard unit, I mean where you have to manually consider every dependency between stages and registers and manually reschedule in the correct order, or fill a NOP between operations as well as you have to do with in order to solve the "delayed slot" which always happen on a branch.

Well, there are RISC-architectures (like 88K)  where the hardware can automatically fill a NOP after the branch, you just need to set the proper bit, and forget about it.

RISC like MIPS are hard to be programmed in assembly. PowerPC is more friendly but even more complex, ARM is the most friendly in the family.

Btw all of those are still programmable in assembly, whereas Mills is definitively a no-go since it's virtually impossible for a human being.

The belt-solution looks horrible for me  :palm: :palm: :palm:
Title: Re: Mill CPU Architecture
Post by: legacy on July 11, 2017, 12:55:59 pm
It was already said (http://hackaday.com/2013/08/02/the-mill-cpu-architecture/) on the publish talk on hackaday (2013). D'oh  :palm:
Title: Re: Mill CPU Architecture
Post by: obiwanjacobi on February 07, 2018, 09:30:50 am
They (Mill Computing) added a few new videos recently for those that are interested:

The Mill CPU Architecture – Inter-Process Communication (12 of 13)
https://www.youtube.com/watch?v=XJasE5aOHSw&t=1s (https://www.youtube.com/watch?v=XJasE5aOHSw&t=1s)

The Mill CPU Architecture – Threading (13 of 13 & more to come)
https://www.youtube.com/watch?v=7KQnrOEoWEY (https://www.youtube.com/watch?v=7KQnrOEoWEY)

It seems they're moving towards an FPGA demo version too.
Title: Re: Mill CPU Architecture
Post by: rstofer on February 07, 2018, 09:29:13 pm
This project has been going on for a long time.  I would have expected hardware, even an FPGA, a long time back.
It will be interesting to see the concept played out with real code on real hardware.
Title: Re: Mill CPU Architecture
Post by: helius on February 07, 2018, 10:31:12 pm
Performance on FPGA is going to be far short of what they are promising and there isn't much benefit to making that public. They're planning on doing all the compiler and OS development themselves so there is also relatively little reason for early access tools to be released externally.
Title: Re: Mill CPU Architecture
Post by: hamster_nz on February 07, 2018, 11:01:56 pm
Performance on FPGA is going to be far short of what they are promising and there isn't much benefit to making that public. They're planning on doing all the compiler and OS development themselves so there is also relatively little reason for early access tools to be released externally.

Rule of thumb, CPUs on an FPGAs run at 10% the clock speed of highend CPUs. A "low end" 150 MHz FPGA implementation would be plenty enough to show that the design has potential. It was supposed to be a fight against the complexity of current CPUs, so what makes it so complex that it can't be demonstrated?

It was/is also claimed that the Mill avoids the hard stuff like register-renaming and out-of-order execution, so it should be just a few thousands of lines of RTL - after all, it is claimed all the hard work is in the software layer.

They are touting as covering the full range of computing needs from embedded to HPC.  An RTL/FPGA implementation of the smallest design would at least build some faith that it isn't vaporware after nearly 15 years of development. Yes, FIFTEEN YEARS. Wikipedia: It has been under development since about 2003 by Ivan Godard and his startup Mill Computing, Inc.

If Mill Computing can't show a booting CPU after 15 years because it is worried that the "simple" design will be too slow is worrying about the wrong thing.

The RISC-V project, where FPGA implementations were available from early on, were very useful for popularizing the architecture, and making it easy for tools to be developed and tested, and enabled the transition to custom silicon much easier.

To me, it seems to be a vacuum for VC money, and vacuuming up the spare time for "keen but green" engineers trying to make a break for themselves.
Title: Re: Mill CPU Architecture
Post by: ataradov on February 07, 2018, 11:05:25 pm
10% is generous, IMO.Their whole architecture is based around having a lot of multiplexing logic. This bothered me the whole time he was explaining stuff. I doubt it will have good performance in FPGAs, and possibly even when hardened.

The whole belt management is a nightmare from hardware point of view, unless I'm missing something from his explanations.

On the other hand, even if it runs at 10 MHz, it still would be nice to see something working.

But for now they are filing patents and collecting checks from VCs.
Title: Re: Mill CPU Architecture
Post by: hamster_nz on February 07, 2018, 11:17:58 pm
10% is generous, IMO.Their whole architecture is based around having a lot of multiplexing logic. This bothered me the whole time he was explaining stuff. I doubt it will have good performance in FPGAs, and possibly even when hardened.

How strange... I though the opposite. Wasn't it was all about having "the belt" were data flowed through a long circular pipeline, where only a few functional units were connected to the belt at any given step, and each functional unit consumed or produced new values to go onto "the belt". This limited fan-in/fan-out an multiplexing, and allowed the design to be efficiently spread out over the silicon (avoiding that "Rent's rule" stuff).

Maybe that was a few 'design pivots' ago. Maybe that is now an layer of abstraction or software model, on the way down to the real hardware, which is a bunch of trained turtles with small glass beads.

Title: Re: Mill CPU Architecture
Post by: ataradov on February 07, 2018, 11:22:50 pm
I don't remember what exactly made me thing this. I think there are operations that would require massive simultaneous (from a programmer point of view) changes to the belt. IIRC, it was something related to function calls. But I really don't remember, since it is hard to keep vaporware in your head for years.

At least software simulator would be nice. I bet they have this by now.
Title: Re: Mill CPU Architecture
Post by: tggzzz on February 08, 2018, 12:04:47 am
They are touting as covering the full range of computing needs from embedded to HPC. 

I'm sure Godard has ruled out embedded as a target market.

Quote
An RTL/FPGA implementation of the smallest design would at least build some faith that it isn't vaporware after nearly 15 years of development. Yes, FIFTEEN YEARS. Wikipedia: It has been under development since about 2003 by Ivan Godard and his startup Mill Computing, Inc.

Given an architecture, it is known how to turn that into hardware - providing there are no idiocies such as the iAPX432. Getting the right architecture where it can be seen that all features are present and they all play nicely together is much harder. The last time I saw that was in Gosling's Java whitepaper, back in '96.

Providing an RTL implementation without a fully thought through architecture would be a classic case of rushing to a premature implementation.
Title: Re: Mill CPU Architecture
Post by: donotdespisethesnake on February 08, 2018, 09:09:59 am
At 15 years, Mill development has been going longer than Propellor 2, a mere 10 years.

Designs on paper can look neat and beautiful, getting them to work in practice sometimes takes a bit of hacking and ugliness. But "winning ugly" is usually better than nothing.
Title: Re: Mill CPU Architecture
Post by: jbb on February 10, 2018, 07:45:39 pm
In several of the videos they said that the ‘shift register’-ness of the belt was provided by auxiliary logic. The actual large registers (which could get very big with vector ops!) stay put.
I guess there’s a (small!) abstraction unit to turn a belt address into a real source address.
Title: Re: Mill CPU Architecture
Post by: ataradov on February 10, 2018, 07:50:54 pm
In several of the videos they said that the ‘shift register’-ness of the belt was provided by auxiliary logic. The actual large registers (which could get very big with vector ops!) stay put.
I guess there’s a (small!) abstraction unit to turn a belt address into a real source address.
The problem is that you can't implement belt as a standard RAM. There are a ton of operations that need to read/write multiple values on the belt at the same time. With RAM it will either have to be N-port RAM, which is expensive and slow for large N, or each such operation will require multiple clock cycles, which defeats the whole purpose of the belt.
Title: Re: Mill CPU Architecture
Post by: obiwanjacobi on February 11, 2018, 08:02:20 am
Can't remember in which video, but I think he said it was similar to renaming...

I haven't read thru this yet:
https://millcomputing.com/topic/the-belt/

Title: Re: Mill CPU Architecture
Post by: tggzzz on February 11, 2018, 08:15:23 am
In several of the videos they said that the ‘shift register’-ness of the belt was provided by auxiliary logic. The actual large registers (which could get very big with vector ops!) stay put.
I guess there’s a (small!) abstraction unit to turn a belt address into a real source address.
The problem is that you can't implement belt as a standard RAM. There are a ton of operations that need to read/write multiple values on the belt at the same time. With RAM it will either have to be N-port RAM, which is expensive and slow for large N, or each such operation will require multiple clock cycles, which defeats the whole purpose of the belt.

By that argument, you can't implement TLB caches since they can't be implemented as a standard RAM!

IIRC there is little point in making the belt more than 32 steps long, which isn't very much hardware.

Perhaps you could indicate the key belt (not RAM) operations that can't be sensibly implemented in logic gates.
Title: Re: Mill CPU Architecture
Post by: ataradov on February 11, 2018, 08:22:57 am
By that argument, you can't implement TLB caches since they can't be implemented as a standard RAM!
IIRC there is little point in making the belt more than 32 steps long, which isn't very much hardware.
Well, I don't know how long the belt is going to be in high end models. I believe the lowest length I've seen was 8.

I did not say it is impossible, it obviously is not. It is just a matter of belt length at which it becomes impractical or slower than conventional design.
Title: Re: Mill CPU Architecture
Post by: brucehoult on February 28, 2018, 11:59:09 pm
They are touting as covering the full range of computing needs from embedded to HPC.  An RTL/FPGA implementation of the smallest design would at least build some faith that it isn't vaporware after nearly 15 years of development. Yes, FIFTEEN YEARS. Wikipedia: It has been under development since about 2003 by Ivan Godard and his startup Mill Computing, Inc.

If Mill Computing can't show a booting CPU after 15 years because it is worried that the "simple" design will be too slow is worrying about the wrong thing.

Yes, it's not encouraging.

Quote
The RISC-V project, where FPGA implementations were available from early on, were very useful for popularizing the architecture, and making it easy for tools to be developed and tested, and enabled the transition to custom silicon much easier.

Yes, it's been handled far better .. it's only 3 1/2 years since they decided to get serious about freezing the spec, making everything open source, and getting a community going. Less than eight years total since the first vague idea of making an internal (university) project.

Quote
To me, it seems to be a vacuum for VC money, and vacuuming up the spare time for "keen but green" engineers trying to make a break for themselves.

I'm not sure about the VCs. As far as I know, everyone working on it is doing so for "sweat equity". Still.

Ivan has offered me to come onboard a couple of times, but I sadly need a salary in order to pay rent and eat.

Still, adventure is a good thing. I've handed in my notice at Samsung R&D and I'm starting at a RISC-V company very soon.
Title: Re: Mill CPU Architecture
Post by: brucehoult on March 01, 2018, 12:04:59 am
In several of the videos they said that the ‘shift register’-ness of the belt was provided by auxiliary logic. The actual large registers (which could get very big with vector ops!) stay put.
I guess there’s a (small!) abstraction unit to turn a belt address into a real source address.
The problem is that you can't implement belt as a standard RAM. There are a ton of operations that need to read/write multiple values on the belt at the same time. With RAM it will either have to be N-port RAM, which is expensive and slow for large N, or each such operation will require multiple clock cycles, which defeats the whole purpose of the belt.

The belt isn't RAM, it's registers. There are only 8 or 16 or so belt positions, depending on the model of Mill.

Things that are about to fall off the end of the belt (not really -- it's actually just that the register is going to be reused for a new result) and be lost, but that are actually still needed, are copied to "Scratchpad" by the compiler. That *is* RAM. SRAM I guess.
Title: Re: Mill CPU Architecture
Post by: ataradov on March 01, 2018, 12:10:40 am
The belt isn't RAM, it's registers. There are only 8 or 16 or so belt positions, depending on the model of Mill.
I got a feeling at 8 or 16 positions applies to low end models and actual models that target performance will have much longer belt. In that respect it is not that different from a typical stack-based machine, and you need quite a bit of stack to do any real calculations. Constantly saving things that are about to fall off is a hassle and time sink.

I'd like to see this simulated on real applications, of course. But I feel like we are not getting that any time soon.
Title: Re: Mill CPU Architecture
Post by: andersm on March 01, 2018, 12:33:23 am
Quote
The RISC-V project, where FPGA implementations were available from early on, were very useful for popularizing the architecture, and making it easy for tools to be developed and tested, and enabled the transition to custom silicon much easier.
Yes, it's been handled far better .. it's only 3 1/2 years since they decided to get serious about freezing the spec, making everything open source, and getting a community going. Less than eight years total since the first vague idea of making an internal (university) project.
To be fair, the RISC-V project is also not trying to do anything new, so it's only to be expected they'll get done quicker.
Title: Re: Mill CPU Architecture
Post by: obiwanjacobi on March 01, 2018, 07:26:46 am
It's very hard to have a truly original thought.

Since I posted this, I have been studying the topic a bit here and there - and you see a LOT of resources regurgitating the same architecture over and over again. As soon as you get off the beaten path, information dries up pretty quickly. I have dismissed the RISK-V architecture as more of the same but open-source -which is great- but hardly an inspiration for new ideas. With such a single-sided (minded) stream of information, you can be forgiven for thinking there is no other way of doing this.

I find the Mill architecture very innovative -probably due to my lack of experience and insight. Too bad of all those patents, though  ;D
Anyway, what Ivan presents in his videos DOES make me think about the problem in a different way -I am not claiming to understand ALL of it, but I think I got most of it. I know I do not know enough to make claims that the proposed Mill architecture will have such-and-such a problem with the belt (or any other part). I do not have any clue how one could make such a statement based on the high level (simplified) info that is presented - he starts of every presentation with that disclaimer. Many detailed questions come to my mind while watching these videos that just aren't talked about - many of those questions are probably due to my ignorance and their answer/solution is probably a well-know mechanism or principle in the CPU architecture domain.

The hardest thing I find is trying to guestimate if any idea I have is an improvement or even what would be the pro's and con's of such an deviation from the norm - again probably due to my lack of understanding. I liked how Ivan explained how the compiler is doing more of the work of scheduling instructions and I am developing an TTA idea based on that premise. It's a wonderful journey.
Title: Re: Mill CPU Architecture
Post by: ataradov on March 01, 2018, 07:36:36 am
The reason all modern CPUs looks the same is decades of evolution. We have settled on a way to do things that work, and guaranteed to work from a first try. That's how engineering works.

People have tried many other architectures (VLIW, OISC, pure stack-based machines, register-less machines), and there is plenty of information on all sorts of strange stuff. But it does not get wide adoption as none of them actually bring performance improvement significant enough to justify re-doing all the software work that has been going on for the same decades.

Those strange architectures also often face problems interfacing with existing IP. How hard would it be to attach existing GPU to that CPU? Will it just work, or something about new bus structures will make it impossible/impractical?

There is a lot of stuff to think about, and the actual architecture matters very little on a full system scale.
Title: Re: Mill CPU Architecture
Post by: brucehoult on March 01, 2018, 07:59:16 am
Quote
The RISC-V project, where FPGA implementations were available from early on, were very useful for popularizing the architecture, and making it easy for tools to be developed and tested, and enabled the transition to custom silicon much easier.
Yes, it's been handled far better .. it's only 3 1/2 years since they decided to get serious about freezing the spec, making everything open source, and getting a community going. Less than eight years total since the first vague idea of making an internal (university) project.
To be fair, the RISC-V project is also not trying to do anything new, so it's only to be expected they'll get done quicker.

Certainly RISC-V is a far less ambitious thing, technically. I wouldn't say *nothing* new though. Each individual thing in it has probably been done before ... and generally long enough ago that patent protection has expired. But the combination of things and attention to detail is really quite nice.

When you compare it to the other clean-sheet design that happened at about the same time -- Aarch64 (which also didn't do anything new) -- the RISC-V guys did I think a much nicer job.
Title: Re: Mill CPU Architecture
Post by: tggzzz on March 01, 2018, 08:51:42 am
The reason all modern CPUs looks the same is decades of evolution. We have settled on a way to do things that work, and guaranteed to work from a first try. That's how engineering works.

Not quite.

Evolution, both biological/engineering, makes local optimisations which are based on the history of environmental/technical conditions. It is effectively a "hill climbing algorithm". That leads to messes like the human eye, which a 10 year old can easily see is ridiculous (especially compared with some of the other eyes that have evolved).

It is frequently the case that adjacent "hills" have higher peaks and, if the chasm can be leaped, there are better rewards (a.k.a. a more optimal solution) to be found. That's well known in problems attacked by "simulated annealing" algorithms, e.g. much CAD software.

That is particularly valid when conditions change. In the engineering sense, Moores law is running headlong into thermodynamics problems. Yes transistors can continue to shrink for a few years, but they are very leaky (= hot) and are becoming small enough that there are fewer electrons in them than desirable.

If we want progress to continue, we have to try radically different solutions. The Mill is leaping a chasm in a particular direction.

Quote
People have tried many other architectures (VLIW, OISC, pure stack-based machines, register-less machines), and there is plenty of information on all sorts of strange stuff. But it does not get wide adoption as none of them actually bring performance improvement significant enough to justify re-doing all the software work that has been going on for the same decades.

Those strange architectures also often face problems interfacing with existing IP. How hard would it be to attach existing GPU to that CPU? Will it just work, or something about new bus structures will make it impossible/impractical?

There is a lot of stuff to think about, and the actual architecture matters very little on a full system scale.

True, with reservations.

The architectures you mention typically were pretty small deltas on previous architectures, attacked a small of proportion of the current challenges, and punted difficult problems into the future (assuming Moores Law would come to the rescue). That has advantages and disadvantages.

The Mill is a more radical departure tackling many of the existing challenges. It is reasonable that it should take longer to investigate and document. They are tackling many of the software issues head on, rather than using more transistors to conceal the problems.

So far I haven't seen a flaw in the Mill's architecture. I believe that it can be made to work, and producing initial hardware will not be a major engineering challenge (unlike making a faster x86!). Time will tell whether it succeeds in crossing the chasm.
Title: Re: Mill CPU Architecture
Post by: ataradov on March 01, 2018, 05:28:30 pm
Evolution, both biological/engineering, makes local optimizations
I did not say that what we have is absolutely the best. I don't think so. But what we have works fine, and evolution did its thing, just like in most other cases.

Overcoming this in favor of something better is going to be hard or impossible.

If we want progress to continue, we have to try radically different solutions. The Mill is leaping a chasm in a particular direction.
Here is a proposal. Stop worrying about the speed of hardware, and start investing into making software faster. Win10 on Core i5 runs slower that Win95 on the first Pentium.


The Mill is a more radical departure tackling many of the existing challenges.
I don't see it as that of a revolution. Majority of things they have are not new ideas, just ideas adopted to work with their core. And I'm not convinced that there is going to be significant performance improvement.
Title: Re: Mill CPU Architecture
Post by: tggzzz on March 01, 2018, 06:05:27 pm
Evolution, both biological/engineering, makes local optimizations
I did not say that what we have is absolutely the best. I don't think so. But what we have works fine, and evolution did its thing, just like in most other cases.

Overcoming this in favor of something better is going to be hard or impossible.

The current path has reached its limit, especially w.r.t. thermodynamics (=> don't bother shrinking transistors) and aggregate i/o bandwidth (=> don't bother increasing core count). Radical changes to both hardware and software are going to be needed to make radical progress.

The Mill is a radical change w.r.t. hardware, and is a neat hardware+software system that notably also works with current programs. Many radical changes don't play well with existing software; often complete rewrites are necessary.

The Mill might achieve an order or magnitude improvement for current codes; that's significant and is known to be problematic using other "traditional" technology.

Quote
If we want progress to continue, we have to try radically different solutions. The Mill is leaping a chasm in a particular direction.
Here is a proposal. Stop worrying about the speed of hardware, and start investing into making software faster. Win10 on Core i5 runs slower that Win95 on the first Pentium.

What makes you think it is possible to make serial codes faster? People have been trying that since the 60s, with limited success. All such attempts have ignored existing codebases, and presumed significant paradigm changes. Amhdal's law is still valid.

There are, of course, "embarassingly parallel" problems; while important, they are a subset of what we need to compute.

Quote
The Mill is a more radical departure tackling many of the existing challenges.
I don't see it as that of a revolution. Majority of things they have are not new ideas, just ideas adopted to work with their core. And I'm not convinced that there is going to be significant performance improvement.

Overall the Mill may or may not succeed, but the reasons you have given aren't sufficient to believe it will fail.

My personal suspicion is that the patent portfolio will be licenced and that it will be incorporated into future systems.
Title: Re: Mill CPU Architecture
Post by: ataradov on March 01, 2018, 06:20:56 pm
The current path has reached its limit, especially w.r.t. thermodynamics (=> don't bother shrinking transistors) and aggregate i/o bandwidth (=> don't bother increasing core count). Radical changes to both hardware and software are going to be needed to make radical progress.
I don't see this as a huge problem. Even if current hardware is fixed, there is a huge margin for extracting actual performance.

I see it as the same thing that happened to Commodore64. Hardware was fixed a long time ago, but there are still games coming out, and they look gorgeous. Since hardware was fixed, people just chose to study it in great details, and learned how to take advantage of all the features.

Same thing happens with modern gaming consoles. When a new generation of console is introduced, games for previous one look much better in many cases. That is until people figure out how to get the maximum performance from the new hardware.

What makes you think it is possible to make serial codes faster?
The fact that we write in JavaScript and F#.

Overall the Mill may or may not succeed, but the reasons you have given aren't sufficient to believe it will fail.
I don't think it will fail, but rather fizzle out as they fail to secure more money.
Title: Re: Mill CPU Architecture
Post by: tggzzz on March 01, 2018, 06:41:24 pm
The current path has reached its limit, especially w.r.t. thermodynamics (=> don't bother shrinking transistors) and aggregate i/o bandwidth (=> don't bother increasing core count). Radical changes to both hardware and software are going to be needed to make radical progress.
I don't see this as a huge problem. Even if current hardware is fixed, there is a huge margin for extracting actual performance.

And there are even huger problems doing that in practice.

Frequently there's so much crap that nobody understands it and nobody dare change it - and that's sometimes true for embedded single-manufacturer products, e..g. HP Printers.

So, that's a nice concept, but unrealistic.

Quote
What makes you think it is possible to make serial codes faster?
The fact that we write in JavaScript and F#.

I completely fail to understand the point you are trying to make.

Quote
Overall the Mill may or may not succeed, but the reasons you have given aren't sufficient to believe it will fail.
I don't think it will fail, but rather fizzle out as they fail to secure more money.

I've no idea what the burn rate and funding is. Avoiding making hardware is a smart decision at this stage, provided you ensure there's nothing preventing it being implemented later.
Title: Re: Mill CPU Architecture
Post by: ataradov on March 01, 2018, 06:43:48 pm
I completely fail to understand the point you are trying to make.
My point is that there is a lot more actual performance in the hardware we have today, but we fail to take advantage of that performance. Even if current technology just stops developing today, we've got 15-20 years of overall system performance improvement just through software optimizations alone.
Title: Re: Mill CPU Architecture
Post by: tggzzz on March 01, 2018, 06:50:35 pm
I completely fail to understand the point you are trying to make.
My point is that there is a lot more actual performance in the hardware we have today, but we fail to take advantage of that performance. Even if current technology just stops developing today, we've got 15-20 years of overall system performance improvement just through software optimizations alone.

Please don't snip important context, in this case your reference to JavaScript and F#, viz:
Quote
What makes you think it is possible to make serial codes faster?
The fact that we write in JavaScript and F#.

Would you like to explain how those languages are relevant, and other languages aren't?
Title: Re: Mill CPU Architecture
Post by: ataradov on March 01, 2018, 06:55:18 pm
Would you like to explain how those languages are relevant, and other languages aren't?
Those were just two examples of bloated languages highly abstracted from the hardware. There are plenty of others.

Abstraction is good when you have a lot of different hardware, and you need to support all of that in short time. You sacrifice performance big time doing so.

Once hardware is fixed and no longer a moving target, you have incentive to rewrite some of the code in languages closer to the hardware.

Title: Re: Mill CPU Architecture
Post by: tggzzz on March 01, 2018, 07:00:12 pm
Would you like to explain how those languages are relevant, and other languages aren't?
Those were just two examples of bloated languages highly abstracted from the hardware. There are plenty of others.

Abstraction is good when you have a lot of different hardware, and you need to support all of that in short time. You sacrifice performance big time doing so.

Once hardware is fixed and no longer a moving target, you have incentive to rewrite some of the code in languages closer to the hardware.

You have a limited view of the software industry, application domains, and commercial incentives.

Other than that, you do have a point for a small part of the industry.
Title: Re: Mill CPU Architecture
Post by: ataradov on March 01, 2018, 07:15:00 pm
You have a limited view of the software industry, application domains, and commercial incentives.
I don't think so. I'm not saying that using Java today is somehow unjustified or bad. Especially given that it is realistically cheaper to buy a better CPU and additional 8 GB of memory than redesign software to consume less.

All I'm saying is that changing conditions will push businesses to make different decisions. When it is no longer cheaper to add more memory, they will pay for developing better software.

But that's also a problem for Mill. Right now, it is more expensive to adopt Mill, even if it comes out. For now, whatever benefits it will bring, can be overcome by throwing more money at existing hardware.
Title: Re: Mill CPU Architecture
Post by: rstofer on March 01, 2018, 07:54:34 pm
I don't think so. I'm not saying that using Java today is somehow unjustified or bad. Especially given that it is realistically cheaper to buy a better CPU and additional 8 GB of memory than redesign software to consume less.

All I'm saying is that changing conditions will push businesses to make different decisions. When it is no longer cheaper to add more memory, they will pay for developing better software.

But that's also a problem for Mill. Right now, it is more expensive to adopt Mill, even if it comes out. For now, whatever benefits it will bring, can be overcome by throwing more money at existing hardware.

I agree!  Nobody suggests that the x86-64 based on the x86 based on the 8080, is a nice architecture.  But it IS a given and worth the effort to write more or less optimal code, at least in the OS.

In that regard ARM is getting very popular.  It's a known architecture (at the core) to which vendors can add peripherals, most of which are also licensed from ARM.  It is worth the effort to write decent code for the platform.

If the Mill were released today, who would care?  We have processors, we have a stable platform (or two) to aim code at, why change?  Now, if the chip just smokes everything on the planet, NASA and NSA will probably care as will Lawrence Livermore and Los Alamos.  The rest of the world will keep chugging away on X86-64.  There is simply limited utility to a more advanced architecture and, today, PCs are like toasters, they are just an appliance.  Nobody cares what's inside as long as the software from decades back still runs.

Look at all the superior architectures that are dying: SPARC, MIPS, <whatever>.  There are still machines around but the x86-64 is king.

A lack of software alone will keep the Mill from going very far.  For better or worse, the x86-64 has achieved critical mass for computers and the ARM has done the same for appliances like cell phones and so on.

The .gov agencies above have always been on the bleeding edge.  They have routinely written their own OS and compilers.  For the rest of us, it would be like starting over in '75.  They hand you an 8080 and there isn't a shred of software around (at least at the hobby level).  Then Bill Gates releases BASIC...
Title: Re: Mill CPU Architecture
Post by: hans on March 01, 2018, 09:29:51 pm
x86 completely thrives to exist on legacy applications. The software industry has many programs written in x86, many of them closed-source and vendors having no interest supporting alternative platforms and instruction sets. 12 years ago we had Apple still making PowerPC desktops, that has stopped and pretty much everything is x86 now. Any performance increase had to be made from speeding up x86.

Architecturally speaking x86 is a PITA. Variable-length instructions (memory was expensive back in the day, now it isn't), CISC instruction set, many data hazards in a typical program limiting instruction-level parallelism, and many layers of legacy and extensions to keep compatibility with current systems. (And the reason why a x86-powered PS4 game console is not a PC)

These issues have been "solved" to the point where we don't see much single-thread performance benefits in the last few years. But basically just by throwing more silicon at the problem; deep pipelines and instruction decoders, speculative execution and register renaming. All this overhead costs a lot of power to make the CPU fast.

The claimed "single-thread 10x power/performance" is perhaps obtainable (atleast a good portion of it); but many RISC (ARM) chips are also far more competitive in this ratio than x86, because is just so much less overhead in just decoding the instruction set. Hence why you see no x86 mobile phones, and probably never will, as there will likely always be a more competitive technical solution available.

As for building a competitive desktop CPU performance; good luck. Intel and AMD (and ARM in their respective domain) have both invested billions to get the products they make today; and as said it's pretty well accepted we've more or less reached a maximum in single-thread performance. Most benefits come from more cores and specific instruction sets (eg AVX2); but even than e.g. throwing a thousand slow & power efficient cores won't compete; any PC still needs the flexible and high peak performance of a single core.

In terms of performance/power; yeah sure that can be improved of x86. But I don't think this is the metric alot of architectures are benchmarked against (unless you're Google and can significantly save on your power bill).

It was attempted to use VLIW like Intel Itanium series, that had explicit parallelism in the programs, but yeah that's now EOL as well. (https://www.pcworld.idg.com.au/article/619139/intel-itanium-once-destined-replace-x86-pcs-hits-end-line/)

If you really need high performance/power in a particular application, then going FPGA/DSP for now is the best bet. But that is such a niche industry relatively speaking. The majority of industry looks at productivity [to solve a customers IT problem] as a more direct measure, and to them programmers' time is more expensive than buying more computational power.
Title: Re: Mill CPU Architecture
Post by: tggzzz on March 01, 2018, 10:00:24 pm
I broadly agree with most of Hans' points, but I'll add a few riders...

Those with long experience were pointing out in 1996 that the Itanic's performance required fundamental advances that very smart people had been searching for since the 60s, largely without success. To presume that HP had smarter people was hubris.

Performance per watt has been a significant metric since ~2000, for server farms and high performance computing application domains. It is worth noting that HPC stress any existing technology to its limits - and beyond. Hence problems found by the HPC community will often hit other people later.

Don't neglect the Sun Niagara processors. Their philosophy was to aim at the "embarassingly parallel" server applications, so they used many parallel simple SPARC cores to get good aggregate performance at the expense of single-thread performance. Basically if the SPARC core had performance P, then a 16-core Niagara chip had performance ~16P. They removed obscenely expensive OOO control logic with other cores.

The Mill is inspired by DSP processors, and attempts to bring their philpsophy to general purpose computing coded in conventional languages.
Title: Re: Mill CPU Architecture
Post by: legacy on March 01, 2018, 10:30:31 pm
gnat,ADA is *ONLY* for x86, with a little port (not so stable) for ARM.

IBM is going to sell POWER9 as "x86 alternative!!!". But a POWER9 workstation will cost 6K USD at least.
(and ADA won't run there. You need to allocate on the eight..sixteen POWER9 cores for QEMU/x86)

edit:
oh, PowerPC, POWER4,5,6,7 and POWER8 are big endian
POWER9 is little endian ... just to make things incompatible
Title: Re: Mill CPU Architecture
Post by: rstofer on March 01, 2018, 11:22:54 pm
gnat,ADA is *ONLY* for x86, with a little port (not so stable) for ARM.

And here I thought it was supposed to be easy to port the runtime to any processor.  It must be possible because the military is doing something with the language and I suspect it for a processor other than an x86.
Title: Re: Mill CPU Architecture
Post by: hamster_nz on March 02, 2018, 04:15:56 am
I'm not at all convinced that the inherent find-grained parallelism exists in most computing tasks to make "super-super scalar" CPU architecture worthwhile for general purpose computing. A different architecture isn't going to change that.

After all, isn't that why Hyperthreading came about? The dependencies in generating and consuming data was unable to keep all of the CPU core's functional units busy all of the time. So HT gives us two threads running on the same CPU core at the same time, competing for things like Integer ALUs, cache and memory bandwidth. The per-thread performance can drop somewhat (due to contention for shared resources), but the system overall might get 30% more work done.

And for the cases where there is a lot of parallelism we have GPUs and SIMD CPU instructions, or specialist DSP processors which expose more of the pipeline...


Title: Re: Mill CPU Architecture
Post by: tggzzz on March 02, 2018, 09:59:01 am
I'm not at all convinced that the inherent find-grained parallelism exists in most computing tasks to make "super-super scalar" CPU architecture worthwhile for general purpose computing. A different architecture isn't going to change that.

It does remain to be proven, but AIUI the Mill team has investigated it and found some useful parallelism.

Quote
After all, isn't that why Hyperthreading came about? The dependencies in generating and consuming data was unable to keep all of the CPU core's functional units busy all of the time. So HT gives us two threads running on the same CPU core at the same time, competing for things like Integer ALUs, cache and memory bandwidth. The per-thread performance can drop somewhat (due to contention for shared resources), but the system overall might get 30% more work done.

Yes and no, but there are more differences than similarities.

The HPC mob, who traditionally stress whatever technology is available, found that HT slowed down their computations, so they disabled it.

HT used conventional OOO-SC cores. If you want HT "done properly", look at Sun's Niagara processors. They really did speed up the aggregate throughput of server-class workloads. The single thread performance was lower, but that was irrelevant in those worloads.

In the embedded arena, the XMOS xCORES are similar (and give cycle-accurate hard realtime guarantees).

Quote
And for the cases where there is a lot of parallelism we have GPUs and SIMD CPU instructions, or specialist DSP processors which expose more of the pipeline...

SIMD is entirely different and in no way comparable, but you know that. DSP is the inspiration for the Mill, and they are trying to bring such "mentality" to general purpose computing.

One point to beware is that the concepts of "instruction" and "operation" are very fluid. AIUI the Mill's definition is closer to that of x86 internal microoperations and to DSP instructions.
Title: Re: Mill CPU Architecture
Post by: legacy on March 02, 2018, 11:17:39 am
And here I thought it was supposed to be easy to port the runtime to any processor.  It must be possible because the military is doing something with the language and I suspect it for a processor other than an x86.

Frankly, we have Ada for SPARC and POWER machine running SUNOS and AIX, but ... they cost 20K euro per license

There is absolutely nothing in the opensource world of gcc && ada-core and llvm doesn't have a support =(

So, yes, in theory .....
Title: Re: Mill CPU Architecture
Post by: WorBlux on February 01, 2020, 05:09:39 am
Ya I know bit of a zombia thread, but the Mill has been of particular interest to me for some time. I'm not certain it will succeed, but it certainly seems like it has a shot. Just a few thoughts about some of the details being missed on this thread.

The problem is that you can't implement belt as a standard RAM.
The belt isn't RAM, it's registers.

It's neither ram nor registers*. The equivalent in a modern cpu is the bypass network. The data is held on the output latches of the functional units and there's a one or two stage NxM mux network. Moving data around. The belt labels just provide the semantics needed to move the route the right data. Larger members are slated to include a rename/tag network, while smaller members may track positions with some associative memory closer to the decode unit and a rotating index. Gate level details can and will vary be member as trade-offs the the options change with scale.

*The 32 position belt version does require fairly small registor file to provide enough pysical operand locations to let the spiller work correctly on funtion calls.

I'm not at all convinced that the inherent find-grained parallelism exists in most computing tasks to make "super-super scalar" CPU architecture worthwhile for general purpose computing. A different architecture isn't going to change that.
...
And for the cases where there is a lot of parallelism we have GPUs and SIMD CPU instructions, or specialist DSP processors which expose more of the pipeline...

Most code is in loops, and loops have unbounded ILP. If you are wide enough to pipeline and clever enough to vectorise most loops, theres a lot of ILP there.

And then it can also capture common data flows in a single instruction to approximate the advantage OoO actually provides (which they claim is mostly from better static schedules) And though it won't beat an OoO of equal width in this regard, it can go much wider without melting the die.

And of course it's not going to beat a DSP or GPU in what they already do well. Many of the core developers were on the Philips TriMedia team. They are trying to adapt DSP internals to general purpose code, which is where they get thier 10x claim.

Those with long experience were pointing out in 1996 that the Itanic's performance required fundamental advances that very smart people had been searching for since the 60s, largely without success. To presume that HP had smarter people was hubris.

Itanium was also released before it was fully ready. It wasn't all that wide and even then most code didn't use the width putting a lot of no-ops in the instruction stream. Cache misses were extremely painful, and dealing with control flow was awkward. The rotating registers were neat, but the compiler didn't really know how to leverage them and wasn't great at pipe-lining real code, and vectorization was also limited. And even when you could pipeline non-trivial loops, the prelude and postlude were so big you'd choke out the cache loading the instruction stream. I also believe clean function calls/context switches with that number of architectural registers was fairly painful as well.

I think the mill has done pretty well with control flow and being easy to pipeline/vectorise. I also think they've tackled code length fairly well with elided no-ops and very compact encodings.

I think it'll live or die on how well the split load actually works, how good the pre-fetch is, and what exactly their still undisclosed stream mechanism is. Also up in the air is weather the multi-core coherence protocol is any good. They hinted at the ability for cycle-accurate determinism between cores if you want it. (like a DSP)

I got a feeling at 8 or 16 positions applies to low end models and actual models that target performance will have much longer belt. In that respect it is not that different from a typical stack-based machine, and you need quite a bit of stack to do any real calculations. Constantly saving things that are about to fall off is a hassle and time sink.

Don't you actually need to save the whole belt contents?
No matter how ingenious your system is, you need to save the whole state, there is no working around that. I find it hard to believe that hand waving without some evidence.

I think there are operations that would require massive simultaneous (from a programmer point of view) changes to the belt. IIRC, it was something related to function calls.

Generally you aren't using registers as long term variable storage*. They're short term intermediate storage usually used less than two times. The compiler will try to schedule producers as close to consumers as possible, and the specializer will automaticly insert spill/fill as needed when this isn't possible. This scratchpad is on die SRAM, but can be be backed to ram on interupt/function call without OS intervention. And for very long-term intermediate results you can spill/fill to system RAM.

*Though a few mathematical libraries will do this, it's not the common case.

On function call the state is saved, but it's done asynchronosly. To do this there are actually twice as many places to hold operands as there are belt positions and like the names, a frame is associated with each operand. Scratch is allocated for the frame stack and operands can trickle out over the next few cycles. Likewise in flight operations retire normally, but are moved into the frame stack with a note about expected timing.

Function calls can be done by explicity naming belt positions to pass and the order to pass them. Functions can also be called by passing the whole belt. You can run into issues on this with split-join program flow. eg. if x a->b->c, else a->c. The compiler will try to if-convert and use speculable ops to eliminate the branch. If a branch is unavoidable, then c can only excpect one sort of belt config and you need to issue a conform op on one of the transitions to c.  Best case you have profile info or the compiler can guess which path is more common and only conform for the less common case. Even then it's a fairly cheap operation is it's just reconfigures the mux network control logic and it's not actually moving data around all over the place.