Author Topic: Why not use interleaving technique on multi-core processor ?  (Read 6123 times)

0 Members and 1 Guest are viewing this topic.

Offline tonygetTopic starter

  • Contributor
  • Posts: 13
Why not use interleaving technique on multi-core processor ?
« on: January 05, 2015, 03:09:28 am »
From my understanding, the current multi-core processor relies on software parallelisation, individual cores may execute separate task/thread concurrently. The disadvantage is that softwares which are not optimized for multi-threading can't fully utilize this feature, so does numerical integration algorithm.

Inspired from ADC interleaved sampling technique employed by oscilloscopes, I'm wondering why not use the same technique on CPU ? For instance, the time interval of 1GHz processor is 1ns, if you combine two cores with individual clock set 0.5ns apart, they take turns to execute lines of code, you would effectively get a 2GHz processor, isn't it ?
« Last Edit: January 05, 2015, 03:13:49 am by tonyget »
 

Offline Marco

  • Super Contributor
  • ***
  • Posts: 6716
  • Country: nl
Re: Why not use interleaving technique on multi-core processor ?
« Reply #1 on: January 05, 2015, 04:04:43 am »
Instructions are not generally independent AND their "execution" takes more time than 1 cycle (through pipelining). Superscalar processors already try to execute multiple instructions at the same time and they fail more often than not ... throwing another core into the mix with large communication latencies won't help.

In fact the reverse of what you suggest is more frequently done, use the resources from a single core to execute two threads (Hyperthreading for instance).
« Last Edit: January 05, 2015, 04:06:19 am by Marco »
 

Offline Psi

  • Super Contributor
  • ***
  • Posts: 9930
  • Country: nz
Re: Why not use interleaving technique on multi-core processor ?
« Reply #2 on: January 05, 2015, 05:27:31 am »
It's not uncommon for a CPU to "take a few guesses" at what the next microinstruction could be.
Or what operation might be done with the result of the current operation.

It then execute the most likely guesses and uses the correct one once that's known.

I forget what its called and what CPU do it.
"Branch prediction" comes to mind but i think that's more to do with if/else jump prediction.
« Last Edit: January 05, 2015, 05:30:49 am by Psi »
Greek letter 'Psi' (not Pounds per Square Inch)
 

Offline helius

  • Super Contributor
  • ***
  • Posts: 3639
  • Country: us
Re: Why not use interleaving technique on multi-core processor ?
« Reply #3 on: January 05, 2015, 06:17:35 am »
The technique called Thread-Level Speculation can be used to apply multiple processors (or cores) to a program written as a single thread. The idea is that at (some subset of) conditional branches, the program forks a thread for each path. The threads are later joined and the result from the branch that was actually taken is used. There were many papers written on this subject in the '00s, and several research CPU designs to support it were tested.
 

Offline tszaboo

  • Super Contributor
  • ***
  • Posts: 7369
  • Country: nl
  • Current job: ATEX product design
Re: Why not use interleaving technique on multi-core processor ?
« Reply #4 on: January 05, 2015, 07:38:29 am »
Communication between cores is rather slow. They only have the L3 cache shared, which is slower than L2, it is possible that several clock cycles required to send data between them.
 

Offline tggzzz

  • Super Contributor
  • ***
  • Posts: 19468
  • Country: gb
  • Numbers, not adjectives
    • Having fun doing more, with less
Re: Why not use interleaving technique on multi-core processor ?
« Reply #5 on: January 05, 2015, 12:37:08 pm »
In desktop CPUs, nowadays access to cache and main memory is the bottleneck. Doubly so when synchronisation is required.
There are lies, damned lies, statistics - and ADC/DAC specs.
Glider pilot's aphorism: "there is no substitute for span". Retort: "There is a substitute: skill+imagination. But you can buy span".
Having fun doing more, with less
 

Offline Mechatrommer

  • Super Contributor
  • ***
  • Posts: 11622
  • Country: my
  • reassessing directives...
Re: Why not use interleaving technique on multi-core processor ?
« Reply #6 on: January 05, 2015, 01:33:59 pm »
From my understanding, the current multi-core processor relies on software parallelisation
unless you can design cores that as intelligent as human to decide which one goes first which one later, that will be always the case. you can make presumption,preemptive and whatever you want to call it in your thesis, it will be not so close enough to "human made software parallelisation"

The disadvantage is that softwares which are not optimized for multi-threading can't fully utilize this feature, so does numerical integration algorithm.
whose fault?

Inspired from ADC interleaved sampling technique employed by oscilloscopes, I'm wondering why not use the same technique on CPU ? For instance, the time interval of 1GHz processor is 1ns, if you combine two cores with individual clock set 0.5ns apart, they take turns to execute lines of code, you would effectively get a 2GHz processor, isn't it ?
dont equate super "sampling" that can do the job in half the clock, each have there own ram and entirely separate chip/hardware. still what you see on the screen is overly undersampled from those collected (or realtime data) or just spits of psychedelics into 2d plane.

instructions execution is causal process, data collection is not, you can still collect the data of today even if you missed yesterday. but in computing/data processing, you try to process the data of yesterday to get of today, if yesterday is garbage, you get garbage today. what you are asking is a processor that can do the instruction and data processing to feed the next instruction in half the clock, in the end what you get? a 2GHz single processor, in each core. if its feasible, some phd in intel already came up with something like that long time ago. people struggled for things like "preemption computing" and its already ages old and has not been on par with "good enough" compared to "human made software parallelisation". and its only applicable in multitasking environment, two or several tasks/programs that do entirely different things and separate data. dont be too excited by that mumbo jumbo, even it, is not close to real deal of hard cores paralisation.

but well, we all know that we can easily say than done and its not wrong just by saying (dreaming) it. when we got the grant we'll be there ;)
Nature: Evolution and the Illusion of Randomness (Stephen L. Talbott): Its now indisputable that... organisms “expertise” contextualizes its genome, and its nonsense to say that these powers are under the control of the genome being contextualized - Barbara McClintock
 

Offline Howardlong

  • Super Contributor
  • ***
  • Posts: 5317
  • Country: gb
Re: Why not use interleaving technique on multi-core processor ?
« Reply #7 on: January 05, 2015, 01:36:06 pm »
The DSP world, for example TI's TMS320C6xxx series use VLIW (very large instruction word) to allow concurrent execution.

It's been nearly ten years since I worked on them, but my recollection is that they have 256 bit instruction words and run from on chip RAM, so as native 32 bit processors they can run up to 8 concurrent instructions. It's up to the compiler (or a super nerd) to present mutually exclusive concurrent instructions. DSP lends itself quite well to closely coupled parallism like this due to the mutually exclusive array computations.
 

Offline rob77

  • Super Contributor
  • ***
  • Posts: 2085
  • Country: sk
Re: Why not use interleaving technique on multi-core processor ?
« Reply #8 on: January 05, 2015, 02:18:42 pm »
The DSP world, for example TI's TMS320C6xxx series use VLIW (very large instruction word) to allow concurrent execution.

It's been nearly ten years since I worked on them, but my recollection is that they have 256 bit instruction words and run from on chip RAM, so as native 32 bit processors they can run up to 8 concurrent instructions. It's up to the compiler (or a super nerd) to present mutually exclusive concurrent instructions. DSP lends itself quite well to closely coupled parallism like this due to the mutually exclusive array computations.

same is done on Itanium processors , but it's up to the compiler to make use of this advantage. if the compiler fails to optimize for VLIW, then the extra horsepower is left unused.
 

Offline helius

  • Super Contributor
  • ***
  • Posts: 3639
  • Country: us
Re: Why not use interleaving technique on multi-core processor ?
« Reply #9 on: January 05, 2015, 02:37:10 pm »
Some notes:
The Itanium is not VLIW; its instruction words are wide, but they are not issued in lock-step in VLIW fashion. Instead, there are bits in each word that specify the data dependency of the instructions in the word, and these instructions are issued dynamically by the processor based on the available IUs according to those bits. So the instructions packed into each word can be issued either sequentially or in parallel. This approach was called "Explicit parallelism" by the HP team that designed the architecture.

Marco gave a good explanation of why the TS's idea cannot work.
 

Offline rob77

  • Super Contributor
  • ***
  • Posts: 2085
  • Country: sk
Re: Why not use interleaving technique on multi-core processor ?
« Reply #10 on: January 05, 2015, 02:49:58 pm »
Some notes:
The Itanium is not VLIW; its instruction words are wide, but they are not issued in lock-step in VLIW fashion. Instead, there are bits in each word that specify the data dependency of the instructions in the word, and these instructions are issued dynamically by the processor based on the available IUs according to those bits. So the instructions packed into each word can be issued either sequentially or in parallel. This approach was called "Explicit parallelism" by the HP team that designed the architecture.

Marco gave a good explanation of why the TS's idea cannot work.

from wikipedia: http://en.wikipedia.org/wiki/Very_long_instruction_word
Quote
Outside embedded processing markets, Intel's Itanium IA-64 EPIC and Elbrus 2000 appear as the only examples of a widely used VLIW CPU architectures. However, EPIC architecture is sometimes distinguished from a pure VLIW architecture, since EPIC advocates full instruction predication, rotating register files, and a very long instruction word that can encode non-parallel instruction groups.

and many other sources are marking Itanium as VLIW, despite the fact is not pure VLIW... because they had to add many more features (prediction, multi-core, etc....) to compensate for inability to fully utilize the advantages by the available software.... (what is the extra horse power good for on your high-end server, if the DB software is not able to use it ?)
 

Offline tggzzz

  • Super Contributor
  • ***
  • Posts: 19468
  • Country: gb
  • Numbers, not adjectives
    • Having fun doing more, with less
Re: Why not use interleaving technique on multi-core processor ?
« Reply #11 on: January 05, 2015, 07:15:51 pm »
The DSP world, for example TI's TMS320C6xxx series use VLIW (very large instruction word) to allow concurrent execution.

It's been nearly ten years since I worked on them, but my recollection is that they have 256 bit instruction words and run from on chip RAM, so as native 32 bit processors they can run up to 8 concurrent instructions. It's up to the compiler (or a super nerd) to present mutually exclusive concurrent instructions. DSP lends itself quite well to closely coupled parallism like this due to the mutually exclusive array computations.

same is done on Itanium processors , but it's up to the compiler to make use of this advantage. if the compiler fails to optimize for VLIW, then the extra horsepower is left unused.

When, not if, all but one of the predicated results are thrown away, that horsepower and the associated joules have been thrown away. Note that large datacentres tend to be power limited nowadays, and are often sited next to large bodies of water.

People have been trying to automatically extract such parallelism as the Itanium requires since the 70s, without success.

Itanic is dead.
There are lies, damned lies, statistics - and ADC/DAC specs.
Glider pilot's aphorism: "there is no substitute for span". Retort: "There is a substitute: skill+imagination. But you can buy span".
Having fun doing more, with less
 

Offline rob77

  • Super Contributor
  • ***
  • Posts: 2085
  • Country: sk
Re: Why not use interleaving technique on multi-core processor ?
« Reply #12 on: January 06, 2015, 05:30:55 am »
Itanic is dead.

Yes it is dead ;) but not completely yet, for really huge servers it's still one of the best available solutions (e.g. HP Superdome). furthermore, smaller itanium hardware is still used for running IO intensive SAP installations and databases, but it's getting replaced by x86 servers in this area (since x86 architecture made a giant leap forward and overcome the north-bridge IO and memory bottleneck).
 

Offline tggzzz

  • Super Contributor
  • ***
  • Posts: 19468
  • Country: gb
  • Numbers, not adjectives
    • Having fun doing more, with less
Re: Why not use interleaving technique on multi-core processor ?
« Reply #13 on: January 06, 2015, 08:38:52 am »
Itanic is dead.

Yes it is dead ;) but not completely yet, for really huge servers it's still one of the best available solutions (e.g. HP Superdome). furthermore, smaller itanium hardware is still used for running IO intensive SAP installations and databases, but it's getting replaced by x86 servers in this area (since x86 architecture made a giant leap forward and overcome the north-bridge IO and memory bottleneck).

Back in 2001-3 Ihad difficulty persuading a PARISC house that they shouldn't believe the hype about Itanic, and should consider x86 machines for future products. I became much less difficult when AMD's x86-64 plus HyperTransport arrived in their Sledghammer processors. Sun's T1 processor was also impressive on the right workloads.

The era of presuming that coherent shared memory can be scaled has fortunately passed.

I expect the long-term future to be a processor based around 1-8 cores closely coupled to shared memory. Scalability will be based on explicit application-level message passing between such processors.

It will be interesting to see which languages/libraries are best able to aid/hinder programmers.
There are lies, damned lies, statistics - and ADC/DAC specs.
Glider pilot's aphorism: "there is no substitute for span". Retort: "There is a substitute: skill+imagination. But you can buy span".
Having fun doing more, with less
 

Offline Howardlong

  • Super Contributor
  • ***
  • Posts: 5317
  • Country: gb
Re: Why not use interleaving technique on multi-core processor ?
« Reply #14 on: January 06, 2015, 09:52:06 pm »
What I would say is that trying to figure out what is going on at the instruction level when an optimising compiler has got its hands on a VLIW like the TI C6000 series is like nothing else I've ever seen before or since.

While supposedly we should just trust the compiler, occasionally, just occasionally, there will be a functional compiler bug. Not often, but it does happen. The last functional compiler bug I found was about five years ago, but that was on a PIC, so hardly difficult to find and characterise.
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf