Author Topic: EEVblog #726 - Dual Xeon Video Editing Machine Build  (Read 72442 times)

0 Members and 1 Guest are viewing this topic.

Offline Fungus

  • Super Contributor
  • ***
  • Posts: 16664
  • Country: 00
Re: EEVblog #726 - Dual Xeon Video Editing Machine Build
« Reply #150 on: March 25, 2015, 12:08:51 pm »
Almost all applications will go slower with too many cores.

But not all... some of them are "embarrassingly parallel".

Also is often preferable not too use all available cores with your application. Leave one or two free for the operating system.

Encoding threads can be set to low priority without much loss in overall encoding speed. You could keep on reading EEVBLOG forums with no problem even if all CPUs were in use by the codec.


 

Offline sync

  • Frequent Contributor
  • **
  • Posts: 799
  • Country: de
Re: EEVblog #726 - Dual Xeon Video Editing Machine Build
« Reply #151 on: March 25, 2015, 12:26:50 pm »
Almost all applications will go slower with too many cores.

But not all... some of them are "embarrassingly parallel".
Yeah, the TOP500 Linpack benchmark and other non-real world things...
Only very few applications scale well with 100s or 1000s cores.
 

Offline Fungus

  • Super Contributor
  • ***
  • Posts: 16664
  • Country: 00
Re: EEVblog #726 - Dual Xeon Video Editing Machine Build
« Reply #152 on: March 25, 2015, 09:19:43 pm »
Yeah, the TOP500 Linpack benchmark and other non-real world things...
Only very few applications scale well with 100s or 1000s cores.

3D Movie rendering is quite "real world". So is weather forecasting and any number of engineering calculations that use finite element analysis (which is done for an awful lot of things these days).

« Last Edit: March 25, 2015, 09:22:18 pm by Fungus »
 

Offline sync

  • Frequent Contributor
  • **
  • Posts: 799
  • Country: de
Re: EEVblog #726 - Dual Xeon Video Editing Machine Build
« Reply #153 on: March 25, 2015, 10:27:31 pm »
3D Movie rendering is quite "real world". So is weather forecasting and any number of engineering calculations that use finite element analysis (which is done for an awful lot of things these days).
These are real world applications. But not the Linpack benchmark. It runs a independent Linpack process on each core without communication between them. This is completely different from FEM or weather forecasts. Which have dependencies and communications between the processes/threads. This is a big problem for scaling with more cores. I guess that 3D movie rendering doesn't have this problem and scales easily and well.
 

Offline DanielS

  • Frequent Contributor
  • **
  • Posts: 798
Re: EEVblog #726 - Dual Xeon Video Editing Machine Build
« Reply #154 on: March 26, 2015, 12:29:09 am »
This is completely different from FEM or weather forecasts. Which have dependencies and communications between the processes/threads.
The main goal of any high-performance computing application is reducing dependencies to the absolute minimum. For things like atomic-scale FEM, you can divide your problem space into twice as many isolated cubes as you have CPU cores available with some overlap with neighboring cubes, process one simulation step, resolve border interactions between completed adjacent cubes, once the step-one cubes are done, start sending out step-2 cubes, rinse and repeat. Depending on how massive the simulation is, processes may not need to talk with each other more than once every several minutes and at this point, linpack actually becomes a reasonable best-case approximation.

When you want your problem to scale to 100 000 cores, you cannot afford to waste much more than 1/100 000th of your time on inter-process communications system-wide. If such machines exist, whoever commissions them must have a few programmers who know how to write useful software that scales to that extent. In my scenario above, inter-process communications would be nearly nonexistent beyond the initial setup, periodic progress tracking and background transfers to keep data sets and results moving without letting CPUs stall.
 

Offline mux

  • Regular Contributor
  • *
  • Posts: 119
Re: EEVblog #726 - Dual Xeon Video Editing Machine Build
« Reply #155 on: March 26, 2015, 11:51:59 am »
Sorry if this has been asked before but I can't find it: has Dave uploaded source files somewhere so we can try out settings for ourselves? I use Vegas as well and I've used CUDA offloading a couple times with some success. Maybe I can find out better settings to speed things along.
 

Offline sync

  • Frequent Contributor
  • **
  • Posts: 799
  • Country: de
Re: EEVblog #726 - Dual Xeon Video Editing Machine Build
« Reply #156 on: March 26, 2015, 01:11:45 pm »
The main goal of any high-performance computing application is reducing dependencies to the absolute minimum.
I agree. Compare this to Dave's Sony Vegas problem. I don't think it's well optimized in this regard. He didn't get full CPU utilization with all the cores. I think it's because of dependencies.

Quote
For things like atomic-scale FEM, you can divide your problem space into twice as many isolated cubes as you have CPU cores available with some overlap with neighboring cubes, process one simulation step, resolve border interactions between completed adjacent cubes, once the step-one cubes are done, start sending out step-2 cubes, rinse and repeat. Depending on how massive the simulation is, processes may not need to talk with each other more than once every several minutes and at this point, linpack actually becomes a reasonable best-case approximation.
Interesting. My experience with commercial mechanical and CFD FEM simulation software is that there is many communication between the nodes. I didn't saw no communication for minutes. Only for a few seconds max.
 

Offline Stonent

  • Super Contributor
  • ***
  • Posts: 3824
  • Country: us
Re: EEVblog #726 - Dual Xeon Video Editing Machine Build
« Reply #157 on: March 26, 2015, 02:08:43 pm »
Just had a thought.

For comparison's sake, move all the memory to the primary processor's NUMA node and remove or disable the second processor completely and run the tests again.  That might show inherent issues with parallelism or NUMA cache misses / transfers between nodes.
The larger the government, the smaller the citizen.
 

Offline necessaryevil

  • Regular Contributor
  • *
  • Posts: 133
  • Country: nl
Re: EEVblog #726 - Dual Xeon Video Editing Machine Build
« Reply #158 on: March 26, 2015, 07:01:47 pm »
Is there any chance of a followup, Dave? You predicted a shitstorm of comments, but on the contrary: I liked it and now I want to know!
 

Offline Fungus

  • Super Contributor
  • ***
  • Posts: 16664
  • Country: 00
Re: EEVblog #726 - Dual Xeon Video Editing Machine Build
« Reply #159 on: March 26, 2015, 07:42:28 pm »
Is there any chance of a followup, Dave? You predicted a shitstorm of comments, but on the contrary: I liked it and now I want to know!

Big mystery:  What did Dave decide to do....? 

 

Offline Rasz

  • Super Contributor
  • ***
  • Posts: 2616
  • Country: 00
    • My random blog.
Re: EEVblog #726 - Dual Xeon Video Editing Machine Build
« Reply #160 on: March 26, 2015, 09:01:25 pm »
Big mystery:  What did Dave decide to do....?

Thats simple: ignore feedback, conclude he was right all along. Thats what I would do :)
Many people already said problem lies in the insane double encoding workflow, bud Dave wont change it, end of story.
Who logs in to gdm? Not I, said the duck.
My fireplace is on fire, but in all the wrong places.
 

Offline DanielS

  • Frequent Contributor
  • **
  • Posts: 798
Re: EEVblog #726 - Dual Xeon Video Editing Machine Build
« Reply #161 on: March 27, 2015, 01:12:09 am »
Interesting. My experience with commercial mechanical and CFD FEM simulation software is that there is many communication between the nodes. I didn't saw no communication for minutes. Only for a few seconds max.
Harmless, non-performance-critical background chatter, sure: even when simulation domains are mostly independent, partial results still need to get forwarded to whatever node is scheduled to resolve border effects with the neighbors when the related blocks complete and the scheduler needs to keep tabs on what the nodes are doing to decide what to schedule where so everything can run smoothly, making sure everything has the data it needs before it needs it. An actual remote data dependency on the other hand would quickly ruin performance.
 

Offline Fungus

  • Super Contributor
  • ***
  • Posts: 16664
  • Country: 00
Re: EEVblog #726 - Dual Xeon Video Editing Machine Build
« Reply #162 on: March 27, 2015, 10:47:31 am »
Big mystery:  What did Dave decide to do....?

Thats simple: ignore feedback, conclude he was right all along. Thats what I would do :)

Nah, the engineer in him won't let him sleep at night if he does that. Look at the machine he just built to try and solve things.

Many people already said problem lies in the insane double encoding workflow, bud Dave wont change it, end of story.

It wouldn't be a problem if the encoding was faster.

I don't get what's so compelling about Sony Movie that he can't switch. The "workflow" for what he's doing is pretty much the same in all video editing packages. He's not making Hollywood moves here with green-screen actors mixed with CGI and 20 layers of post-production lighting effects, he's joining short video clips together, trimming start/end of each clip, adding overlay text then encoding the result.  If another package encode three or four times faster than Sony then I say use that other package.
 

Online EEVblogTopic starter

  • Administrator
  • *****
  • Posts: 37740
  • Country: au
    • EEVblog
Re: EEVblog #726 - Dual Xeon Video Editing Machine Build
« Reply #163 on: March 27, 2015, 11:38:03 am »
Thats simple: ignore feedback, conclude he was right all along. Thats what I would do :)
Many people already said problem lies in the insane double encoding workflow, bud Dave wont change it, end of story.

I've already said I'm moving to a faster uncompressed output from Sony and have posted times from that.
 

Online EEVblogTopic starter

  • Administrator
  • *****
  • Posts: 37740
  • Country: au
    • EEVblog
Re: EEVblog #726 - Dual Xeon Video Editing Machine Build
« Reply #164 on: March 27, 2015, 11:40:41 am »
Nah, the engineer in him won't let him sleep at night if he does that. Look at the machine he just built to try and solve things.

The machine I just built was done so because:
a) People kindly sent me some stuff for free to use
b) It would at least offer some improvement over what I have now, and it was interesting to find out how much.

I would not have chosen the hardware I did if it was all from scratch.
 

Online EEVblogTopic starter

  • Administrator
  • *****
  • Posts: 37740
  • Country: au
    • EEVblog
Re: EEVblog #726 - Dual Xeon Video Editing Machine Build
« Reply #165 on: March 27, 2015, 11:42:41 am »
Is there any chance of a followup, Dave? You predicted a shitstorm of comments, but on the contrary: I liked it and now I want to know!

Probably not on the main channel, it's just not that interesting.
I have already posted what I plan to do on the blog page.
 

Offline Fungus

  • Super Contributor
  • ***
  • Posts: 16664
  • Country: 00
Re: EEVblog #726 - Dual Xeon Video Editing Machine Build
« Reply #166 on: March 27, 2015, 12:08:54 pm »
Thats simple: ignore feedback, conclude he was right all along. Thats what I would do :)
Many people already said problem lies in the insane double encoding workflow, bud Dave wont change it, end of story.

I've already said I'm moving to a faster uncompressed output from Sony and have posted times from that.

It still takes 1:45 for a 1 minute video though. Not exactly screaming speed (five times slower then PowerDirector would do it).

 

Online EEVblogTopic starter

  • Administrator
  • *****
  • Posts: 37740
  • Country: au
    • EEVblog
Re: EEVblog #726 - Dual Xeon Video Editing Machine Build
« Reply #167 on: March 27, 2015, 12:17:13 pm »
It still takes 1:45 for a 1 minute video though. Not exactly screaming speed (five times slower then PowerDirector would do it).

That remains to be seen. I have installed it but have not yet tried it. IIRC last time I tried it it was, meh.
 

Offline necessaryevil

  • Regular Contributor
  • *
  • Posts: 133
  • Country: nl
Re: EEVblog #726 - Dual Xeon Video Editing Machine Build
« Reply #168 on: March 27, 2015, 12:24:56 pm »
The Xeon Phi is out now! I hope someone will donate one to Dave!
 

Offline Fungus

  • Super Contributor
  • ***
  • Posts: 16664
  • Country: 00
Re: EEVblog #726 - Dual Xeon Video Editing Machine Build
« Reply #169 on: March 27, 2015, 12:25:40 pm »
It still takes 1:45 for a 1 minute video though. Not exactly screaming speed (five times slower then PowerDirector would do it).

That remains to be seen. I have installed it but have not yet tried it. IIRC last time I tried it it was, meh.

Actually, no ... I can encode 1080p50 video at 3.5x speed on my i5.

You have an i7 with twice as many cores so it should be way faster than that - you might be able to encode an hour of video in ten minutes!

 

Offline vlad777

  • Frequent Contributor
  • **
  • Posts: 350
  • Country: 00
Re: EEVblog #726 - Dual Xeon Video Editing Machine Build
« Reply #170 on: March 27, 2015, 04:57:40 pm »
Another thing to keep in mind: LGA2011 CPUs have a quad-channel memory controller so unless you put four similar DIMMs on each CPU, you are only enabling half of the RAM bandwidth each CPU is capable of. This could be a massive bottleneck when all 24 threads are enabled.

I completely agree. Dave doesn't seem to appreciate memory channels but they are duck's guts of speed.
Mind over matter. Pain over mind. Boss over pain.
-------------------------
 

Offline DanielS

  • Frequent Contributor
  • **
  • Posts: 798
Re: EEVblog #726 - Dual Xeon Video Editing Machine Build
« Reply #171 on: March 27, 2015, 07:54:11 pm »
You have an i7 with twice as many cores so it should be way faster than that - you might be able to encode an hour of video in ten minutes!
The i7 has exactly the same number of cores as the i5, except the i7 has 2MB extra L3 cache and has HyperThreading enabled, which lets each core run two threads for better thread-level parallelism and more efficient use of execution units within each core.
 

Offline Fungus

  • Super Contributor
  • ***
  • Posts: 16664
  • Country: 00
Re: EEVblog #726 - Dual Xeon Video Editing Machine Build
« Reply #172 on: March 27, 2015, 08:00:55 pm »
You have an i7 with twice as many cores so it should be way faster than that - you might be able to encode an hour of video in ten minutes!
The i7 has exactly the same number of cores as the i5, except the i7 has 2MB extra L3 cache and has HyperThreading enabled, which lets each core run two threads for better thread-level parallelism and more efficient use of execution units within each core.

That's what I meant to say - it can do nearly twice as much work as an i5.   :-[

I think each pair of cores has two execution units but a single floating point unit. Heavy floating point math is a teeny bit less efficient with hyperthreading than with separate cores.

The FPU works like a FIFO: You put numbers in the front and 'X' clock cycles later the result comes out the other end, with 'X' depending on the operation.

If two threads are putting numbers into the FIFO then single cycle instructions like addition can take a hit because they have to alternate. OTOH instructions which take many clock cycles (multiply, divide, sqrt, sin/cos, etc.) will make much better use of the FPU because with two threads you can have twice as many of them passing through the FIFO at the same time.

In reality the effect on single-cycle instructions isn't too bad because it's quite difficult to put a new number into the FIFO on every single clock cycle - you need to store results, fetch new operands from RAM, etc., this frees up the FPU for the other thread. With the rigth code they can interleave perfectly with no clashes.

(Or at least, that's how "hyperthreading" worked when I was younger...)
« Last Edit: March 27, 2015, 08:17:22 pm by Fungus »
 

Offline Fungus

  • Super Contributor
  • ***
  • Posts: 16664
  • Country: 00
Re: EEVblog #726 - Dual Xeon Video Editing Machine Build
« Reply #173 on: March 27, 2015, 08:01:09 pm »
Another thing to keep in mind: LGA2011 CPUs have a quad-channel memory controller so unless you put four similar DIMMs on each CPU, you are only enabling half of the RAM bandwidth each CPU is capable of. This could be a massive bottleneck when all 24 threads are enabled.

I completely agree. Dave doesn't seem to appreciate memory channels but they are duck's guts of speed.

I think Dave appreciates it just fine (he actually mentioned it in the video!), he just doesn't believe it's the bottleneck here.

FWIW I agree with him.

Think: The CPU usage meter works by looking at OS time slices  If the CPU usage meter is showing "50% idle" then RAM configuration isn't the problem. The problem is that 50% of the CPUs aren't being handed any work to do by the compression codec.

The time to talk about RAM bank configurations is when the meter shows 100% usage, not before.

 

Offline Fungus

  • Super Contributor
  • ***
  • Posts: 16664
  • Country: 00
Re: EEVblog #726 - Dual Xeon Video Editing Machine Build
« Reply #174 on: March 27, 2015, 09:23:02 pm »
In either case however if the CPU's are at 100% then increasing memory bandwidth is not going to get you more CPU cycles.

True, but that's not what the CPU meter shows.

The CPU meter doesn't know how many CPU cycles did useful work and how many were wasted due to slow RAM (it will show "100%" in both those cases so long as the thread is doing *something*, no matter how slowly).
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf