Author Topic: EEVblog #726 - Dual Xeon Video Editing Machine Build  (Read 72440 times)

0 Members and 1 Guest are viewing this topic.

Offline DanielS

  • Frequent Contributor
  • **
  • Posts: 798
Re: EEVblog #726 - Dual Xeon Video Editing Machine Build
« Reply #175 on: March 27, 2015, 11:34:48 pm »
Think: The CPU usage meter works by looking at OS time slices  If the CPU usage meter is showing "50% idle" then RAM configuration isn't the problem.
Another thing the CPU usage is showing is that there is a roughly 2:1 bias toward scheduling things on a single socket. Since the software likely does not have explicit NUMA support, most of its code and data will land on a single CPU and the scheduler will try to keep most of its threads on it to minimize socket-to-socket overhead.

Dual-channel 1600MT/s RAM is just about ideal for a quad-core i5 under most circumstances and is mostly still adequate for a 4C8T i7. For a dual-socket 6C12T Xeon, dual-channel 1333MT/s is grossly inadequate when CPU0 ends up hosting most application code and data for both sockets.

BTW, Intel's LGA2011 CPUs have memory controller instrumentation which allows the scheduler to make decisions based on memory controller load. If CPU0's memory controller is under heavy load, it makes no sense to schedule threads that rely on code and data hosted on CPU0 on other CPUs since there is no spare bandwidth to serve them.
 

Offline Fungus

  • Super Contributor
  • ***
  • Posts: 16664
  • Country: 00
Re: EEVblog #726 - Dual Xeon Video Editing Machine Build
« Reply #176 on: March 28, 2015, 10:37:25 am »
In either case however if the CPU's are at 100% then increasing memory bandwidth is not going to get you more CPU cycles.

True, but that's not what the CPU meter shows.

The CPU meter doesn't know how many CPU cycles did useful work and how many were wasted due to slow RAM (it will show "100%" in both those cases so long as the thread is doing *something*, no matter how slowly).

You've very subtly changed from "any work" to "useful work". Is that significant?

No, just trying to explain what the CPU meter shows. To me it shows very clearly that RAM isn't the bottleneck (yet)
 

Offline Fungus

  • Super Contributor
  • ***
  • Posts: 16664
  • Country: 00
Re: EEVblog #726 - Dual Xeon Video Editing Machine Build
« Reply #177 on: March 28, 2015, 10:39:37 am »
BTW, Intel's LGA2011 CPUs have memory controller instrumentation which allows the scheduler to make decisions based on memory controller load. If CPU0's memory controller is under heavy load, it makes no sense to schedule threads that rely on code and data hosted on CPU0 on other CPUs since there is no spare bandwidth to serve them.

Sure, but do you think Windows is that smart?

 

Offline DanielS

  • Frequent Contributor
  • **
  • Posts: 798
Re: EEVblog #726 - Dual Xeon Video Editing Machine Build
« Reply #178 on: March 28, 2015, 06:00:20 pm »
Sure, but do you think Windows is that smart?
There would not be much of a point in supporting multi-socket configurations if the scheduler is going to do a piss-poor job at trying to make the whole thing work reasonably efficiently. Though Microsoft might reserve the finer scheduling tricks for Windows HPC Server.

Even without the memory performance instrumentation, the CPU's performance counters have numbers on clock ticks and instructions executed that the scheduler can use to determine how efficiently the CPUs are at running processes and how much of an impact using remote CPUs has on threads hosted on each CPU. If CPU0's throughput drops more when scheduling CPU0-hosted processes on remote CPUs than what other CPUs contribute to CPU0's workload, you reduce the amount of CPU0 processes scheduled elsewhere to keep throughput close to optimum.
 

Offline Fungus

  • Super Contributor
  • ***
  • Posts: 16664
  • Country: 00
Re: EEVblog #726 - Dual Xeon Video Editing Machine Build
« Reply #179 on: March 29, 2015, 10:09:34 am »
Sure, but do you think Windows is that smart?
There would not be much of a point in supporting multi-socket configurations if the scheduler is going to do a piss-poor job at trying to make the whole thing work reasonably efficiently. Though Microsoft might reserve the finer scheduling tricks for Windows HPC Server.

Either that, or .... you put in an API so that users can do it themselves:

https://msdn.microsoft.com/en-us/library/windows/desktop/aa363804%28v=vs.85%29.aspx

Even without the memory performance instrumentation, the CPU's performance counters have numbers on clock ticks and instructions executed that the scheduler can use to determine how efficiently the CPUs are at running processes and how much of an impact using remote CPUs has on threads hosted on each CPU. If CPU0's throughput drops more when scheduling CPU0-hosted processes on remote CPUs than what other CPUs contribute to CPU0's workload, you reduce the amount of CPU0 processes scheduled elsewhere to keep throughput close to optimum.

It's a pretty theory but do you have any evidence to show that's what's happening here?
 

Offline DanielS

  • Frequent Contributor
  • **
  • Posts: 798
Re: EEVblog #726 - Dual Xeon Video Editing Machine Build
« Reply #180 on: March 30, 2015, 03:30:24 pm »
It's a pretty theory but do you have any evidence to show that's what's happening here?
While it does not go into details, this MSDN article definitely states that Windows' scheduler does favor scheduling threads on the CPU closest to the process' memory:
https://msdn.microsoft.com/en-us/library/windows/desktop/ms684251%28v=vs.85%29.aspx

That alone is enough to explain why only one CPU gets consistently loaded by non-NUMA-aware software.
 

Offline Fungus

  • Super Contributor
  • ***
  • Posts: 16664
  • Country: 00
Re: EEVblog #726 - Dual Xeon Video Editing Machine Build
« Reply #181 on: March 30, 2015, 05:19:19 pm »
It's a pretty theory but do you have any evidence to show that's what's happening here?
While it does not go into details, this MSDN article definitely states that Windows' scheduler does favor scheduling threads on the CPU closest to the process' memory:
https://msdn.microsoft.com/en-us/library/windows/desktop/ms684251%28v=vs.85%29.aspx

That alone is enough to explain why only one CPU gets consistently loaded by non-NUMA-aware software.

Yes, but it doesn't say it will ONLY schedule them on those CPUs (which makes sense - a thread running with slow(er) memory access can still do useful work).
 

Online IanJ

  • Supporter
  • ****
  • Posts: 1608
  • Country: scotland
  • Full time EE & Youtuber
    • IanJohnston.com
Re: EEVblog #726 - Dual Xeon Video Editing Machine Build
« Reply #182 on: March 30, 2015, 06:09:53 pm »
Here you go dave check this video out.i even did that while capturing my screen in HD res lol
i didn't show entire rendering process just time estimated time left by SOny movie studio and i completed the rendering it is almost the same

Without the screen cap running the time is:
8:13 CPU
1:20 GPU

don't tell me your video are different and this stuff, it is clear man that my CPU had a rough time dealing with this 2 minute clip while with GPU it was a breeze although it is an old beast. so Opencl is the way to go still wondering why it didn't work for you
http://youtu.be/1zbfnwaV9ds

edit: the whole clip is 2 minutes long realtime

Been doing some testing in order to try an replicate this.......to no avail so far (I get same/similar speeds as Dave), but what I did see is that in your YT vid you picked a profile called "BO"....a custom one you had saved. The issue I found though is that it depends on what profile you picked in the first place that you eventually saved to "BO" can influence the encoding.

Can you tell me what profile used to generate "BO".

PS. The fastest I can achieve (50fps source rendering to a 50fps .m2ts) is 17mins for a 9min video........pro-rata down works at about 3.77mins for your 2min video.

PPS. However, can't help but feel that this quote I found says it all:-
"as someone who uses vegas pro and media studio a decent amount I can say that GPU acceleration is hit or miss depending on the codec/container you source files are in as well as the render file type".

Thanks,

Ian
[i7 4770 cpu)
« Last Edit: March 30, 2015, 07:18:20 pm by IanJ »
Ian Johnston - Original designer of the PDVS2mini || Author of the free WinGPIB app.
Website - www.ianjohnston.com
YT Channel (electronics repairs & projects): www.youtube.com/user/IanScottJohnston, Twitter (X): https://twitter.com/IanSJohnston
 

Offline DanielS

  • Frequent Contributor
  • **
  • Posts: 798
Re: EEVblog #726 - Dual Xeon Video Editing Machine Build
« Reply #183 on: March 30, 2015, 08:48:00 pm »
a thread running with slow(er) memory access can still do useful work).
If the main CPU is running into memory bandwidth/IO contention, having extra processes from CPU1 accessing CPU0's memory may cause performance of processes running on CPU0 using local memory to degrade worse than whatever work CPU1 can provide and you end up with worse overall performance. With 12-24 threads competing against each other for open memory rows and bandwidth, things can get ugly fast much like how the performance of network links tends to go down the drain once they start congesting.

The "useful work" might not be so useful. I don't remember if Dave tried disabling the second CPU - I spot-checked the video but did not find that part if he did. On paper, a  single Xeon with quad-channel memory should easily match his i7.
 

Offline eneuro

  • Super Contributor
  • ***
  • Posts: 1528
  • Country: 00
Re: EEVblog #726 - Dual Xeon Video Editing Machine Build
« Reply #184 on: April 04, 2015, 03:02:50 pm »
http://docs.nvidia.com/cuda/cuda-c-programming-guide/
Quote
The challenge is to develop application software that transparently scales its parallelism to leverage the increasing number of processor cores, much as 3D graphics applications transparently scale their parallelism to manycore GPUs with widely varying numbers of cores.

Anyway, if we divide 24 cores in new dual CPUs upgraded system by 8 in reference we get: 24/8= 3, but if we multiply 24 cores by 2.6 Ghz ~ 62 [core GHz] and oryginal 8 cores * 3.7 Ghz ~ 30 [core Ghz] than it looks like 62/30 ~ 2 x times more [core Ghz] in new setup  ;)
However, it is only a estimation of [core GHz] without any investigating architecture differencies and memory setups.

Probably Nvidia Tesla's with optimized software for those monsters could make @Dave happy, especially no need to give away any solar energy back to grid with setup like those  >:D
http://www.nvidia.com/object/tesla-supercomputing-solutions.html
ASUS GeForce GTX 780 Ti for less than $1000, so worth to test its GPU processing power  8)

3GB Memory ~1GHz clock
http://www.nvidia.com/gtx-700-graphics-cards/gtx-780ti/
CUDA Cores    2880    :o
This thing should be really powerfull 2880 cores x 0.9 Ghz ~ 2500 [core GHz] which is 40x times more than latest @Dave setup :-+
« Last Edit: April 05, 2015, 07:48:42 am by eneuro »
12oV4dWZCAia7vXBzQzBF9wAt1U3JWZkpk
“Let the future tell the truth, and evaluate each one according to his work and accomplishments. The present is theirs; the future, for which I have really worked, is mine”  - Nikola Tesla
-||-|-
 

Offline Fungus

  • Super Contributor
  • ***
  • Posts: 16664
  • Country: 00
Re: EEVblog #726 - Dual Xeon Video Editing Machine Build
« Reply #185 on: April 06, 2015, 08:00:15 pm »
a thread running with slow(er) memory access can still do useful work).
If the main CPU is running into memory bandwidth/IO contention, having extra processes from CPU1 accessing CPU0's memory may cause performance of processes running on CPU0 using local memory to degrade worse than whatever work CPU1 can provide and you end up with worse overall performance. With 12-24 threads competing against each other for open memory rows and bandwidth, things can get ugly fast much like how the performance of network links tends to go down the drain once they start congesting.

The "useful work" might not be so useful.

You can always come up with a theoretical pathological case, yes.

But Xeon CPUs have big caches and a few pages ago we were discussing how that's unlikely to happen with a video codec. Video encoding is embarrassingly parallel.

« Last Edit: April 06, 2015, 08:02:32 pm by Fungus »
 

Offline DanielS

  • Frequent Contributor
  • **
  • Posts: 798
Re: EEVblog #726 - Dual Xeon Video Editing Machine Build
« Reply #186 on: April 07, 2015, 12:47:32 am »
But Xeon CPUs have big caches and a few pages ago we were discussing how that's unlikely to happen with a video codec. Video encoding is embarrassingly parallel.
Video encoding might be embarrassingly parallel but it is not the only variable in the equation: you have the file reading and parsing, the audio and video decoders, then you have whatever pre-processing the editing software applies on top, the audio and frame rate conversions when necessary, whatever other post-processing the video editing software might do before sending video to the encoding CODEC, packing the output file, etc. Each and every intermediate processing step may add a number of potential bottlenecks on top of the resource sharing issues for the second socket if they are not perfectly tuned to work with each other, which they rarely are when combining software from multiple independent sources - this can already be hard enough to achieve even within one's own code.
 

Offline vlad777

  • Frequent Contributor
  • **
  • Posts: 350
  • Country: 00
Re: EEVblog #726 - Dual Xeon Video Editing Machine Build
« Reply #187 on: April 27, 2015, 04:39:38 pm »
AMD landed a deal with Avid.
FirePro cards will support Avid Media Composer 8.4

1:40
Mind over matter. Pain over mind. Boss over pain.
-------------------------
 

Offline Fungus

  • Super Contributor
  • ***
  • Posts: 16664
  • Country: 00
Re: EEVblog #726 - Dual Xeon Video Editing Machine Build
« Reply #188 on: April 27, 2015, 07:11:06 pm »
But Xeon CPUs have big caches and a few pages ago we were discussing how that's unlikely to happen with a video codec. Video encoding is embarrassingly parallel.
Video encoding might be embarrassingly parallel but it is not the only variable in the equation: you have the file reading and parsing, the audio and video decoders, then you have whatever pre-processing the editing software applies on top, the audio and frame rate conversions when necessary, whatever other post-processing the video editing software might do before sending video to the encoding CODEC, packing the output file, etc. Each and every intermediate processing step may add a number of potential bottlenecks on top of the resource sharing issues for the second socket if they are not perfectly tuned to work with each other, which they rarely are when combining software from multiple independent sources - this can already be hard enough to achieve even within one's own code.

So basically we agree. The problem ISN'T the RAM, it's the codec. Putting that RAM in there will make bugger-all difference, a few percent at most.
 

Offline vlad777

  • Frequent Contributor
  • **
  • Posts: 350
  • Country: 00
Re: EEVblog #726 - Dual Xeon Video Editing Machine Build
« Reply #189 on: May 11, 2015, 07:19:06 pm »
Check out this 18 core Xeon!

« Last Edit: May 11, 2015, 07:25:17 pm by vlad777 »
Mind over matter. Pain over mind. Boss over pain.
-------------------------
 

Offline miguelvp

  • Super Contributor
  • ***
  • Posts: 5550
  • Country: us
Re: EEVblog #726 - Dual Xeon Video Editing Machine Build
« Reply #190 on: May 11, 2015, 10:02:27 pm »
That´s it!

Time to ask my work to get me a better machine than this crappy E5640 dual xeon at 2.67 GHz 8 cores each.

I can´t work with this sub'par system anymore after watching that.



Then again maybe it won´t allow me to take coffee breaks if it compiles my stuff in less than 5 minutes.

Edit, and yeah I can make those processors sweat a bit:

« Last Edit: May 11, 2015, 10:06:20 pm by miguelvp »
 

Offline necessaryevil

  • Regular Contributor
  • *
  • Posts: 133
  • Country: nl
Re: EEVblog #726 - Dual Xeon Video Editing Machine Build
« Reply #191 on: May 18, 2015, 05:57:55 pm »
Tried out some other memory yet, Dave?
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf