PS while intel uses hyperthreading, AMD's now CPU's are also using something similar,consider it a better version of hyperthreading.
The AMD 8 core CPU's are not true 8 core chips since many of the components are shared where a traditional multi core CPU will have the various parts not shared.
Thats why AMD labels the 8 core CPU as having 4 core modules.
AMD doesn't use hyperthreading (or a version of it)
Each AMD "module" looks like this:
Basically, a module is made out of two separate cores each with their own integer section, that share a floating point section. That's the reason why people say the eight cores on AMD processors are not 8 real cores (because Intel's cores each have one integer and one floating point), and they're right : they're 8 integer cores and 4 to 8 floating point cores, depending on what type of floating point arithmetic threads want to use.
x264 - video encoding in general - is 99% integer based, floating point only matters at games and 3d rendering and other things.
So, in the video encoding area, an AMD processor will act as eight individual cores, with each 2 cores having maybe a 1-2% penalty due to the scheduler having to split and assign integer instructions to each core. During video encoding, the shared floating point module is NOT used, so it doesn't matter.
AMD processors simply have lower throughput and use more cycles for some instructions, hence why a processors like 3770k that has only 4 cores (but has more cpu extensions that are very useful for video encoding) can keep up with the AMD processors.
If 2 threads fall on a single core module (with it's supposedly 2 cores), you will see significantly lower speeds than if you take the same 2 threads and place them on 2 different core modules.
http://www.extremetech.com/computing/138394-amds-fx-8350-analyzed-does-piledriver-deliver-where-bulldozer-fell-short/2
All those tests in the link you posted are about software using the floating point section of modules.
When piledriver/bulldozer was launched, the operating system would not recognize this particularity of AMD processors, that the floating point unit is shared between two cores, so four threads would be put on 2 modules, each 2 threads sharing a floating point module.
A Windows patch later, and Windows these days will by default spread threads on separate modules in case those threads do floating point stuff.
Vishera just tweaked the architecture.
If you want to be honest about it, post the same tests done with x264 and playing with modules/cores, and you'll see there's no difference. Or better yet, tell me why that website didn't test with x264 but tested with only software using floating point stuff.