Author Topic: how to test a 3T harddrive?  (Read 6289 times)

0 Members and 1 Guest are viewing this topic.

Offline bd139

  • Super Contributor
  • ***
  • Posts: 23102
  • Country: gb
Re: how to test a 3T harddrive?
« Reply #25 on: August 24, 2021, 01:59:55 pm »
LOL enjoy kubernetes on horrible disks like that.

We have about 900TiB of enterprise SSD and EBS online and that’s still giving us IO issues. There ain’t enough time in the world for the disk to go round to the head these days.
 

Offline DiTBhoTopic starter

  • Super Contributor
  • ***
  • Posts: 4452
  • Country: gb
Re: how to test a 3T harddrive?
« Reply #26 on: August 24, 2021, 03:30:40 pm »
horrible disks like that

Why horrible? They are WDC WD30EFRX-68EUZN0, 3TB red-line sATA hard-drives.

The only problem was the slow controllers I had on hands before finding the MEGAraid card, and that all the nodes of the clusters will be delivered next week, otherwise I would have directly installed the disks into nodes and used them to test all the disks.
The opposite of courage is not cowardice, it is conformity. Even a dead fish can go with the flow
 

Offline bd139

  • Super Contributor
  • ***
  • Posts: 23102
  • Country: gb
Re: how to test a 3T harddrive?
« Reply #27 on: August 24, 2021, 04:02:16 pm »
Kubernetes clusters tend to have massively fragmented IO. 5400rpm + rotating disk = IO waits = cluster of pain. Obviously depends on workload profile but that’s going to suck. It’s sucks on 15K DAS SAS on HP DL580G10’s.
 

Offline David Hess

  • Super Contributor
  • ***
  • Posts: 17489
  • Country: us
  • DavidH
Re: how to test a 3T harddrive?
« Reply #28 on: August 24, 2021, 08:29:49 pm »
PCI-machines cannot give you more than 20Mbyte/sec with SATA HBA on a PCI32bit@33Mhz, it's easy to say "shitty-ass", but it's hard to debug the kernel 5 to squeeze out more performance.

I have regularly gotten 100+ MByte/sec on my ancient (Pentium III and Pentium4) PCI machines and SATA 1.  Often the limitation is how the hard drive is being accessed.

Quote
For weird reasons here I only have available only a SBC and an old RISC workstation. Both are slow with sATA, both are based on PCI32bit@33Mhz. All the *big* and *modern* PCIe machines here are busy and I cannot use them, neither I can use PCs in our office, today I bring my personal PC from home, it's a modern PCIe 1x minicomputer, but I can somehow attach an adapter and use a MEGAraid card.

The MEGAraid is PCIe16, it comes with a smart controller, and two SAS lanes, which means 8 sATA links.

Maybe I can test 8 harddrives in parallel  :D

My suggestion about using a better HBA is that some cards, like the old Areca ones I mentioned, have the read and write diagnostic functions built in, so all of the read and write traffic is over the SATA buses between the hard drives and the card and not over the PCI bus.
 

Offline DiTBhoTopic starter

  • Super Contributor
  • ***
  • Posts: 4452
  • Country: gb
Re: how to test a 3T harddrive?
« Reply #29 on: August 24, 2021, 09:38:00 pm »
I have regularly gotten 100+ MByte/sec on my ancient (Pentium III and Pentium4) PCI machines and SATA 1.  Often the limitation is how the hard drive is being accessed.

I tried with different cards (sATA, SCSI, optical Ethernet) and modern hard-drives (sATA, UW160 SCSI 15Krpm), but not on x86 machines.

Linux on machines like Apple PowerMac G4 are slow with the PCI. Never tried with MacOSX, but it's pointless, that PowerMac is there for an industrial scanner, that's its purpose, and I won't touch it anymore, even because it's slow.
The opposite of courage is not cowardice, it is conformity. Even a dead fish can go with the flow
 

Offline David Hess

  • Super Contributor
  • ***
  • Posts: 17489
  • Country: us
  • DavidH
Re: how to test a 3T harddrive?
« Reply #30 on: August 24, 2021, 09:50:19 pm »
I have regularly gotten 100+ MByte/sec on my ancient (Pentium III and Pentium4) PCI machines and SATA 1.  Often the limitation is how the hard drive is being accessed.

I tried with different cards (sATA, SCSI, optical Ethernet) and modern hard-drives (sATA, UW160 SCSI 15Krpm), but not on x86 machines.

Linux on machines like Apple PowerMac G4 are slow with the PCI. Never tried with MacOSX, but it's pointless, that PowerMac is there for an industrial scanner, that's its purpose, and I won't touch it anymore, even because it's slow.

During the Pentium4 era, I had some nVidia nForce2 boards for less critical applications and they could only pump about 90 MBytes/sec over their PCI bus.  If there was a configuration setting to fix this, then I never found it.  Comparable Intel boards managed at least 120 MBytes/sec.

As you have observed, PCI bus performance can be limited to a considerably lower level.

 

Offline BradC

  • Super Contributor
  • ***
  • Posts: 2156
  • Country: au
Re: how to test a 3T harddrive?
« Reply #31 on: August 25, 2021, 12:44:58 am »
As previously mentioned, a full disk read can be done with a SMART long test. This will abort on the first read error.

A full disk write can be done with a secure erase : https://grok.lsu.edu/article.aspx?articleid=16716

With an WD drives in an array, we disable the spin-down and head-park. When we "burn-in" new drives we do it by building them into the array they are destined for (in a staging machine) and beating the snot out of them for a week with random IO workloads. This is time consuming and resource intensive and has *never* weeded out an early life failure. We've still had the odd disk die from a couple of months to within the warranty period.

Next time I build an array I'm going to do a SMART long on the drives, build the array, let them idle for a week and put them into service.
 
The following users thanked this post: DiTBho

Online Jeroen3

  • Super Contributor
  • ***
  • Posts: 4210
  • Country: nl
  • Embedded Engineer
    • jeroen3.nl
Re: how to test a 3T harddrive?
« Reply #32 on: August 25, 2021, 07:25:34 am »
(I have a basket with 24 of those bloody huge harddrives, they are for the new rack of storage, but since machines here are all busy I can only use two embedded SBCs for testing harddrives  :palm:

I think my boss asked me to do this job as kind of penalty retribution for having offended her preferred elixir functional programming language, which honestly - I do think - is pure garbage.
Just toss them in the rack and build up the array. If one fails during the build, just replace it and rebuild. It's why your have more than one.
Testing each drive before putting them in an array does not add any value.
 
The following users thanked this post: DiTBho

Offline DiTBhoTopic starter

  • Super Contributor
  • ***
  • Posts: 4452
  • Country: gb
Re: how to test a 3T harddrive?
« Reply #33 on: August 25, 2021, 09:23:23 am »
Just toss them in the rack

I don't have a rack

and build up the array

I don't have to build an array

Testing each drive before putting them in an array does not add any value.

My colleagues have to build a cluster, two disks per nodes in software RAID1.
But the hardware to build the cluster hasn't yet been delivered.
The opposite of courage is not cowardice, it is conformity. Even a dead fish can go with the flow
 

Offline DiTBhoTopic starter

  • Super Contributor
  • ***
  • Posts: 4452
  • Country: gb
Re: how to test a 3T harddrive?
« Reply #34 on: August 25, 2021, 09:24:45 am »
A full disk write can be done with a secure erase : https://grok.lsu.edu/article.aspx?articleid=16716

That's very interesting! Never heard about this. Thank you!
Does it also work for SCSI? I have a dozen of disks at home, but this is for hobby purpose.
The opposite of courage is not cowardice, it is conformity. Even a dead fish can go with the flow
 

Online magic

  • Super Contributor
  • ***
  • Posts: 7526
  • Country: pl
Re: how to test a 3T harddrive?
« Reply #35 on: August 25, 2021, 09:28:23 am »
It doesn't work for almost anything.

Well, SSDs often support it because they use AES as scrambling algorithm and therefore SE is as easy to them as throwing away the key and generating a new one (and that's how they do it).
 
The following users thanked this post: DiTBho

Offline DiTBhoTopic starter

  • Super Contributor
  • ***
  • Posts: 4452
  • Country: gb
Re: how to test a 3T harddrive?
« Reply #36 on: August 25, 2021, 11:07:57 am »
This morning I quickly found two defectives harddrives  :-BROKE

The controller said *there is no physical device connected to the sATA channel xx* ... but obviously there were two pieces of physical metal with a connector on each, but it was just ... dead rusty stuff.

I double checked the MOLEX connectors, tried again, same result: the motor did a weird sound, like like a wounded animal, and then it fell silent by turning off all the leds and signals.

They are dead, I am sure they are dead disks, the MEGAraid controller never goes wrong. I am going to open a return ticket for a full replacement, I hope Amazon will be fast with that.

Let's go testing the others, with the current speed I can test up to 8-9 disks per day (24h) ;D
The opposite of courage is not cowardice, it is conformity. Even a dead fish can go with the flow
 
The following users thanked this post: Nominal Animal

Offline Nominal Animal

  • Super Contributor
  • ***
  • Posts: 7302
  • Country: fi
    • My home page and email address
Re: how to test a 3T harddrive?
« Reply #37 on: August 25, 2021, 10:41:12 pm »
Yup, that's what the tests are good for: detecting if the drive is dead on arrival, not much else.  Even the early deaths tend to be just as unpredictable as late deaths, so having a drive pass any kind of test is not proof of long life expectance, only that right now, the drive is okay.

But, when you're building machines, especially clusters, weeding those out makes the process work better.  I mean, whenever a node does not come up normally, the reason could be just about anything, from badly seated memory, to a faulty motherboard, CPU, drive, or whatever.  Nasty to debug.  If you know the drives work, then at least you can exclude one possible cause.  Me, I like to take the node apart, clean everything (and if I have the time and the chassis is empty, deburr the damn edges of the stamped metal parts, because my fingertips are worth to me more than the two minutes it takes to do; plus if you do vacuum the chassis, you'll surprisingly often suck up small detached metal burrs and forgotten screws that can cause an occasional short), and put back together again.  I didn't gather any statistics on how often that made a difference, but when you set up a few dozen compute nodes, you can expect to have bad parts in the batch for sure, no matter what the vendor.
It's just what it is.
 
The following users thanked this post: DiTBho

Offline BradC

  • Super Contributor
  • ***
  • Posts: 2156
  • Country: au
Re: how to test a 3T harddrive?
« Reply #38 on: August 25, 2021, 11:55:37 pm »
Yup, that's what the tests are good for: detecting if the drive is dead on arrival, not much else.

Most drives have a SMART conveyance test which is pretty much just designed to check the mechanics weren't damaged in transit. A couple of minutes vs many hours for a long test.
 

Offline Nominal Animal

  • Super Contributor
  • ***
  • Posts: 7302
  • Country: fi
    • My home page and email address
Re: how to test a 3T harddrive?
« Reply #39 on: August 26, 2021, 12:42:08 am »
Sure, but when a drive comes off its packaging, and its SMART log is completely empty, to me it means that the drive hasn't ever been completely self-tested, aside from whatever production tests the manufacturer does while putting the drive together.

Still, I do admit that doing a full (long) SMART self-test for a drive can be considered overkill.  In most cases, it will go just fine, reports no problems, and nothing suspicious happens with the SMART attributes.  To me, on my personal workstation I use all the time, that is worth it for the peace of mind.  Like I said earlier, when I build my own machines, I even deburr the stamped metal edges.  (It's not being anal or pedantic or perfectionist; it's the small cuts in my fingerpads that I hate, and consider the time and effort worth the results.)  I also add sound-absorbing foam and baffles and such, but I'm not sure if that is because I need them, or because I like building that sort of machines.  (No, I don't add LEDs; I like my machines silent and unobtrusive.)

For cluster nodes, I usually do SMART tests in parallel on the nodes, as one step in the initial testing process; memtest is another.  Those that do not come up immediately, or show issues during initial testing, I do the second-degree teardown-rebuild way, using a know good machine to test the drives.  It takes time in the latency sense, but I believe it saves me time, since I can isolate the problem source faster.  On a new drive, I'm not sure if badblocks is worth the time.  On an used one, I'd say it is; as well as drives from manufacturers like Seagate with some series of drives having lots of DOA units.

A small fraction of hardware is always b0rked on arrival, that's a given.  I personally am willing to spend the time to weed the immediately observed ones out as early as possible, even if it takes an extra 24 hours or so (assuming I can do other stuff for that duration, obviously; I'm not going to babysit such tests).  Then again, I definitely weigh time spent differently to say a commercial entity which uses the drives for bulk data storage.  Your mileage and weights/emphasis may well differ, no argument there on my part!

That is, I do not know what level of testing is appropriate for anyone else, because it really depends.  I'm only hoping my wide description of what I do and why gives a sample point others can use when deciding what they think is appropriate for their use cases.
« Last Edit: August 26, 2021, 12:43:46 am by Nominal Animal »
 

Offline DiTBhoTopic starter

  • Super Contributor
  • ***
  • Posts: 4452
  • Country: gb
Re: how to test a 3T harddrive?
« Reply #40 on: August 26, 2021, 09:17:53 am »
Well, for another project, I also suspect a serious bug with kernel 5.12. I cannot say anything at the moment: it can be a software problem, or an hardware problem, including defective harddrives.
« Last Edit: August 27, 2021, 08:58:38 am by DiTBho »
The opposite of courage is not cowardice, it is conformity. Even a dead fish can go with the flow
 

Offline DiTBhoTopic starter

  • Super Contributor
  • ***
  • Posts: 4452
  • Country: gb
Re: how to test a 3T harddrive?
« Reply #41 on: August 26, 2021, 09:46:45 am »
Two years ago I ordered a set of qty=40 ram-sticks and was shocked as more than 50% of them were defective.

Code: [Select]
_________________
|
| G B G B G G B G |
|  B G B G G B B  |
| B G B G B G B G |
|  B G G B B G B  |
|  G G B G B G B  |
| B B G G B G B G |
|  G G G B B G B  |
|_________________|
G=good unit, fully working
B=bad unit, damaged
p(B) >> p(G) ---> QA=WTF?!?

How can it be? And if you get them in a black box sent from Amazon from a respectable seller (5 stars with good feedbacks and reviews) and you don't know the distribution, so what to think when you pull out random RAM, try it on your experimental SBC, and things start going crazy with several completely crazy malfunctions?

Is it the ram? or is it a bug in your firmware? or is it a bug in your kernel setup?
Is it external fault or internal fault? Both are possible, but which one is the one?

Both can even happen  :o :o :o

So, you have to be sure that the ram is not faulty, but when you see that more than 50% of the units are "measured" as faulty, it's so weird to see that you start questioning everything: the Amazon's QA, as well as your own testing methods.

Code: [Select]
_________________
|
| G G G G G G G G |
|  G G G G G G G  |
| G G G G G G G G |
|  G G G G G G G  |
|  G G G G G G G  |
| G G G G G G G G |
|  G G G G G G G  |
|_________________|

I remember I sent the ram-box back, got a refund and bought it again from another seller, with a completely different result: zero defective parts found :o :o :o
The opposite of courage is not cowardice, it is conformity. Even a dead fish can go with the flow
 

Offline bd139

  • Super Contributor
  • ***
  • Posts: 23102
  • Country: gb
Re: how to test a 3T harddrive?
« Reply #42 on: August 26, 2021, 10:10:23 am »
I think you may have just found a defective seller there.

Best to buy from respectable retailer ie crucial.
 
The following users thanked this post: DiTBho

Offline cdev

  • Super Contributor
  • ***
  • !
  • Posts: 7350
  • Country: 00
Re: how to test a 3T harddrive?
« Reply #43 on: August 26, 2021, 01:47:34 pm »
Do you have a URL where I can read about these racks?

Are they aimed at cryptocurrency "miners" ?
"What the large print giveth, the small print taketh away."
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf