DIY HDD platter swap?

#50 Reply
Posted by Mechanical Menace on 01 Aug, 2015 20:26
Quote from: codeboy2k on 01 Aug, 2015 19:57
I sure hope not ..

Maybe I've just had a bad experience with them. They were identical drives bought at the same time so likely, and I wasn't sold them as Toshiba either. They were Toshiba drives in some German company I've never heard of boxes, which I was sold as Hitachi's but couldn't get my money back.

Quote from: wraper on 01 Aug, 2015 20:24
this is Toshiba DT01ACA300 aka rebadged Hitachi

Explains the circumstances of me receiving them. Mine were the DT01ACA100 if that makes much of a difference.

EDIT:Nope it was the 300, was reading the wrong part of the label lol. Says it was from April 13.

#51 Reply
Posted by wraper on 01 Aug, 2015 20:27
Quote from: codeboy2k on 01 Aug, 2015 19:47
There is lots of misinformation on the net regarding reallocated sector counts. It really is normal for large TB disks to have some reallocated sectors, in my opinion it's inevitable. I think I might have dropped the ball when it started growing over the years. There was a huge growth at the beginning when it was first installed, and then it was stable for a few years, but recently would add one or two more sectors every month or two.
As much as I have read, growing reallocated sectors from the very beginning is quiet specific for Seagate, which is one more reason to not buy them.

#52 Reply
Posted by PlainName on 01 Aug, 2015 20:47
Quote
bought 2 new Toshiba 2TB drives to replace

A bad move, IMO. When I set up a mirror I always use different manufacturers. Doesn't matter if one of them has a bad rep - you're planning for a fail and if it fails way before the other drive that's good! Just so long as the fail cycles don't get in sync...

#53 Reply
Posted by madires on 01 Aug, 2015 20:53
Please use server/RAID disks for servers and not desktop disks! Desktop disks are not designed to run 24x7 with a typical server usage profile.

#54 Reply
Posted by codeboy2k on 01 Aug, 2015 22:29
Quote from: dunkemhigh on 01 Aug, 2015 20:47
... Just so long as the fail cycles don't get in sync...

... as two of my 3 Seagates did ! (all bought at the same time)

I have already bought 2 Hitachi drives DT01ACA200. I think I will add to this 2 more WD drives and split them across a RAID10, with 1 Hitachi drive mirroring a WD drive, then striping across 2 mirror sets like that. If both Hitachis fail at the same time, I still have 2 WD in the stripe.

POOL
=================
=========== ===========
[ WD ] [ Hit ] [ WD ] [ Hit ]

Quote from: madires on 01 Aug, 2015 20:53
Please use server/RAID disks for servers and not desktop disks! Desktop disks are not designed to run 24x7 with a typical server usage profile.

It's my home server. It's lightly used, and runs 24x7 just like my desktop. By no means is it seeing a typical corporate server usage profile, and desktop drives are fine. I just need to properly plan for failure.

#55 Reply
Posted by sync on 01 Aug, 2015 23:27
Quote from: codeboy2k on 01 Aug, 2015 19:47
That's clear now

There is lots of misinformation on the net regarding reallocated sector counts. It really is normal for large TB disks to have some reallocated sectors, in my opinion it's inevitable.
A new disk has 0 reallocated sectors reported via SMART. The factory defect sectors are not shown. A few grown reallocated sectors are not a problem. It can happen. But they should not constantly growing. I would not trust a disk which has more than 10-20 reallocated sectors (after a few years).

This is a 4 years old 1TB hard drive (WD10EARS) from my XBMC PC:
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0 3 Spin_Up_Time 0x0027 189 183 021 Pre-fail Always - 6533 4 Start_Stop_Count 0x0032 098 098 000 Old_age Always - 2582 5 Reallocated_Sector_Ct 0x0033 199 199 140 Pre-fail Always - 1 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 9 Power_On_Hours 0x0032 094 094 000 Old_age Always - 4433 10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 098 098 000 Old_age Always - 2581 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 61 193 Load_Cycle_Count 0x0032 075 075 000 Old_age Always - 375033 194 Temperature_Celsius 0x0022 114 098 000 Old_age Always - 33 196 Reallocated_Event_Count 0x0032 199 199 000 Old_age Always - 1 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 14 198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 7 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 21
Only one reallocated sector but 7 Offline Uncorrectable ones :-(

btw: I try to keep the disks in servers/RAID systems under 30°C. Which is usually not a problem with well designed rack mount servers and cooled server rooms. Not easy at home of course.

And as madires said, use server disk for anything which runs 24/7 or the data is important. Enterprise 7200rpm SATA disks are affordable.

#56 Reply
Posted by codeboy2k on 01 Aug, 2015 23:54
Quote from: sync on 01 Aug, 2015 23:27
Quote from: codeboy2k on 01 Aug, 2015 19:47
That's clear now

There is lots of misinformation on the net regarding reallocated sector counts. It really is normal for large TB disks to have some reallocated sectors, in my opinion it's inevitable.
A new disk has 0 reallocated sectors reported via SMART. The factory defect sectors are not shown. A few grown reallocated sectors are not a problem. It can happen. But they should not constantly growing. I would not trust a disk which has more than 10-20 reallocated sectors (after a few years).

Really so low? If that's the case, then I'll need to change my thinking. I'm a techie, I have never been an enterprise server admin, so I don't have any first-hand experience with disk drive statistics or when to know when it's going bad. You say 20 sectors on a 2TB disk that reports about 3907029168 sectors (512B/sector), but is only 0.5 ppm or 500ppb. It's really, really, small to be throwing the disk out, isn't it ?

Quote
And as madires said, use server disk for anything which runs 24/7 or the data is important. Enterprise 7200rpm SATA disks are affordable.

I'm going to rebuild this server with a mix of the 2 x 2TB Hitachi's that I already bought and 2 more 2TB WD red's these are $119 and I agree are affordable.
For future backup, I've decided to get a used LTO3 drive from eBay. LTO3 tapes are cheaper and more space efficient than buying more disk, and if I grow the server I just need to add more tape. Plus tape allows me to get a real monthly/weekly/daily strategy going without a lot of cost.

I'll try to rebuild the lost RAID offline after I get this server back up with all new drives.

#57 Reply
Posted by wraper on 02 Aug, 2015 00:17
Quote from: codeboy2k on 01 Aug, 2015 23:54
Really so low? If that's the case, then I'll need to change my thinking. I'm a techie, I have never been an enterprise server admin, so I don't have any first-hand experience with disk drive statistics or when to know when it's going bad. You say 20 sectors on a 2TB disk that reports about 3907029168 sectors (512B/sector), but is only 0.5 ppm or 500ppb. It's really, really, small to be throwing the disk out, isn't it ?
How much worth is in the HDD which corrupts data may I ask? Reallocated sector = sector was remapped because data couldn't be read from it = your data were corrupted. Now let's imagine, your HDD is full of HD video, each file like a 40-50 GB. Now take your 0.5ppm of corruption figure and suddenly it appears that you don't have even a single file unaffected and all of them were corrupted.

#58 Reply
Posted by codeboy2k on 02 Aug, 2015 00:29
Quote from: wraper on 01 Aug, 2015 20:24
Quote from: Mechanical Menace on 01 Aug, 2015 19:50
Quote from: wraper on 01 Aug, 2015 19:18
3TB toshiba 4800 hours.

That Toshiba is half way through it's life span if my experiences are at all common.
this is Toshiba DT01ACA300 aka rebadged Hitachi HDS723030BLE640 which proved to be quiet reliable.

I did some googling, and it would seem that it's also the IBM Deskstar 7K3000. Reviewers say it's a 5-platter drive, which would account for it's reliability. Most 3TB drives are 3 or 2 platters these days, which means the bits are denser and harder to read, so it can fail more easily. That's going to be my opinion going forward from now on.

I was originally going to do a RAID10 with a mix of 2TB Hitachi and 2TB WD red's (4TB total) but I think I'll change my mind now, and simply do a 2 disk mirror of 1 2TB Toshiba/Hitachi DT01ACA200 and 1 2TB WD red. (for 2 TB total). It's slightly cheaper per TB ($86 per TB vs $100 / TB), uses less power than 4 drives, and good enough for me right now as I don't have the data anymore to need more than 2 TB storage.

And if I'm thinking of LTO3 tape backup, do I even need to mirror it anymore? Why not just stripe it for 4 TB total and rely on (tested!) backups then. It's my home server, I really don't NEED the reliability of RAID, I realize that now. What I need, going forward, is good backups.

#59 Reply
Posted by codeboy2k on 02 Aug, 2015 00:33
Quote from: wraper on 02 Aug, 2015 00:17
How much worth is in the HDD which corrupts data may I ask? Reallocated sector = sector was remapped because data couldn't be read from it = your data were corrupted. Now let's imagine, your HDD is full of HD video, each file like a 40-50 GB. Now take your 0.5ppm of corruption figure and suddenly it appears that you don't have even a single file unaffected and all of them were corrupted.

It's just my home data. No business data, and nothing that I can't do without right now, which is why I'm not rushing the disk to an $800 data recovery service.

What I will do is rebuild the server, get a good backup system in place, and go forward. The lost data on my existing RAID I'll keep those disks separate and offline, and try to recover any data just as a learning exercise.

#60 Reply
Posted by wraper on 02 Aug, 2015 00:46
Home or business data. What is the value of the medium if it does not serve it's main purpose - store the data intact. Imagine some firmware with 0.5 ppm of data corrupted. like a single byte corrupted in 2 MB of flash ROM. Your device becomes useless junk.
http://www.trademe.co.nz/Browse/Listing.aspx?id=925187196
Quote
Good for paperweight

#61 Reply
Posted by TerraHertz on 02 Aug, 2015 03:25
I've completely given up on RAID systems.
* They add complexity in setup and management,
* Add an extra critical failure point - the RAID system hardware itself. If that fails the drives are typically unreadable via other means, and trying to replace the failed board or whatever can be hard &/or expensive.
* You typically can't just pull a drive out and read or copy it using standard hardware.

The simple rule I follow now is, can I read the drive in a cheap external USB HDD dock? Can I do complete file system duplications using treecopy utilities that copy all files without corrupting attributes like file dates, and will run to completion without throwing millions of errors and stopping without being restartable?

I won't use any system that doesn't allow that.
I keep all 'work' drives in removable trays, and never ever mix system/utility installs with work spaces.
With that, I can manage work backups as appropriate, using any old excess/found/salvaged drives or USB sticks (free), and all of the backup devices are readable as standard file systems on any PC. Such backups can take a long time, but I can run them on any spare PC without tying up my main work system.
Also when possible I format backup drives as FAT32, since as a last resort I can manually examine and patch FAT32, but NTFS is completely beyond such measures. For me, anyway. It sucks that now almost always filesets are too big for FAT32 drives.

Incidentally, many years ago, maybe around 2004, I tried a HD PCB & platter swap. The drive contents was fairly important. I'd decided a backup was way overdue, and was spending a few hours doing folder tree structure tidying, before backing up. Ha ha, big mistake inviting Mr Murphy to lunch. During that process the drive suddenly became a non-drive, ie not recognized by the BIOS at all, suggesting PCB failure. I had an identical working blank drive. Not sure if removing the PCB might break hermetic seal of the interior, I improvised a 'clean area' by using a large clear plastic bag, new from a stack and never 'fluffed up', so hopefully the interior was dust free. Cleaned the two drive exteriors with compressed air, slipped them and a screwdriver into the bag, taped surgical gloves onto the bag opening, and swapped the good PCB onto the bad drive. Result: now it was recognized, but couldn't read any data. I then tried a platter swap, so the single precious platter was in the body with known good PCB, heads and head-amp. That wasn't readable either.

On inquiring, I was told that (even then) the boards had head tracking calibration data in flash, and board or platter swaps never worked anymore. Never did recover the lost files.

#62 Reply
Posted by Halcyon on 02 Aug, 2015 06:12
Quote from: Psi on 01 Aug, 2015 04:14
I've never seen anyone successfully recover a HDD by swapping either the PCB or the platters with a good drive.

We do just this at work when other methods fail. We use a "donor drive" to swap the PCB out of. But it doesn't always work (even minor differences in drive firmware is sometimes enough for it to fail). If we get really desperate, the drive gets sent off to another Government lab with lots of cool and expensive toys. :-)

Having lost data myself without a proper backup routine in place (not due to drive failure, but I locked myself out of the encrypted drive as the password was stored in a KeePass file inside the encrypted volume, do'h! Catch 22). Ever since, I back everything up to disk and LTO tape.

#63 Reply
Posted by sync on 02 Aug, 2015 12:31
Quote from: codeboy2k on 01 Aug, 2015 23:54
Really so low? If that's the case, then I'll need to change my thinking. I'm a techie, I have never been an enterprise server admin, so I don't have any first-hand experience with disk drive statistics or when to know when it's going bad. You say 20 sectors on a 2TB disk that reports about 3907029168 sectors (512B/sector), but is only 0.5 ppm or 500ppb. It's really, really, small to be throwing the disk out, isn't it ?
Yes, really so low. Also this is for a desktop PC like case. On an enterprise server I didn't care. It's the RAID controller's job. Don't think in relative terms like ppm here. Every reallocated sector is a little fault. Would you trust a device which is constantly getting new faults?

Quote
For future backup, I've decided to get a used LTO3 drive from eBay. LTO3 tapes are cheaper and more space efficient than buying more disk, and if I grow the server I just need to add more tape. Plus tape allows me to get a real monthly/weekly/daily strategy going without a lot of cost.
LTO is a nice technology. But tapes have also a few drawbacks. First I'm skeptical about used tape drives from ebay. Tape drives will wear out. They are delicate precision mechanically devices. They should be used in a dust free environment. They want a minimum sustained data rate or it stop, reverse a bit and start again repeatably (shoe-shining). This will wear out the drives and tapes fast. Also this reduces the usable capacity. Depending on the LTO-3 drive at leased 60 or 70MB/s sustained data rate is need. Better double for compression.

The capacity of a LTO-3 tape is 400GB, compressed up to 800GB. You are taking about a few TB of storage. If you want to backup that you need a lot of tapes. Which means swapping tapes over hours for a full backup.

Quote
I'll try to rebuild the lost RAID offline after I get this server back up with all new drives.
Good luck with that.

#64 Reply
Posted by senso on 02 Aug, 2015 12:42
Quote from: Psi on 01 Aug, 2015 04:14
I've never seen anyone successfully recover a HDD by swapping either the PCB or the platters with a good drive.
It probably depends on the brand, some might work, but i think most drives have their low level configuration split between flash memory on the PCB and data on the platters.
So swapping things around just corrupts stuff. I can see it working if the motor is dead though, (swapping both the pcb and platters).

Done that, successfully, just remember to also swap the SOIC-8 EEPROM and its all good(when its a pcb related damage, in the case, it was a grilled motor driver).

#65 Reply
Posted by Psi on 02 Aug, 2015 12:59
Here's my oldest drive that's still in use, happens to be a seagate (before they got shit).
It's in my server and has been running 24/7 for 87998 hours... 10 years!!!
It's seen entire generations of other seagate models (1.5 and 3TB) get designed, manufactured, and then die.
And it still doesn't have any reallocations or even any pending reallocations.

=== START OF INFORMATION SECTION ===
Model Family: Seagate Barracuda 7200.8
Device Model: ST3200826A
Serial Number: 3ND0HW6Y
Firmware Version: 3.02
User Capacity: 200,049,647,616 bytes [200 GB]

Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 052 048 006 Pre-fail Always - 78530478
3 Spin_Up_Time 0x0003 098 098 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 421
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 064 060 030 Pre-fail Always - 2810645
9 Power_On_Hours 0x0032 001 001 000 Old_age Always - 87998
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 511
194 Temperature_Celsius 0x0022 029 045 000 Old_age Always - 29 (0 14 0 0 0)
195 Hardware_ECC_Recovered 0x001a 052 047 000 Old_age Always - 78530478
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 3
200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age Offline - 0
202 Data_Address_Mark_Errs 0x0032 100 253 000 Old_age Always - 0

#66 Reply
Posted by SeanB on 02 Aug, 2015 13:09
I did it with a 20G Seagate, luckily I had bought a few of the same model drive at the time ( same size Seagate, and the donor was a very close firmware revision drive), though the temporary donor machine was taken off of a very unhappy data capture clerk for a 2 day period, as her machine was the closest match. Got the 10G of data I needed and the OS cloned to another ( slightly newer but still used) drive, before I swapped controllers again to return the machine. cloned the drive again so I would have a spare, as the software is no longer obtainable, but is happy to run with a dongle only. Finally retired the Win95 machine last year, as the PBX it ran was replaced with a newer system. Still have it, and it still works, though I would not want to use IE4 again on the internet, and it is only a 90MHz Pentium processor. I kept it running using the pile of old scrapped slowpoke machines over the years, upgrading RAM to 512M as I got the modules as opposed to the 64M it originally had.

Still have a pile of 32M, 64M, 128M SIMM's around in a baggie, I used a few in printer upgrades, where they fitted older HP printers. Some went to being able to print a full page graphic from the old RAM printing a quarter page at a time.

#67 Reply
Posted by SeanB on 02 Aug, 2015 13:16
Quote from: Psi on 02 Aug, 2015 12:59
Here's my oldest drive that's still in use, happens to be a seagate (before they got shit).
It's in my server and has been running 24/7 for 87998 hours... 10 years!!!
It's seen entire generations of other seagate models (1.5 and 3TB) get designed, manufactured, and then die.
And it still doesn't have any reallocations or even any pending reallocations.

Same with the 40G drive in an old laptop I use as a desktop spare machine at work. I run it as a media player, and use it to check internal network only so I can simply use the web interfaces of printers to save walking there to check the "It has a red light and won't print my job WAAAH and I can't be bothered to read the FSCKING display on the front to see what the error is", so I can take a cartridge where needed, or put paper in the machine. Yes there are people who think that they work by making paper and toner out of thin air, and complain they can't print when the power is out.

#68 Reply
Posted by amyk on 02 Aug, 2015 13:55
Quote from: Psi on 02 Aug, 2015 12:59
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 421
It's been running on average for over 200h between each spinup/spindown. That contributes to longevity, as start/stop cycles cause the most wear.

I also follow a "zero tolerance" policy on reallocated sectors - if any new ones develop after the drive has been in use, I replace the drive since more are coming.

#69 Reply
Posted by BradC on 02 Aug, 2015 14:00
Quote from: george graves on 01 Aug, 2015 10:39
Quote from: Psi on 01 Aug, 2015 04:14
Spinrite does a think called dynostat recovery

Don't get me started on Spinrite - I'm convinced it doesn't work. Steve Gibson is a really smart guy, and I bet back when he came up with the software (1980's) if worked 50% of the time when drives had the aerial density of a the hindenburg - but nope - I've tried it on about 25 drives I'd say - at least 20. I've never recovered a drive with it - ever - not even a single file.

Finally a topic I can contribute to. Back in about 2003 as Mr Gibson was finalising SR6 I spent a long time running it in an instrumented VM to actually watch its interaction with the ATA controller.

Let's get a couple of things straight. Spinrite has not performed magic for about 20 years. It does not and can't "disable SMART" and it does not interact or interface with the drives firmware (or any other bullshit people say it does). What it does do (and I'm mystified about why nobody else is doing this) is invoke "dynastat" recovery.

This does work for one specific failure. You have a PC that dies (GPF, bluescreen, hang, whatever) when it hits a bad sector on the disk. Spinrite reads the disk end to end and when it hits the dud it invokes "Dynastat". This reads the dud sector using the ATA read_long (and sometimes some other low level commands to the same effect) repetitively. This command returns the raw sector contents in whatever form they come off the disk having failed ECC recovery. If in the process the disk gets a good read, the disk itself will re-write that sector. Problem solved. In most cases it won't, so Spinrite builds a simple statistical model of each bit in the sector. It does this X times (where X can be defined on the command line) and at the end of X times it writes the statistical model back to the sector on the disk. In general this writes a "best guess" of the sector contents back and in most cases it solves the problem.

On a failing disk, Spinrite will just run it into the ground and trash it completely, but for mum's PC which is just failing to boot it's a quick fix that makes the problem go away.

So it does work, for certain use cases based on old technology and smaller capacity drives. It is useless for "preventative maintenance" or other mystic magic Steves followers promote it for, but hey people buy snake oil and put it in their cars, or monster cable. At least this *does* do something for specific cases.

#70 Reply
Posted by max666 on 02 Aug, 2015 15:53
Quote from: madires on 01 Aug, 2015 20:53
Please use server/RAID disks for servers and not desktop disks! Desktop disks are not designed to run 24x7 with a typical server usage profile.

Not only that, but drives in a server (or any multi-bay enclosure) also have to endure higher vibrations from all the closely mounted neighbouring drives, which are happily humming and clicking away.
However I myself don't follow that advice, because I'm too cheap. But I also had surprisingly little issues with my countless desktop drives running in RAID5 24x7.

#71 Reply
Posted by wraper on 02 Aug, 2015 19:22
Found on Russian forum that in HDDs of these series a seal under connector becomes leaky.

How to bring a second life to it

#72 Reply
Posted by Dave Turner on 02 Aug, 2015 20:10
I was part of the team running a small/medium sized company's network from '98 to '09. The network and server farm expanded considerably during this period. The one thing we tried to do was mix batches of disks for the raids, mirrors and SAN racks. I don't recall the source of the information that caused me to do this, however we never lost information due to the relatively few failed server drives we suffered.

If you think about it, it makes 'gut' sense to mix drive sources provided they have identical properties, and whilst I can't work the probability statistics out I'm sure there are those that can.

There is a reason for not being on the 'bleeding edge' just as there is for not using a 'dot zero' product when one's business depends on very high reliability.

Dave

#73 Reply
Posted by edavid on 02 Aug, 2015 20:51
Quote from: madires on 01 Aug, 2015 20:53
Please use server/RAID disks for servers and not desktop disks! Desktop disks are not designed to run 24x7 with a typical server usage profile.

Is there any evidence that server/RAID drives have lower failure rates?

#74 Reply
Posted by sync on 02 Aug, 2015 21:29
Quote from: edavid on 02 Aug, 2015 20:51
Quote from: madires on 01 Aug, 2015 20:53
Please use server/RAID disks for servers and not desktop disks! Desktop disks are not designed to run 24x7 with a typical server usage profile.

Is there any evidence that server/RAID drives have lower failure rates?
Does over 15 years experience count?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

There was an error while thanking

Thanking...

Go to page:

« 1 2 3 4 5 6 » All

Full site Menu

Navigation

Powered by SMFPacks Advanced Attachments Uploader Mod