Author Topic: HDDs and badblocks corrupt downloaded files?  (Read 18062 times)

0 Members and 1 Guest are viewing this topic.

Offline BravoV

  • Super Contributor
  • ***
  • Posts: 7549
  • Country: 00
  • +++ ATH1
Re: HDDs and badblocks corrupt downloaded files?
« Reply #75 on: October 07, 2020, 01:52:19 pm »
You can also read a disk backwards which yields some pretty reasonable results. It essentially disables any ECC and literally reads the data stream in reverse order.

?! Again with this rubbish..

You can also read a disk backwards which yields some pretty reasonable results. It essentially disables any ECC and literally reads the data stream in reverse order. I found it handy with some damaged and problematic drives. It can be slow, but when the data is valuable, sometimes it's worth the days or weeks to read the data out reliably.
LOL, to read backwards, you need to spin it backwards :-DD, though heads will stick to the platters if you try to do so and it will be destroyed. As of order at which you read sectors, reading in different order can help to retrieve data when HDD craps out at bad sectors so you can retrieve the data from both sides of bad sector before HDD craps out. Also FYI HDD is a random access device, it does not care about the order data is read at.

Guys, just cut Halcyon a slack will you ? At least for him to share the details as I find his claim is extraordinary in this age.

If we were living in ST506 era, or in PC floppy, these kind of hack was norm, as they were well known to be documented and accessible by public, so user can drill down to the tiny details even down to single bit read/written like at floppy disk controller. Oldies like Spinrite was well known to exploit these kind of low level features at ST506 HD, like at alter sector's interleave at the tracks forming to boost performance and etc.

Damn, I'm exposing my age by typing that, am I ?   :-DD
« Last Edit: October 07, 2020, 03:11:13 pm by BravoV »
 

Offline classicsamus87Topic starter

  • Regular Contributor
  • *
  • !
  • Posts: 97
  • Country: br
Re: HDDs and badblocks corrupt downloaded files?
« Reply #76 on: October 07, 2020, 04:01:55 pm »
HDD has SMART technology is it an indication that it automatically remaps and isolates bad sectors (badblocks)?
 

Online wraper

  • Supporter
  • ****
  • Posts: 17545
  • Country: lv
Re: HDDs and badblocks corrupt downloaded files?
« Reply #77 on: October 07, 2020, 04:31:54 pm »
HDD has SMART technology is it an indication that it automatically remaps and isolates bad sectors (badblocks)?
SMART is simply an instrument for external monitoring of HDD statistics/health. All modern HDD remap sectors by themselves.
 

Offline free_electron

  • Super Contributor
  • ***
  • Posts: 8545
  • Country: us
    • SiliconValleyGarage
Re: HDDs and badblocks corrupt downloaded files?
« Reply #78 on: October 07, 2020, 04:53:36 pm »

I don't see how that contradicts what I wrote. Some of your information is also a bit outdated (for example, modern hard drives do lots of things without being instructed).

It doesn't contradict. there are two different mechanisms.

ECC is encoding of the user data into a pattern that ends up as a magnetic stream. if , the magnetic stream gets damaged ( surface impurity , physical damage ) the data can be reconstructed using software.
During write the only thing that can be detected is a TA event ( headstrike ). Most drives these days do a cleanup and attempt the write again. if it fails again the data is moved to a different block and area where the strike happened is marked bad. i do not know if they attempt to  read to verify. for throughput purposes they may just immediately retry the same block and see if another TA happens. if the write completes without a TA the drive moves on to write the next block. writing data is fire and forget. They don't even bother reading it back, reliability is that high.
If during a read a TA event happens the ECC engine will attempt a recovery. if data is recovered : move on. if it fails they will automatically retry the read. in case of another TA the cycle repeats. (with timeouts of course)
TA events  are typically non repetitive. it's not like there is something stuck to one particular spot. a TA is caused by a loose particle passing through. it is very hard for anything to stick to the media due to the lubrication process ( platters are coated )

So TA events are transient and typically do not lead to data loss. the drive autorecovers , during write and read without the OS even be aware.
but, a TA event can disturb the magnetic stream enough so that ECC cannot do its work. there are simply too many bits lost in the stream ( the stream on the platter is safe , the stream as captured by the head has a long trail of 'garbage in it', so long that ECC cannot recover.

so the mechanisms serve different purposes :
ECC corrects magnetic errors on the platters  : tiny disturbances affecting a few bits
TA forces the drive to do a retry as the disturbance is so large there is no way ECC can recover from that.upon write the stream is almost certianly damaged beyond recovery so we rewrite immediately. upon read we can try ECC since we have the time of one revolution to decide if w e move on , or re-attempt the write based on the outcome of the ECC decode.

Some of my stuff may be outdate now. i've been out of the harddisk business for 6 years now , so things have moved on. When i left we just finished shingled recording and dual actuator technology. what they do now in terms of dark voodoo i don't know.
Professional Electron Wrangler.
Any comments, or points of view expressed, are my own and not endorsed , induced or compensated by my employer(s).
 

Offline Halcyon

  • Global Moderator
  • *****
  • Posts: 5870
  • Country: au
Re: HDDs and badblocks corrupt downloaded files?
« Reply #79 on: October 07, 2020, 07:08:30 pm »
You can also read a disk backwards which yields some pretty reasonable results. It essentially disables any ECC and literally reads the data stream in reverse order. I found it handy with some damaged and problematic drives. It can be slow, but when the data is valuable, sometimes it's worth the days or weeks to read the data out reliably.
LOL, to read backwards, you need to spin it backwards :-DD, though heads will stick to the platters if you try to do so and it will be destroyed. As of order at which you read sectors, reading in different order can help to retrieve data when HDD craps out at bad sectors so you can retrieve the data from both sides of bad sector before HDD craps out. Also FYI HDD is a random access device, it does not care about the order data is read at.

It's clear by your comments that you have never done data recovery on a professional level. There are a handful of hardware tools which do exactly this in order to read data off a damaged disk. You're basically reading data from the last LBA back to the first. They also allow you to (for example) completely disable one or more of the read heads and only read out certain platters using the good ones.

I'll point you to the Atola manual (one of such devices) if you wanted to see how it works. Scott Moulton is one of the lead experts in the field, so if you have a spare $3500 to spend, you can partake in one of his 5-day training courses (well worth the money).

You can also read a disk backwards which yields some pretty reasonable results. It essentially disables any ECC and literally reads the data stream in reverse order.

?! Again with this rubbish..

You can also read a disk backwards which yields some pretty reasonable results. It essentially disables any ECC and literally reads the data stream in reverse order. I found it handy with some damaged and problematic drives. It can be slow, but when the data is valuable, sometimes it's worth the days or weeks to read the data out reliably.
LOL, to read backwards, you need to spin it backwards :-DD, though heads will stick to the platters if you try to do so and it will be destroyed. As of order at which you read sectors, reading in different order can help to retrieve data when HDD craps out at bad sectors so you can retrieve the data from both sides of bad sector before HDD craps out. Also FYI HDD is a random access device, it does not care about the order data is read at.

Guys, just cut Halcyon a slack will you ? At least for him to share the details as I find his claim is extraordinary in this age.

If we were living in ST506 era, or in PC floppy, these kind of hack was norm, as they were well known to be documented and accessible by public, so user can drill down to the tiny details even down to single bit read/written like at floppy disk controller. Oldies like Spinrite was well known to exploit these kind of low level features at ST506 HD, like at alter sector's interleave at the tracks forming to boost performance and etc.

Thank you. You just can't educate some people. Whether people choose to accept what I'm saying is a matter for them. It's all easily researched and available out there in the public domain.

I do this kind of stuff for a living (among other things). We are seeing less and less of this kind of work as many people switch to SSDs. It's a bit of a dying craft. I've even managed to recover CCTV from fire damaged hard disks before. It's a slow process, but it can be done as long as the disk isn't completely incinerated.
« Last Edit: October 07, 2020, 07:11:54 pm by Halcyon »
 

Offline Monkeh

  • Super Contributor
  • ***
  • Posts: 8042
  • Country: gb
Re: HDDs and badblocks corrupt downloaded files?
« Reply #80 on: October 07, 2020, 07:14:59 pm »
You can also read a disk backwards which yields some pretty reasonable results. It essentially disables any ECC and literally reads the data stream in reverse order. I found it handy with some damaged and problematic drives. It can be slow, but when the data is valuable, sometimes it's worth the days or weeks to read the data out reliably.
LOL, to read backwards, you need to spin it backwards :-DD, though heads will stick to the platters if you try to do so and it will be destroyed. As of order at which you read sectors, reading in different order can help to retrieve data when HDD craps out at bad sectors so you can retrieve the data from both sides of bad sector before HDD craps out. Also FYI HDD is a random access device, it does not care about the order data is read at.

It's clear by your comments that you have never done data recovery on a professional level. There are a handful of hardware tools which do exactly this in order to read data off a damaged disk. You're basically reading data from the last LBA back to the first. They also allow you to (for example) completely disable one or more of the read heads and only read out certain platters using the good ones.

I'll point you to the Atola manual (one of such devices) if you wanted to see how it works. Scott Moulton is one of the lead experts in the field, so if you have a spare $3500 to spend, you can partake in one of his 5-day training courses (well worth the money).

Anyone can change the order they ask for sectors in. Doesn't disable ECC.

I do this kind of stuff for a living

The guy who plastered the wall under my new lounge window does it for a living. Doesn't change the fact he did a shit job.
« Last Edit: October 07, 2020, 07:16:31 pm by Monkeh »
 

Offline Halcyon

  • Global Moderator
  • *****
  • Posts: 5870
  • Country: au
Re: HDDs and badblocks corrupt downloaded files?
« Reply #81 on: October 07, 2020, 08:45:00 pm »
I do this kind of stuff for a living

The guy who plastered the wall under my new lounge window does it for a living. Doesn't change the fact he did a shit job.

I'm beginning to see a bit of a trend here.

Do you judge other experts on their abilities and skills as well? Perhaps you know better than your mechanic, or you can make Duck à l'Orange better than a trained chef. Maybe you can EE better than Dave as well?

Next time I have a difficult job, I'll come and seek advice from you first.
 

Offline Monkeh

  • Super Contributor
  • ***
  • Posts: 8042
  • Country: gb
Re: HDDs and badblocks corrupt downloaded files?
« Reply #82 on: October 07, 2020, 09:02:02 pm »
I do this kind of stuff for a living

The guy who plastered the wall under my new lounge window does it for a living. Doesn't change the fact he did a shit job.

I'm beginning to see a bit of a trend here.

Do you judge other experts on their abilities and skills as well? Perhaps you know better than your mechanic, or you can make Duck à l'Orange better than a trained chef. Maybe you can EE better than Dave as well?

Next time I have a difficult job, I'll come and seek advice from you first.

What, exactly, should I judge people on? How much they talk themselves up? Pieces of paper? Bold claims?

Stop deflecting for a change. How, exactly, does changing the order of a series of read operations disable ECC? Can you evidence it? Can you point me to any remotely modern HDD which has no ECC functionality whatsoever and writes totally raw data to the platters in the hope it can get it back?
« Last Edit: October 07, 2020, 09:06:47 pm by Monkeh »
 

Online wraper

  • Supporter
  • ****
  • Posts: 17545
  • Country: lv
Re: HDDs and badblocks corrupt downloaded files?
« Reply #83 on: October 07, 2020, 09:52:25 pm »
You can also read a disk backwards which yields some pretty reasonable results. It essentially disables any ECC and literally reads the data stream in reverse order. I found it handy with some damaged and problematic drives. It can be slow, but when the data is valuable, sometimes it's worth the days or weeks to read the data out reliably.
LOL, to read backwards, you need to spin it backwards :-DD, though heads will stick to the platters if you try to do so and it will be destroyed. As of order at which you read sectors, reading in different order can help to retrieve data when HDD craps out at bad sectors so you can retrieve the data from both sides of bad sector before HDD craps out. Also FYI HDD is a random access device, it does not care about the order data is read at.

It's clear by your comments that you have never done data recovery on a professional level. There are a handful of hardware tools which do exactly this in order to read data off a damaged disk. You're basically reading data from the last LBA back to the first. They also allow you to (for example) completely disable one or more of the read heads and only read out certain platters using the good ones.

I'll point you to the Atola manual (one of such devices) if you wanted to see how it works. Scott Moulton is one of the lead experts in the field, so if you have a spare $3500 to spend, you can partake in one of his 5-day training courses (well worth the money).
I don't know if I need to cry or laugh. Also it seems you preferer to ignore or straw man what is not convenient to you in my posts. Reading starting from last LBA is not reading HDD in reverse. It's simply starting reading in order from higher LBA to lower LBA. There is no preferred reading order to begin with. It's just a normal HDD operation, not some special mode FFS. As I already said, HDD is a random access device. And I already said in quoted post what's the reason for doing so. If you really did data recovery professionally, I'm amazed by level of ignorance and misunderstanding of what you were actually doing. And being proud if it rather that even think of possibility of being wrong and actually checking if what you are saying is true. Not to mention ever quoting any information that supports your claims.
I guess you mean this by mentioning Atola manual:
Quote
There are three methods of scanning:
Linear — from start LBA to end LBA
Backward — from end LBA to start LBA (in reverse)
And proudly claim magical ECC ignoring while completely misunderstanding what it actually does.
 

Online David Hess

  • Super Contributor
  • ***
  • Posts: 17063
  • Country: us
  • DavidH
Re: HDDs and badblocks corrupt downloaded files?
« Reply #84 on: October 08, 2020, 02:17:03 am »
If you are running RAID, then when a bad sector is detected, the RAID should regenerate the data and write it back to the drive to force reallocation.  RAID controllers usually support periodic scrubbing where all data is read to look for bad sectors so they can be scrubbed.

That's wrong. RAID controllers don't see disk defects. Sector re-allocation is completely transparent on any modern (i.e. since the days of ATA) drive.

For anything somewhat modern (i.e. SAS/SATA) the RAID controller only sees a bunch of blocks.

The patrol read only serves to trigger the drive's internal defect management (data is read so it will trigger an ECC check, and if there's an error the data will be rewritten or, if that fails, the block moved). The controller itself has no handling in disk errors, and if the controller starts seeing bad blocks then the drive is flagged as defect.

You are going to have a hard time convincing me of that since I have watched it in real time.  Maybe you have only used crap RAID controllers, of which there are many.

Hard drives do *not* perform scrub-on-read operation for uncorrectable errors; instead they attempt to read the data until a timeout is reached, and then return the bad data and indicate an error.  "RAID" drives allow a shorter timeout to be set so they do not waste time trying to recover data which can be regenerated anyway.

When the RAID controller receives the bad data and an error, good ones regenerate the data and write it back to the drive, which then performs a scrub-on-write.  If this did not occur, then there would be no reason for the RAID controller to perform idle time scrubbing.

It would be dumb to discard an entire drive because of a single bad read when the data can be recovered and written back to force the drive to reallocation that sector; of course firmware based RAID controllers often are this dumb.

Quote
Quote
Actually, every hard drive made since we moved from plain old IDE to ATA employs ECC (and it was common even before then). No exceptions. All data on a hard drive is ECC protected, and the drive will know if there's an error (the idea behind "bit rot"). And unless the bit error happens to be in a defective sector *and* the drive has exhausted its spares then the drive will correct the error the next time the sector is read.

That would be scrub-on-read versus scrub-on-write which we have been discussing.  I personally have never seen a scrub-on-read on a hard drive, but good SSDs are suppose to do this.  (2) On a hard drive scrub-on-read presents practical difficulties because typically no verification pass is done.

All good modern RAID controllers perform Patrol *Reads*. Because this is enough to trigger the drive's internal defect management.

The internal defect management could operate on a correctable error, but I have never seen happen.  I would have noticed if the defect list grew without errors being reported.  Many times I have done extended SMART surface scans and watched for this very thing and it never happened.  Most recently I have done it multiple times in the past couple of weeks on a pair of 1TB WD Greens, but doing an external surface scan which includes writing had some results.

Quote
SSDs are a different matter as they should not have any patrol reads done on them, and less so any writes. SSDs also have internal defect management (which is part of it's Garbage Collection) and unlike with hard drives this normally works without having to be triggered (which for SSDs is done with the TRIM command).

The difference is that SSD must perform verification after every write as part of the write process, and they also scrub-on-read of sectors which are still correctable if they have too many soft errors.  Many perform idle time scrubbing because retention of modern Flash memory is abysmal, or at least their specifications say they do.  I sure know that many do not and will happily suffer bit rot while powered.

Quote
Quote
(2) SSDs which perform idle time scrubbing *must* also support power loss protection because program and erase cycles can occur at any time, which implies that SSDs which do not support power loss protection, do not perform idle time scrubbing.

That's wrong. PLP on SSDs is required to make sure data that is in the drive's cache is written to the flash memory after a power loss occurs so that it doesn't get lost. If it does then this can affect the integrity of the OS level data layer.

No, but there are two levels of power loss protection.

What you are describing is protection of data in transit which includes buffered and data in write cache.  Not all drives have this nor is it required if the filesystem and drive handle synchronous writes properly.  In the worst case, file contents are not updated but no metadata or other contents should be lost.

However power loss protection is also required for *any* write operation, and possibly any erase operation.  The reason for this is that interrupted writes can not only corrupt the data which is being written, but also data in other sectors, including the flash translation table, which can result in a non-recoverable situation.

Their are two reasons data can be so easily corrupted with an incomplete write.  With multi-level flash, the existing data when a sector is updated is at risk during the update and it is easy to see why.  It seem to me like this is avoidable by only storing the data from one sector with multiple levels but apparently Flash chips are not organized that way.

The other reason is more insidious; the state machine which controls the write operation can glitch during power loss, which is how drives which lack power loss protection got bricked so easily in those published tests several years ago.  Some drives lost no data, not even data in transit, some drives lost only data in transit, and most suffered either unrelated data loss or complete failure, but that was before the problem was understood well.
 

Online David Hess

  • Super Contributor
  • ***
  • Posts: 17063
  • Country: us
  • DavidH
Re: HDDs and badblocks corrupt downloaded files?
« Reply #85 on: October 08, 2020, 02:28:44 am »
Also, if a hard drive which was made in 2000 or later exhibits a bad block to the OS then that means it has run out of spare sectors and should have been replaced long ago.

I recovered a RAID set not long ago which had exactly that problem.  The drive reported an error during a RAID rebuild, because I stupidly did not run a scrub of the array before swapping a drive, which stopped the RAID rebuild.  The drive actually had accumulated several bad sectors but it only takes one to stop the rebuild.  The solution was to scan the RAID at the level of the file system to determine which files were damaged, and then deliberately run a read-write of the entire drive outside of the array to force reallocation of the bad sectors, albeit with corrupted data.  Once that was done, the drive reported no bad sectors during the RAID rebuild and the damaged files could be restored.

So a drive can report a bad sector to the OS, or RAID controller in this case, but I have had both recently, while not being out of spare sectors because scrub-on-read deliberately never happened, and the drive was waiting for a write to the bad sector to perform scrub-on-write.  It would be much worse for the drive to return bad data and no error while performing a scrub-on-read operation.
 

Offline helius

  • Super Contributor
  • ***
  • Posts: 3664
  • Country: us
Re: HDDs and badblocks corrupt downloaded files?
« Reply #86 on: October 08, 2020, 03:27:58 am »
What is special about "AV" drives is their firmware handling of thermal recalibration. As the temperature inside the drive changes, parts like the head armature and the platters change dimension, and since they have different thermal coefficients of expansion, throws off their tracking. The drive firmware needs to search for the embedded servo signals to form a model of how far apart the tracks are for generating the seek impulses, and periodically re-calibrates in case the temperature has changed. This re-calibration interrupts any data transfer and makes it wait, which would be a problem for real-time media capture or playback. AV firmware just uses a different re-calibration strategy that postpones the re-cal until the disk is idle.

This also means that there is no benefit for AV drives for surveillance recording, since by definition it is never idle. They were made for video and film post-production and multi-track recording studios.
 

Offline Monkeh

  • Super Contributor
  • ***
  • Posts: 8042
  • Country: gb
Re: HDDs and badblocks corrupt downloaded files?
« Reply #87 on: October 08, 2020, 03:30:46 am »
This also means that there is no benefit for AV drives for surveillance recording, since by definition it is never idle. They were made for video and film post-production and multi-track recording studios.

You should tell WD that, they're marketing their drives to the wrong people, apparently.
 

Offline Halcyon

  • Global Moderator
  • *****
  • Posts: 5870
  • Country: au
Re: HDDs and badblocks corrupt downloaded files?
« Reply #88 on: October 08, 2020, 03:57:19 am »
This also means that there is no benefit for AV drives for surveillance recording, since by definition it is never idle. They were made for video and film post-production and multi-track recording studios.

A lot of those "AV" drives are primarily for CCTV recording, where you have multiple, relatively low bitrate streams. Any half-decent video or film recording studio who are working with large amounts of high-bitrate recordings will be using proper NASs or SANs with enterprise drives or SSD. When I worked in video production in a few Australian TV studios, read speed was just as important as write speed. Those consumer AV drives are designed primarily for writes where as read speed can vary significantly.
 

Offline MK14

  • Super Contributor
  • ***
  • Posts: 4853
  • Country: gb
Re: HDDs and badblocks corrupt downloaded files?
« Reply #89 on: October 08, 2020, 05:07:25 am »
Anyone can change the order they ask for sectors in. Doesn't disable ECC.

And proudly claim magical ECC ignoring while completely misunderstanding what it actually does.

As far as I can tell, I completely (>99% as regards reading it backwards, and at least some agreement on ECC) agree with Halcyon here.

I'm beginning to see a bit of a trend here.

Me too.

Simple googling of the concept, brings complete (>99%, as regards reading it backwards, and at least some agreement as regards the ECC situation), agreement, with what you have been saying.

In summary, it seems that if you try and read a set of files, on a hard disk, with one or more faulty sectors. Using the 'normal' forward reading methods. As well as reading the requested sector, e.g. sector 100,000.
A modern drive (HDD), also tries to partly fill up its (often) massive cache buffer, e.g. 128 MB worth.
Using the many heads (on the often multiple platters), which (given modern very high density HDDs), will be a huge number of sectors.
If any of them have read errors (even if those bad sectors, have nothing whatsoever to do with the sector/file, you were originally trying to read), then the drive will tend to go into attempting to continually rereading the 'faulty' sectors, rather than at least supplying the working sectors, to the OS.

But if you read the HDD 'backwards', i.e. request the sectors of the files of interest in reverse order. It will only read the sectors that you request (although potentially extremely slowly), ignoring the bad sectors, either fully, or until a bad sector is part of the file you are trying to read.

Hence the ECC is only being calculated for the sectors you are trying to actually read, rather than a huge number of other ones, which could be bad, and hence causing the read to be aborted or badly delayed.

Many links found, example(s):
https://www.forensicfocus.com/forums/general/reading-disk-areas-backwards-does-it-help/

https://www.linkedin.com/pulse/imaging-reverse-reduce-data-loss-risk-hdd-failure-serge-shirobokov
« Last Edit: October 08, 2020, 05:09:32 am by MK14 »
 

Offline Halcyon

  • Global Moderator
  • *****
  • Posts: 5870
  • Country: au
Re: HDDs and badblocks corrupt downloaded files?
« Reply #90 on: October 08, 2020, 05:48:39 am »
I will clarify, you don't have to disable ECC. You can also manually set the number of times the drive attempts to re-read a sector. It can be zero or it can be more. It will depend on the drive firmware, but tools like Atola and DeepSpar allow you to really get into the low level of the firmware to change how the drive would normally perform when connected to a host. As I said, an example is that individual drive heads can be completely turned off if you want to (I've used this method a few times when one head is completely stuffed, rather than doing a head swap).

But yeh, when you're talking about damaged drives, read speed can get real slow with each subsequent pass as it tried to read out more data than the pass before. The last drive I imaged got to about Pass 7 and was reading at between 1 and 7 KB/sec. Needless to say, I canned the process at that point, but most of the data was recovered and readable. It just depends on the job and cost vs. benefit.
 
The following users thanked this post: MK14

Offline Wuerstchenhund

  • Super Contributor
  • ***
  • Posts: 3088
  • Country: gb
  • Able to drop by occasionally only
Re: HDDs and badblocks corrupt downloaded files?
« Reply #91 on: October 08, 2020, 08:01:24 am »
That's wrong. RAID controllers don't see disk defects. Sector re-allocation is completely transparent on any modern (i.e. since the days of ATA) drive.

For anything somewhat modern (i.e. SAS/SATA) the RAID controller only sees a bunch of blocks.

The patrol read only serves to trigger the drive's internal defect management (data is read so it will trigger an ECC check, and if there's an error the data will be rewritten or, if that fails, the block moved). The controller itself has no handling in disk errors, and if the controller starts seeing bad blocks then the drive is flagged as defect.

You are going to have a hard time convincing me of that since I have watched it in real time.

You have seen what, exactly?

Quote
Maybe you have only used crap RAID controllers, of which there are many.

I'm not sure I'd call HPE's Smart Array range (or Dell's PERC, or Microsemi's Adaptec) hardware RAID controllers "crap", as they represent the upper end of the market.

But I'm certainly not talking about fake RAID stuff if that's what you have in mind.

Quote
Hard drives do *not* perform scrub-on-read operation for uncorrectable errors; instead they attempt to read the data until a timeout is reached, and then return the bad data and indicate an error.  "RAID" drives allow a shorter timeout to be set so they do not waste time trying to recover data which can be regenerated anyway.

That is correct when it comes to *uncorrectable* errors.

But as far as *correctable* errors are concerned, they (as the name implies) are corrected inside the drive when found during read. The typical "bit rot" scenario is exactly that, a "flipped bit" on an otherwise healthy sector. Which, when detected, results in a re-write of that sector.

Now, as to *uncorrectable* errors, it is correct that the drive eventually returns corrupted data. However, when a hard drive reports an uncorrectable error then that's because the defect exceeds into the ECC area so there is not enough information to restore the original information. Which commonly means the drive is defective and should be replaced. It's as simple as that.

Quote
When the RAID controller receives the bad data and an error, good ones regenerate the data and write it back to the drive, which then performs a scrub-on-write.  If this did not occur, then there would be no reason for the RAID controller to perform idle time scrubbing.

Nope. What you wrote would be (mostly) correct if we were talking about MFM/RLL/ESDI or early IDE (or SCSI-1 drives) from some 30 years ago. But we're not.

Modern (i.e. made in 2000 or later) drives don't represent their physical layout to the host. They report some artificial CHS layout which has nothing to do with the actual physical layout to the host for backward compatibility reasons (so that antique systems still CHS for addressing can boot from these drives). Even the sector size is often fake, as most modern hard drives use 4k sectors internally while reporting 512byte sectors on the interface.

But for most part the CHS layout isn't even used. LBA (Logical Block Adressing) has been a thing even before the year 2000 (it was first used with SCSI drives long before then), and it's been the standard way of addressing disks for many many years. With LBA, the host only sees a device which has a certain number of blocks. There's no CHS involved. LBA has been supported at least since Windows 98 and NT 4.0, and became the standard format with Windows 2000. And while LBA was an extension for IDE and ATA, LBA is the defined addressing standard for SATA, SAS and NVMe storage.

You RAID controller, whatever type/model that is, would have to be really old to not use LBA (and if that controller is so old then I guess the disks are, too). And even then it would only see the fake geometry reported by the drive, not the real one.

If a modern RAID controller encounters a bad block (unrecoverable error), it will try to reconstruct the data from the redundancy disks and then may attempt to re-write the block to the affected disk. If this disk has sufficient spare sectors, it may revert the write to a spare block, after which the block will be fine and the integrity of the data is restored. If that was a one-off, the drive may well be fine for years. However, if that happens more often (as it's the case on a dying drive), the disk will eventually run out of spare sectors after which the attempt by the RAID controller to re-write the block will fail, and then the disk is failed and the array goes into contingency mode.

Quote
It would be dumb to discard an entire drive because of a single bad read when the data can be recovered and written back to force the drive to reallocation that sector; of course firmware based RAID controllers often are this dumb.

In a professional environment, if a drive shows bad sectors at the interface it's scrapped, period. Any decent RAID controller will immediately flag a drive as soon as unrecoverable errors start to appear. Because for a modern drive unrecoverable errors are usually a clear sign that the drive is defective, and the only thing that would be stupid is to not scrap the drive and risk the integrity of the host data. It's a simple as that. Because at the end of the day the host data (and the hourly rate for the admin who has to deal with it and the potential fall-out should the drive remain in service) is worth a lot more than that stupid hard drive.

Now I accept that for hobbyist use that may well be different, and if you can't afford to replace a drive it's certainly tempting to work around the problem of a defect sector. But that is only a viable option if your data (and your time) isn't worth much (because if it was you'd not try to cheap-skate backup). And it doesn't change the fact that the drive is telling you that you can no longer rely on it.

Quote
The internal defect management could operate on a correctable error, but I have never seen happen.

Well, yes, that's because it's supposed to be *transparent* to the host. Which is the whole point of a "defect-free interface".

Quote
I would have noticed if the defect list grew without errors being reported.  Many times I have done extended SMART surface scans and watched for this very thing and it never happened.  Most recently I have done it multiple times in the past couple of weeks on a pair of 1TB WD Greens, but doing an external surface scan which includes writing had some results.

Because you don't seem to fully understand what you are seeing. Any high level surface scan tool only scans the area that the drive is reporting as good (it has no access to the whole drive area - "defect-free Interface", remember?). Defects are hidden because defect management is completely *transparent*. The only way you can check what's actually going on in the drive is through SMART data.

When you're at the point where your tool can "see" defects then that means the drive has developed unrecoverable errors and is defective and should be discarded, but at the very least should not be used to store anything of importance.

Quote
Quote
SSDs are a different matter as they should not have any patrol reads done on them, and less so any writes. SSDs also have internal defect management (which is part of it's Garbage Collection) and unlike with hard drives this normally works without having to be triggered (which for SSDs is done with the TRIM command).

The difference is that SSD must perform verification after every write as part of the write process, and they also scrub-on-read of sectors which are still correctable if they have too many soft errors.  Many perform idle time scrubbing because retention of modern Flash memory is abysmal, or at least their specifications say they do.  I sure know that many do not and will happily suffer bit rot while powered.

There's nothing in SSDs that matches "scrubbing". Which, on flash, would be completely counterproductive because every read of a flash cell reduces the amount of charge that is stored in that cell, so if a SSD did regular "read-scrubs" then the charge level would quickly become so low that the sector would have to be re-written to maintain its information, a process which causes wear on the flash cell.

In reality, SSDs employ ECC (although slightly more sophisticated then on spinning rust) for error correction, Garbage Collection (which erases blocks that are marked for deletion) and Wear Leveling (where the SSD keeps track of flash use and tries to balance the load evenly across all cells).

Quote
Quote
Quote
(2) SSDs which perform idle time scrubbing *must* also support power loss protection because program and erase cycles can occur at any time, which implies that SSDs which do not support power loss protection, do not perform idle time scrubbing.

That's wrong. PLP on SSDs is required to make sure data that is in the drive's cache is written to the flash memory after a power loss occurs so that it doesn't get lost. If it does then this can affect the integrity of the OS level data layer.

No, but there are two levels of power loss protection.

What you are describing is protection of data in transit which includes buffered and data in write cache.  Not all drives have this nor is it required if the filesystem and drive handle synchronous writes properly.  In the worst case, file contents are not updated but no metadata or other contents should be lost.

That is correct. PLP exists to protect host data.

Quote
However power loss protection is also required for *any* write operation, and possibly any erase operation.  The reason for this is that interrupted writes can not only corrupt the data which is being written, but also data in other sectors, including the flash translation table, which can result in a non-recoverable situation.

Their are two reasons data can be so easily corrupted with an incomplete write.  With multi-level flash, the existing data when a sector is updated is at risk during the update and it is easy to see why.  It seem to me like this is avoidable by only storing the data from one sector with multiple levels but apparently Flash chips are not organized that way.

First of all, SSDs internally are physically organized in blocks, not sectors. Remember the LBA I mentioned above? LBA is *exactly* how SSDs are structured internally. On SSDs, Sectors are an artificial construct which has zero relation to which flash cell the information is physically located.

Garbage Collection and Wear Levelling also don't need PLP. If power is interrupted during the deletion of a block then the block will remain marked as for deletion and deletion will be repeated after power comes back up. For Wear Levelling, when data is moved from one block to another then the data from the old block is copied to a new block and then the old block is marked for deletion. If that process is interrupted then the old data is still there and the block shift is simply repeated.

Quote
The other reason is more insidious; the state machine which controls the write operation can glitch during power loss,

The part in a SSD which controls write operations (and everything else) is the SSD controller and that is not a State Machine, it's a micro-controller running specific software (the drive's firmware) to perform it's duties.

Quote
which is how drives which lack power loss protection got bricked so easily in those published tests several years ago.  Some drives lost no data, not even data in transit, some drives lost only data in transit, and most suffered either unrelated data loss or complete failure, but that was before the problem was understood well.

I know that early consumer SSDs killed themselves (particularly those made by OCZ) for a number of reasons, most of which were related to the lack of automatic GC (the drive needed GC triggered by the OS via TRIM, which the back then still common WindowsXP didn't support) and a wide range of firmware errors.

But the point remains that the only "data in transit" in a SSD is host data, and that SSDs do not work the way you think they do.
« Last Edit: October 08, 2020, 08:48:49 am by Wuerstchenhund »
 
The following users thanked this post: MK14

Offline Wuerstchenhund

  • Super Contributor
  • ***
  • Posts: 3088
  • Country: gb
  • Able to drop by occasionally only
Re: HDDs and badblocks corrupt downloaded files?
« Reply #92 on: October 08, 2020, 09:12:22 am »
Also, if a hard drive which was made in 2000 or later exhibits a bad block to the OS then that means it has run out of spare sectors and should have been replaced long ago.

I recovered a RAID set not long ago which had exactly that problem.  The drive reported an error during a RAID rebuild, because I stupidly did not run a scrub of the array before swapping a drive, which stopped the RAID rebuild.  The drive actually had accumulated several bad sectors but it only takes one to stop the rebuild.  The solution was to scan the RAID at the level of the file system to determine which files were damaged, and then deliberately run a read-write of the entire drive outside of the array to force reallocation of the bad sectors, albeit with corrupted data.  Once that was done, the drive reported no bad sectors during the RAID rebuild and the damaged files could be restored.

So in short, you put a knowingly defective drive in a RAID array, and when the rebuild process falls over the drive's dead sectors you fix it by manually relocating corrupt host data to another sector so the rebuild process can be fooled into completing? Seriously???  |O

At least where I am working doing such stunts would very likely get you fired for negligience and incompetence.

As I said in my last post I understand that for hobbyist use the criteria may well be different and re-using a drive that shows defects may or may not make sense, but the fact remains that this drive is defective and really should be replaced, not forced into use.

Quote
So a drive can report a bad sector to the OS, or RAID controller in this case, but I have had both recently, while not being out of spare sectors because scrub-on-read deliberately never happened, and the drive was waiting for a write to the bad sector to perform scrub-on-write.  It would be much worse for the drive to return bad data and no error while performing a scrub-on-read operation.

You're not understanding the issues here. The drive shows bad sectors at the interface because it can't recover the data in these sectors because of the large defect area on the platter. And when the drive can't recover the data (because the ECC information is affected as well) then it has to report an unrecoverable error as without correct data it makes no sense to relocate the sector.

The fact that the rebuild failed on the drive should have already been a warning that the best place for this drive would be the electronics recycling dumpster (or the shredder if the data was sensitive).

This really is RAID taken ad absurdum.
 

Offline Monkeh

  • Super Contributor
  • ***
  • Posts: 8042
  • Country: gb
Re: HDDs and badblocks corrupt downloaded files?
« Reply #93 on: October 08, 2020, 09:35:27 am »
As far as I can tell, I completely (>99% as regards reading it backwards, and at least some agreement on ECC) agree with Halcyon here.But if you read the HDD 'backwards', i.e. request the sectors of the files of interest in reverse order. It will only read the sectors that you request (although potentially extremely slowly), ignoring the bad sectors, either fully, or until a bad sector is part of the file you are trying to read.

Hence the ECC is only being calculated for the sectors you are trying to actually read, rather than a huge number of other ones, which could be bad, and hence causing the read to be aborted or badly delayed.

Yes, changing the way you request sectors from the drive may change its read-ahead strategy. That doesn't mean ECC is disabled.

Changing the order of reads is absolutely a valid tactic, one I've used, and I've never argued it isn't. But the claim is that it disables ECC, the claim is ECC doesn't even exist on some drives, and the only proof so far offered is "I'm a professional". ECC and error recovery are not the same thing.
 
The following users thanked this post: MK14

Offline Halcyon

  • Global Moderator
  • *****
  • Posts: 5870
  • Country: au
Re: HDDs and badblocks corrupt downloaded files?
« Reply #94 on: October 08, 2020, 10:08:33 am »
 :horse:
 

Offline MK14

  • Super Contributor
  • ***
  • Posts: 4853
  • Country: gb
Re: HDDs and badblocks corrupt downloaded files?
« Reply #95 on: October 08, 2020, 10:21:48 am »
the only proof so far offered is "I'm a professional". ECC and error recovery are not the same thing.

https://atola.com/products/insight/bad-sector-recovery.html

What the above link seems to be saying (quick summary), is that if there are unrecoverable ECC errors, then the HDD will simply return an error message/number, NOT any actual data (which might be garbled/corrupt).

So, as an absolute last resort, if you really MUST try and recover the data, or at least some of it. E.g. A text file.
You can read the disk using another read method, which WILL return data, even if the ECC tests/corrections fail.

That at least means you can copy over the required data, onto a good HDD drive, even if there are bad ECC error(s) present. Then later, process what you have been able to extract, and possibly recover some or all of the data.

E.g. A fire/water/etc damaged HDD, where the data is extremely valuable, possibly found at a big business, or the scene of a crime, or crashed airplane, or other high value data situation.
Or just simply someones entire digital photograph collection, over the last 12 years, and they want to recover as many photographs as possible. Even though windows refuses to read the faulty HDD.
 
The following users thanked this post: Monkeh

Offline Monkeh

  • Super Contributor
  • ***
  • Posts: 8042
  • Country: gb
Re: HDDs and badblocks corrupt downloaded files?
« Reply #96 on: October 08, 2020, 10:50:55 am »
the only proof so far offered is "I'm a professional". ECC and error recovery are not the same thing.

https://atola.com/products/insight/bad-sector-recovery.html

What the above link seems to be saying (quick summary), is that if there are unrecoverable ECC errors, then the HDD will simply return an error message/number, NOT any actual data (which might be garbled/corrupt).

So, as an absolute last resort, if you really MUST try and recover the data, or at least some of it. E.g. A text file.
You can read the disk using another read method, which WILL return data, even if the ECC tests/corrections fail.

That at least means you can copy over the required data, onto a good HDD drive, even if there are bad ECC error(s) present. Then later, process what you have been able to extract, and possibly recover some or all of the data.

E.g. A fire/water/etc damaged HDD, where the data is extremely valuable, possibly found at a big business, or the scene of a crime, or crashed airplane, or other high value data situation.
Or just simply someones entire digital photograph collection, over the last 12 years, and they want to recover as many photographs as possible. Even though windows refuses to read the faulty HDD.

Hey, some actual discourse. Refreshing.

READ LONG isn't quite as clear cut as suggested, but it does indeed exist and may be useful (.. of course, it also may be as good as random noise). It can still fail (you can ask for no retries).

Quote
The READ LONG command performs similarly to the READ SECTOR(S) command except
that it returns the data and a number of vendor specific bytes appended to the data field of the desired
sector. During a READ LONG command, the device does not check to determine if there has been a data
error. Only single sector READ LONG operations are supported.

The transfer of the vendor specific bytes shall be 16 bit transfers with the vendor specific byte in bits 7
through 0. Bits 15 through 8 shall be ignored by the host. The host shall use PIO mode 0 when using this
command.


Error recovery performed by the device either with or without retries is vendor specific.

Of course, this is from a standard superseded 22 years ago, so it's anyone's guess what this really does on a modern drive if it responds to it. What's actually contained in the (variable length) vendor bytes is, of course, vendor specific.

Still, data was written with ECC and, obviously, this command is entirely useless in normal operation.
« Last Edit: October 08, 2020, 10:58:01 am by Monkeh »
 
The following users thanked this post: MK14

Offline MK14

  • Super Contributor
  • ***
  • Posts: 4853
  • Country: gb
Re: HDDs and badblocks corrupt downloaded files?
« Reply #97 on: October 08, 2020, 11:18:35 am »
Of course, this is from a standard superseded 22 years ago, so it's anyone's guess what this really does on a modern drive if it responds to it. What's actually contained in the (variable length) vendor bytes is, of course, vendor specific.

Still, data was written with ECC and, obviously, this command is entirely useless in normal operation.

I remember in the old days using Spinrite (decades ago). Apparently it is still for sale, and there is an interesting (at least for me) 12 minute (approx), video about how it works in 2012. The video explains about 'bit rot', and how modern drives, with the ever smaller data bits (i.e. tiny physical dimensions, as stored on disks), because of the modern massive data capacities. Are susceptible, to errors, because of their tiny dimensions.
It also explains about how Spinrite using different strategies, to attempt to read (accurately) data, even if the ECC has gone too bad, to recover the data, via normal reads.

https://www.grc.com/sr/whatitdoes.htm   (Part of Gibson Research Corporation, which also does the ShieldsUP! open port tests, which some people, might of also heard of or used).
« Last Edit: October 08, 2020, 11:22:01 am by MK14 »
 

Online David Hess

  • Super Contributor
  • ***
  • Posts: 17063
  • Country: us
  • DavidH
Re: HDDs and badblocks corrupt downloaded files?
« Reply #98 on: October 08, 2020, 11:45:43 am »
Quote
When the RAID controller receives the bad data and an error, good ones regenerate the data and write it back to the drive, which then performs a scrub-on-write.  If this did not occur, then there would be no reason for the RAID controller to perform idle time scrubbing.

Nope. What you wrote would be (mostly) correct if we were talking about MFM/RLL/ESDI or early IDE (or SCSI-1 drives) from some 30 years ago. But we're not.

Modern (i.e. made in 2000 or later) drives don't represent their physical layout to the host. They report some artificial CHS layout which has nothing to do with the actual physical layout to the host for backward compatibility reasons (so that antique systems still CHS for addressing can boot from these drives). Even the sector size is often fake, as most modern hard drives use 4k sectors internally while reporting 512byte sectors on the interface.

But for most part the CHS layout isn't even used. LBA (Logical Block Adressing) has been a thing even before the year 2000 (it was first used with SCSI drives long before then), and it's been the standard way of addressing disks for many many years. With LBA, the host only sees a device which has a certain number of blocks. There's no CHS involved. LBA has been supported at least since Windows 98 and NT 4.0, and became the standard format with Windows 2000. And while LBA was an extension for IDE and ATA, LBA is the defined addressing standard for SATA, SAS and NVMe storage.

You RAID controller, whatever type/model that is, would have to be really old to not use LBA (and if that controller is so old then I guess the disks are, too). And even then it would only see the fake geometry reported by the drive, not the real one.

The only person here bringing up pre-IDE drives is you.  What part of "scrub-on-write" do you not understand?

Quote
If a modern RAID controller encounters a bad block (unrecoverable error), it will try to reconstruct the data from the redundancy disks and then may attempt to re-write the block to the affected disk. If this disk has sufficient spare sectors, it may revert the write to a spare block, after which the block will be fine and the integrity of the data is restored. If that was a one-off, the drive may well be fine for years. However, if that happens more often (as it's the case on a dying drive), the disk will eventually run out of spare sectors after which the attempt by the RAID controller to re-write the block will fail, and then the disk is failed and the array goes into contingency mode.

So I am wrong, but you agree while contradicting what you said earlier?

Quote
Quote
It would be dumb to discard an entire drive because of a single bad read when the data can be recovered and written back to force the drive to reallocation that sector; of course firmware based RAID controllers often are this dumb.

In a professional environment, if a drive shows bad sectors at the interface it's scrapped, period. Any decent RAID controller will immediately flag a drive as soon as unrecoverable errors start to appear. Because for a modern drive unrecoverable errors are usually a clear sign that the drive is defective, and the only thing that would be stupid is to not scrap the drive and risk the integrity of the host data. It's a simple as that. Because at the end of the day the host data (and the hourly rate for the admin who has to deal with it and the potential fall-out should the drive remain in service) is worth a lot more than that stupid hard drive.

Now I accept that for hobbyist use that may well be different, and if you can't afford to replace a drive it's certainly tempting to work around the problem of a defect sector. But that is only a viable option if your data (and your time) isn't worth much (because if it was you'd not try to cheap-skate backup). And it doesn't change the fact that the drive is telling you that you can no longer rely on it.

That is a policy decision and not inherent to RAID or sector reallocation.  Why even bring it up?

Quote
Quote
The internal defect management could operate on a correctable error, but I have never seen happen.

Well, yes, that's because it's supposed to be *transparent* to the host. Which is the whole point of a "defect-free interface".

It cannot be transparent because the visible reallocated sector count would increase.  Did you even read what I wrote?

Quote
Quote
I would have noticed if the defect list grew without errors being reported.  Many times I have done extended SMART surface scans and watched for this very thing and it never happened.  Most recently I have done it multiple times in the past couple of weeks on a pair of 1TB WD Greens, but doing an external surface scan which includes writing had some results.

Because you don't seem to fully understand what you are seeing. Any high level surface scan tool only scans the area that the drive is reporting as good (it has no access to the whole drive area - "defect-free Interface", remember?). Defects are hidden because defect management is completely *transparent*. The only way you can check what's actually going on in the drive is through SMART data.

When you're at the point where your tool can "see" defects then that means the drive has developed unrecoverable errors and is defective and should be discarded, but at the very least should not be used to store anything of importance.

Did I say "high level surface scan" or "SMART surface scan"?

And only the visible sectors need to be scanned anyway for the reason you identify.  The question was whether hard drives do scrub-on-read of bad sectors.  They do not.  And that has nothing to do with physical sectors which have been mapped out.

Quote
Quote
Quote
SSDs are a different matter as they should not have any patrol reads done on them, and less so any writes. SSDs also have internal defect management (which is part of it's Garbage Collection) and unlike with hard drives this normally works without having to be triggered (which for SSDs is done with the TRIM command).

The difference is that SSD must perform verification after every write as part of the write process, and they also scrub-on-read of sectors which are still correctable if they have too many soft errors.  Many perform idle time scrubbing because retention of modern Flash memory is abysmal, or at least their specifications say they do.  I sure know that many do not and will happily suffer bit rot while powered.

There's nothing in SSDs that matches "scrubbing". Which, on flash, would be completely counterproductive because every read of a flash cell reduces the amount of charge that is stored in that cell, so if a SSD did regular "read-scrubs" then the charge level would quickly become so low that the sector would have to be re-written to maintain its information, a process which causes wear on the flash cell.

Reads disturbance is observable and needs to be taken into account, but is minor compared to other factors.

Read disturbance and write disturbance also affect cells which are not being accessed, making idle time scrubbing in some form more important.

Quote
Quote
However power loss protection is also required for *any* write operation, and possibly any erase operation.  The reason for this is that interrupted writes can not only corrupt the data which is being written, but also data in other sectors, including the flash translation table, which can result in a non-recoverable situation.

Their are two reasons data can be so easily corrupted with an incomplete write.  With multi-level flash, the existing data when a sector is updated is at risk during the update and it is easy to see why.  It seem to me like this is avoidable by only storing the data from one sector with multiple levels but apparently Flash chips are not organized that way.

First of all, SSDs internally are physically organized in blocks, not sectors. Remember the LBA I mentioned above? LBA is *exactly* how SSDs are structured internally. On SSDs, Sectors are an artificial construct which has zero relation to which flash cell the information is physically located.

That distinction is irrelevant for this discussion, which is not about write amplification.

Quote
Garbage Collection and Wear Levelling also don't need PLP. If power is interrupted during the deletion of a block then the block will remain marked as for deletion and deletion will be repeated after power comes back up. For Wear Levelling, when data is moved from one block to another then the data from the old block is copied to a new block and then the old block is marked for deletion. If that process is interrupted then the old data is still there and the block shift is simply repeated.

*Any* interrupted write can destroy data which is not in the current block.


Quote
Quote
The other reason is more insidious; the state machine which controls the write operation can glitch during power loss,

The part in a SSD which controls write operations (and everything else) is the SSD controller and that is not a State Machine, it's a micro-controller running specific software (the drive's firmware) to perform it's duties.

Oh good, they implemented the state machine in a micro-controller so all is solved!

Quote
Quote
which is how drives which lack power loss protection got bricked so easily in those published tests several years ago.  Some drives lost no data, not even data in transit, some drives lost only data in transit, and most suffered either unrelated data loss or complete failure, but that was before the problem was understood well.

I know that early consumer SSDs killed themselves (particularly those made by OCZ) for a number of reasons, most of which were related to the lack of automatic GC (the drive needed GC triggered by the OS via TRIM, which the back then still common WindowsXP didn't support) and a wide range of firmware errors.

But the point remains that the only "data in transit" in a SSD is host data, and that SSDs do not work the way you think they do.

It was not only OCZ drives which failed, but it was *every* drive which did not have power loss protection.  Of particular note is that all of the drives with Sandvine controllers, which were advertised as not requiring power loss protection, failed.
 

Online David Hess

  • Super Contributor
  • ***
  • Posts: 17063
  • Country: us
  • DavidH
Re: HDDs and badblocks corrupt downloaded files?
« Reply #99 on: October 08, 2020, 11:52:28 am »
Also, if a hard drive which was made in 2000 or later exhibits a bad block to the OS then that means it has run out of spare sectors and should have been replaced long ago.

I recovered a RAID set not long ago which had exactly that problem.  The drive reported an error during a RAID rebuild, because I stupidly did not run a scrub of the array before swapping a drive, which stopped the RAID rebuild.  The drive actually had accumulated several bad sectors but it only takes one to stop the rebuild.  The solution was to scan the RAID at the level of the file system to determine which files were damaged, and then deliberately run a read-write of the entire drive outside of the array to force reallocation of the bad sectors, albeit with corrupted data.  Once that was done, the drive reported no bad sectors during the RAID rebuild and the damaged files could be restored.

So in short, you put a knowingly defective drive in a RAID array, and when the rebuild process falls over the drive's dead sectors you fix it by manually relocating corrupt host data to another sector so the rebuild process can be fooled into completing? Seriously???  |O

No, I replaced a good drive in the RAID array with a different good drive and bad sectors on one of the remaining drives stopped the rebuild.  Had I scrubbed the RAID array before doing the drive swap, then there would have been no problem.

Quote
Quote
So a drive can report a bad sector to the OS, or RAID controller in this case, but I have had both recently, while not being out of spare sectors because scrub-on-read deliberately never happened, and the drive was waiting for a write to the bad sector to perform scrub-on-write.  It would be much worse for the drive to return bad data and no error while performing a scrub-on-read operation.

You're not understanding the issues here. The drive shows bad sectors at the interface because it can't recover the data in these sectors because of the large defect area on the platter. And when the drive can't recover the data (because the ECC information is affected as well) then it has to report an unrecoverable error as without correct data it makes no sense to relocate the sector.

I agree; hard drives do not scrub-on-read.  I said that.

Quote
The fact that the rebuild failed on the drive should have already been a warning that the best place for this drive would be the electronics recycling dumpster (or the shredder if the data was sensitive).

The drive was good; see above.  The only failure was your reading comprehension.
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf