Nope. What you wrote would be (mostly) correct if we were talking about MFM/RLL/ESDI or early IDE (or SCSI-1 drives) from some 30 years ago. But we're not.
Modern (i.e. made in 2000 or later) drives don't represent their physical layout to the host. They report some artificial CHS layout which has nothing to do with the actual physical layout to the host for backward compatibility reasons (so that antique systems still CHS for addressing can boot from these drives). Even the sector size is often fake, as most modern hard drives use 4k sectors internally while reporting 512byte sectors on the interface.
But for most part the CHS layout isn't even used. LBA (Logical Block Adressing) has been a thing even before the year 2000 (it was first used with SCSI drives long before then), and it's been the standard way of addressing disks for many many years. With LBA, the host only sees a device which has a certain number of blocks. There's no CHS involved. LBA has been supported at least since Windows 98 and NT 4.0, and became the standard format with Windows 2000. And while LBA was an extension for IDE and ATA, LBA is the defined addressing standard for SATA, SAS and NVMe storage.
You RAID controller, whatever type/model that is, would have to be really old to not use LBA (and if that controller is so old then I guess the disks are, too). And even then it would only see the fake geometry reported by the drive, not the real one.
The only person here bringing up pre-IDE drives is you.
Yes, because that's the *only* generation where what you state could even be relevant for.
Your general understanding of this topic also appears to be rooted around that timeframe.
What part of "scrub-on-write" do you not understand?
What part of "it doesn't work the way you think you do" do you not understand?
If a modern RAID controller encounters a bad block (unrecoverable error), it will try to reconstruct the data from the redundancy disks and then may attempt to re-write the block to the affected disk. If this disk has sufficient spare sectors, it may revert the write to a spare block, after which the block will be fine and the integrity of the data is restored. If that was a one-off, the drive may well be fine for years. However, if that happens more often (as it's the case on a dying drive), the disk will eventually run out of spare sectors after which the attempt by the RAID controller to re-write the block will fail, and then the disk is failed and the array goes into contingency mode.
So I am wrong, but you agree while contradicting what you said earlier?
Nope. You just don't seem to understand that the checks and activities a RAID controller performs is at a different level than what the internal defect management of a hard drive does.
Handling of media defects is up to the drive's internal defect management, which (again) is *transparent* to the host. When it's not (i.e. the drive encounters an unrecoverable error) then that means the drive is telling you to that it is on its way out. When that happens in a disk that is part of a RAID array in redundant configuration (i.e. anything except RAID0 or JBOD) then the controller, in an attempt to maintain integrity of the RAID array, will reconstruct the data from it's redunancy elements and send the block to the drive for writing (where, if the drive's defect management will write it to a space physical sector if there are some left). If that fails (because there are no spare sectors left) then the drive is flagged for replacement.
Now what you seem to believe is that the RAID controller's handling of defective sectors is synonymous with the hard drive's internal defect management. But that is not the case. The RAID controller has zero control over how the drive handles defects, all it can do is to ask the drive to write a certain block, and whatever the RAID controller does it does so only to maintain the integrity of the RAID array (it's not to handle drive defects, which, again, is what the drive is responsible for).
It would be dumb to discard an entire drive because of a single bad read when the data can be recovered and written back to force the drive to reallocation that sector; of course firmware based RAID controllers often are this dumb.
In a professional environment, if a drive shows bad sectors at the interface it's scrapped, period. Any decent RAID controller will immediately flag a drive as soon as unrecoverable errors start to appear. Because for a modern drive unrecoverable errors are usually a clear sign that the drive is defective, and the only thing that would be stupid is to not scrap the drive and risk the integrity of the host data. It's a simple as that. Because at the end of the day the host data (and the hourly rate for the admin who has to deal with it and the potential fall-out should the drive remain in service) is worth a lot more than that stupid hard drive.
Now I accept that for hobbyist use that may well be different, and if you can't afford to replace a drive it's certainly tempting to work around the problem of a defect sector. But that is only a viable option if your data (and your time) isn't worth much (because if it was you'd not try to cheap-skate backup). And it doesn't change the fact that the drive is telling you that you can no longer rely on it.
That is a policy decision and not inherent to RAID or sector reallocation. Why even bring it up?
Because its *not* a "policy decision", these are established procedures based on how storage systems actually work.
A drive that shows defects on the interface is no longer reliable and should be discarded. It's not rocket science to understand why this is established practice.
The internal defect management could operate on a correctable error, but I have never seen happen.
Well, yes, that's because it's supposed to be *transparent* to the host. Which is the whole point of a "defect-free interface".
It cannot be transparent because the visible reallocated sector count would increase.
"Transparent to the interface" means that drive defects are hidden to the host (i.e. the drive appears with zero defects).
The *only* exception are SMART data, because SMART data is supposed to show a drive's health status.
"Re-allocated Sectors" indicate how many times the drive encountered a defective sector, was able to reconstruct the data and moved the data to a spare sector.
"Recoverable Errors" are soft errors where the drive encountered an error in a sector but was able to restore the data and successfully rewrite the sector (no re-allocation required).
Both are "correctable errors"
"Unrecoverable errors" are when the drive encounters bad data and is incapable of restoring the original data.
I would have noticed if the defect list grew without errors being reported. Many times I have done extended SMART surface scans and watched for this very thing and it never happened. Most recently I have done it multiple times in the past couple of weeks on a pair of 1TB WD Greens, but doing an external surface scan which includes writing had some results.
Because you don't seem to fully understand what you are seeing. Any high level surface scan tool only scans the area that the drive is reporting as good (it has no access to the whole drive area - "defect-free Interface", remember?). Defects are hidden because defect management is completely *transparent*. The only way you can check what's actually going on in the drive is through SMART data.
When you're at the point where your tool can "see" defects then that means the drive has developed unrecoverable errors and is defective and should be discarded, but at the very least should not be used to store anything of importance.
Did I say "high level surface scan" or "SMART surface scan"?
It doesn't matter as SMART Short Test and SMART Extended Tests are both tests that only read the user area (not the extended area that contains the reserve blocks or any other surface area, e.g. the firmware areas some drives have). The only difference is where the results go (to the host PC on a regular surface scan, to the disk controller on a SMART Test).
On top of that, both SMART Short Test and Long Test are generally read-only. And they are so because on a healthy drive there shouldn't be any unrecoverable errors. Also, sector re-allocation normally only works for the user area, if there's a defect in the firmware area which can't be corrected by ECC then the drive is toast.
With modern drives, there is no way to perform a low-level format or even low-leve surface scan. Simple as that.
SSDs are a different matter as they should not have any patrol reads done on them, and less so any writes. SSDs also have internal defect management (which is part of it's Garbage Collection) and unlike with hard drives this normally works without having to be triggered (which for SSDs is done with the TRIM command).
The difference is that SSD must perform verification after every write as part of the write process, and they also scrub-on-read of sectors which are still correctable if they have too many soft errors. Many perform idle time scrubbing because retention of modern Flash memory is abysmal, or at least their specifications say they do. I sure know that many do not and will happily suffer bit rot while powered.
There's nothing in SSDs that matches "scrubbing". Which, on flash, would be completely counterproductive because every read of a flash cell reduces the amount of charge that is stored in that cell, so if a SSD did regular "read-scrubs" then the charge level would quickly become so low that the sector would have to be re-written to maintain its information, a process which causes wear on the flash cell.
Reads disturbance is observable and needs to be taken into account, but is minor compared to other factors.
Correct, although the effects become more pronounced the smaller the flash structures and the more levels a cell has to store.
Read disturbance and write disturbance also affect cells which are not being accessed, making idle time scrubbing in some form more important.
No, it doesn't (Read Disturbance is captured by ECC). Also, read cycles are monitored as is cell aging and Wear Leveling moves data to other blocks when ECC detects an error or a certain number of P/E cycles or read cycles have been performed, which alleviates the issue completely.
No patrol reads required.
However power loss protection is also required for *any* write operation, and possibly any erase operation. The reason for this is that interrupted writes can not only corrupt the data which is being written, but also data in other sectors, including the flash translation table, which can result in a non-recoverable situation.
Their are two reasons data can be so easily corrupted with an incomplete write. With multi-level flash, the existing data when a sector is updated is at risk during the update and it is easy to see why. It seem to me like this is avoidable by only storing the data from one sector with multiple levels but apparently Flash chips are not organized that way.
First of all, SSDs internally are physically organized in blocks, not sectors. Remember the LBA I mentioned above? LBA is *exactly* how SSDs are structured internally. On SSDs, Sectors are an artificial construct which has zero relation to which flash cell the information is physically located.
That distinction is irrelevant for this discussion, which is not about write amplification.
It's relevant because there are no sectors in an SSD (there are blocks) while you talk about sectors which have no relevance for the location of the physical data.
Garbage Collection and Wear Levelling also don't need PLP. If power is interrupted during the deletion of a block then the block will remain marked as for deletion and deletion will be repeated after power comes back up. For Wear Levelling, when data is moved from one block to another then the data from the old block is copied to a new block and then the old block is marked for deletion. If that process is interrupted then the old data is still there and the block shift is simply repeated.
*Any* interrupted write can destroy data which is not in the current block.
No, it can't. Because the original block is erased only after the write process in the new block has completed. If that process is interrupted then the original data is still there.
The other reason is more insidious; the state machine which controls the write operation can glitch during power loss,
The part in a SSD which controls write operations (and everything else) is the SSD controller and that is not a State Machine, it's a micro-controller running specific software (the drive's firmware) to perform it's duties.
Oh good, they implemented the state machine in a micro-controller so all is solved!
I'm not sure you really understand the concept of "State Machine".
The flash controller in a typical SSD is not much different than the main processor of the host it is connected to. Many SSD controllers are ARM based.
It was not only OCZ drives which failed, but it was *every* drive which did not have power loss protection. Of particular note is that all of the drives with Sandvine controllers, which were advertised as not requiring power loss protection, failed.
I really would like to see some evidence for that claim. Because it sounds bogus to me. Of course SSDs without PLP can lose data but they certainly shouldn't fail.
Besides, I don't know any "Sandvine" SSD controllers (I know SandForce controllers, part of LSI) but maybe I just missed it.