Author Topic: Raid1 + {ext3, ext4, xfs} is problematic if a disk silently has IO errors (Read 4575 times)

DiTBho · « **Reply #25 on:** November 22, 2023, 10:09:40 am »

Quote from: gf on November 22, 2023, 09:43:43 am

"Better"? Not in terms of performance, because reading both copies and comparing them prevents striping of concurrent read requests across disks, and the latter can almost double throughput for some read-intensive workloads.

we're talking about "reliability".

Quote from: gf on November 22, 2023, 09:43:43 am

Why do you think that reading and comparing both copies could not be implemented in software RAID, too? Of course there is not free lunch -- it costs CPU power and memory bandwidth.

Well, possible, just no one to date has implemented anything, probably because softRAID already costs CPU power and memory bandwidth, and usually people - as you just pointed out yourself - care more about taking less time to copy things than actually being sure that things were copied correctly.

Nominal Animal · « **Reply #26 on:** November 22, 2023, 03:49:16 pm »

Quote from: DiTBho on November 22, 2023, 10:09:40 am

we're talking about "reliability".

That should probably exclude RAID0 and RAID1, then. They provide zero protection against silent corruption, which means RAID1 actually doubles the probability of silent corruption.

Traditionally in Linux, with all types of software-RAID, smartmontools are used to track the actual storage device error logs and statistics. Background scrubbing –– necessary to detect data degradation –– is controlled by the system administrator (by writing check or repair to the md/sync_action sysfs pseudofile; see man 4 md Scrubbing and mismatches chapter).

I'm sure you can see how it is quite logical for this kind of stack (per Unix philosophy!) that the topmost filesystem (ext2/3/4) does not have any file integrity checks, either: each layer relies on the function of the lower layer, and each has tools and policies one can apply to detect problems.

It is good that we do not all agree, though, and that there are competing filesystems and approaches that do include file integrity checks.
Knowing the real-world probabilities, I'm not really interested in having those on my own workstations (even though I do like to use RAID-0 and RAID-1) because I prefer the higher throughput over integrity checks there; but might choose differently for certain servers and appliances.

Do note it is not a matter of not caring about integrity: it is about having other means to achieve sufficient practical probabilities.
(You [plural!] may have noticed that I myself often rail against programmers who assume syscalls succeed, and ignore "rare" errors because they feel they are so rare they don't need to be cared about. I do want my tools to report any errors when those are reported by the kernel or hardware, but I still do not expect them to be perfect. It is having the information about an error occurring available and ignoring it that really bothers me; and not that the tool is imperfect and sometimes may garble my precious data.)

Currently, with so few Intel/AMD desktop and laptop processors supporting ECC RAM, I do believe silent RAM bit-flips may occur more often in practice, for example during copying of files. As the data in RAM is then not protected by any checksum or error correction code, nothing can detect the corruption either, and e.g. on RAID-1, both copies will inherit the changed data without any error. (Very few file copy utilities actually read back the copied file, to verify the original and new data match.)

This is one reason why I like to use tar archives for backing up my important files: it adds the checksum verification. (Yes, you could achieve similar by using e.g. ZFS or some other filesystem for the backups that has file integrity checks. I find the tar archives suitable for my needs, that's all.)

If I ever suspect any kind of silent file corruption –– be it from hardware or from software (like kernel driver) ––, the following bash-find stanza can be very useful:
find root -type f -printf '%s %T+ %p\0' | while read -d "" size modified path ; do csum="$(sha256sum -b "$path")" ; csum="${csum%% *}" ; printf '%s %12s %s %s\n' "$modified" "$size" "$csum" "$path" ; done
which generates a listing of all files under root with their last modification time (in YYYY-MM-DD+hh:mm:ss.nnnnnnnnn in local time, sorting correctly), size in bytes, SHA256 checksum, and full path. (If there are multiple file owners/groups or access modes, they're easily added to the find print pattern, the while read ... list, and the final printf output.)
You could also use SHA512 checksum (both are part of coreutils, and thus available on all Linux distributions), but the output then becomes too wide for my liking. If you have files with strange file names, I recommend you reset the locale export LANG=C LC_ALL=C first, and use NUL (\0) instead of newline for the record separator. You can then use tr '\0' '\n' filename | less to view it.

Redirecting or tee'ing that to a file lets one easily verify the files later, using either diff -Nabur or similar, or a simple awk scriptlet (with size, checksum, and modification timestamps in separate arrays keyed by the path; reporting only if conflicting information is read). If you use NUL instead of a newline, you can start with the following gawk/mawk snippet:

awk -v RS='\0' '{ modified=$1; size=$2; csum=$3; path=$0; sub(/^[^ ]+ +[^ ]+ +[^ ]+ */, "", path);
if (path in fdate) {
if (fdate[path] == modified && fcsum[path] != csum) {
/* report error */
}
} else {
fdate[path] = modified;
fcsum[path] = csum;
fsize[path] = size;
}' files...

which correctly extracts the path part even when it contains spaces in it.
The combination of scanning and comparison can then easily be wrapped in a script that one triggers from crontab or similar, running only when the machine and storage devices are idle (i.e., nice -n +20 ionice -c 3 script...).

DiTBho has a good point in that this is a completely different approach to verifying and reporting the error on a per-file basis, as soon as it is noticed. One reason I personally prefer this opposite/offline method, is that many current programs don't handle those error reports well, aborting and/or producing garbage: having a logically separate scrubber, or method of verification, allows those programs to still work, but also tell me whenever corruption or problems have occurred. It is not optimal, but given the current tools at hand, no optimal approach exists.

On server-class hardware, I do prefer to use proper hardware RAID (RAID-6 for example), and ECC RAM, so that the hardware layer does the monitoring for me. (My own data – hobby projects and such – don't currently warrant the cost, that's all.) There, too, it is important to ensure the hardware reports are monitored and any issues are quickly reviewed by a human; just having the hardware do monitoring is not sufficient.

DiTBho · « **Reply #27 on:** November 22, 2023, 04:22:07 pm »

Quote from: Nominal Animal on November 22, 2023, 03:49:16 pm

This is one reason why I like to use tar archives for backing up my important files: it adds the checksum verification.

This is the old but good trick I also use with { DDS4-tape, DVD-RAMs, iRev } backups.
Formatted with the simple filesystem possible, files added to tar-archives, without compressions

DiTBho · « **Reply #28 on:** November 22, 2023, 06:58:34 pm »

Boom, makes good points

Halcyon · « **Reply #29 on:** November 26, 2023, 01:45:13 am »

Quote from: DiTBho on November 21, 2023, 09:10:02 pm

Linus Tech Tips fails at using ZFS properly, and loses data; read here

One of the comments in that thread is spot on: "Linus Sebastian, the epitome of knowing just enough to be dangerous".

Linus Tech Tips isn't a technical channel, it's merely there for entertainment.

Marco · « **Reply #30 on:** November 26, 2023, 03:56:33 pm »

Quote from: DiTBho on November 22, 2023, 06:58:34 pm

Boom, makes good points

Software RAID is pretty dead too at all but the smallest scale.

Everything big is cluster, redundant array of file servers. Redundancy at the lowest level is just needless cost and complexity at that point.

SiliconWizard · « **Reply #31 on:** November 26, 2023, 09:50:00 pm »

Quote from: Marco on November 26, 2023, 03:56:33 pm

Quote from: DiTBho on November 22, 2023, 06:58:34 pm
Boom, makes good points
Software RAID is pretty dead too at all but the smallest scale.

Everything big is cluster, redundant array of file servers. Redundancy at the lowest level is just needless cost and complexity at that point.

Really?

Halcyon · « **Reply #32 on:** November 26, 2023, 11:39:57 pm »

Quote from: Marco on November 26, 2023, 03:56:33 pm

Quote from: DiTBho on November 22, 2023, 06:58:34 pm
Boom, makes good points
Software RAID is pretty dead too at all but the smallest scale.

Everything big is cluster, redundant array of file servers. Redundancy at the lowest level is just needless cost and complexity at that point.

I'd actually disagree with that. I think software "RAID" is more relevant now that it has been in the past. With technologies like ZFS, they have benefits over your more traditional hardware RAID setups.

Software RAID got a pretty bad reputation in the consumer market, when Windows Dynamic Disks and half-baked software RAID built-in to consumer boards started to become a thing. Back then, proper hardware RAIDs were superior, but those were mostly reserved for the enterprise market (and for those who knew what they were doing).

David Hess · « **Reply #33 on:** November 27, 2023, 12:06:48 am »

Quote from: Halcyon on November 26, 2023, 11:39:57 pm

Software RAID got a pretty bad reputation in the consumer market, when Windows Dynamic Disks and half-baked software RAID built-in to consumer boards started to become a thing. Back then, proper hardware RAIDs were superior, but those were mostly reserved for the enterprise market (and for those who knew what they were doing).

I am actually pretty impressed with the Windows replacement for Dynamic Disks. Storage Spaces operates a lot like ZFS except that disks can be added and removed as needed. Unfortunately Microsoft removed the capability to use their new ZFS style of file system leaving NTFS.

Nominal Animal · « **Reply #34 on:** November 27, 2023, 02:13:33 am »

Software developers working with large code bases on Linux, and others working with large datasets on Linux, really should consider using software-RAID 0 and/or 1.

RAID-0 (striping) increases large-file copy bandwidth, whereas RAID-1 reduces small-block random read latencies. Until you've tried it in practice, it is hard to understand how large a difference it can make in practice. In Linux, it is completely okay to mix the two, on a per-partition basis. I recommend you use LVM; see man 7 lvmraid for details; you then get all the LVM bonuses like snapshots and dynamic resizing for "free".

Neither RAID-0 or RAID-1 provides any protection against silent data corruption. RAID-1 only provides protection against complete malfunction of one drive (which is basically the protection I want; the rest depending on a good daily/weekly/monthly backup policy), so combined with the increased small-block random read rates, it's pretty nice for a Linux development machine. Just make sure you use HDDs or SSDs with the same firmware, make and model.

magic · « **Reply #35 on:** November 27, 2023, 07:52:31 am »

Quote from: Nominal Animal on November 27, 2023, 02:13:33 am

RAID-0 (striping) increases large-file copy bandwidth, whereas RAID-1 reduces small-block random read latencies.

Do you still see a meaningful latency difference on SSD, though?
(Sequential throughput - yes, obviously.)

Quote from: Halcyon on November 26, 2023, 11:39:57 pm

Software RAID got a pretty bad reputation in the consumer market, when Windows Dynamic Disks and half-baked software RAID built-in to consumer boards started to become a thing. Back then, proper hardware RAIDs were superior, but those were mostly reserved for the enterprise market (and for those who knew what they were doing).

You sound like a Windows user

Software RAID has a well deserved bad reputation, always had and always will, because it sucks. RAID 0/1 are OK if you accept their compromises (lack of redundancy / large storage overhead). But RAID 5/6 are difficult to get right in presence of power failures (with software: also OS crashes

) during partial stripe writes. And it loads the CPU; maybe less of a problem with modern hardware (CPUs faster than spinning rust).

It supposedly works better in ZFS, becasue ZFS is aware of what it's doing and which data belong to where and it's allowed do COW. (AFAIK the better software RAIDs also effectively did COW, by means of journaling to an extra disk, but after committing they had to flush the data back to main storage).

It is true that large storage solutions these days are moving redundancy higher up the stack, to maintain availability when whole servers (if not data centers) go down. If you have the data replicated (or RAID5- striped) across multiple machines, maybe across continents, you no longer need a traditional RAID to ensure smooth operation when a disk fails somewhere. The machine simply reports the data as gone forever and you get them from elsewhere, and rebuild the missing replica (also elsewhere). That being said, striped RAID on storage nodes may still offer throughput advantage without the hassle of downloading separate fragments from many different machines.

DiTBho · « **Reply #36 on:** November 27, 2023, 11:17:07 am »

Quote from: Halcyon on November 26, 2023, 01:45:13 am

Quote from: DiTBho on November 21, 2023, 09:10:02 pm
Linus Tech Tips fails at using ZFS properly, and loses data; read here

One of the comments in that thread is spot on: "Linus Sebastian, the epitome of knowing just enough to be dangerous".

Linus Tech Tips isn't a technical channel, it's merely there for entertainment.

Yup, it's there for entertainment, just I have to say that I also made the same mistake with ZFS, because I thought it would immediately reread what it wrote.

Not by default! It's instead the user who must periodically force a rereading!

Compared to LTT, I worked with 20GB of data, of synthetic data, that is, not created specifically in the laboratory, just to test ZFS (kernel integrated). LTT, on the other hand, lost YouTube videos and projects because its snapshots also became corrupted.

Halcyon · « **Reply #37 on:** November 27, 2023, 11:53:17 pm »

Quote from: magic on November 27, 2023, 07:52:31 am

You sound like a Windows user

I used to be... Then Windows 8/10/11 became a thing. I use MacOS as my daily driver and Linux for everything else these days.


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

EEVblog Electronics Community Forum

Author Topic: Raid1 + {ext3, ext4, xfs} is problematic if a disk silently has IO errors (Read 4575 times)

DiTBho

Re: Raid1 + {ext3, ext4, xfs} is problematic if a disk silently has IO errors

Nominal Animal

Re: Raid1 + {ext3, ext4, xfs} is problematic if a disk silently has IO errors

DiTBho

Re: Raid1 + {ext3, ext4, xfs} is problematic if a disk silently has IO errors

DiTBho

Re: Raid1 + {ext3, ext4, xfs} is problematic if a disk silently has IO errors

Halcyon

Re: Raid1 + {ext3, ext4, xfs} is problematic if a disk silently has IO errors

Marco

Re: Raid1 + {ext3, ext4, xfs} is problematic if a disk silently has IO errors

SiliconWizard

Re: Raid1 + {ext3, ext4, xfs} is problematic if a disk silently has IO errors

Halcyon

Re: Raid1 + {ext3, ext4, xfs} is problematic if a disk silently has IO errors

David Hess

Re: Raid1 + {ext3, ext4, xfs} is problematic if a disk silently has IO errors

Nominal Animal

Re: Raid1 + {ext3, ext4, xfs} is problematic if a disk silently has IO errors

magic

Re: Raid1 + {ext3, ext4, xfs} is problematic if a disk silently has IO errors

DiTBho

Re: Raid1 + {ext3, ext4, xfs} is problematic if a disk silently has IO errors

Halcyon

Re: Raid1 + {ext3, ext4, xfs} is problematic if a disk silently has IO errors

Share me