Author Topic: OpenZFS Data Corruption Issue  (Read 2913 times)

0 Members and 1 Guest are viewing this topic.

Offline SiliconWizardTopic starter

  • Super Contributor
  • ***
  • Posts: 15111
  • Country: fr
OpenZFS Data Corruption Issue
« on: December 13, 2023, 01:34:37 am »
For those using OpenZFS (or about to) and who haven't heard of this issue yet, this could be useful: https://www.phoronix.com/news/OpenZFS-2.2.2-Released

 

Offline DiTBho

  • Super Contributor
  • ***
  • Posts: 4214
  • Country: gb
Re: OpenZFS Data Corruption Issue
« Reply #1 on: December 16, 2023, 05:02:38 pm »
interesting ...  :o :o :o
The opposite of courage is not cowardice, it is conformity. Even a dead fish can go with the flow
 

Offline SiliconWizardTopic starter

  • Super Contributor
  • ***
  • Posts: 15111
  • Country: fr
Re: OpenZFS Data Corruption Issue
« Reply #2 on: December 16, 2023, 08:24:49 pm »
There was also this recent issue with possible ext4 data corruption in kernel 6.1.64 - that would affect users of kernel 6.1 LTS, such as Debian: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1057843

It's interesting to note that LTS versions of the kernel can get backported code from more recent versions, potentially breaking stuff in a nasty way. Although to be fair, it's not that common.
 

Offline DiTBho

  • Super Contributor
  • ***
  • Posts: 4214
  • Country: gb
Re: OpenZFS Data Corruption Issue
« Reply #3 on: December 17, 2023, 12:44:27 pm »
There was also this recent issue with possible ext4 data corruption in kernel 6.1.64 - that would affect users of kernel 6.1 LTS, such as Debian: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1057843

We are in severe regression-time.
Lately many things are breaking, many more, and with greater frequency, than in the past.
That's why I am still with kernel 5.*.
The opposite of courage is not cowardice, it is conformity. Even a dead fish can go with the flow
 

Offline DiTBho

  • Super Contributor
  • ***
  • Posts: 4214
  • Country: gb
Re: OpenZFS Data Corruption Issue
« Reply #4 on: December 17, 2023, 12:51:05 pm »
I'm currently working on a user space RAID. It's a complicated personal project, but if Linux has problems with a filesystem { Ext4, OpenZFS, ... }, I can move everything to { Open, Net, .. }BSD without too much trouble.

In March I will have to go back to Linux v6 kernels anyway, simply because I have no other Linux support for the new development boards that will arrive.

That's life, oh  :o :o :o
The opposite of courage is not cowardice, it is conformity. Even a dead fish can go with the flow
 

Offline magic

  • Super Contributor
  • ***
  • Posts: 7045
  • Country: pl
Re: OpenZFS Data Corruption Issue
« Reply #5 on: December 17, 2023, 02:06:22 pm »
It's interesting to note that LTS versions of the kernel can get backported code from more recent versions, potentially breaking stuff in a nasty way. Although to be fair, it's not that common.
Yes, they can, for example when the really bad bug is introduced by a minor bugfix.

FWIW, I have submitted a patch to stable once and it went through without questions from anyone...
But my bugfixes don't make things worse ;D
 

Offline audiotubes

  • Regular Contributor
  • *
  • Posts: 176
  • Country: cz
Re: OpenZFS Data Corruption Issue
« Reply #6 on: December 17, 2023, 02:33:31 pm »
I've been using ZFS since the Sun days, on SPARC hardware. I intentionally stay behind now on FreeBSD, still running 12.something... I knew this would happen :(
I have taken apart more gear than many people. But I have put less gear back together than most people. So there is still room for improvement.
 

Offline Karel

  • Super Contributor
  • ***
  • Posts: 2247
  • Country: 00
Re: OpenZFS Data Corruption Issue
« Reply #7 on: December 17, 2023, 02:52:02 pm »
There was also this recent issue with possible ext4 data corruption in kernel 6.1.64 - that would affect users of kernel 6.1 LTS, such as Debian: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1057843

It's interesting to note that LTS versions of the kernel can get backported code from more recent versions, potentially breaking stuff in a nasty way. Although to be fair, it's not that common.

To be fair, that wasn't a bug in ext4 but a bug in the kernel (outside the ext4 module) that affected ext4.
Ext4 is still the most reliable file system.
 

Offline DiTBho

  • Super Contributor
  • ***
  • Posts: 4214
  • Country: gb
Re: OpenZFS Data Corruption Issue
« Reply #8 on: December 17, 2023, 03:14:15 pm »
There was also this recent issue with possible ext4 data corruption in kernel 6.1.64 - that would affect users of kernel 6.1 LTS, such as Debian: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1057843

It's interesting to note that LTS versions of the kernel can get backported code from more recent versions, potentially breaking stuff in a nasty way. Although to be fair, it's not that common.

To be fair, that wasn't a bug in ext4 but a bug in the kernel (outside the ext4 module) that affected ext4.
Ext4 is still the most reliable file system.

so, is xfs affected too?
The opposite of courage is not cowardice, it is conformity. Even a dead fish can go with the flow
 

Offline DiTBho

  • Super Contributor
  • ***
  • Posts: 4214
  • Country: gb
Re: OpenZFS Data Corruption Issue
« Reply #9 on: December 17, 2023, 03:31:37 pm »
the really bad bug is introduced by a minor bugfix

Quote
Data corruption on software RAID 0 when discard is used
2015-05-22 - Evangelos Foutras

Recent Linux kernels (4.0.2+, LTS 3.14.41+), pushed to the [core] repository in the past couple of weeks, suffered from a bug that can cause data corruption on file systems mounted with the discard option and residing on software RAID 0 arrays. Even if discard is not specified, the fstrim command can also trigger this bug. (If you do not use software RAID 0 or the discard option, then this issue does not affect you.)

The issue has been addressed in the linux 4.0.4-2 and linux-lts 3.14.43-2 updates. Due to the nature of the bug, however, it is likely that data corruption has already occurred on systems running the aforementioned kernels. It is strongly advised to verify the integrity of affected file systems using fsck and/or restore their data from known good backups.

LOL, this reminds me of the behavior of my PPC405 nodes, while I was trying to solve a bug in u-boot (firmware) causing the PCI was not initialized correctly resulting in the PCI-sATA controller occasionally losing data.

In my case, it was UNsupported hw, old stuff from 2001, which no one cared/cares about, and this is the reason why several defects accumulated that no one ever wanted to test or fix.

In the case of mainstream kernels, i.e. on modern hardware platforms or in any case of great public interest ...

... I just don't understand ...  :-// :-// :-//
The opposite of courage is not cowardice, it is conformity. Even a dead fish can go with the flow
 

Online Halcyon

  • Global Moderator
  • *****
  • Posts: 5857
  • Country: au
Re: OpenZFS Data Corruption Issue
« Reply #10 on: December 17, 2023, 11:28:13 pm »
I think it's important to keep in mind that it's a rare set of circumstances which would cause this bug to manifest. If you're using FreeBSD (and indeed TrueNAS) as intended, you're never going to experience this issue. You can completely mitigate this issue (until a fix is released) by disabling block cloning.
 

Offline SiliconWizardTopic starter

  • Super Contributor
  • ***
  • Posts: 15111
  • Country: fr
Re: OpenZFS Data Corruption Issue
« Reply #11 on: December 17, 2023, 11:38:43 pm »
There was also this recent issue with possible ext4 data corruption in kernel 6.1.64 - that would affect users of kernel 6.1 LTS, such as Debian: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1057843

We are in severe regression-time.
Lately many things are breaking, many more, and with greater frequency, than in the past.
That's why I am still with kernel 5.*.

I don't know if many more things are breaking. Do you have figures to back this up?
Although the fact that the more complex the kernel gets, and the more people contribute, and the higher the probability of something breaking. That's to be expected.
But broken stuff in the Linux kernel gets fixed very quickly (so far). So as long as you can afford that (I probably wouldn't for anything like servers, but for workstations it's fine to me), I personally prefer being on the latest kernel version and updating very regularly. I always look, at least in summary, at the list of changes before updating though, and sometimes skip a version. But I prefer having access to the latest fixes immediately if they are there, rather than being stuck to an older version. Each use case is different though.

Speaking of the future of the Linux kernel, I've listened to one of the latest Linus talks, and found the attitude of Linus a bit odd. Something appears to be changing. I don't know, like - he almost sounds like he doesn't care as much anymore.
If anyone else has noticed it, would be curious to discuss that. Is he preparing to "retire"?
 

Offline magic

  • Super Contributor
  • ***
  • Posts: 7045
  • Country: pl
Re: OpenZFS Data Corruption Issue
« Reply #12 on: December 18, 2023, 12:09:54 am »
It's a complex function of the problem's severity, number affected users, their resources, publicity and so on.

If you encounter an obscure bug in something that either works for everybody else or no one else is known to use, you are sort of on your own. I mean, you will get some hints, but God forbid you don't know C :P Even if you come with a working patch, it won't quite have the same priority as the thousand other things that millions of others are loudly complaining about.

Not 100% sure, but I seem to recall that the recent depreciation of ReiserFS was directly motivated by somebody almost screwing it up (or having to spend time not to screw it up) and deciding to look for excuses to kill it instead.
« Last Edit: December 18, 2023, 12:16:00 am by magic »
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf