Author Topic: [solved] catastrophic event, file recovery on ext3  (Read 5805 times)

0 Members and 1 Guest are viewing this topic.

Offline legacyTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 4415
  • Country: ch
Re: catastrophic event, file recovery on ext3
« Reply #25 on: October 22, 2019, 12:26:51 pm »
the mistakes you made in the last month so maybe it only takes you 3 weeks re-coding :-/

Yup, written notes on paper have survived, and this helps  :D
 

Offline legacyTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 4415
  • Country: ch
Re: catastrophic event, file recovery on ext3
« Reply #26 on: October 22, 2019, 12:31:33 pm »
Quote from: I wanted a rude username link=topic=215246.msg2749520#msg2749520
Newer files being Have you tried running the more promising looking files through gzrt to see if at least the first part of them is recoverable?

yup, this way I recovered C files and assembly files. Small files, whose blocks have low fragmentation.

 

Offline legacyTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 4415
  • Country: ch
Re: catastrophic event, file recovery on ext3
« Reply #27 on: October 22, 2019, 12:40:02 pm »
Gentoo in production? You're a braver man than I.

Gentoo in production and on experimental platforms.

The HPPA one is more stable than the MIPS4 one; anyway ... I have never thought a bash shell can behave this way, the whole /etc/system/* relies on bash scripts, so if the bash goes crazy after a recompile, this is really really worrying.

Lesson learned: never trust a new bash shell before you have run ALL the tests. This stuff needs to stay in quarantine inside a sandbox until you are 100% sure it's safe.
 

Online SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14521
  • Country: fr
Re: catastrophic event, file recovery on ext3
« Reply #28 on: October 22, 2019, 01:18:23 pm »
You never back up?

The hard drive is mirrored, but this means the content is equal to both disks. We usually save a snapshot dail, it's automatically done before midnight, and then we burn one-week-of-snapshots into a DVDRAM every Sunday afternoon; but in this case we didn't because we are still moving our storage stuff to a new place, therefore everything was manual, and the backup was planned to be monthly.

Murphy's law rarely fails to hold true. ;D

I'm sure during the time you had daily backups, you probably hardly encountered any failure?
And now that you temporarily switched to a monthly backup, you have a fuck-up 30 days right after the last backup? That's Murphy almost to the letter! :-DD

I understand the temporary situation, but going from daily to monthly was a bit on the edge anyway...
 

Online SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14521
  • Country: fr
Re: catastrophic event, file recovery on ext3
« Reply #29 on: October 22, 2019, 04:00:53 pm »
Just remembering one event... when it comes to fuck-ups...

Not too long ago, I issued a "rm -rf /boot" as root on one of my machines. :-DD
This was meant to be "rm -rf boot" (removing a local boot directory, but with root ownership), but "autopilot" plays tricks when you're used to typing '/boot' much more often than just 'boot'.

I had no recent full system backup, but I had a copy of all boot config files, so rebuilding the /boot directory was just a matter of a couple minutes, no harm done. But this is still one of those times when you are in awe for a few seconds for doing such a stupid thing...
 ;D
 

Offline sokoloff

  • Super Contributor
  • ***
  • Posts: 1799
  • Country: us
Re: catastrophic event, file recovery on ext3
« Reply #30 on: October 22, 2019, 06:24:44 pm »
I had an nfs volume mounted with no_root_squash and went to rm -rf the local copy of all home directories that I'd copied over to the nfs volume so I could re-create the mount point and mount user dirs over nfs.

Unfortunately, I picked the wrong target and rather than quickly deleting the modest size home directory of a couple local users, it began to slowly rm -rf all home dirs on the nfs volume. I gave it a few seconds to complete before the reality set in of what I'd just done. Also no backups...  |O
 

Offline legacyTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 4415
  • Country: ch
Re: catastrophic event, file recovery on ext3
« Reply #31 on: October 22, 2019, 06:50:52 pm »
I wonder which is the behavior, in this case, of RaiserFS and XFS.
Do they use the LAST(1) policy(2) for choosing the "free" inode on file creation?

Do they zero structures on file and folder deletion?
I will check it out.

(1) this means that if you delete a folder, and you do not issue "sync", its inode lists is still in ram
and will be reused for new files, which is cathastropic because everything you have just deleted is going to be overwritten

Today I have resumed a lot of files, and LAST is what I have observed for most of the data truncated/overwritten.

(2) can this policy be changed? I have to investigate
« Last Edit: October 22, 2019, 09:29:33 pm by legacy »
 

Offline I wanted a rude username

  • Frequent Contributor
  • **
  • Posts: 627
  • Country: au
  • ... but this username is also acceptable.
Re: catastrophic event, file recovery on ext3
« Reply #32 on: October 22, 2019, 08:56:41 pm »
This all becomes moot with SSDs, anyway. We can't rely on implementation quirks to secure our data. A combination of online and offline backup methods, with good reporting by email (with a useful summary in the subject field), is the only way to be sure.
 

Offline james_s

  • Super Contributor
  • ***
  • Posts: 21611
  • Country: us
Re: catastrophic event, file recovery on ext3
« Reply #33 on: October 22, 2019, 09:41:23 pm »
I had an nfs volume mounted with no_root_squash and went to rm -rf the local copy of all home directories that I'd copied over to the nfs volume so I could re-create the mount point and mount user dirs over nfs.

Unfortunately, I picked the wrong target and rather than quickly deleting the modest size home directory of a couple local users, it began to slowly rm -rf all home dirs on the nfs volume. I gave it a few seconds to complete before the reality set in of what I'd just done. Also no backups...  |O

Whoops.

I once did sudo rm -rf and hit enter before I'd typed the full path on my work PC and blew away the whole bin folder which had all the system files like cp and whatnot. I ended up having to boot up from a live cd of the same distro and copy it over from that. Lesson learned.
 

Offline legacyTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 4415
  • Country: ch
Re: catastrophic event, file recovery on ext3
« Reply #34 on: October 22, 2019, 09:42:16 pm »
I do think SSD is no matter unless you modify the kernel driver. A SSD disk is managed as a random block device, the kernel addresses it by LBAs, and LBAs are chosen by LAST and its policy, therefore if you format it ext3 you have the same behavior as on an electromechanical disk.

In fact, there are two patches for SSDs, and they work at the low level (SATA controller), and they won't work on HPPA and MIPS, this due to the nature of x86 SATA controllers used in non-x86 computers, which are "too much modern " stuff for the technology, and rather "snubbed" by the industry, in fact, there are no working hw-raid cards, only "hacked" sw-raid controllers. You can imagine the ZERO support for SSD-optimized SATA controllers, which are also only for PCIexpress, while we are still on PCI-X and PCI64.

My team spent weeks checking and testing PCI SATA controllers, signaling and fixing bugs, and I know, we'd better avoid SSDs; I do believe SSDs can have the same "shuffle" effect unless you seriously activate SSD specific support in the kernel.


Anyway, now the point is: to stay with ext3, or to move to XFS or RaiserFS?

Backups will be planned as usual: weekly!
« Last Edit: October 22, 2019, 10:01:13 pm by legacy »
 

Offline sokoloff

  • Super Contributor
  • ***
  • Posts: 1799
  • Country: us
Re: catastrophic event, file recovery on ext3
« Reply #35 on: October 22, 2019, 10:19:26 pm »
HPPA, as in the pre-Itanium HP-PA?
That HP stopped selling about 10 years ago and dropped support for in datacenters in 2013?

In a case like that, you might want to stick with ext3 and backups.
If you're stuck on that platform, you might consider ISCSI or nfs targets hosted on a more modern platform.
I ran reiserfs for years, lost a few disks but never lost any data. When the Hans Reiser case broke and he was arrested, I figured that was the death of reiserfs as far as long term supportability went.

I switched to zfs about 2 years ago at home (first on FreeNAS, then on bare Linux). I like it and I'm happy to throw RAM at it for the security and performance (and hands-off ease, frankly) that it gives me. I ship backups locally with zfs send/receive and remotely a subset of mount points are backed to S3/glacier.
 

Offline legacyTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 4415
  • Country: ch
Re: catastrophic event, file recovery on ext3
« Reply #36 on: October 22, 2019, 10:37:08 pm »
HPPA, as in the pre-Itanium HP-PA?
That HP stopped selling about 10 years ago and dropped support for in datacenters in 2013?

Yup. Even if it was announced decommissioning in 2008, it last until 2013.
But, this stuff had HPUX, while Linux has never been stable, neither supported.

So, in this combination, you like it only if you like wild stuff :D

If you're stuck on that platform, you might consider ISCSI or nfs targets hosted on a more modern platform.

Yup, iSCSI is in our plans :D


But first, we have to fix the evil Bash-shell which caused this mess.
 

Offline I wanted a rude username

  • Frequent Contributor
  • **
  • Posts: 627
  • Country: au
  • ... but this username is also acceptable.
Re: catastrophic event, file recovery on ext3
« Reply #37 on: October 22, 2019, 11:58:41 pm »
SSD [...] if you format it ext3 you have the same behavior as on an electromechanical disk.

Not for deletion. If Linux detects that a device is an SSD it will periodically trim it (send it TRIM/UNMAP commands for the deleted ranges), causing the device itself to erase deleted extents. This is the default behaviour ... and you probably don't want to disable it, because then performance degrades (forcing erase-before-write mode).

If that disk were an SSD, you would not be finding any old deleted files on it now.
 

Offline legacyTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 4415
  • Country: ch
Re: catastrophic event, file recovery on ext3
« Reply #38 on: October 23, 2019, 03:36:25 am »
Quote from: I wanted a rude username link=topic=215246.msg2750384#msg2750384
If Linux detects that a device is an SSD it will periodically trim it (send it TRIM/UNMAP commands for the deleted ranges)

"If it detects" means the kernel driver needs specific support, and this is precisely what does not work with x86 sata controllers on non-x86 machines(1)

This usually needs some "bios_setup" part made in the firmware, and it's critical because the Linux kernel expects the initialization of the controller already done by the BIOS, with a couple of BIOS_extension parts written in x86 and installed inside the controller into the ROM. All of these things suck on HPPA and MIPS, so SSDs are managed in "compatibility mode".

(1) perhaps, except ARM and POWER9. The last one mostly because it milks money from DARPA, hence it might have this things worked around, somehow ... but it seems they prefer AIX.

edit: when we ask to THE people with skills, the typical answer is
Quote
If you want our support, we need to talk about my paid-time
Here you can understand why non-x86's there is so little support: not because there are technical problems, but rather because hobbysts do not have enough skills to fix the stuff, while THE people with competence want money, a lot of money.

In short, because "if you are good at something, never do it for free"   :D
« Last Edit: October 23, 2019, 04:05:40 am by legacy »
 

Offline legacyTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 4415
  • Country: ch
Re: catastrophic event, file recovery on ext3
« Reply #39 on: October 23, 2019, 03:48:55 am »
This is the default behaviour ... and you probably don't want to disable it, because then performance degrades (forcing erase-before-write mode).

Yup, and it's one of the reasons why we are dealing with electromechanical hard drives. SSDs downgraded into "compatibility mode" have a very bad performance response, hence they are not worth the money, compared to electromechanical hard drives.

non-x86 workstations have a lot of limitations, especially with Linux.
 

Offline I wanted a rude username

  • Frequent Contributor
  • **
  • Posts: 627
  • Country: au
  • ... but this username is also acceptable.
Re: catastrophic event, file recovery on ext3
« Reply #40 on: October 23, 2019, 04:07:59 am »
x86 sata controllers on non-x86 machines

And I thought I was a masochist. True though, there just isn't the open source community support for workstations. Alphas were the same, back in the day.

Hopefully RISC-V based computers will fill the "powerful, popular, and not x86" gap in the next decade ... but they won't give us that nice big-endian flavour.
 

Offline Whales

  • Super Contributor
  • ***
  • Posts: 1900
  • Country: au
    • Halestrom
Re: catastrophic event, file recovery on ext3
« Reply #41 on: October 23, 2019, 11:01:35 am »
If Linux detects that a device is an SSD it will periodically trim it (send it TRIM/UNMAP commands for the deleted ranges),

N.B. I do not believe this is entirely true.

Linux won't auto-trim unless the filesystem is mounted with 'discard' option (and uses a filesystem that supports this feature, ext3 isn't one IIRC).  Some distro install/setup scripts may add this mount option to fstab automatically and/or schedule manual fstrim runs in userspace instead (eg via cronjob or systemd-timer-lizard thingy).

Otherwise: TRIM is a completely opt-in procedure.
« Last Edit: October 23, 2019, 11:05:11 am by Whales »
 

Offline I wanted a rude username

  • Frequent Contributor
  • **
  • Posts: 627
  • Country: au
  • ... but this username is also acceptable.
Re: catastrophic event, file recovery on ext3
« Reply #42 on: October 23, 2019, 11:55:48 am »
Some distro install/setup scripts may [...] schedule manual fstrim runs in userspace instead (eg via cronjob or systemd-timer-lizard thingy).

Yes ... perhaps I should have said "GNU/Linux".   ;D

Code: [Select]
$ systemctl status fstrim.timer
● fstrim.timer - Discard unused blocks once a week
   Loaded: loaded (/lib/systemd/system/fstrim.timer; enabled; vendor preset: enabled)
   Active: active (waiting) since Thu 2019-09-19 16:56:48 AEST; 1 months 3 days ago
  Trigger: Mon 2019-10-28 00:00:00 AEDT; 4 days left
     Docs: man:fstrim
 

Offline Whales

  • Super Contributor
  • ***
  • Posts: 1900
  • Country: au
    • Halestrom
Re: catastrophic event, file recovery on ext3
« Reply #43 on: October 23, 2019, 08:51:03 pm »
Code: [Select]
$ systemctl status fstrim.timer
● fstrim.timer - Discard unused blocks once a week
   Loaded: loaded (/lib/systemd/system/fstrim.timer; enabled; vendor preset: enabled)
   Active: active (waiting) since Thu 2019-09-19 16:56:48 AEST; 1 months 3 days ago
  Trigger: Mon 2019-10-28 00:00:00 AEDT; 4 days left
     Docs: man:fstrim

Thanks for the job snippet -- I would have thought they would schedule it daily, but I guess it doesn't matter too much.  A week should give you enough headroom to notice a data-loss problem and take things offline.

Quote
Yes ... perhaps I should have said "GNU/Linux".   ;D

No no, it's systemd/Linux these days  :P

Offline I wanted a rude username

  • Frequent Contributor
  • **
  • Posts: 627
  • Country: au
  • ... but this username is also acceptable.
Re: catastrophic event, file recovery on ext3
« Reply #44 on: October 23, 2019, 09:11:33 pm »
No no, it's systemd/Linux these days  :P

This physically hurt to read. :'(
 
The following users thanked this post: Ampera

Offline legacyTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 4415
  • Country: ch
Re: catastrophic event, file recovery on ext3
« Reply #45 on: October 28, 2019, 12:14:24 pm »
But ... I was not so lucky to find the folder "/home/project/DT11", probably because overwritten by the resync. At least, the inode that describes the folder "project" is gone, but other inodes might still be there.
I believe ext3 journals writes to directory data blocks so full information about older revisions of this directory should be found in the journal.
So start with reviewing older revisions of /home inode. See what directory data blocks it points to (may be the same blocks as the current revision of /home). Find older revisions of those blocks in the journal. They will point you to your file inodes.

Of course no guarantee that file data are not overwritten.

In parallel, we cloned the harddrive for a second mac-mini/x86 which is still processing.

sort of for inode={2 to ....}, process(inode)

So, this approach is very very slow.
 

Offline legacyTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 4415
  • Country: ch
Re: catastrophic event, file recovery on ext3
« Reply #46 on: October 28, 2019, 12:25:19 pm »
Two tools have been developed during this story:
  • a text viewer (similar to hexedit, but *text* and) able to open a giant file of 40GByte, processing data only if they look "printable" "strings"
  • a pattern matching searcher (similar to grep but) able to work within a region (given start address and end address regarding the seek position in the input big file, and able to) grab stuff when it sees a matching pattern, and stop grabbing when it sees a stop pattern

example
grab between 0x410FAD9A1 and 0x0x610FAD9A1, everything that begins with "/*" untill you see '\0'

This generated some local files, mostly are false positive, but this way it extracted the 90% of lost C files.
 
The following users thanked this post: I wanted a rude username

Offline legacyTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 4415
  • Country: ch
Re: catastrophic event, file recovery on ext3
« Reply #47 on: October 28, 2019, 12:45:50 pm »
Thank everyone for the support. It's much more than appreciated  :D
 

Offline StillTrying

  • Super Contributor
  • ***
  • Posts: 2850
  • Country: se
  • Country: Broken Britain
Re: [solved] catastrophic event, file recovery on ext3
« Reply #48 on: October 28, 2019, 01:49:35 pm »
You're lucky there legacy, I was just about to quote your troll and log post. :)
It took me ages to decide where it was from, and the timestamps were out of order. :-//
.  That took much longer than I thought it would.
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf