Author Topic: DOS vs. Linux (Read 24795 times)

Nominal Animal · « **Reply #100 on:** December 05, 2020, 11:31:00 pm »

Quote from: dunkemhigh on December 05, 2020, 09:30:07 pm

But is there something that already does all that?

Normal users should probably just use
tar -cSJf $(date +%Y%m%d)-$(id -un).tar.xz -C ~/ .
to take a backup of their home directory and all its contents. Because of the compression, it will be slow. If you want it fast, omit compression (and sparse file detection):
tar -cf $(date +%Y%m%d)-$(id -un).tar -C ~/ .

Quote from: dunkemhigh on December 05, 2020, 09:43:45 pm

Quote
Copy on write snapshots solve this.
Are these the same as the Windows volume shadow service?

Don't know, don't use windows.

But you can indeed continue using LVM volumes after the snapshot without affecting the snapshot, as long as the storage device has sufficient space for the changes after the snapshot.

Quote from: dunkemhigh on December 05, 2020, 09:43:45 pm

Quote
BTW, is it really that slow to populate an empty filesystem with files from a tar archive, compared to restoring a low level image? On Linux?
I would think there is a difference since you couldn't really do faster than blatting sequential sectors onto a disk,

You obviously haven't used Linux and tar, and are basing your assumptions on how Windows behaves. That is... unworkable.

On my system, tarballing Supertux2 sources (various types of files, binaries, images, lots of sources; about 8500 files, 1.1G of data) takes 4.2 seconds to tarball and 4.1 seconds to extract, with cold caches, if using uncompressed archives. Of course, I'm using a fast Samsung 512G SSD (MZVLW512HMJP-000H1) for the storage. (Note: I do mean cold caches, clearing out both metadata and cached file contents before and after compression and before and after decompression, using sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches ; sync ; echo 3 > /proc/sys/vm/drop_caches' .

tar is pretty good at extracting files in continuous manner, so it does not really "jump around". The metadata updates are cached by the kernel page cache, and won't immediately hit storage. Because of filesystem structure, you won't get raw block device speeds, but the difference is minor for the typical filesystems (ext4, xfs, zfs). And, if you use any sort of compression, the compression and decompression will typically be the speed bottleneck. If the backup is on a slow media, like a DVD-RW or similar, or comes via a non-local network connection, you'll want to use decompression, because the data transfer rate will be the bottleneck. (You can safely pipe tar data through a SSH pipe, both ways. Done that too for such backups.)

Quote from: rstofer on December 05, 2020, 11:26:50 pm

I don't think that Linux has anything like this. For large servers, it seems like a requirement.

I just explained LVM does that and more. If you are saying that Linux needs GUI tools so that monkeys can use the LVM utilities, then ...

... Let's just say that everyone who is tasked to manage actual servers should know how to use and automate LVM. It isn't complicated at all, if you are familiar with Linux admin tools.

brucehoult · « **Reply #101 on:** December 05, 2020, 11:31:19 pm »

Quote from: SparkyFX on December 05, 2020, 02:03:16 pm

Often i find myself in front of systems on which installing additional packages is a problem, so making most use out of a basic system is an advantage.

And this is why everyone needs to know vi (not even vim) at least at a basic level :-( (try "esc esc :q!", folks)

Quote

People that just want to try out such commands or need some functions on windows might want to check out cygwin.

NOOOO PLEEASE NOOOOO.

Cygwin is just endless pain.

And absolutely zero point to it now that you have WSL, which with WSL2 even performs really well as long as you stick to the Linux filesystem and don't want to access USB devices (this is the suckiest part .. WSL1 works for that)

brucehoult · « **Reply #102 on:** December 05, 2020, 11:45:05 pm »

Quote from: Nominal Animal on December 05, 2020, 11:31:00 pm

Normal users should probably just use
tar -cSJf $(date +%Y%m%d)-$(id -un).tar.xz -C ~/ .
to take a backup of their home directory and all its contents. Because of the compression, it will be slow. If you want it fast, omit compression (and sparse file detection):
tar -cf $(date +%Y%m%d)-$(id -un).tar -C ~/ .

Try lz4 if you want a decent bit of compression and speed as well. Not as good compression as gzip or xz or bzip2 obviously, but a heck of a lot faster, and plenty enough to compress the blank sectors and text files.

Quote

On my system, tarballing Supertux2 sources (various types of files, binaries, images, lots of sources; about 8500 files, 1.1G of data) takes 4.2 seconds to tarball and 4.1 seconds to extract, with cold caches, if using uncompressed archives. Of course, I'm using a fast Samsung 512G SSD (MZVLW512HMJP-000H1) for the storage. (Note: I do mean cold caches, clearing out both metadata and cached file contents before and after compression and before and after decompression, using sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches ; sync ; echo 3 > /proc/sys/vm/drop_caches' .

tar is pretty good at extracting files in continuous manner, so it does not really "jump around". The metadata updates are cached by the kernel page cache, and won't immediately hit storage. Because of filesystem structure, you won't get raw block device speeds, but the difference is minor for the typical filesystems (ext4, xfs, zfs). And, if you use any sort of compression, the compression and decompression will typically be the speed bottleneck.

Try it with lz4 :-)

PlainName · « **Reply #103 on:** December 05, 2020, 11:49:45 pm »

Quote

VSS is quite a package but I doubt that many single-user systems are running it.

AFAIK, if you're running Windows you have VSS available. Users, per se, don't touch it - it's the apps that get involved.

However, before VSS was widely available, some backup apps implemented their own version. Indeed, the wonderful and long-lifed DriveSnapshot still allows you to choose either VSS or their proprietary solution.

magic · « **Reply #104 on:** December 05, 2020, 11:55:33 pm »

Quote from: rstofer on December 05, 2020, 11:26:50 pm

VSS is quite a package but I doubt that many single-user systems are running it. I'm not sure how they can limit the shadow copy creation to 10 seconds but they must have a scheme.

The relevant ~~propa~~documentation is incredibly confusing. There is no way they can back up an NTFS volume of any serious size in 10 seconds. I suppose it means that it takes less than 10 seconds to create a COW snapshot on NTFS and then the rest of the process churns in the background while I/O is unfrozen and applications can continue.

Quote from: rstofer on December 05, 2020, 11:26:50 pm

I don't think that Linux has anything like this. For large servers, it seems like a requirement.

Filesystem snapshots.
What Linux doesn't have is a scheme to notify applications of the snapshot being taken so they can flush their buffers, but if your applications can't recover from an atomic snapshot then they can't recover from a hard reboot either, so

PlainName · « **Reply #105 on:** December 06, 2020, 12:08:12 am »

Quote

Normal users should probably just use
tar -cSJf $(date +%Y%m%d)-$(id -un).tar.xz -C ~/ .
to take a backup of their home directory and all its contents.

I have different sorts of backups for explicit data, but what I am specifically interested in implementing (well, having some app implement) is efficient disaster recovery. Whilst I accept that this method gets your data back, it's not quick and it's just your data. It's not really suitable for bare metal restore.

Quote

Quote from: dunkemhigh on Today at 10:43:45

Quote

BTW, is it really that slow to populate an empty filesystem with files from a tar archive, compared to restoring a low level image? On Linux?

I would think there is a difference since you couldn't really do faster than blatting sequential sectors onto a disk,

You obviously haven't used Linux and tar, and are basing your assumptions on how Windows behaves. That is... unworkable.

I am not a current Linux user, no. But I am basing my assumptions not just on Windows experience. (For future reference, I have previously built Linux, and the necessary cross-compiler and build tools, from sources, and additionally written Linux device drivers. I am not completely without Linux experience, it's just that I don't use it now and haven't for quite a while.)

Quote

On my system, tarballing Supertux2 sources (various types of files, binaries, images, lots of sources; about 8500 files, 1.1G of data) takes 4.2 seconds to tarball and 4.1 seconds to extract, with cold caches, if using uncompressed archives. Of course, I'm using a fast Samsung 512G SSD (MZVLW512HMJP-000H1) for the storage. (Note: I do mean cold caches, clearing out both metadata and cached file contents before and after compression and before and after decompression, using sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches ; sync ; echo 3 > /proc/sys/vm/drop_caches' .

Well that's jolly impressive, but I have to say it is somewhat missing the point. To achieve that 4.1 seconds of extraction you first need to install and setup your Linux system, no? You could use a live CD perhaps, but you'd still need to do a fair amount of messing about to get to the point where you can start streaming your backed-up system onto the disk.

The ideal would be to insert boot media, optionally browse for the restore source (could be on the network or in a removable drive slot), click restore. Come back with a coffee and your system is exaactly as it was before whatever disaster struck.

This isn't just me being lazy - every option or necessary thought during the process is an opportunity to screw up, and usually one is quite highly stressed at this time, so simpler and easier is better.

brucehoult · « **Reply #106 on:** December 06, 2020, 12:13:21 am »

I don't have Supertux2 sources but I just tried it on an old llvm source tree of 988M I had lying around, using your cache clearing incantation. "find llvm | wc -l" gives 27402.

5:38.5 711417024 tar -cSJf llvm.tar llvm
0:26.5 763861086 tar -cSzf llvm.tar llvm
0:04.8 794145179 tar -cSf - llvm | lz4 >llvm.tar
0:03.7 981084160 tar -cSf llvm.tar llvm

And with hot caches (I won't bother with xz):

0:25.9 tar -cSzf llvm.tar llvm
0:02.1 tar -cSf - llvm | lz4 >llvm.tar
0:00.7 tar -cSf llvm.tar llvm

So lz4 saves 180 MB (19%) while adding only a little over one second to the time.

gzip only saves 30 MB more, while adding more than an extra 20 seconds.

xz saves another 50 MB over gzip, but is soooo slow that it's only worth considering for very slow transmission channels, or large numbers of downloaders.

(tests on 4.2 GHz 2990WX Zen 1+, which is not all that quick these days)

westfw · « **Reply #107 on:** December 06, 2020, 01:59:14 am »

Quote

Use
man -s 1 -k term

Thanks.It doesn't work on my Mac (which has a 2005 version of "apropos") :-( (but it does work on ubuntu, so it's still useful advice.)

Nominal Animal · « **Reply #108 on:** December 06, 2020, 02:12:53 pm »

Quote from: brucehoult on December 05, 2020, 11:45:05 pm

Quote from: Nominal Animal on December 05, 2020, 11:31:00 pm
Normal users should probably just use
tar -cSJf $(date +%Y%m%d)-$(id -un).tar.xz -C ~/ .
to take a backup of their home directory and all its contents. Because of the compression, it will be slow. If you want it fast, omit compression (and sparse file detection):
tar -cf $(date +%Y%m%d)-$(id -un).tar -C ~/ .

Try lz4 if you want a decent bit of compression and speed as well. Not as good compression as gzip or xz or bzip2 obviously, but a heck of a lot faster, and plenty enough to compress the blank sectors and text files.

Good idea; pity it hasn't been integrated yet into tar, and is not included in default installations. (I had to install liblz4-tools to get the command-line utilities.)
tar -cf archive.tar.lz4 -I 'lz4 -1' -C directory .
takes four seconds to compress, with compressed size about 80% of original (thus saving about 20% of space, with this kind of mixed binary/text content).
(At better compression rates, it takes 4.2 seconds on -2, 30 seconds on -3, with very modest increase in compression; -9 or maximum compression took 37 seconds, with still very modest increase in compression.)
As expected (because of LZ4 compression scheme), the decompression takes less than four seconds regardless of compression.

Thus, it is a very good idea to use lz4 compression instead of no compression at all. Just remember to install the necessary package first (from standard repository of your Linux distribution; package liblz4-tools in Debian derivatives). Because lz4 defaults to speed, to backup ones home directory one should use then
tar -cf $(date +%Y%m%d)-$(id -un).tar.lz4 -I lz4 -C ~/ .
and to restore (after creating and moving to the home directory)
tar -xf path-to-archive.tar.lz4 -I unlz4
and as shown, it will be blazingly fast.

Note that if you only want to extract a specific file or subdirectory, just list their relative paths (starting with ./) in the restore command. I personally like to create an index,
tar -tvf path-to-archive.tar.lz4 -I unlz4 > path-to-archive.list
so that one can use grep, less, more, or any text editor to look for specific files, and especially their timestamps, to see if a particular file is included in the archive.

Do note that if you will be storing the backups on particularly slow media, do consider using better compression (option 'z' for gzip, 'j' for bzip2, 'J' for xz), as the storage media read rate will be a bottleneck, and having higher compression means less data to transfer, and thus faster overall wall-clock time to restore.

Finally, if you have ample disk space, want the backup snapshot to take as little of your time as possible, but also want to compress it, you can create the tarball first, and then compress it afterwards (optionally niced down so compression is done only when the processor is otherwise idle enough), using
archive=$(date +%Y%m%d)-$(id -un).tar
tar -cf "$archive" -C ~/ .
after which you can continue using the files, and start compressing the archive using e.g.
nice xz "$archive"

Let's say you are a GUI user, and you decide that the Backups directory in your home directory will contain only backup files (and will itself not be included in backups), you only need the following script:

Code: [Select]

#!/bin/bash

# Directory under home directory containing the backups
BACKUPDIR="Backups"

# Subtree of home directory to be backed up.  Use "." for entire home directory tree.
SUBTREE="."

# Archive base name (no extension)
ARCHIVE="$(date +%Y%m%d-%H%M%S)-$(id -un)"

# Compressor command and its extension to be used.  You can leave these empty.
COMPRESSOR="xz"
COMPRESSEXT=".xz"

# Quiet silly error messages from standard error
exec 2>/dev/null

# Make sure home directory exists and can be entered
cd || exit 1

# If it does not exist, create the Backups directory
mkdir -m 0700 -p ~/"$BACKUPDIR"
if ! cd ~/"$BACKUPDIR" ; then
    zenity --title "Backup" --error --no-wrap --text 'Cannot create "Backups" directory in your home directory!'
    exit 1
fi

# Tell the user to close all applications (so that the snapshot will be coherent):
zenity --title "Backup" --question --ok-label "Continue" --cancel-label="Cancel" --no-wrap --text $"Please close all applications, so that\nthe snapshot will be coherent." || exit

# Show a pulsating progress bar while creating the backup.
if ! ERRORS=$(exec 3>&1 ; tar -cf "$ARCHIVE.tar" --exclude "./$BACKUPDIR" -C ~/. "$SUBTREE" 2>&3 | zenity --title "Backup" --progress --pulsate --auto-close --no-cancel --text $"Creating backup $ARCHIVE ..." 2>/dev/null) ; then
    rm -f "$ARCHIVE.tar"
    zenity --title "Backup" --error --no-wrap --text $"Backup failed:\n$ERRORS"
    exit 1
fi

if [ -n "$COMPRESSOR" ]; then
    # Show a pulsating progress bar while compressing the backup.
    if ! ERRORS=$(exec 3>&1 ; $COMPRESSOR "$ARCHIVE.tar" 2>&3 | zenity --title "Backup" --progress --pulsate --auto-close --no-cancel --text $"Backup completed.\nFeel free to run applications.\nNow compressing the backup ..." 2>/dev/null) ; then
        rm -f "$ARCHIVE.tar$COMPRESSEXT"
        zenity --title "Backup" --error --no-wrap --text $"Backup failed.\nUncompressed backup $ARCHIVE.tar still exists.\n$ERRORS"
        exit 1
    fi
else
    COMPRESSEXT=""
fi

# Display an info dialog showing completion.
zenity --title "Backup" --info --no-wrap --text $"Backup $ARCHIVE.tar$COMPRESSEXT successfully generated."
exit 0

If you save it temporarily as say script, you can "install" it to your desktop by running
install -m 0744 script ~/Desktop/Backup
It uses Zenity to provide GUI dialogs.

If we include the time it takes to mount and unmount a filesystem image, I do believe a script similar to above, using tar, will be less effort and less wall-clock time used. (Note that closing all applications is not necessary; it is just there to remind users that if they take a backup while applications are modifying files, the backup may not have a coherent copy of the file.)

With a bit of effort, one could easily modify the above script (or convert it to Python3 + Qt5, for example), to detect if removable storage with a backup-y volume name is mounted, and offer to copy the compressed backup there; plus options to adjust compression, exclude directories, combine them into profiles, and so on.

The problem is, we'd really need to investigate first what kind of practices and use patterns would make this utility not only useful, but something that provides value to the user. Most ordinary Linux users never take backups. Advanced users who are more likely to take backups, use their own scripts. So, the true problem at hand is not the code, but the usability of such tool. Sure, it would be easy to optimize it for one specific user – say myself –, and publish it in the hopes that others might find it useful, but when we are talking about non-expert Linux users, such a tool should encourage better practices. For example, it should be able to show the exact commands being run to the end user, so that if the user finds the tool not wieldy enough, they can adapt the commands to write their own scripts.

As always, this boils down to observing workflows and designing a better workflow, before implementing the tools. I really, really hate the way proprietary applications and OSes impose a specific workflow on the user, and make them believe that the tool should describe the workflow to the user; whereas it is always the user who is wielding the tool, and responsible for the results. Sure, some users just want something to click that should do an adequate job; but that is not what I want to facilitate. If you are using Linux to accomplish things, you should be trying to find ways of accomplishing those things efficiently. If not, you're better off using one of the proprietary OSes instead, as they do have a pattern for you to fit into. I don't wanna, and I don't want to bring that sort of stuff to Linux either. Others are free to do so, though.

DiTBho · « **Reply #109 on:** December 06, 2020, 02:26:27 pm »

There is also a LZ4-Bindings for Python, see here (homepage) and here (github)

p.s.
Python is great, for scripting I am also going to learn Ruby, so I hope in 2021 I will know enough about 3 scripting-languages, in this order:

bash
python
ruby

Nominal Animal · « **Reply #110 on:** December 06, 2020, 02:30:50 pm »

Quote from: dunkemhigh on December 06, 2020, 12:08:12 am

I have different sorts of backups for explicit data, but what I am specifically interested in implementing (well, having some app implement) is efficient disaster recovery. Whilst I accept that this method gets your data back, it's not quick and it's just your data. It's not really suitable for bare metal restore.

What is the disaster you wish to recover from, and how do you use your machine overall?

Me, I don't want to do bare metal backups on my workstation, because I know I can install any Linux distribution in twenty-thirty minutes, and because of the rate of development, I know I'll want to switch distributions in the future. Usually, I switch distributions only when I have multiple machines at hand, so I can try out various ones before deciding on which one. Then, I can restore either my complete user profile, or only my files, in seconds.

If you need a High Availability workstation, you're better off having two, syncing the active one to the backup one. That way you minimize your downtime.

If you do things that might mess up your storage device, but not the hardware in particular, use an external hard drive with a tarball or image of your operating system, and a separate partition for your user file tarballs. (The example script I showed earlier can be trivially expanded to use Zenity to show a dialog of which backup(s) to remove, if the partition is say 75% full or more. Or you can automate the entire thing, running of crontab, rotating the backups.)

You must understand that I am not denigrating your needs. Quite the opposite, I applaud everyone who thinks about backups and how they utilize them. It is good.

What I am concerned about, is that you are fixated on a backup workflow that works for you in Windows, and are insisting something similar to be used in Linux, and are using really odd'ish arguments why you insist on it. I am saying that that pattern is not workable, unless you implement it yourself, or buy an implementation off someone.
(The facilities already exist: LVM does cover your needs. What you would need to buy, is a high level, easy graphical interface. I know server farms have them, but haven't used them myself; and I and others have written command-line versions for our own servers. I am not aware of any GUIs intended for workstation use, but they might exist.)

Instead, you should think about what you need out of those backups. Instead of focusing on full filesystem imaging, if the 20-30 minute downtime is too much for you to install a fresh version of your Linux distro, then make a clean OS image without any user files, and a separate partition for user file tarballs. Trying to get full filesystem imaging to work for you in Linux like it does in Windows is doomed to failure; the OSes are that much different. In particular, a full filesystem image will always contain all users home directories. Perhaps you are the only user on your machine, but that is not true for all Linux workstations.

magic · « **Reply #111 on:** December 06, 2020, 02:42:11 pm »

I was wondering if there are tools like xfs_copy for other filesystems and found that somebody wrote a program which can efficiently low-level copy a whole bunch of various filesystems.

https://partclone.org/

It uses some weird format by default, but its disk-to-disk mode (-b) can be abused to create mountable sparse file images

Nominal Animal · « **Reply #112 on:** December 06, 2020, 05:11:45 pm »

After re-reading my own posts, I see my tone can be interpreted as hostile. Not my intention; I may be a bit frustrated, but not hostile/angry/peeved. Once again, me fail Eglish.
To fix, I shall try to outline how LVM2-based snapshotting and backups work, from an users perspective.

LVM2 is the Linux Logical Volume Manager. (The 2 at the end is usually omitted, since the first version of LVM is ancient, and no longer used.)
It comprises of a group of userspace tools, and a library that interfaces to the Linux kernel Device Mapper, which does the actual work.

You have one or more hardware storage devices, with at least one partition on each.
If you are using software RAID, the md (multiple device) kernel driver will use those partitions, and export the RAID devices (/dev/mdN) as the partitions actually used.
LVM combines one or more of these partitions into one or more physical volumes. (Run sudo pvdisplay to see current physical volumes.)

One or more physical volumes – across hardware devices – are combined into a Volume Group. You can have one or more Volume Groups. (Run sudo vgdisplay to see these. You may need to run sudo vgscan first, to scan all known LVM devices for Volume Groups first, especially if you just attached external media. It can take a bit of time.)

Each Volume Group contains one or more Logical Volumes. Each logical volume is seen as a partition, and contains a filesystem or swap space, is used as a raw partition for some kind of experiments, or is unused. (Run sudo lvdisplay to see these.) Logical Volumes can be resized within a Volume Group.

Do not use all space in a Volume Group for Logical Volumes. Snapshots use up some space in the Volume Group.

When Logical Volumes are mounted, the device path (as shown in e.g. df -h or du -hs or mount) will show /dev/mapper/VolumeGroup-LogicalVolume as the device.

To take a snapshot of a Logical Volume, you create a new logical volume using lvcreate (8):
sudo lvcreate -s VolumeGroup/LogicalVolume -L size -n NewLogicalVolume
This is atomic, and takes an exact snapshot of that Logical Volume at that point, and is okay to do even when it is mounted and active. You'll want to ensure that applications are not writing to files in that Logical Volume right then, or the files will obviously be in an intermediate state. (There are tricks on how to check if any writes have occurred after the snapshot, most being heuristic in the sense that they can give false positives, but not false negatives; quite reliable.)

Then, you can either mount the new Logical Volume (/dev/mapper/VolumeGroup-NewLogicalVolume, but with any dashes in NewLogicalVolume doubled; there is also copy-on-write /dev/mapper/VolumeGroup-NewLogicalVolume-cow device) and copy/compress/use its contents at your leisure, or save the image.

For storing the image on an external storage device, I recommend creating a partition the exact same size as the Logical Volume. Since the partition will be a normal filesystem, it is usually auto-mounted when connected. To update the backup, you can use the volume label to find the device corresponding to the partition:
label="foo" ; user=$(id -un) ; dev=$(LANG=C LC_ALL=C mount | awk '$2 == "on" && $3 == "'"/media/$user/$label"'" { print $1 }')
If current user has media with volume label foo mounted, its corresponding device will be in shell variable dev; otherwise the shell variable will be empty.
Then, unmount the device using
umount $dev
and copy the partition image over using e.g.
sudo dd if=/dev/mapper/VolumeGroup-NewLogicalVolume of="$dev" oconv=fdatasync,nocreat bs=1048576
Note that the device name is inserted by the current shell to the sudo command.
Finally, remove the no longer needed Logical Volume,
sudo lvremove -y VolumeGroup/NewLogicalVolume
and the snapshot is done.

To restore an existing image, find the device corresponding to the partition (again as $dev). Then, unmount the Logical Volume,
sudo umount /dev/mapper/VolumeGroup-LogicalVolume
restore the image,
sudo dd if="$dev" of=/dev/mapper/VolumeGroup-LogicalVolume oconv=fdatasync,nocreat bs=1048576
and remount the Logical Volume,
sudo mount /dev/mapper/VolumeGroup-LogicalVolume

That's the short version.

As you can surely see, there are many different ways of automating this. A simple udev rule and a shell script is could be used to prompt you when an external backup media is attached whether you want to back up current Logical Volumes to that media. For restoring, you'll want a command-line command (it's useful to add status=progress to dd commands in that case), since you might be in emergency command line, and in any case you'll want to unmount the Logical Volume (making GUI use, uh, difficult – workarounds exist, of course) for the restoration.

As to why I don't think this is a workable pattern for workstations, is that in my opinion, workstations are volatile and modular. A monolithic backup is too rigid.

If you partition your system so that /home is a separate Logical Volume, then image-based backups become viable. However, even then, I believe that instead of dd'ing the image itself, we should use tar (with or without compression, possibly niced and/or ioniced) to compress the filesystem contents to the external media. That way, the external media can contain multiple backups. With an index file (text file describing the file list inside the tar archive), one can trivially restore individual files or subdirectories, which can come in handy.

If the OS is on a separate Logical Volume or Volumes, then taking a snapshot after a clean install makes a lot of sense. When tar'ing the user files, you can also save the currently installed package list, output of
LANG=C LC_ALL=C dpkg -l | awk '$1 == "ii" { print $2, $3 }' > packages.list
so you can quickly upgrade an older image to newer by installing the difference in the packages – or if you get some sort of conflicts after installing a package, and purging the package doesn't fix it, you can compare incidental changes in the packages/versions.
Let's say packages.list contains the desired package list. Then, running
LANG=C LC_ALL=C dpkg -l | awk -v src=packages.list 'BEGIN { while (getline) { if ($1=="ii") pkg[$2] = $3 } while (getline < src) { if ($1 in pkg) { if (pkg[$1] != $2) printf "Update: %s from %s to %s\n", $1, pkg[$1], $2 } else { printf "Install: %s %s\n", $1, $2 } } }'
gives you a summary of which packages need updating/downgrading and which packages need installing.
This, of course, could be automated in the backup user interface.

A simple GUI utility would make a lot of assumptions on how the system is partitioned, that this is essentially a single-human-user system, and so on. Not a problem if you write your own GUI, but a problem when considering more general use backup tool.

golden_labels · « **Reply #113 on:** December 06, 2020, 08:10:30 pm »

Quote from: brucehoult on December 05, 2020, 11:45:05 pm

Try lz4 if you want a decent bit of compression and speed as well.

There is also zstd. libarchive version suports zstd since 3.4.3, GNU tar has zstd since 1.31.

Quote from: DiTBho on December 06, 2020, 02:26:27 pm

Python is great, for scripting I am also going to learn Ruby, so I hope in 2021 I will know enough about 3 scripting-languages, in this order:

Worth noting that they are not equivalent in terms of what they can be used for. E.g. bash is a shell and while it has great scripting capabilities, don’t get carried away while using it. Just because you can doesn’t mean you should.

People create amazing pieces of art like bashtop, but just like a 3D shooter written in gawk they should remain in the domain of art.

Nominal Animal · « **Reply #114 on:** December 06, 2020, 09:49:12 pm »

Quote from: golden_labels on December 06, 2020, 08:10:30 pm

Quote from: brucehoult on December 05, 2020, 11:45:05 pm
Try lz4 if you want a decent bit of compression and speed as well.
There is also zstd. libarchive version suports zstd since 3.4.3, GNU tar has zstd since 1.31.

I'm still running tar 1.29, and didn't think of checking the upstream man 1 tar manpage. Oops...

Quote from: golden_labels on December 06, 2020, 08:10:30 pm

Worth noting that [those scripting languages] are not equivalent in terms of what they can be used for. E.g. bash is a shell and while it has great scripting capabilities, don’t get carried away while using it. Just because you can doesn’t mean you should.

Fully agreed.

For the topic at hand, here are my typical use cases for different scripting languages:

bash: Shell scripts, automation.
In rare cases, I use zenity to provide prompts or progress bars. More commonly, just notify-send to inform of task completion.
python3: Graphical user interfaces, format conversion.
Python has very nice built-in libraries to handle e.g. CSV format, plus lots of modules for e.g. spreadsheet formats.
awk: Text-format data collation and summarizing.
Awk is designed for record and field-based data, and has fast associative arrays, and the syntax is relatively simple.
A good example of awk snippets I use is the snippet a couple of messages earlier that summarizes changes needed from currently installed Debian packages to a stored package list.
I also have a script named c written in bash and awk, for quick numerical calculations on the command line; like bc, but simpler.

For server-side scripts to be used on web pages, I've used both PHP and Python. I've written more stuff in PHP over the decades, but I do like Python more.
I have also written security-sensitive web scripting stuff in C, but only when required for proper privilege separation.

There are two patterns I'd like to see used more when doing shell scripting.

First is automagically removed temporary work directories:

Code: [Select]

#!/bin/bash
Work="$(mktemp -d)" || exit 1
trap "cd /tmp ; rm -rf '$Work'" EXIT

# Put temporary files in "$Work"

The exit trap means the temporary directory is removed whenever the script exits, even if interrupted by Ctrl+C. The path to the work directory, $Work, is evaluated when the trap is set, so even if you change "$Work" afterwards, the original temporary directory will be removed. The temporary directory will be under /tmp, or under whatever directory is set in environment variable $TMPDIR. The reason for cd /tmp is that if the working directory is in the temporary directory, deleting the tree could fail: changing to any directory known to be outside works here, and since the working directory is irrelevant when the script exists, it is completely safe to do here. I like to use cd /tmp to ease my OCD/paranoia; it's a relatively harmless place to run rm -rf in.

The second is passing paths safely from a find operation.
To execute a command for each file in the current directory and any subdirectories:

Code: [Select]

find . -type f -exec command '{}' ';'The single quotes ensure the special parameters, {} for the file name, and ; for the end of the command, are passed as parameters to find, and not interpreted by the shell as special tokens.
To run a command allowing multiple file names per command:

Code: [Select]

find . -type f -print0 | xargs -r0 commandTo pass the file names to Awk, use

Code: [Select]

find . -type f -print0 | awk 'BEGIN { RS="\0" ; FS="\0" } { path=$0; /* use path */ }'To pass the file names to a Bash while loop, use

Code: [Select]

IFS=$'\0'
find . type f -print0 | while read -rd $'\0' filepath ; do
    # use "$filepath"
done

or, if you don't mind running the while loop in a subshell,

Code: [Select]

find . -type f -print0 | ( IFS=$'\0' ; while read -rd $'\0' filepath ; do
    # use "$filepath"
done )

The latter is preferred, since IFS is the Bash variable that controls word splitting. Running the while loop in a subshell means word splitting in the parent shell is unaffected.

All these use ASCII NUL (zero byte) as the separator, and work without issues with all possible file names in Linux.
In Linux, file and directory names are opaque byte strings where only NUL ('\0', zero) and / ('/', 47) are not allowed. You can have a file named newline, for example: touch $'\n' in Bash.

Many new script-writers assume whitespace does not occur in filenames. That is fixed by putting the file name reference in double quotes. But what about file names that have a newline or some other control character in them, including ASCII BEL? Many Perl utilities fail with those, for basically no reason, causing all sorts of annoyances when you encounter filenames with odd characters in them.

I do like to additionally use export LANG=C LC_ALL=C in my shell scripts parsing the output of utility commands, because the locale can affect their output. The C locale is the untranslated, raw locale.

To pass the file names with last modification date and time, and size in bytes, to a Bash script, I use

Code: [Select]

find . -type f -printf '%TY-%Tm-%Td %TT %s %p\0' | ( IFS=$'\0' ; while read -rd $'\0' filepath ; do
    filedate="${filepath%% *}" ; filepath="${filepath#* }"
    filetime="${filepath%% *}" ; filepath="${filepath#* }"
    filesize="${filepath%% *}" ; filepath="${filepath#* }"
    # use $filedate, $filetime, $filesize, and "$filepath"
done )

To an awk script,

Code: [Select]

find . -type f -printf '%TY-%Tm-%Td %TT %s %p\0' |  awk 'BEGIN { RS="\0" ; FS=" " }
      { filedate=$1 ; filetime=$2 ; filesize=$3 ; filepath=$0 ; sub(/^[^ ]* [^ ]* [^ ]* /, "", filepath);
        /* use filedate, filetime, filesize, and filepath */
      }'

Note: I do not expect a beginner to know the above. I only wish they were more easily found, amid all the crap, when a beginner decides they want to do something like the above. The only "trick" here is in the last awk snippet, using a regular expression to remove the other fields from the path using the entire record ($0 in awk), but using the standard field separation for the normal fields. And perhaps the Bash and POSIX shell syntax for manipulating strings (${variable#pattern}, ${variable##pattern}, ${variable%pattern}, and ${variable%%pattern}).
For lack of a better place to put these examples, I posted them here.

PlainName · « **Reply #115 on:** December 06, 2020, 10:28:09 pm »

Quote from: Nominal Animal on December 06, 2020, 02:30:50 pm

Quote from: dunkemhigh on December 06, 2020, 12:08:12 am
I have different sorts of backups for explicit data, but what I am specifically interested in implementing (well, having some app implement) is efficient disaster recovery. Whilst I accept that this method gets your data back, it's not quick and it's just your data. It's not really suitable for bare metal restore.

What is the disaster you wish to recover from, and how do you use your machine overall?

Good question! And the answer is... I don't know. There is the obvious catastrophic disk loss (sadly, now more likely without warning than it used to be). However, a few times recently I've elected to recover from some situation (crap install of something, perhaps) by just restoring the system drive. Also a few times recently, recovering single files that I've accidentally, er, mislaid (in fact, I did this yesterday).

I think tailoring to a specific type of disaster might well be asking for trouble. A disk image covers pretty much any restore requirement, from 'oops, didn't meant to press that key' through to infrastructure flattening. Worst case, where even replacement hardware isn't available, booting it into a VM is pretty simple.

Note that my backups are disk images as a file. I've done the tapes and disk clones and all that stuff. Wouldn't consider any of them appropriate.

Not sure what you mean about using the machine overall. It is my primary machine so I do most things on it.

Quote

Me, I don't want to do bare metal backups on my workstation, because I know I can install any Linux distribution in twenty-thirty minutes, and because of the rate of development, I know I'll want to switch distributions in the future.

I don't want to do any restore at all either! Nevertheless, in 20-30 minutes I can be recovered from a disaster (accepting the hopefully lesser disaster of missing the data since the last backup) and be on my way again. You've essentially got to a position where you can start dropping your data onto your new OS - I accept that it would take me longer to go that route (which is one reason I wouldn't consider it on Windows), but there is a deeper thing to: I have maybe 200-300 apps installed and each is likely set up in a way that suits me.

When I get a new machine I tend to start again from scratch: install a new copy of the OS, install each app only as I need it. Nine months to a year down the line I am still probably dicking around setting up things the way I want, and referring to the backup of the old system to recover various settings and data. You can see the attraction of slapping in a disk and coming back a half-hour later to a system that doesn't need any of that.

Quote

If you need a High Availability workstation, you're better off having two, syncing the active one to the backup one.

That's a thought, but not appropriate - it's basically RAID on a machine level, so I'd still need backups.

Quote

You must understand that I am not denigrating your needs. Quite the opposite, I applaud everyone who thinks about backups and how they utilize them.

I understand that and appreciate the effort.

Quote

you are fixated on a backup workflow that works for you in Windows, and are insisting something similar to be used in Linu

Yes, entirely likely. However, it's something that's evolved over many years, if not decades, and which I've found to work reliably. Since no-one using Linux seems to do the same, there is clearly something I am missing. Nevertheless, I note that the typical last resort backup option is similar to what I do - look in the RPi forum, for instance, and backing up a system there is basically making a copy of the SDcard.

PlainName · « **Reply #116 on:** December 06, 2020, 10:39:00 pm »

Quote

you are fixated on a backup workflow that works for you in Windows

Actually, yes, that is a big influence. Essentially because you have to do an image if you want to backup the system drive and end up with a bootable recovery (file-and-folder isn't good enough). So in that sense, Windows demands an image, at least for that drive.

However, I think that just points to the most recoverable technique. Linux may not have such a requirement, but if all else fails then a disk image is as good as you could ever get. Starting from that point gets you to wherever you need to go, whereas starting from a more 'reasonable' point might not.

Another way to look at it is that an image is an offline (in the disk sense) RAID copy. You're just syncing the array periodically instead of in real time.

DiTBho · « **Reply #117 on:** December 07, 2020, 01:48:44 am »

Quote from: Nominal Animal on December 06, 2020, 09:49:12 pm

php

I am learning PhP mostly because it can be invoked by Webservers like Apache; I do find that PhP is a very powerful language that can also be used to script something, but I haven't yet understood how to properly debug sources.

How do you manage it?

Ruby looks also interesting, powerful and it's easier to be debugged. But I am not sure about that, just my first feeling about them based on some simple tutorials.

Nominal Animal · « **Reply #118 on:** December 07, 2020, 02:38:51 am »

Quote from: DiTBho on December 07, 2020, 01:48:44 am

How do you [debug PHP]?

You do not. (No, I'm not kidding.)

PHP is a very dangerous language. It used to be even more dangerous, what with Magic Quotes being enabled by default (but fortunately completely removed in 5.4.0, released in 2012).

You write the code in small sections, unit testing each part. Then you incrementally combine the parts, redoing the unit tests, and also testing the combined code. I use highly aggressive test inputs (containing embedded zeros, quotes, newlines of different formats, invalid UTF-8 sequences) for all form-processing code; basically stuff that not even skript kiddies have thought of yet.

Validating and vetting existing PHP code is hellish. You essentially have to rip it apart, and rewrite the functionality yourself, then compare the results to the original code, to be sure your understanding, the original developers' understanding, and the code all match.

A large part of the problem is that PHP has several ways of doing the same thing. It has objects and methods, but also a huge library of non-OO functions by default. Many PHP programmers mix these. Unless you have the same background as the original developer(s), you'll be constantly wondering why they did stuff that way, when this way would have been shorter/easier/more robust. That sort of thing gets on ones nerves, especially if you have a OCD/paranoid streak, which in my opinion is a prerequisite for anyone vetting security-sensitive code. (And all code handling user passwords or personal information is security-sensitive to me.)

When you discover the PHP command-line interpreter, you might think that hey, I can run my code under that, and gdb the process, with Python accessors to make sense of the PHP data structures and such, which would work, except that the command-line interpreter is not the same environment as the Apache module or the FastCGI interpreter. In Debian derivatives, they're even provided by separate packages, which means they may not even be the exact same version.

brucehoult · « **Reply #119 on:** December 07, 2020, 04:00:26 am »

Quote from: DiTBho on December 07, 2020, 01:48:44 am

Ruby looks also interesting, powerful and it's easier to be debugged.

I like Ruby much more than Python. It feels as if it was actually designed. Also, it has many of the goodies from Perl, but in a language with proper data structures and functions that is usable for programs longer than 10 lines.

The Python implementation is a bit faster than Ruby, but if that matters then you should probably be using C or Swift or Go or Rust anyway.

Nominal Animal · « **Reply #120 on:** December 07, 2020, 07:20:21 am »

There is still room for a new, efficient scripting language out there, that avoids Python's pitfalls, has proper thread support, but similar C/C++ interfacing capability from the get go; and if possible, as easy to embed as Lua. As Python has shown, a comprehensive standard library is a necessity.

There are all these "new and advanced" programming languages implementing complex new abstractions and touting new computing science innovations, but nobody is willing to do the real grunt work and really find out what works in existing languages, makes them powerful, or weak, and look at the existing codebases to see what sort of features are used; and to take all the good things, and try to avoid the bad things, to design something provably better than existing languages. I mean, brute force objective analysis, forgetting about ones own preferences. It's nothing glamorous, just hard grunt work and lots of it, but the results would beat the new-language-du-jour by a mile, methinks.

We don't seem to appreciate that sort of design-to-last mentality anymore

.

brucehoult · « **Reply #121 on:** December 07, 2020, 11:05:05 am »

People *have* designed such languages, but the problem is they don't get any traction unless they find some "killer app", in which case the shittiest language in the world (PHP, JavaScript...) can take off.

DiTBho · « **Reply #122 on:** December 07, 2020, 11:18:05 am »

Yesterday I found a very nice book, available as PDF, epub e-readers, and mobi for Kindle readers.
It's priced $18.50 (USD) and can be ordered here

PKTKS · « **Reply #123 on:** December 07, 2020, 11:58:38 am »

Quote from: Nominal Animal on December 07, 2020, 07:20:21 am

There is still room for a new, efficient scripting language out there, that avoids Python's pitfalls, has proper thread support, but similar C/C++ interfacing capability from the get go; and if possible, as easy to embed as Lua. As Python has shown, a comprehensive standard library is a necessity.
(.)

Actually such lang. already exists.

It has been powering things under the hood for decades.

Incredible powerful light - properly written by people who
actually do the things properly.. and not by hidden interest..

Such lang. is frequently mocked like all proper NIX environ..
with those funky labels... mostly orienting "new" crappy
untested solutions... for retarded goonies..

Such lang. is called PERL.

Paul

DiTBho · « **Reply #124 on:** December 07, 2020, 12:46:10 pm »

What I need:

Automation Tasks
Web programming

Umm, Perl? I think I will add it to my list


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

Author Topic: DOS vs. Linux (Read 24795 times)

Share me