Are your backups up to date?

#50 Reply
Posted by macboy on 22 Nov, 2017 14:38
Quote from: bd139 on 21 Nov, 2017 21:01
They actually didn’t notice they weren’t running and didn’t check for errors. Someone just went into the cupboard in the office; took the tape out and put another one in once a day. What happened between two consecutive events wasn’t their problem.
Related to this, I once suffered a major data loss while working at a large company (close to 100k employees at the time), where we used Unix machines at our desktops (HPUX based) and all file storage was, of course, over the network on servers. One day a filesystem got corrupted, affecting the home directory of myself and about 40 colleagues. The RAID didn't help since it was not a disk failure but filesystem issue, so they had to go to the tapes to restore. You can see where this is going. Well it turns out the daily and weekly incremental backups weren't working correctly (nothing was copied) so they needed to fall back to the most recent complete backup... from about 7 months earlier. Luckily the code repository is separate, but the affected users lost 7 months worth of data on their home directories including e-mails, documentation for active projects, personal research notes, unsubmitted code, etc. Though not quite as bad as simply losing everything, getting "rolled back" by several months is painful. The subsequent investigation revealed a great many other filesystems also running completely without any recent backup.

#51 Reply
Posted by Mr. Scram on 22 Nov, 2017 15:19
Quote from: bd139 on 21 Nov, 2017 21:01
They actually didn’t notice they weren’t running and didn’t check for errors. Someone just went into the cupboard in the office; took the tape out and put another one in once a day. What happened between two consecutive events wasn’t their problem.
That's what you get when you treat or pay people like drones. You get people that do their job and their job exactly.

#52 Reply
Posted by bd139 on 22 Nov, 2017 15:37
Sometimes drones is all you can get.

#53 Reply
Posted by Mr. Scram on 22 Nov, 2017 15:37
Quote from: bd139 on 21 Nov, 2017 16:43
I worked for a company once. Well I say worked for, but accidentally landed chief technical monkey position because I needed to eat. Turned out they had been blindly cycling tapes for about 2 years. When I reviewed the steaming turd I was landed with I noticed that the external VS160 DLT drive wasn't even connected to the server. The SCSI cable was down the back of the rack. I think someone had moved stuff around and left it like that. Still the tapes ejected and got inserted so they had the illusion of a backup
I've seen something similar. Someone must have gotten fed up with all the errors the backup software threw and checked or unchecked the right boxes to make them go away. The problem was that the backup jobs were reported to have been run completely, but no where it was mentioned that half the data was glossed over due to errors, which were no longer visible due to the checkboxes. So you get a "job successfully completed" message in the end, as all the required steps had been kicked off, and you could blissfully drink your morning coffee.

#54 Reply
Posted by Mr. Scram on 22 Nov, 2017 15:39
Quote from: bd139 on 22 Nov, 2017 15:37
Sometimes drones is all you can get.
Then you'd better make sure you train them properly, or give them the right checklists. If they all dance their appointed monkey dance, it's no problem, but someone needs to oversee the bigger picture.

#55 Reply
Posted by Tom45 on 22 Nov, 2017 15:42
Quote from: Mr. Scram on 22 Nov, 2017 15:39
Quote from: bd139 on 22 Nov, 2017 15:37
Sometimes drones is all you can get.
Then you'd better make sure you train them properly, or give them the right checklists. If they all dance their appointed monkey dance, it's no problem, but someone needs to oversee the bigger picture.

Sometimes the drone's boss is just a higher level drone.

#56 Reply
Posted by bd139 on 22 Nov, 2017 15:58
This is why I replaced all my drones with scripts and automation

#57 Reply
Posted by PlainName on 23 Nov, 2017 00:14
Quote
That's what you get when you treat or pay people like drones.

Nice soundbite, but I've had this when the 'drone' has been the owner of the business (and rich to boot). He wasn't dumb and wanted it to work, so I don't think this example (just one of many) fits your bon mot.

#58 Reply
Posted by Mr. Scram on 23 Nov, 2017 00:33
Quote from: dunkemhigh on 23 Nov, 2017 00:14
Quote
That's what you get when you treat or pay people like drones.

Nice soundbite, but I've had this when the 'drone' has been the owner of the business (and rich to boot). He wasn't dumb and wanted it to work, so I don't think this example (just one of many) fits your bon mot.
He wasn't treated or paid like a drone, so that seems to work out. Though I never claimed it fits your situation, or every situation even. Only the laws of nature get to do that, and we're not even sure of it being so.

#59 Reply
Posted by PlainName on 23 Nov, 2017 01:01
Quote
He wasn't treated or paid like a drone, so that seems to work out.

Hardly! Still have the religiously changed but still blank tapes, which I'd class as not working out

No, my point wasn't that this was an exception to the rule, but that perhaps the rule is arse about face. That is, the supposition is that treating someone like a drone gets you a bad service which, on the face of it, seems reasonable. But the ultimate drone has to be a computer, and that works out just fine (except when it doesn't). Typically, it fails to work because you've neglected to cover some situation in its programming, and maybe that's the problem with the drone thing. Make something explicitly part of the job and its covered.

So maybe the fix is to treat your drones as drones, being explicit in what they should do, and realising that if you haven't 'programmed' a situation it won't be covered. A failure is then a programming issue (i.e. you've failed the think things through enough to let your drone know how to react to a bad situation). If you tell 'em to change a tape every day and they do that but haven't checked to see if the backup has been done, that isn't their fault but yours for not making that check part of the job.

Which is not to say or imply that a 'drone' her is a brainless moron.

#60 Reply
Posted by Mr. Scram on 23 Nov, 2017 01:16
Quote from: dunkemhigh on 23 Nov, 2017 01:01
Hardly! Still have the religiously changed but still blank tapes, which I'd class as not working out

No, my point wasn't that this was an exception to the rule, but that perhaps the rule is arse about face. That is, the supposition is that treating someone like a drone gets you a bad service which, on the face of it, seems reasonable. But the ultimate drone has to be a computer, and that works out just fine (except when it doesn't). Typically, it fails to work because you've neglected to cover some situation in its programming, and maybe that's the problem with the drone thing. Make something explicitly part of the job and its covered.

So maybe the fix is to treat your drones as drones, being explicit in what they should do, and realising that if you haven't 'programmed' a situation it won't be covered. A failure is then a programming issue (i.e. you've failed the think things through enough to let your drone know how to react to a bad situation). If you tell 'em to change a tape every day and they do that but haven't checked to see if the backup has been done, that isn't their fault but yours for not making that check part of the job.

Which is not to say or imply that a 'drone' her is a brainless moron.
I meant the statement itself was working out, as it didn't apply to the boss. It only applies to drones, which this boss isn't. Though I don't think finding an exception changes much. I'm not trying to write the laws of physics.

I'm sure I addressed the rest of your comment a few posts back.

Quote from: Mr. Scram on 22 Nov, 2017 15:39
Then you'd better make sure you train them properly, or give them the right checklists. If they all dance their appointed monkey dance, it's no problem, but someone needs to oversee the bigger picture.

#61 Reply
Posted by PlainName on 23 Nov, 2017 02:56
Quote
I'm sure I addressed the rest of your comment a few posts back.

OK.

#62 Reply
Posted by Freelander on 23 Nov, 2017 14:08
Quote from: wilfred on 20 Nov, 2017 02:37
I just had an external drive die suddenly. I am now copying it back from a backup drive.
Do I have all my backups up to date ?

Oh yes...........
Absolutely up to date

Systems all backed up daily (incremental) and full every week - to Qnap TS453 Pro with 4 WD 3TB Red drives in Raid 5. Qnap backed up weekly to external drives - 2 x 6TB, 2 x 5TB. Main PC's also backed up weekly to 2.5 external SSDs. (Grandfather Father Son method). Main QNAP NAS backed monthly to 3 x 4TB WD external and stored at a friends.
On top of that also still have the TS212 with 2 x 6TB in JBOD and a 'few other Nas units with Raid 1 sets' knocking about . All units and PC systems protected by a couple of APC2200 UPS units.
Oh, and the PC's, all the windows user 'folders' are on seperate SSD's from the main OS drive. The OS is fully up to date and maintained that way -as are the programs. Many items are run sandboxed in VM's using VMware and also the QNAP Virtualisation station. Full antivirus / malware in real time and weekly run of 'Trend Housecall' every week or two .. also normal good common sense in usage and what you open or do not open. I have NEVER had a virus or Malware on a PC. I only use Android for watching movies on a tablet and do not have mobile data on my 'phone'. I avoid windows 10 like the plague it is . .. nasty nasty OS...

I was full time liveaboard 'yotty' for 10 years. Had ALL my life's backups of photos, videos, Movies, Music data etc etc etc stored on DVD's. Salt air and DVDs do not a nice combination make. DVD's are subject to insidious failure in storage - on dry land, let alone on board a 'yott'. (GENUINE(Verbatim are the bloody WORSE !). Once you lose such valuable data you do your best to ensure it will never happen again. A past life as a Network Security Manager (for the UK NHS for the Northern UK) leaves one a tad 'on the safe side' - a CISSP qualification also introduces 'slight ocd'

Backups and OCD often go hand in hand..... so my shrink says...............

On top of all that, the house is on a hill (no flooding) and built of concrete (unlikely to burn)..

I will ask my nurse for some more of my 'steady pills' now

I need to lie down............

Now, where did that wire go....................

#63 Reply
Posted by Mr. Scram on 23 Nov, 2017 14:16
How often do you test your backups?

#64 Reply
Posted by Freelander on 23 Nov, 2017 14:37
Quote from: Mr. Scram on 23 Nov, 2017 14:16
How often do you test your backups?
I don' t test the individual PC backups as there is no need. Nothing of great importance could be lost due to other backups providing backups
The NAS unit I am perfectly happy with as far as ANY (relatively) large raid 5 system can go. The raid is scrubbed every week. Recovery of a single drive failure large raid 5 array is always finger biting no matter what the machine used. The main raid is backed up at a folder / file level - NOT as an image. The PC's data is backed up to the NAS at a folder / file level (which is why the user directories are on a separate drive). The only 'sector image backups' are the individual PC's to the GFS SD units - and again - I can safely recover that from other areas. The GFS system always means there are 3 independent image copies with little date difference. (And again, this only applies to the PC's / Laptops) .
All the data is therefore stored on a file by file basis with multiple backups. There is simply no need to test. It is utterly pointless. Image backups are a different animal which is why I limit that to PC's -(and VM's actually).
All external drives are checked occasionally for any error using manufacturer's applications (non destructive).
There is simply no need to 'test' anything. Relying on RAID - even with drive failure ' fault tolerance' is very very bad karma. Raid is not failsafe. It MUST be backed up. There is a very real chance of raid not recovering effectively and this is directly related to the amount of data our drives can store. I am 99.9999999% happy with the reliability and robustness of my systems. Delirious in fact. . If it was a commercial enterprise I would 'consider' use a real time off site link storage / backup, however, it isn't and I dont. Off site storage 'by hand' is usually quite sufficient even for a moderately sized commercial enterprise -- depending of course on the format of the data and number of copies stored - a GFS method should be used in that case (grandfather father son).

#65 Reply
Posted by John Coloccia on 23 Nov, 2017 14:55
I have a Synology appliance set for Raid 1 that I backup to. I use Macrium Reflect for backups. I switched from Acronis (the business one, not the personal one) and dropped it because it was an absolute piece of junk that gave me headaches all the time. Macrium has been headache free.

I also have a SATA dock and I occasionally take an image on a drive and keep it as an offsite backup.

I use to have a backup plan that did a full backup every 2 weeks, differentials every 2 days, and incrementals every couple of hours. Now I just do a full backup every week or so. At some point, I'll kick off a differential every couple of days but haven't gotten around to it on my new system yet.

#66 Reply
Posted by Mr. Scram on 23 Nov, 2017 15:20
Quote from: Freelander on 23 Nov, 2017 14:37
I don' t test the individual PC backups as there is no need. Nothing of great importance could be lost due to other backups providing backups
The NAS unit I am perfectly happy with as far as ANY (relatively) large raid 5 system can go. The raid is scrubbed every week. Recovery of a single drive failure large raid 5 array is always finger biting no matter what the machine used. The main raid is backed up at a folder / file level - NOT as an image. The PC's data is backed up to the NAS at a folder / file level (which is why the user directories are on a separate drive). The only 'sector image backups' are the individual PC's to the GFS SD units - and again - I can safely recover that from other areas. The GFS system always means there are 3 independent image copies with little date difference. (And again, this only applies to the PC's / Laptops) .
All the data is therefore stored on a file by file basis with multiple backups. There is simply no need to test. It is utterly pointless. Image backups are a different animal which is why I limit that to PC's -(and VM's actually).
All external drives are checked occasionally for any error using manufacturer's applications (non destructive).
There is simply no need to 'test' anything. Relying on RAID - even with drive failure ' fault tolerance' is very very bad karma. Raid is not failsafe. It MUST be backed up. There is a very real chance of raid not recovering effectively and this is directly related to the amount of data our drives can store. I am 99.9999999% happy with the reliability and robustness of my systems. Delirious in fact. . If it was a commercial enterprise I would 'consider' use a real time off site link storage / backup, however, it isn't and I dont. Off site storage 'by hand' is usually quite sufficient even for a moderately sized commercial enterprise -- depending of course on the format of the data and number of copies stored - a GFS method should be used in that case (grandfather father son).
Having many copies of the original copy doesn't protect you from a lot of failure modes. You really do need to test whether you can recover relevant files from your backups on a regular basis. If you don't test, you don't know what you have. Your stacks of copies could be full of perfect files full of perfect garbage.

#67 Reply
Posted by Freelander on 23 Nov, 2017 15:29
Quote from: John Coloccia on 23 Nov, 2017 14:55
I have a Synology appliance set for Raid 1 that I backup to. I use Macrium Reflect for backups. I switched from Acronis (the business one, not the personal one) and dropped it because it was an absolute piece of junk that gave me headaches all the time. Macrium has been headache free.

I also have a SATA dock and I occasionally take an image on a drive and keep it as an offsite backup.

I use to have a backup plan that did a full backup every 2 weeks, differentials every 2 days, and incrementals every couple of hours. Now I just do a full backup every week or so. At some point, I'll kick off a differential every couple of days but haven't gotten around to it on my new system yet.
I believe Synology units are very well respected. ! . Do they not have some form of real-time 'sync' application ? - that may be useful. I only have experience of the QNAP Pro units for home use. The last commercial NAS units I used were Compaq Proliant external Raid units linked via Compaq failover- (serial port heartbeat link) - (and this wasn't really a true NAS as it used Windows NT4 server to link to the network. NT was an excellent system with odd numbered service packs providing you wore adult pampers.... that dates it ... ///
Anyways, back to the Synology.
There may be a method to use a sync program on the PC to keep your main designated data folders duplicated on the nas - (with the 'don't delete files on the nas when deleted on the host' checked). On the QNAP this (Qsync) syncs my D Drives (ssd with all user folders) to a 'homes' directory on the NAS. A schedule on the NAS syncs this to another area - a copy on the local nas and a copy to an external drive in my case . You could link that to an external drive if the sync software is similar to the qnap stuff. It does a good job I must confess. The nas spends 23 hours of most days in 'sleep' mode. It wakes at 10 in the morning for 1 hour to sync then sleeps again unless another main sync job is running. (System sleep schedule as opposed to drive sleep saves leccy and adds up over the year). The applications available for for the QNAP are excellent and most things are catered for, I am sure Synology is the same. A new 'hybrid backup' application is now provided with the QNAP O/S. This is excellent and now offers cloud backup which is great for my Google Drive and works with all major cloud based systems. Not looked at 'macrium' - I fully agree re Acronis - what a POS ! (and that is not 'point of sale!) .. Jeez, attempting to rid your PC of this /infection/ is a nightmare also- it is easier to get rid of herpes..... I use a nice little utility called EaseUS Todo free. VERY very good. ! Well chuffed with it. Lacks full scheduling but I always use it manually anyway. Only ever had to use the restore once for the main system drive after an SSD died and it was painless. The provided boot / recovery software does what it says on the tin.. totally free and no ads or strings attached. Never needed anything else. I must get a dock one day as I have a stack of odd drives knocking about. Have you found the dock reliable ?

edit - just to clarify as I didn't specify - EaseUS Todo is a BACKUP application. "EaseUS Todo Backup (free)" - superb piece of code - thoroughly recommended.

#68 Reply
Posted by Freelander on 23 Nov, 2017 16:13
Quote
Having many copies of the original copy doesn't protect you from a lot of failure modes. You really do need to test whether you can recover relevant files from your backups on a regular basis. If you don't test, you don't know what you have. Your stacks of copies could be full of perfect files full of perfect garbage.
Not at all .
It is not the gun, it's the gunner...
Static files are subject to hashing when written, as they are static by nature they will be flagged by hash corruption. A total non issue.
Dynamic files are also subject to hashing but of no use as a direct integrity check. The parent application should maintain this integrity check. - self corruption is therefore a non issue providing multiple date/state copies exist. A total non issue.
Dynamic files and the issue of 'user' corruption / invalid or incorrect data entry or deletion or 'insert and other 'PBCAK' here' is managed by standard dynamic file methodology and also a far FAR longer TBO regime - (time before overwrite - if you are unfamiliar)
Dynamic files that are not self managed for data integrity are also subject to extended TBO. Depending on needs and pockets and regulations TBO can be up of 5 years or more (7 in a lot of cases in the UK)
Again, with my system and my regime I am 99.999999% happy and secure. It is a total NON issue. Simply unnecessary. .

#69 Reply
Posted by Mr. Scram on 23 Nov, 2017 16:22
Quote from: Freelander on 23 Nov, 2017 16:13
Not at all .
It is not the gun, it's the gunner...
Static files are subject to hashing when written, as they are static by nature they will be flagged by hash corruption. A total non issue.
Dynamic files are also subject to hashing but of no use as a direct integrity check. The parent application should maintain this integrity check. - self corruption is therefore a non issue providing multiple date/state copies exist. A total non issue.
Dynamic files and the issue of 'user' corruption / invalid or incorrect data entry or deletion or 'insert and other 'PBCAK' here' is managed by standard dynamic file methodology and also a far FAR longer TBO regime - (time before overwrite - if you are unfamiliar)
Dynamic files that are not self managed for data integrity are also subject to extended TBO. Depending on needs and pockets and regulations TBO can be up of 5 years or more (7 in a lot of cases in the UK)
Again, with my system and my regime I am 99.999999% happy and secure. It is a total NON issue. Simply unnecessary. .
What happens when you RAM goes corrupt and slowly corrupts data over time, which shows itself after a while when the errors built up to critical mass? All the garbage is copied perfectly down the line, again and again, and your last clean backups will be months or even years ago. Not testing anything is setting yourself up for failure. You're not the first and won't be the last. In the end, there has to be a monkey checking to see if the recovery process output lines up with the input.

Besides, anyone not nervous about his backups is complacent and will fall eventually

#70 Reply
Posted by CJay on 23 Nov, 2017 16:41
Ahh, the NHS, they pay *really* well when they lose critical patient data because their backups failed to restore sensible data.

I think I went to Cornwall for two weeks in 5 star on one of those or was that the private cosmetic surgery place, I forget...

#71 Reply
Posted by Freelander on 23 Nov, 2017 16:58
Quote
What happens when you RAM goes corrupt and slowly corrupts data over time, which shows itself after a while when the errors built up to critical mass? All the garbage is copied perfectly down the line, again and again, and your last clean backups will be months or even years ago. Not testing anything is setting yourself up for failure. You're not the first and won't be the last. In the end, there has to be a monkey checking to see if the recovery process output lines up with the input.

Besides, anyone not nervous about his backups is complacent and will fall eventually
What happens ? - if you follow the perfectly standard and normal advice I have list above - nothing at all is what happens (ps - love the bit .. )
The methodology serves well in mission critical systems. - 999 Emergency services for example (which I personally have MANY years of experience managing). Oh, and there is not a single computer in use in that system with non ECC memory
The static stuff you have no worries with as I list before - if it is designated as static (ie - completed record) and has changed then it is flagged as problematic - if static and hash checked / verified it will be EXACTLY the same in 10 years as it was when written and verified. - A non issue.
If dynamic, follow the dynamic methodology as I listed above. Your points are then invalid - A non issue. (again )
The TBO is set for your tangible or perceived value and risk analysis - hence - a non issue.

You should also save up for ECC memory if data is being manipulated on a regular basis
Most issues are caused by lack of knowledge and lack of spending on appropriate infrastructure - nearly as many issues in fact as incompatible bloody updates on 'insert favourite data storage rip off software here'.
Lack of computing ability often means inability to verify or validate in real time - monkey turns off 'feature' instead of monkey's manager telling monkey's manager's manager to get his hands in his pocket for suitable hardware...
Lack of 'storage room' to store old outdated hardware in an air conditioned and safe environment, along with old outdated software and OS and updates/ patches to enable that very old TBO dusty data medium to actually RE-INSTALL / revert in the first place. Oracle and Citrix used to be the very worst culprits for hardware and OS specific (including SP / HF level) incompatibility and NO conversion application.
There are far far more REAL problematic areas for holistic system integrity and validity. McDonnell Douglas have at least stopped putting the big unprotected OFF switch on their latest iteration of what was a Unix Mini - switch placed just at the right height for MD monkey to turn it off with his knee... APC system wide UPS units now don't completely die and kill the system when the cleaner plugs the hoover into the red socket after spending 20 minutes levering out the block plug with a teaspoon...

Being 'nervous' about backups is perfectly normal. Being confident in ones systems is also perfectly normal and not mutually exclusive. If you drive to the IAM 'system (Institute of advanced motorists) standards - as in Police / Ambulance teachings, then you will never have an accident that is your fault. It doesnt mean you should not be nervous though......... 'tis the other buggers....

#72 Reply
Posted by bd139 on 23 Nov, 2017 17:07
My desktop PC (HP Z620 w/ CentOS 7) has ECC RAM. It's pretty essential for data integrity.

I'd be happier if they did Macs with them though. I'd pay more for it.

#73 Reply
Posted by Freelander on 23 Nov, 2017 17:10
Quote from: CJay on 23 Nov, 2017 16:41
Ahh, the NHS, they pay *really* well when they lose critical patient data because their backups failed to restore sensible data.

I think I went to Cornwall for two weeks in 5 star on one of those or was that the private cosmetic surgery place, I forget...
100% agree . Since they closed the specialist NHS Information Authority (NHSIA) of which I was a part of, and farmed it out to muppets to 'save a few quid' they have had HUGE issues, many of which are never reported as patients would run from the hospitals.....
(I must confess though the pay as a senior manager was really excellent and the golden handshake at 47 when they farmed it to 'outside contractors - at more than 3 times the cost in the first year !!!!! - was worth more than my house ! and then my pension at 50 .. was rather good of them... ) ...

The NHS 'mail' system was commissioned by the government and given to EDS - yes, the EDS that installed the Passport system that self imploded and disappeared up it's own rectum after depositing its crap everywhere. It must have been a good bribe or two.. against all advice and warning, in writing to the highest levels of Government, against all risk assessments, they gave it to bloody EDS !!! - what a joke- Ed's mail we called it.. a total sack of donkey's doop. 195 Million squids written off and not a penny recouped from 'Ed' - EDS are a joke.... The government 'upper monkeys have a total inability to write appropriate contracts, FULL statements of needs and penalty clauses. And these are the same idiots that are handling brexit.... Mohahahahahaha... we are all doomed I tell ya..

#74 Reply
Posted by bd139 on 23 Nov, 2017 17:17
That's funny. I have some family members who work close to NHS IT. It's a shitfest and a half. Glad I work in private sector finance.

They need to bring it all in house and run it like they ran Spine 2.