Author Topic: Forum Outage  (Read 55742 times)

0 Members and 1 Guest are viewing this topic.

Offline EEVblogTopic starter

  • Administrator
  • *****
  • Posts: 37730
  • Country: au
    • EEVblog
Forum Outage
« on: November 16, 2013, 10:29:46 pm »
The forum went down overnight.
There were some corrupted database tables, not sure of the cause yet. Hopefully nothing lost.
It's back up and running now thanks to gnif  :-+  :clap:
 

Offline digsys

  • Supporter
  • ****
  • Posts: 2209
  • Country: au
    • DIGSYS
Re: Forum Outage
« Reply #1 on: November 16, 2013, 10:34:25 pm »
Was just about to post - must have been monumental ! I was in the middle of scouring posts, then the crash.
No pings for an hour or so, then database corruption errors! Phewwww
Hello <tap> <tap> .. is this thing on?
 

Offline EEVblogTopic starter

  • Administrator
  • *****
  • Posts: 37730
  • Country: au
    • EEVblog
Re: Forum Outage
« Reply #2 on: November 16, 2013, 10:44:41 pm »
The server hard rebooted 9 hours ago and corrupted some database tables in the process. Not sure yet what caused the hard reboot.
 

Offline AF6LJ

  • Supporter
  • ****
  • Posts: 2902
  • Country: us
Re: Forum Outage
« Reply #3 on: November 16, 2013, 10:45:32 pm »
Glad to see it is back up...
I come here when things are slow over on QRZ (The Zed).
Sue AF6LJ
 

Offline kizzap

  • Supporter
  • ****
  • Posts: 477
  • Country: au
Re: Forum Outage
« Reply #4 on: November 16, 2013, 10:45:45 pm »
go gnif!
<MatCat> The thing with aircraft is murphy loves to hang out with them
<Baljem> hey, you're the one who apparently pronounces FPGA 'fuhpugger'
 

Offline samgab

  • Frequent Contributor
  • **
  • Posts: 423
  • Country: nz
Re: Forum Outage
« Reply #5 on: November 16, 2013, 10:49:49 pm »
...It's back up and running now thanks to gnif  :-+  :clap:

The Global Neuroscience Initiative Foundation helped you with this?  ???
 

Offline free_electron

  • Super Contributor
  • ***
  • Posts: 8517
  • Country: us
    • SiliconValleyGarage
Re: Forum Outage
« Reply #6 on: November 16, 2013, 11:24:05 pm »
mandatory eye roll 'linux'  tssss...
Professional Electron Wrangler.
Any comments, or points of view expressed, are my own and not endorsed , induced or compensated by my employer(s).
 

Offline Alana

  • Frequent Contributor
  • **
  • Posts: 297
  • Country: pl
Re: Forum Outage
« Reply #7 on: November 16, 2013, 11:34:31 pm »
Glad its running, i almost thought its some black magic happening.
Corrupted mysql happens all the time when you reboot any machine runing mysql - had it many times on linux servers and once on windoze so OS is not a problem.
 

Offline BFX

  • Frequent Contributor
  • **
  • Posts: 376
  • Country: sk
Re: Forum Outage
« Reply #8 on: November 16, 2013, 11:47:07 pm »
os has no problem, application has problem  ;D
 

Offline rolycat

  • Super Contributor
  • ***
  • Posts: 1101
  • Country: gb
Re: Forum Outage
« Reply #9 on: November 16, 2013, 11:54:39 pm »
mandatory eye roll 'linux'  tssss...
Someone's feeling trollish  >:D
 

Offline c4757p

  • Super Contributor
  • ***
  • Posts: 7799
  • Country: us
  • adieu
Re: Forum Outage
« Reply #10 on: November 17, 2013, 12:13:07 am »
mandatory eye roll 'linux'  tssss...

Nice one.

:-+
No longer active here - try the IRC channel if you just can't be without me :)
 

Offline Radio Tech

  • Frequent Contributor
  • **
  • Posts: 942
  • Country: us
  • KC4UMO Buddy
    • Hobby Forum
Re: Forum Outage
« Reply #11 on: November 17, 2013, 12:33:25 am »
Glad it is up.
 :-+

Glad to see it is back up...
I come here when things are slow over on QRZ (The Zed).

Hi Sue, good to see you

Offline baljemmett

  • Supporter
  • ****
  • Posts: 665
  • Country: gb
Re: Forum Outage
« Reply #12 on: November 17, 2013, 02:48:44 am »
Heh, I thought it seemed a bit quiet when I pulled up the unread posts, having not been around all day...
 

Offline dr.diesel

  • Super Contributor
  • ***
  • Posts: 2214
  • Country: us
  • Cramming the magic smoke back in...
Re: Forum Outage
« Reply #13 on: November 17, 2013, 02:58:10 am »
Wondered why I got so much done today...

Offline SeanB

  • Super Contributor
  • ***
  • Posts: 16276
  • Country: za
Re: Forum Outage
« Reply #14 on: November 17, 2013, 03:28:08 am »
mandatory eye roll 'linux'  tssss...
Unlike the monthly scheduled Windows reboot you mean. Can be mighty inconvenient on that second Tuesday when all the IIS servers do a simultaneous reboot. Linux server that does the real work stays up for months until you turn it off because the fans need cleaning or the hardware fails.
 

Offline walshms

  • Regular Contributor
  • *
  • Posts: 183
  • Country: us
Re: Forum Outage
« Reply #15 on: November 17, 2013, 03:38:16 am »
mandatory eye roll 'linux'  tssss...
Unlike the monthly scheduled Windows reboot you mean. Can be mighty inconvenient on that second Tuesday when all the IIS servers do a simultaneous reboot. Linux server that does the real work stays up for months until you turn it off because the fans need cleaning or the hardware fails.

I have a Linux cluster that's been up continuously for more than four years.  The one Windows VM running on it can't last a month without a reboot.

Odds are probably very good the outage was caused by something hardware-related causing the reboot and the database being caught by the reboot. 

Linux itself is stable... so stable that NASA relies on it almost exclusively, and the ISS is running nothing but Linux today; before the Shuttles stopped going there, they pulled all of the Windows machines off. 
 

Offline dr.diesel

  • Super Contributor
  • ***
  • Posts: 2214
  • Country: us
  • Cramming the magic smoke back in...
Re: Forum Outage
« Reply #16 on: November 17, 2013, 03:41:54 am »
Linux server that does the real work stays up for months.

I've got many Linux servers with uptimes > 4 years.  Hardware rotation is the only thing keeping longer durations.

Offline arekm

  • Supporter
  • ****
  • Posts: 165
  • Country: pl
Re: Forum Outage
« Reply #17 on: November 17, 2013, 05:57:14 am »
The server hard rebooted 9 hours ago and corrupted some database tables in the process.

Curse of myisam tables (corrupted tables when reboot happens without proper mysql shutdown). Switch to innodb if possible and that problem will be gone.

ps. and buy/setup some nagios monitoring service to notify early about the problems ;-)
 

Offline nessatse

  • Regular Contributor
  • *
  • Posts: 99
  • Country: za
Re: Forum Outage
« Reply #18 on: November 17, 2013, 07:34:36 am »
Curse of myisam tables (corrupted tables when reboot happens without proper mysql shutdown). Switch to innodb if possible and that problem will be gone.


If anyone still uses myisam in this day and age, they deserve what they get.  I am pretty sure Dave must be using innodb, although the corrupted table symptoms sound suspiciously like myisam... 
 

Offline jancumps

  • Supporter
  • ****
  • Posts: 1272
  • Country: be
  • New Low
Re: Forum Outage
« Reply #19 on: November 17, 2013, 09:06:29 am »
Youtube's revenge for the rant [:conspiracyemoticonhere:]
 

Offline Bloch

  • Supporter
  • ****
  • Posts: 453
  • Country: dk
Re: Forum Outage
« Reply #20 on: November 17, 2013, 09:11:52 am »
Youtube's revenge for the rant [:conspiracyemoticonhere:]


 :-DD
 

Offline rolycat

  • Super Contributor
  • ***
  • Posts: 1101
  • Country: gb
Re: Forum Outage
« Reply #21 on: November 17, 2013, 11:08:38 am »
The server hard rebooted 9 hours ago and corrupted some database tables in the process. Not sure yet what caused the hard reboot.
I used to work with a guy who when called by the operators that a system was down would instantly ask "Is there an engineer loose in the computer room with a screwdriver?" sometimes he'd substitute "two fingers up his nose".  Actually I think it was mostly the other way around.

I worked at a company where our nemesis turned out to be a cleaner with a rotary floor polisher. She was very conscientious and used to run the edges of the machine right under the back of the rack cabinets. Any insecure cables got flicked straight out of the equipment. Of course it was never anything critical with proper cable management, so the outage typically wasn't discovered until she was long gone.

 

Offline AF6LJ

  • Supporter
  • ****
  • Posts: 2902
  • Country: us
Re: Forum Outage
« Reply #22 on: November 17, 2013, 04:40:54 pm »
Glad it is up.
 :-+

Glad to see it is back up...
I come here when things are slow over on QRZ (The Zed).

Hi Sue, good to see you
Good to see Y'all too.... :)
Sue AF6LJ
 

Offline ProfDecoy

  • Newbie
  • Posts: 3
  • Country: us
Re: Forum Outage
« Reply #23 on: November 17, 2013, 05:24:19 pm »
I worked at a company where our nemesis turned out to be a cleaner with a rotary floor polisher. She was very conscientious and used to run the edges of the machine right under the back of the rack cabinets. Any insecure cables got flicked straight out of the equipment. Of course it was never anything critical with proper cable management, so the outage typically wasn't discovered until she was long gone.

No outages because the cleaners unplugged a server to plug in the vacuum or buffer?  That's another common one.

I was rather obsessive about keeping all the cables in my rack(s) neat and tidy.  Never enjoyed working on a rack that didn't have proper cable management.  In those cases, I used a lot of one-wrap velcro ties to hack my own cable management together.
 

Offline SeanB

  • Super Contributor
  • ***
  • Posts: 16276
  • Country: za
Re: Forum Outage
« Reply #24 on: November 17, 2013, 05:33:39 pm »
Had the guys who came in to clean a carpet do that, despite the red colour of the socket, the label on it, the fact they had to use a hammer to put the plug into the polarised socket ( breaking it in the process), and the regular AC power socket right next to it. Not good for the UPS to get an extra 20A load from the carpet cleaner, it tripped out and all the computers including the server went dead. The trip probably was the cause for the one power supply in the server dying a week later. At least I am good at cold starting the network...... Good thing as well I use only 5A breakers on all the UPS feeds, as they trip showing where to look for who plugged something in.
 

Offline BravoV

  • Super Contributor
  • ***
  • Posts: 7547
  • Country: 00
  • +++ ATH1
Re: Forum Outage
« Reply #25 on: November 17, 2013, 05:47:46 pm »
Letting an untrained/not monitored/not attended cleaner messed up the server room its like letting an ex rapist to clean the bed room not guarded while your mom/wife/girlfriend/daughter is sleeping on the bed.  :palm:

Offline SeanB

  • Super Contributor
  • ***
  • Posts: 16276
  • Country: za
Re: Forum Outage
« Reply #26 on: November 17, 2013, 06:23:44 pm »
Letting an untrained/not monitored/not attended cleaner messed up the server room its like letting an ex rapist to clean the bed room not guarded while your mom/wife/girlfriend/daughter is sleeping on the bed.  :palm:

This was on a feed from the UPS to the workstations on another floor. Central UPS and dedicated wiring for power to the computers.
 

Offline gnif

  • Administrator
  • *****
  • Posts: 1675
  • Country: au
Re: Forum Outage
« Reply #27 on: November 18, 2013, 06:46:52 am »
The outage was caused by a power failure at the data center, and yes, the db corruption was due to the cold reboot.
 

Offline eendje

  • Newbie
  • Posts: 9
Re: Forum Outage
« Reply #28 on: November 18, 2013, 11:58:07 am »
Hi Dave,

Glad it's working again, there is lots different penguins, I never heard about a repair penguin.

but it seems to be a cool bird  :clap: :clap: :clap:

Eendje
 

Offline gnif

  • Administrator
  • *****
  • Posts: 1675
  • Country: au
Re: Forum Outage
« Reply #29 on: November 19, 2013, 12:20:04 am »
The outage was caused by a power failure at the data center, and yes, the db corruption was due to the cold reboot.

Should I be surprised that they are not protected by a UPS and diesel generators? Is that an extra cost option?

Did you have to restore the DB from backups or just run uncommitted updates from a journal?

One would expect that they have that kind of equipment, they have not provided a reason for the outage. We were just able to repair the tables, it was just uncommitted updates.
 

Offline xrunner

  • Super Contributor
  • ***
  • Posts: 7513
  • Country: us
  • hp>Agilent>Keysight>???
Re: Forum Outage
« Reply #30 on: November 19, 2013, 12:22:54 am »
You know SMF is at v 2.0.6 now right? (this forum is still at 2.0.4)
I told my friends I could teach them to be funny, but they all just laughed at me.
 

Offline gnif

  • Administrator
  • *****
  • Posts: 1675
  • Country: au
Re: Forum Outage
« Reply #31 on: November 19, 2013, 03:42:20 am »
You know SMF is at v 2.0.6 now right? (this forum is still at 2.0.4)

Normally that is for Dave to handle, but I believe that this is 2.0.6 but due to the reboot it seems it is a little confused about it's version. When I find some time I will have a look into it.
 

Offline EEVblogTopic starter

  • Administrator
  • *****
  • Posts: 37730
  • Country: au
    • EEVblog
Re: Forum Outage
« Reply #32 on: November 19, 2013, 03:43:51 am »
You know SMF is at v 2.0.6 now right? (this forum is still at 2.0.4)

Normally that is for Dave to handle, but I believe that this is 2.0.6 but due to the reboot it seems it is a little confused about it's version. When I find some time I will have a look into it.

I just upgraded to 2.0.6
 

Offline EEVblogTopic starter

  • Administrator
  • *****
  • Posts: 37730
  • Country: au
    • EEVblog
Re: Forum Outage
« Reply #33 on: November 21, 2013, 02:16:14 am »
Looks like another power failure or whatever happened to the server again, along with another database corruption.
All fixed now.
 

Offline BravoV

  • Super Contributor
  • ***
  • Posts: 7547
  • Country: 00
  • +++ ATH1
Re: Forum Outage
« Reply #34 on: November 21, 2013, 02:20:58 am »
Is there any SLA or anything similar on the server uptime ? This is getting worst.  :-\

Offline EEVblogTopic starter

  • Administrator
  • *****
  • Posts: 37730
  • Country: au
    • EEVblog
Re: Forum Outage
« Reply #35 on: November 21, 2013, 02:51:03 am »
and it just a happened AGAIN 30 min later!
Two database tables this time.
The data centre really has some serious issues at present...
 

Offline EEVblogTopic starter

  • Administrator
  • *****
  • Posts: 37730
  • Country: au
    • EEVblog
Re: Forum Outage
« Reply #36 on: November 21, 2013, 02:57:07 am »
Is there any SLA or anything similar on the server uptime ? This is getting worst.  :-\

Like most, it's like 99.9% or something. But when shit happens, what do you do, change hosts? That would be foolish.
This server has been very good to me reliability wise.
 

Offline BravoV

  • Super Contributor
  • ***
  • Posts: 7547
  • Country: 00
  • +++ ATH1
Re: Forum Outage
« Reply #37 on: November 21, 2013, 03:00:47 am »
Yep, just saw it again and said something like forum member list or something is corrupted and crashed.  :o

Edit :
This -> "Table './eevblog_smp01/smf_members' is marked as crashed and should be repaired"
« Last Edit: November 21, 2013, 03:05:34 am by BravoV »
 

Offline digsys

  • Supporter
  • ****
  • Posts: 2209
  • Country: au
    • DIGSYS
Re: Forum Outage
« Reply #38 on: November 21, 2013, 03:07:55 am »
OUCH! Yup, saw something similar to that too .... maybe there's a dead forum member stuck in a post :-) ???
Hello <tap> <tap> .. is this thing on?
 

Offline nanofrog

  • Super Contributor
  • ***
  • Posts: 5446
  • Country: us
Re: Forum Outage
« Reply #39 on: November 21, 2013, 03:08:56 am »
FWIW, just got the following error message not long ago.

Quote
Table './eevblog_smp01/smf_members' is marked as crashed and should be repaired
 

Offline BravoV

  • Super Contributor
  • ***
  • Posts: 7547
  • Country: 00
  • +++ ATH1
Re: Forum Outage
« Reply #40 on: November 21, 2013, 03:11:30 am »
Not an expert on this matter, just wondering if the crash and recovered, but this keeps happening again and again, will this affect the overall SMF data's integrity in long run ?

Offline vk6zgo

  • Super Contributor
  • ***
  • Posts: 7585
  • Country: au
Re: Forum Outage
« Reply #41 on: November 21, 2013, 03:14:40 am »
Glad to see it is back up...
I come here when things are slow over on QRZ (The Zed).

Hi Sue!
 

Offline strangelovemd12

  • Regular Contributor
  • *
  • Posts: 102
  • Country: 00
Re: Forum Outage
« Reply #42 on: November 21, 2013, 03:18:44 am »
I just found this place a few days ago and I love it in every way, except the vBulletin flashbacks I'm getting at every corner.  The good news is that a Google search on the outage informed me that Dave tweets like a canary in orgasm.  Followed!
Please hit my ignorance with a big stick.
 

Offline Anks

  • Frequent Contributor
  • **
  • Posts: 252
  • Country: gb
    • www.krisanks.wordpress.com
Re: Forum Outage
« Reply #43 on: November 21, 2013, 03:28:43 am »
Is there any SLA or anything similar on the server uptime ? This is getting worst.  :-\

Like most, it's like 99.9% or something. But when shit happens, what do you do, change hosts? That would be foolish.
This server has been very good to me reliability wise.

I'm with you on this Dave. Changing host in my experience generally bring different issues and this forum isn't the worst for outages Ive seen.
 

Offline grumpydoc

  • Super Contributor
  • ***
  • Posts: 2905
  • Country: gb
Re: Forum Outage
« Reply #44 on: November 21, 2013, 08:10:33 am »
Quote
Like most, it's like 99.9% or something. But when shit happens, what do you do, change hosts? That would be foolish.
This server has been very good to me reliability wise.
A reasonable stance but it might be worth putting some thought into hardening the forum more against server outage since the next time it goes the database might not be repairable.

I'm sure you have good backups - have you investigated turning on full (data as well as metadata) journalling (might be a performance hit) on your filesystems or using a more robust database (as some suggested earlier in the thread)?
 

Offline nctnico

  • Super Contributor
  • ***
  • Posts: 26891
  • Country: nl
    • NCT Developments
Re: Forum Outage
« Reply #45 on: November 21, 2013, 08:50:45 am »
and it just a happened AGAIN 30 min later!
Two database tables this time.
The data centre really has some serious issues at present...
Maybe its time to get serious about software and hardware. Back in the old days when I was a sys-admin I more or less had this checklist for anything which needed to be reliable:
- Dell server or HP Proliant server with ECC memory
- Database system with journaling / automatic recovery (preferably a real database like Postgresql and most certainly not Mysql with myIsam tables)

I can imagine the database is tuned for performance and less for reliability but as this forum is your income you better tune for reliability even if that means setting up a second server to handle the load. Maybe even hire an expert to harden the database but I'd check the quality of the server hardware first.
There are small lies, big lies and then there is what is on the screen of your oscilloscope.
 

Offline arekm

  • Supporter
  • ****
  • Posts: 165
  • Country: pl
Re: Forum Outage
« Reply #46 on: November 21, 2013, 08:54:56 am »
Really try InnoDB instead of MyISAM (and there is always a way to go back if there are any problems).
 

Offline Clint

  • Regular Contributor
  • *
  • Posts: 119
  • Country: gb
Re: Forum Outage
« Reply #47 on: November 21, 2013, 09:04:45 am »
Move the whole thing to Amazon, its all I am working on now with superb results, never any worry about hardware :)
=-=-=-=-=-=-=-=-=
g33K5 L1k3 80085
 

alm

  • Guest
Re: Forum Outage
« Reply #48 on: November 21, 2013, 09:06:06 am »
Until recently MySQL didn't support full-text search in InnoDB tables, and I believe MariaDB still doesn't support it. This might be the reason for the MyISAM tables. But these crashes have been a good demonstration why ignoring the consistency from ACID, like MyISAM does, may be a bad idea for real databases.
 

Offline EEVblogTopic starter

  • Administrator
  • *****
  • Posts: 37730
  • Country: au
    • EEVblog
Re: Forum Outage
« Reply #49 on: November 21, 2013, 10:14:45 am »
Maybe its time to get serious about software and hardware.

How much more serious can I get about hardware? I already run a multi-hundred dollar/month dedicated top shelf server box with a top provider.
Arguing over which host provider is best, and which hardware is best is pretty pointless, ask 100 experts (I have!), and you get 100 different answers.
IME, no matter which host provider you go with, even the "redundant cloud" type, your site can go down.
I do daily automated full site and database backups to a remote site (need a new solution for this, as autositebackup.com are ceasing)
And really, if the forum is down for a few hours (rare), or even a few days (never happened), is it the end of the world?

Quote
- Database system with journaling / automatic recovery (preferably a real database like Postgresql and most certainly not Mysql with myIsam tables)

I believe I can only run what the SMF forum software uses?
« Last Edit: November 21, 2013, 10:16:26 am by EEVblog »
 

Offline EEVblogTopic starter

  • Administrator
  • *****
  • Posts: 37730
  • Country: au
    • EEVblog
Re: Forum Outage
« Reply #50 on: November 21, 2013, 10:43:11 am »
Do you expect to hear from the hosting company as to the true root cause of the server outages?

Of course not!

Quote
Do you have options under consideration that can give you some fallback to another server or that can insulate you from database corruption? Backups are necessary but if you experience a prolonged outage you might be glad you had a plan B.

I have 2 other servers (1 shared, 1 cloud) and can reroute the domain name in a few hours, and reinstall the backups with some work, but could be done in a day probably. Would not have the same performance, but would get the site back up and running.

But really, this is the eevblog forum, not a bank. How much effort and expense should I really go to ensure for every contingency?

 

Offline apelly

  • Supporter
  • ****
  • Posts: 1061
  • Country: nz
  • Probe
Re: Forum Outage
« Reply #51 on: November 21, 2013, 10:51:07 am »
But really, this is the eevblog forum, not a bank. How much effort and expense should I really go to ensure for every contingency?

Ah, the sweet voice of sanity!
 

Offline EEVblogTopic starter

  • Administrator
  • *****
  • Posts: 37730
  • Country: au
    • EEVblog
Re: Forum Outage
« Reply #52 on: November 21, 2013, 11:07:44 am »
Move the whole thing to Amazon, its all I am working on now with superb results, never any worry about hardware :)

Atlium did that. Their website and forum famously goes down every week (I'm not kidding)
 

Offline nctnico

  • Super Contributor
  • ***
  • Posts: 26891
  • Country: nl
    • NCT Developments
Re: Forum Outage
« Reply #53 on: November 21, 2013, 11:12:42 am »
Maybe its time to get serious about software and hardware.

How much more serious can I get about hardware? I already run a multi-hundred dollar/month dedicated top shelf server box with a top provider.
Arguing over which host provider is best, and which hardware is best is pretty pointless, ask 100 experts (I have!), and you get 100 different answers.
Yes and no. The best way is to ask for hands on experience with server parks and not their PCs at home. That usually reduces 100 opinions to about 3  ;) I put the best one can get on my list. That list is based on over 15 years of experience with setups which need to be very reliable. Anyway, databases don't corrupt themselves. That is like software getting a mind of its own. The usual suspect is a power outage or bad hardware. I've come across several cases where bad hardware caused all kinds of subtile errors like some particular software not working  or large compilation runs failing every now and then. A couple of years ago it took me two days to find out which memory module was bad in one of my PCs. But without any hard evidence its impossible to tell where and how a system goes wrong. Just assume hardware isn't 100% perfect and one out of every trillion of operations goes wrong. On a desktop PC this isn't much of a problem because it only runs for a couple of hours per day and if it starts to behave oddly then you click again or restart it. A server OTOH is running 24/7 so faults can accumulate so its necessary for key software and hardware components to have some self healing capabilities like ECC memory and a database which adheres to the ACID rules.
Quote
And really, if the forum is down for a few hours (rare), or even a few days (never happened), is it the end of the world?
Uhhh... :'(
Quote
Quote
- Database system with journaling / automatic recovery (preferably a real database like Postgresql and most certainly not Mysql with myIsam tables)

I believe I can only run what the SMF forum software uses?
It seems SMF forum also supports Postgresql.
There are small lies, big lies and then there is what is on the screen of your oscilloscope.
 

Offline AndersAnd

  • Frequent Contributor
  • **
  • Posts: 572
  • Country: dk
Re: Forum Outage
« Reply #54 on: November 21, 2013, 11:16:50 am »
And really, if the forum is down for a few hours (rare), or even a few days (never happened), is it the end of the world?
Heh, reminds me of this.  ;D

« Last Edit: November 21, 2013, 12:27:58 pm by AndersAnd »
 

Offline EEVblogTopic starter

  • Administrator
  • *****
  • Posts: 37730
  • Country: au
    • EEVblog
Re: Forum Outage
« Reply #55 on: November 21, 2013, 11:27:22 am »
Yes and no. The best way is to ask for hands on experience with server parks. That usually reduces 100 opinions to about 3  ;)

Fraid not!
Before I got my dedicated server, I asked many people involved in various aspects of the industry (and many fans are also in the pro server industry that offered pro advice), all with that kind of professional experience you also claim. I got more conflicting advice than I could poke a stick at. So in the end I went with what I knew and what seemed good enough. In the end most grungingly agreed that what I got was more than good enough for the job.

Quote
I put the best one can get on my list. That list is based on over 15 years of experience with setups which need to be very reliable.


Quote
Anyway, databases don't corrupt themselves.

No one has said they do. The problem has been identified as hard reset likely caused by some form of power loss.

 

Offline MrRedHat

  • Supporter
  • ****
  • Posts: 31
  • Country: us
Re: Forum Outage
« Reply #56 on: November 21, 2013, 11:46:12 am »
Fraid not!

Before I got my dedicated server, I asked many people involved in various aspects of the industry (and many fans are also in the pro server industry that offered pro advice), all with that kind of professional experience you also claim. I got more conflicting advice than I could poke a stick at.

I’ve been doing IT work for a long time and one thing that is for sure....

Opinions are like assholes. Everybody's got one and everyone thinks everyone else's stinks. :)
 

Offline Towger

  • Super Contributor
  • ***
  • Posts: 1645
  • Country: ie
Re: Forum Outage
« Reply #57 on: November 21, 2013, 05:22:14 pm »
Anyway, databases don't corrupt themselves.

Aaagh but they can...

I once had that problem will a newish release of a well know database, and they said it was impossible.  I managed to reproduce the problem in its simplest form and uploaded the source code to their forum.  I created a blank database. Added one simple table with 4 or 5 fields (had to be in the correct type/order).  Added a unique index on one field.  I then added 3 records and at this point it got interesting.  The next command was a simple SQL delete of one record on its unique primary key.  Then a simple Select * from the table.... and low and be hold only one record was left.!  Several people who downloaded and tried it got the same result, but when their own staff found it they initially acknowledged they could reproduce the problem, thread was then pulled from the forum and an update was soon released which did not mention it was a bug fix for my discovery... The feckers never even emailed my back to say they had fixed it...
 

Offline madires

  • Super Contributor
  • ***
  • Posts: 7754
  • Country: de
  • A qualified hobbyist ;)
Re: Forum Outage
« Reply #58 on: November 21, 2013, 06:11:48 pm »
I was curious about the outage and I found this link.
http://newswire.net/newsroom/financial/00078336-bluehost-service-out.html
There are probably dozens of similarly derived news stories.

How do you have power damage between the servers and the backup diesel generators? Without it being self inflicted.

Sounds like the common backup power problem. The diesel engines are run regularly, but with a dummy load attached. If there's a problem with the switching or wiring it won't be detected during a test run. And some data centers forget to upgrade their backup systems to meet customer demand, e.g. the intial backup infrastructure supports 1MW but the actual load is 1.5MW. Those selfmade outages happened at Amazon and all the others several times. The only way to detect them early and maybe prevent them is to take the risk of using the real servers as load instead of the dummy load during the regularly test runs.
 

Offline madires

  • Super Contributor
  • ***
  • Posts: 7754
  • Country: de
  • A qualified hobbyist ;)
Re: Forum Outage
« Reply #59 on: November 21, 2013, 06:43:08 pm »
Yes and no. The best way is to ask for hands on experience with server parks. That usually reduces 100 opinions to about 3  ;)

Fraid not!
Before I got my dedicated server, I asked many people involved in various aspects of the industry (and many fans are also in the pro server industry that offered pro advice), all with that kind of professional experience you also claim. I got more conflicting advice than I could poke a stick at. So in the end I went with what I knew and what seemed good enough. In the end most grungingly agreed that what I got was more than good enough for the job.

The only thing you can do is to filter out the bad hosters (poor service, cheap hardware, poor connectivity, lots of outages). Anything else is simply luck because all hosters will tell you that the server is located in a first class datacenter. And even the first class ones got outages as I wrote before in another post. The good point about the two EEVblog outages is that the datacenter is under huge pressure now to fix the backup power problem ASAP. And that means it's more unlikely to happen again in the near future. But there might be a planned outage (maintenance window) for required work.
 

Offline Stonent

  • Super Contributor
  • ***
  • Posts: 3824
  • Country: us
Re: Forum Outage
« Reply #60 on: November 21, 2013, 07:57:49 pm »
An old IBM retiree that I worked with for a few years used to say "Computers are an art, science."
The larger the government, the smaller the citizen.
 

Online IanB

  • Super Contributor
  • ***
  • Posts: 11859
  • Country: us
Re: Forum Outage
« Reply #61 on: November 21, 2013, 08:06:43 pm »
Just to mention, there were many reports of service disruptions and stealth DDOS attacks over the past day or so. So the server going down at the same time may not have been a coincidence...
 

Offline xrunner

  • Super Contributor
  • ***
  • Posts: 7513
  • Country: us
  • hp>Agilent>Keysight>???
Re: Forum Outage
« Reply #62 on: November 21, 2013, 08:28:57 pm »
Just to mention, there were many reports of service disruptions and stealth DDOS attacks over the past day or so.

Maybe it was a Rigol company task force / power supply division trying to shut him down.  :)
I told my friends I could teach them to be funny, but they all just laughed at me.
 

Offline Stonent

  • Super Contributor
  • ***
  • Posts: 3824
  • Country: us
Re: Forum Outage
« Reply #63 on: November 21, 2013, 11:52:08 pm »
Just to mention, there were many reports of service disruptions and stealth DDOS attacks over the past day or so.

Maybe it was a Rigol company task force / power supply division trying to shut him down.  :)

Nah I think it was FLIR.
The larger the government, the smaller the citizen.
 

Offline walshms

  • Regular Contributor
  • *
  • Posts: 183
  • Country: us
Re: Forum Outage
« Reply #64 on: November 22, 2013, 12:45:08 am »
I’ve been doing IT work for a long time and one thing that is for sure....

Opinions are like assholes. Everybody's got one and everyone thinks everyone else's stinks. :)

Too true!  :-+   Here's mine:  :blah:

What Dave has is good enough.  If he can't protect any better for power outages (and Dave, might be worth asking if you can put your own UPS there...) then there's going to be DB corruption incidents, and not much he can do about it. 

File system journaling only takes you so far... the server has to have a battery-backed cache, UPS and UPS monitoring software in addition to those to ensure graceful shutdown on power outages, and the server should be configured to automatically power on when power is restored.  With all of that, you stand a pretty good chance of never seeing the issue -- however, it's still not a guarantee. 

The cost of a 99.999% reliable system isn't justified here, no matter how much you suffer from withdrawal...  :scared:   ;D
 

Offline os40la

  • Regular Contributor
  • *
  • Posts: 122
  • Country: us
Re: Forum Outage
« Reply #65 on: November 22, 2013, 02:25:51 am »

Maybe it was a Rigol company task force / power supply division trying to shut him down.  :)

Close,

What really happend was that Dave's server was powered by Rigol's DP832 power supply with the rev1 board. google found out and remotely logged into it and shut off the fan. The second outage was when the backup with the rev2 board was yet again attacked be Google. They now have it running with the rev3 board. the one Dave now has. We should be good now..... :phew:
"No, but I did stay at a Holiday Inn Express"
 

Offline BravoV

  • Super Contributor
  • ***
  • Posts: 7547
  • Country: 00
  • +++ ATH1
Re: Forum Outage
« Reply #66 on: November 22, 2013, 02:35:54 am »
LOL ... just didn't expect this thread turned into, its so hilarious now.  :-DD

Offline Stonent

  • Super Contributor
  • ***
  • Posts: 3824
  • Country: us
Re: Forum Outage
« Reply #67 on: November 22, 2013, 02:41:31 am »
Where in Texas is this server located?
The larger the government, the smaller the citizen.
 

Offline xrunner

  • Super Contributor
  • ***
  • Posts: 7513
  • Country: us
  • hp>Agilent>Keysight>???
Re: Forum Outage
« Reply #68 on: November 22, 2013, 02:53:42 am »
Where in Texas is this server located?

Oh dear ...  :-\
I told my friends I could teach them to be funny, but they all just laughed at me.
 

Offline Stonent

  • Super Contributor
  • ***
  • Posts: 3824
  • Country: us
Re: Forum Outage
« Reply #69 on: November 22, 2013, 03:59:13 am »
Where in Texas is this server located?

Oh dear ...  :-\

What's that supposed to mean?

The larger the government, the smaller the citizen.
 

Offline Rigby

  • Super Contributor
  • ***
  • Posts: 1476
  • Country: us
  • Learning, very new at this. Righteous Asshole, too
Re: Forum Outage
« Reply #70 on: November 22, 2013, 04:22:10 am »
mandatory eye roll 'linux'  tssss...
Unlike the monthly scheduled Windows reboot you mean. Can be mighty inconvenient on that second Tuesday when all the IIS servers do a simultaneous reboot. Linux server that does the real work stays up for months until you turn it off because the fans need cleaning or the hardware fails.

I have a Linux cluster that's been up continuously for more than four years.  The one Windows VM running on it can't last a month without a reboot.

Odds are probably very good the outage was caused by something hardware-related causing the reboot and the database being caught by the reboot. 

Linux itself is stable... so stable that NASA relies on it almost exclusively, and the ISS is running nothing but Linux today; before the Shuttles stopped going there, they pulled all of the Windows machines off.

I have been running Windows servers since 1996, Linux since 1997, and FreeBSD since 2001.  I don't have problems with any of them.

Anyone can keep a server going if they want to be insecure and measure their OS quality solely with uptime stats.

Hearing or reading folks complain about Windows says little about Windows and a lot about the person doing the talking or typing.
 

Offline EEVblogTopic starter

  • Administrator
  • *****
  • Posts: 37730
  • Country: au
    • EEVblog
Re: Forum Outage
« Reply #71 on: November 22, 2013, 05:40:18 am »
Where in Texas is this server located?

I thought it was Provo Utah, but I think it's actually uses Softlayer:
http://support.hostgator.com/articles/hosting-guide/hardware-software/hostgator-data-centers
 

Offline Stonent

  • Super Contributor
  • ***
  • Posts: 3824
  • Country: us
Re: Forum Outage
« Reply #72 on: November 22, 2013, 06:17:20 am »
Where in Texas is this server located?

I thought it was Provo Utah, but I think it's actually uses Softlayer:
http://support.hostgator.com/articles/hosting-guide/hardware-software/hostgator-data-centers

Ok, I thought you mentioned one time that it was in Texas.
The larger the government, the smaller the citizen.
 

Offline free_electron

  • Super Contributor
  • ***
  • Posts: 8517
  • Country: us
    • SiliconValleyGarage
Re: Forum Outage
« Reply #73 on: November 22, 2013, 06:34:29 am »
Looks like another power failure or whatever happened to the server again, along with another database corruption.
All fixed now.

Time to send em those big APC UPS machines you have sitting in your lab ... haven't those guys heard about backup power
( diesel generators , bloomboxes , batteries ). it is inexcusable for a datacenter not to have alternate on-line power.

As for corrupted files ... we got journaling file systems these days ... it should recover ! If you get a raid controller with an on-board battery backup ( 12 volt lead-gel pack) even in catastrophic power failure those controllers preserve the cached data in their buffers for up to 24 hours. When power is restored the the controller applies the last file changes and all is well. Those controllers double-check that the requested changes are commited. This is protection of the file integrity on a hardware level. the operating system is unaware this happens.

Promise and 3Ware have such controllers.
Professional Electron Wrangler.
Any comments, or points of view expressed, are my own and not endorsed , induced or compensated by my employer(s).
 

Offline Stonent

  • Super Contributor
  • ***
  • Posts: 3824
  • Country: us
Re: Forum Outage
« Reply #74 on: November 22, 2013, 09:40:31 am »
Just get some of those maxwell supercaps. They go up to 6000F. That ought to keep the server running a bit. 
« Last Edit: November 22, 2013, 09:42:52 am by Stonent »
The larger the government, the smaller the citizen.
 

Offline Baliszoft

  • Frequent Contributor
  • **
  • Posts: 277
  • Country: hu
Re: Forum Outage
« Reply #75 on: November 22, 2013, 10:10:55 am »
Already said that hostgator sucks.
 

Offline free_electron

  • Super Contributor
  • ***
  • Posts: 8517
  • Country: us
    • SiliconValleyGarage
Re: Forum Outage
« Reply #76 on: November 22, 2013, 10:08:29 pm »
boom. we had another one apparently.. it's just back up now
Professional Electron Wrangler.
Any comments, or points of view expressed, are my own and not endorsed , induced or compensated by my employer(s).
 

Offline Alana

  • Frequent Contributor
  • **
  • Posts: 297
  • Country: pl
Re: Forum Outage
« Reply #77 on: November 22, 2013, 10:12:35 pm »
I had the same problem on a server and we solved it by adding this:
Code: [Select]
mysqlcheck --auto-repair -A -u mythtv -pmysqlpass to startup scripts.
 

Offline Stonent

  • Super Contributor
  • ***
  • Posts: 3824
  • Country: us
Re: Forum Outage
« Reply #78 on: November 22, 2013, 10:14:03 pm »
Might change that mythtv part though.
The larger the government, the smaller the citizen.
 

Offline xrunner

  • Super Contributor
  • ***
  • Posts: 7513
  • Country: us
  • hp>Agilent>Keysight>???
Re: Forum Outage
« Reply #79 on: November 22, 2013, 10:29:46 pm »
Man I'd be composing a real bitchy email to the people running the server about now ...
I told my friends I could teach them to be funny, but they all just laughed at me.
 

Offline sleemanj

  • Super Contributor
  • ***
  • Posts: 3024
  • Country: nz
  • Professional tightwad.
    • The electronics hobby components I sell.
Re: Forum Outage
« Reply #80 on: November 22, 2013, 10:33:06 pm »

As for corrupted files ... we got journaling file systems these days ... it should recover !

It's not that type of corruption.  The files are perfectly valid and consistent as far as the filesystem/OS is concerned (assuming journaled FS in use and that it recovered the journal on boot). 

It's due to the (non-transactional) nature of how MySQL's native data engine/table format (MyISAM) works.  MySQL has a number of engines/formats, MyISAM is usually the default, it's fast but it's not transactional and not ACID compliant in any way and as a result it's quite easy to get it in an inconsistent state - if it updated the record count in the MyISAM header, but server crashed before it inserted the record itself for example. 

Nothing at the filesystem level, let alone hardware level will save you from this, the files are perfectly valid series of bits, just that they don't have all the bits that MySQL was going to write, because it didn't write them.
~~~
EEVBlog Members - get yourself 10% discount off all my electronic components for sale just use the Buy Direct links and use Coupon Code "eevblog" during checkout.  Shipping from New Zealand, international orders welcome :-)
 

Offline Corporate666

  • Supporter
  • ****
  • Posts: 2008
  • Country: us
  • Remember, you are unique, just like everybody else
Re: Forum Outage
« Reply #81 on: November 22, 2013, 10:34:45 pm »
But really, this is the eevblog forum, not a bank. How much effort and expense should I really go to ensure for every contingency?

As someone else said, thank you for the sanity.

I have CNC machines in our factory, and because they are big and expensive, people don't buy them to sit around doing nothing waiting for potential work.  So the CNC machines are usually running a lot.  Well, one went down - turned out a seal got compromised and coolant seeped in and destroyed an optical encoder.  A day to diagnose the problem, a day or two to get a replacement part, and a day to R&R the machine and get it back up and running.

We were running some parts for a group buy for customers, and when told that we would be delayed by a few days, some of the responses were borderline comical.  Some suggested we weren't a serious business because we obviously had no contingency plan.  Some flamed us for having a single point of failure.  Most "got it" and realized that shit happens.

Point being - whenever you talk technical with a bunch of tech guys, there will always be folks who will happily spend your money creating their dream system.. but those people 1) aren't signing the check and 2) aren't seeing all the facts like revenues/expenses and all the other factors that go into these decisions, so their opinions lack context and relevance.

In short, Dave, don't worry about the downtime... I got more work done today than usual, and I doubt you lost any users because of some hours of downtime on a FREE resource that delivers so much for that free price.
It's not always the most popular person who gets the job done.
 

Offline robrenz

  • Super Contributor
  • ***
  • Posts: 3035
  • Country: us
  • Real Machinist, Wannabe EE
Re: Forum Outage
« Reply #82 on: November 22, 2013, 10:38:41 pm »
I sure got a lot done today with the forum down :D

Offline Carrington

  • Super Contributor
  • ***
  • Posts: 1202
  • Country: es
Re: Forum Outage
« Reply #83 on: November 22, 2013, 10:46:23 pm »
Again, the forum back to life when it is time to sleep here (SPAIN).   :palm:
My English can be pretty bad, so suggestions are welcome. ;)
Space Weather.
Lightning & Thunderstorms in Real Time.
 

Offline arekm

  • Supporter
  • ****
  • Posts: 165
  • Country: pl
Re: Forum Outage
« Reply #84 on: November 22, 2013, 11:14:02 pm »
MySQL's native data engine/table format (MyISAM) works.  MySQL has a number of engines/formats, MyISAM is usually the default

I'll add that this is true only in ancient mysql versions. Current mysql 5.6 is using InnoDB engine by default.

MyISAM is the reason why database people called mysql a "toy".  Things have changed with InnoDB.


So switch to InnoDB. If not then at least mysqlcheck with auto repair ugly "hack" at boot proposed earlier - things will hopefuly auto-fix after each hard reset even with MyISAM (so no need to wait until morning etc).
 

Offline Bored@Work

  • Super Contributor
  • ***
  • Posts: 3932
  • Country: 00
Re: Forum Outage
« Reply #85 on: November 22, 2013, 11:32:51 pm »
haven't those guys heard about backup power
( diesel generators , bloomboxes , batteries ). it is inexcusable for a datacenter not to have alternate on-line power.

Ha! From the sparse information they gave about the first power-outage something broke between the data center and the backup generator

http://newswire.net/newsroom/financial/00078336-bluehost-service-out.html
Quote
According to one Customer Service Representative at Bluehost, the disruption took place “between the back-up generators, and the server racks, necessitating a physical re-wiring”. For reasons that are not fully understood, the back-up generators at the facility were in-effective in maintaining power supply to power the co-location facility.
I delete PMs unread. If you have something to say, say it in public.
For all else: Profile->[Modify Profile]Buddies/Ignore List->Edit Ignore List
 

Offline sleemanj

  • Super Contributor
  • ***
  • Posts: 3024
  • Country: nz
  • Professional tightwad.
    • The electronics hobby components I sell.
Re: Forum Outage
« Reply #86 on: November 22, 2013, 11:38:31 pm »
I'll add that this is true only in ancient mysql versions. Current mysql 5.6 is using InnoDB engine by default.

That's true, but of course most off-the-shelf PHP web applications out there are using MyISAM, because it's ubiquitous, you can't rely on InnoDB being available on a mysql server (when it comes to shared hosting).

InnoDB's quite recent addition of fulltext indexing is also not that good compared to MyISAM's.
~~~
EEVBlog Members - get yourself 10% discount off all my electronic components for sale just use the Buy Direct links and use Coupon Code "eevblog" during checkout.  Shipping from New Zealand, international orders welcome :-)
 

Offline nctnico

  • Super Contributor
  • ***
  • Posts: 26891
  • Country: nl
    • NCT Developments
Re: Forum Outage
« Reply #87 on: November 22, 2013, 11:52:04 pm »
But really, this is the eevblog forum, not a bank. How much effort and expense should I really go to ensure for every contingency?

As someone else said, thank you for the sanity.
Well I'd like to hear anyone's boss after someone's car broke down for the third time this week  >:D Ofcourse Dave's website is not a bank but still the website and it's forum are part of Dave's income. If it where a bank he would have everything fully redundant down to the telephone lines (been there done that). I also depend on my gear for my income so I have 2 scopes, 2 soldering irons, a spare computer, backups and several other things I'd need to keep my business going. Then again I don't have a second house in case the current one burns down.
Quote
compromised and coolant seeped in and destroyed an optical encoder.  A day to diagnose the problem, a day or two to get a replacement part, and a day to R&R the machine and get it back up and running.

We were running some parts for a group buy for customers, and when told that we would be delayed by a few days, some of the responses were borderline comical.  Some suggested we weren't a serious business because we obviously had no contingency plan.  Some flamed us for having a single point of failure.  Most "got it" and realized that shit happens.
So you are not too serious about your customers then?  ;D When running a business you sometimes end up between a rock and a hard place to get a deal to fall through. A couple of months ago one of my customers could get a very good deal but only if their equipment would pass CE testing within two weeks. To get it through CE testing I had to order some parts from Farnell which I really needed the next day. Unfortunately Farnell (for the first time) forgot to put the parts in the envelope so I got nothing. Some parts worth 20 cents suddenly became a potential deal breaker. Fortunately these parts where generic so I could buy them in a local shop.

The same goes for your customers. It may be that the parts you delayed by a few days could potentially cost them a lot of business so I do sympathise with their frustration.
« Last Edit: November 22, 2013, 11:55:36 pm by nctnico »
There are small lies, big lies and then there is what is on the screen of your oscilloscope.
 

Offline EEVblogTopic starter

  • Administrator
  • *****
  • Posts: 37730
  • Country: au
    • EEVblog
Re: Forum Outage
« Reply #88 on: November 23, 2013, 12:03:49 am »
Ofcourse Dave's website is not a bank but still the website and it's forum are part of Dave's income.

Having issues like this even a few times a year does not really affect my income in any way.
It's only a long term thing, which I have had issues with before I moved to a dedicated server. i.e. the forum and website are slow and unreliable, and that eventually turns people away.
That is not the issue here, it's a simple major disruption issue that can happen with any host. Yes, if it keeps happening over a matter of weeks or months, then I need to look elsewhere, but only a fool would jump hosts every time there is an issue.
Once again, people conveniently forget how reliable this server has been running for the last couple of years.

 

Offline Mark_O

  • Frequent Contributor
  • **
  • Posts: 939
  • Country: us
Re: Forum Outage
« Reply #89 on: November 23, 2013, 12:20:10 am »
And really, if the forum is down for a few hours (rare)

Hours?!   :-BROKE  Oh, no!

Quote
...or even a few days (never happened), is it the end of the world?

Are you kidding?  Definitely!   :scared:   :scared:  Panic!  Panic! 

I think you may not realize how dependent some of your members are for their daily 'fix' here.  It's like an addiction.  They click their favs link, and... it's not there!  OMG!!   :'(   :'(  then  >:(  then  :rant:

Its' rather amusing.

« Last Edit: November 23, 2013, 12:26:37 am by Mark_O »
 

Offline Mark_O

  • Frequent Contributor
  • **
  • Posts: 939
  • Country: us
Re: Forum Outage
« Reply #90 on: November 23, 2013, 12:24:25 am »
Once again, people conveniently forget how reliable this server has been running for the last couple of years.

Yes, it is convenient to forget the reliable times. 

However, this also explains the shock when things do go wrong.  If it had problems constantly, folks would be used to that, and just shrug off losing access for a few hours.  But when it's always there, and reliably so, day after day, month after month, and so forth, an outage becomes a surprising anomaly.

Resulting in 7 pages of discussions.  :)
 

Offline EEVblogTopic starter

  • Administrator
  • *****
  • Posts: 37730
  • Country: au
    • EEVblog
Re: Forum Outage
« Reply #91 on: November 23, 2013, 12:33:21 am »
Reply from HostGator:
Quote
Please know that the occurrences over the past few days have been isolated to the issue that caused our network outage, and we do not foresee any further issues once all of the current issues are resolved. We have taken measure not only to restore all of our servers, but have introduced new redundancy measures in order to prevent a future issue of the same nature. Thank you so very much for your patience while we restored everything to working order.
 

Offline Maxlor

  • Frequent Contributor
  • **
  • Posts: 565
  • Country: ch
Re: Forum Outage
« Reply #92 on: November 23, 2013, 12:35:09 am »
Yes, if it keeps happening over a matter of weeks or months, then I need to look elsewhere, but only a fool would jump hosts every time there is an issue.
Once again, people conveniently forget how reliable this server has been running for the last couple of years.
I had similar issues with a hosting company a few years ago, a month filled with interruptions and crashes because of a design fault in their set up. I stuck with them though. The fixed the problem, improved their procedures, and have had a perfect record since according to my NAGIOS. Judging by their announcements, they do a lot more testing before any kind of change these days.

What I'm saying is, your hosting company will probably end up being better (more reliable) once this is over, but without costing you more. I'd hold out too, for a while longer.

As for database corruption – it shouldn't happen, even when the computer crashes. Databases that allow it to happen fail one of their core tasks. So... if I have a choice, I pick something other than MySQL :)
 

Offline iceisfun

  • Regular Contributor
  • *
  • Posts: 140
  • Country: us
Re: Forum Outage
« Reply #93 on: November 23, 2013, 12:37:43 am »
Reply from HostGator:
Quote
Please know that the occurrences over the past few days have been isolated to the issue that caused our network outage, and we do not foresee any further issues once all of the current issues are resolved. We have taken measure not only to restore all of our servers, but have introduced new redundancy measures in order to prevent a future issue of the same nature. Thank you so very much for your patience while we restored everything to working order.

Just make sure your taking off site backups of database data, its pretty easy to setup a mysqldump and rsync to a remote location and the day you need it you will be glad you took the time to set it up.


 

Offline EEVblogTopic starter

  • Administrator
  • *****
  • Posts: 37730
  • Country: au
    • EEVblog
Re: Forum Outage
« Reply #94 on: November 23, 2013, 01:31:11 am »
My reading of the Hostgator reply reveals the critical statement "introduced new redundancy measures in order to prevent a future issue of the same nature".
If they can introduce these "new" measures so quickly and easily then WHY weren't they in place already?

Because almost certainly what they really mean is "we plan on introducing". They haven't done it yet.
 

Offline JoannaK

  • Frequent Contributor
  • **
  • Posts: 336
  • Country: fi
    • Diytao making blog
Re: Forum Outage
« Reply #95 on: November 23, 2013, 02:03:06 am »
And really, if the forum is down for a few hours (rare), or even a few days (never happened), is it the end of the world?
Heh, reminds me of this.  ;D



Life?  :wtf:
 

Offline nctnico

  • Super Contributor
  • ***
  • Posts: 26891
  • Country: nl
    • NCT Developments
Re: Forum Outage
« Reply #96 on: November 23, 2013, 02:20:58 am »
My reading of the Hostgator reply reveals the critical statement "introduced new redundancy measures in order to prevent a future issue of the same nature".

It isn't a question (well, here it is) of how necessary or important the EEVBLOG forum server uptime is. These datacenters host on the same server racks servers for businesses where downtime can have a very severe impact.

Banks, Telcos and Insurance companies can expect and demand continuous availability (barring natural disasters) year in year out.
Those kind of businesses usually have SLAs in place with severe penalties so their websites are run on different hardware and seperate UPS with dual power feeds. SLAs come with a price tag though so many hosting providers don't provide them by default. Although the hosting provider I've been using for over a decade offers a minimal SLA of 99.8% (less than 18 hours down time per year) or money back on all their hosting services. Even the ones they charge €6 per month for.
« Last Edit: November 23, 2013, 02:23:05 am by nctnico »
There are small lies, big lies and then there is what is on the screen of your oscilloscope.
 

Offline Corporate666

  • Supporter
  • ****
  • Posts: 2008
  • Country: us
  • Remember, you are unique, just like everybody else
Re: Forum Outage
« Reply #97 on: November 23, 2013, 02:56:40 am »
Well I'd like to hear anyone's boss after someone's car broke down for the third time this week  >:D Ofcourse Dave's website is not a bank but still the website and it's forum are part of Dave's income. If it where a bank he would have everything fully redundant down to the telephone lines (been there done that). I also depend on my gear for my income so I have 2 scopes, 2 soldering irons, a spare computer, backups and several other things I'd need to keep my business going. Then again I don't have a second house in case the current one burns down.

That's sort of my point though... there is always a point of failure. 

Quote
So you are not too serious about your customers then?  ;D When running a business you sometimes end up between a rock and a hard place to get a deal to fall through. A couple of months ago one of my customers could get a very good deal but only if their equipment would pass CE testing within two weeks. To get it through CE testing I had to order some parts from Farnell which I really needed the next day. Unfortunately Farnell (for the first time) forgot to put the parts in the envelope so I got nothing. Some parts worth 20 cents suddenly became a potential deal breaker. Fortunately these parts where generic so I could buy them in a local shop.

The same goes for your customers. It may be that the parts you delayed by a few days could potentially cost them a lot of business so I do sympathise with their frustration.

I would say the correct business decision is always to weight the costs against the benefits as with any business decision.  I could buy a whole new CNC machine and run two machines at 50% capacity, but considering the machines cost well into 6 figures, it would very hard to justify that.  Will I make more money by investing in that second machine?  No.  Sure, when something goes wrong, customers can be upset... but that doesn't justify the expense.  In your Farnell example, you could have ordered multiple parts from multiple suppliers to be delivered to multiple locations (home and work), if it was so critical to pay for next-day from Farnell.  But as always, it comes down to cost vs. benefit.
« Last Edit: November 23, 2013, 04:07:48 am by Corporate666 »
It's not always the most popular person who gets the job done.
 

Offline walshms

  • Regular Contributor
  • *
  • Posts: 183
  • Country: us
Re: Forum Outage
« Reply #98 on: November 23, 2013, 03:33:01 am »
mandatory eye roll 'linux'  tssss...
Unlike the monthly scheduled Windows reboot you mean. Can be mighty inconvenient on that second Tuesday when all the IIS servers do a simultaneous reboot. Linux server that does the real work stays up for months until you turn it off because the fans need cleaning or the hardware fails.
Linux itself is stable... so stable that NASA relies on it almost exclusively, and the ISS is running nothing but Linux today; before the Shuttles stopped going there, they pulled all of the Windows machines off.
Hearing or reading folks complain about Windows says little about Windows and a lot about the person doing the talking or typing.

I could say one hell of a lot about Windows, actually.  From its internal kernel architecture right on up.  I was there at the beginning (Windows 1.0), and I've been supporting Windows servers since the first release of Windows Server.  Microsoft made mistakes in the kernel a long, long time ago and because of their insistence that they not break anything with future releases unless it was absolutely necessary, that architecture never actually changed.  Now they're actually trapped -- and unless they are willing to pull the plug on it and start fresh, they'll never fix it.

People far smarter than I am have pretty clear positions on it too, and they wholly agree that Windows is badly designed.  When it comes right down to it, when big business or life depends on it, Windows isn't the OS of choice.

Vince can roll his eyes, and you can try to defend it as well, but those of us who truly understand OS architecture and how to develop them wouldn't choose to use it when we have that choice to make.  We'd rather stick to Linux (and VxWorks, when it's appropriate.)

Maybe you could dig a little bit deeper... if you did, you might find yourself rethinking your position.
 

Offline Rigby

  • Super Contributor
  • ***
  • Posts: 1476
  • Country: us
  • Learning, very new at this. Righteous Asshole, too
Re: Forum Outage
« Reply #99 on: November 23, 2013, 04:03:10 am »
I could say one hell of a lot about Windows, actually.  From its internal kernel architecture right on up.  I was there at the beginning (Windows 1.0), and I've been supporting Windows servers since the first release of Windows Server.  Microsoft made mistakes in the kernel a long, long time ago and because of their insistence that they not break anything with future releases unless it was absolutely necessary, that architecture never actually changed.  Now they're actually trapped -- and unless they are willing to pull the plug on it and start fresh, they'll never fix it.

People far smarter than I am have pretty clear positions on it too, and they wholly agree that Windows is badly designed.  When it comes right down to it, when big business or life depends on it, Windows isn't the OS of choice.

Vince can roll his eyes, and you can try to defend it as well, but those of us who truly understand OS architecture and how to develop them wouldn't choose to use it when we have that choice to make.  We'd rather stick to Linux (and VxWorks, when it's appropriate.)

Maybe you could dig a little bit deeper... if you did, you might find yourself rethinking your position.

Poor design is one thing, poor usability another, and whether or not Windows was designed badly is arguable; anyone can pick a few things about any operating system and declare that entire operating system faulty.  Windows' market position says all anyone will ever need to hear about big business' trust in it, so your point there is completely without merit.  There are also a great deal of hospital systems run on Windows, so your entire "big business or life depends on it" argument is gone.

Windows is almost unilaterally the OS of choice for anyone that needs to get something done quickly.  You can argue technical merit all the live long day, but that doesn't change a thing.  All operating systems have plenty of technical faults.  Sound on Linux is STILL a giant pain in the ass.  Fonts in X STILL look like shit.  X in general is STILL a giant architectural nightmare.  We can each go on and on about whatever we want and say that's ample evidence that our point is more valid, but the proof lies in the pudding as they say. 

When I stop seeing Windows CE EVERYWHERE in industry, then we can talk.  When I start seeing industrial software tools released on Linux, then we'll talk.  When I start seeing anything that isn't complete and utter Windows dominance in the places where time is money, then we'll talk.

For consumer electronics, I'll gladly hand the trophy to Linux when it comes to embedded devices. 
 

Offline WarSim

  • Frequent Contributor
  • **
  • Posts: 514
Forum Outage
« Reply #100 on: November 23, 2013, 04:35:15 am »
Yes despite the fact that Windows has huge issues, marketing in the Microsoft camp has won out. 
Back in the day of most computer users being sheep there was little hope for other OSs. 

Today people know more about computers and make a more informed choices for themselves. 
Unfortunately people insist on lashing out at others who choose differently. 

Over the last 35 years I have experienced many impressive and many horrid OSs. 
As a Systems Analysts I had to know far too much of all of them. 
In summary every OSs had a purpose, no OS has ever been the end all be all to everyone. 

Today there are about 18 OS choices for enterprise, 5 OS categories for embedded and 3 OS choices for consumer computing. 
Just pick the one you want and enjoy your choice. 
Attacking another's choice is just childish. 
Defending an OS by pointing at the flaws of another just seems silly and too easy to me. 

Despite the OS or DB used, any DB should be able to recover after host failure if the programer does his job right.  Unfortunately solid recovery code only seems to be used in the financial sector these days.  IMHO it is just laziness. 


Sent from my iPad using Tapatalk
 

Offline walshms

  • Regular Contributor
  • *
  • Posts: 183
  • Country: us
Re: Forum Outage
« Reply #101 on: November 23, 2013, 06:09:08 am »
Windows' market position says all anyone will ever need to hear about big business' trust in it, so your point there is completely without merit.

Your opinion, no more valid than mine.  On market dominance: Windows Server is actually in the minority worldwide.  You might want to check that.  On the desktop -- yes, no competition.  On servers?  No, sorry, you'd be wrong.

Ask Google, Facebook, Yahoo, LinkedIn... hell, pick any large-scale service provider anywhere in the world what they're running and they'll tell you it's Linux.  Dave's server is running on Linux, as are most of the virtual hosts in the world, whether they're running on Xensource, Xen, KVM or VMWare.   Take a look at the Top500 list sometime.  See what dominates it.  Technical merit *is* the reason, and the list grows day by day.  IBM and HP rely on it internally, and IBM's Z196 midrange is designed to run it.

Quote
There are also a great deal of hospital systems run on Windows, so your entire "big business or life depends on it" argument is gone.

My wife works for one of the largest hospital systems around, and I can tell you, she (and the rest of the IT team supporting them) wishes they could get rid of Windows.  The problem is the software, not the OS; the software hasn't been ported to Linux on the whole.  It's just a matter of time... it will be.  NASA, ESA and in fact most of the rest of the space industry went to Linux because they couldn't take the downtime anymore.

Quote
Sound on Linux is STILL a giant pain in the ass.  Fonts in X STILL look like shit.  X in general is STILL a giant architectural nightmare.

When you can show me what *servers* require sound, or for that matter even a GUI, I'll be happy to have that conversation with you.  In the mean time, I can only presume you're talking about desktops, and that's just not where the effort has been invested.

Granted, I wish some additional effort were invested there.

Quote
When I start seeing anything that isn't complete and utter Windows dominance in the places where time is money, then we'll talk.

NASDAQ, NYSE, DAX, FTSE... need I go on?

I think you really need to catch up.  Really not trying to tweak you here, but... honestly, you need to spend a little time and catch up to events.
 

Offline walshms

  • Regular Contributor
  • *
  • Posts: 183
  • Country: us
Re: Forum Outage
« Reply #102 on: November 23, 2013, 06:20:29 am »
Despite the OS or DB used, any DB should be able to recover after host failure if the programer does his job right.  Unfortunately solid recovery code only seems to be used in the financial sector these days.  IMHO it is just laziness. 

Actually, it's already there, and it's easy enough to implement.  A stock MySQL install today is robust enough, as long as you have a battery-backed cache on your controller and UPS monitoring.  All of our customers are set up this way, and none has ever had a database crash because of a power outage, or for that matter even a power supply failure (which a UPS can't help you with.)

Virtually every HP server sold is sold with a battery-backed cache, for example; it's nothing more than a tiny rechargeable cell pack that mounts inside the server and is connected to the controller.  Takes about three hours to fully charge, and it's good to keep the cache valid for about 36 hours in a power failure.

Heck, even Microsoft SQL Server will survive that as long as it's on similar hardware and similarly configured.   ;D
 

Offline jancumps

  • Supporter
  • ****
  • Posts: 1272
  • Country: be
  • New Low
Re: Forum Outage
« Reply #103 on: November 23, 2013, 08:33:08 am »
...
Quote
..I had to order some parts from Farnell which I really needed the next day. Unfortunately Farnell (for the first time) forgot to put the parts in the envelope so I got nothing. Some parts worth 20 cents suddenly became a potential deal breaker. Fortunately these parts where generic so I could buy them in a local shop.

...
...
 In your Farnell example, you could have ordered multiple parts from multiple suppliers to be delivered to multiple locations (home and work), if it was so critical to pay for next-day from Farnell.  But as always, it comes down to cost vs. benefit.

This example shows how difficult and cost-impacting it is can be to get everything right. The Farnell deal has several non-guarantied services embedded. The parcel service could have lost it, an accident, ...

But that was (maybe lucky for you, maybe calculated in by you - the word fortunately might indicate it was luck) mitigated by the fact that such an incident was easily resolved because general availability in your neighbourhood of the parts. What if there was a more difficult to find part in the order?
« Last Edit: November 23, 2013, 08:35:32 am by jancumps »
 

Offline AndersAnd

  • Frequent Contributor
  • **
  • Posts: 572
  • Country: dk
Re: Forum Outage
« Reply #104 on: November 23, 2013, 12:02:17 pm »
In your Farnell example, you could have ordered multiple parts from multiple suppliers to be delivered to multiple locations (home and work), if it was so critical to pay for next-day from Farnell.  But as always, it comes down to cost vs. benefit.
And order different brands in case there was a production fault in products from one manufacturer. That is if a alternative brands exists, but otherwise you could also be in trouble if the manufacturer run out of stock, the factory is flooded, burned down or something. Because of this some companies tries to only design with parts where there's more than one manufacturer to choose from. That's not always easy especially for the more specialized ICs.
 

Offline Rigby

  • Super Contributor
  • ***
  • Posts: 1476
  • Country: us
  • Learning, very new at this. Righteous Asshole, too
Re: Forum Outage
« Reply #105 on: November 23, 2013, 02:06:31 pm »
Windows' market position says all anyone will ever need to hear about big business' trust in it, so your point there is completely without merit.

Your opinion, no more valid than mine.  On market dominance: Windows Server is actually in the minority worldwide.  You might want to check that.  On the desktop -- yes, no competition.  On servers?  No, sorry, you'd be wrong.

Ask Google, Facebook, Yahoo, LinkedIn... hell, pick any large-scale service provider anywhere in the world what they're running and they'll tell you it's Linux.  Dave's server is running on Linux, as are most of the virtual hosts in the world, whether they're running on Xensource, Xen, KVM or VMWare.   Take a look at the Top500 list sometime.  See what dominates it.  Technical merit *is* the reason, and the list grows day by day.  IBM and HP rely on it internally, and IBM's Z196 midrange is designed to run it.

Quote
There are also a great deal of hospital systems run on Windows, so your entire "big business or life depends on it" argument is gone.

My wife works for one of the largest hospital systems around, and I can tell you, she (and the rest of the IT team supporting them) wishes they could get rid of Windows.  The problem is the software, not the OS; the software hasn't been ported to Linux on the whole.  It's just a matter of time... it will be.  NASA, ESA and in fact most of the rest of the space industry went to Linux because they couldn't take the downtime anymore.

Quote
Sound on Linux is STILL a giant pain in the ass.  Fonts in X STILL look like shit.  X in general is STILL a giant architectural nightmare.

When you can show me what *servers* require sound, or for that matter even a GUI, I'll be happy to have that conversation with you.  In the mean time, I can only presume you're talking about desktops, and that's just not where the effort has been invested.

Granted, I wish some additional effort were invested there.

Quote
When I start seeing anything that isn't complete and utter Windows dominance in the places where time is money, then we'll talk.

NASDAQ, NYSE, DAX, FTSE... need I go on?

I think you really need to catch up.  Really not trying to tweak you here, but... honestly, you need to spend a little time and catch up to events.

quit changing your argument.  i started with end-user systems, you brought up architecture and business.  i address architecture and business, you switch to servers.  pick a damn point and stick to it.

we'll just disagree to agree.  have fun on slashdot.
 

Offline WarSim

  • Frequent Contributor
  • **
  • Posts: 514
Forum Outage
« Reply #106 on: November 23, 2013, 03:38:41 pm »

Despite the OS or DB used, any DB should be able to recover after host failure if the programer does his job right.  Unfortunately solid recovery code only seems to be used in the financial sector these days.  IMHO it is just laziness. 

Actually, it's already there, and it's easy enough to implement.  A stock MySQL install today is robust enough, as long as you have a battery-backed cache on your controller and UPS monitoring.  All of our customers are set up this way, and none has ever had a database crash because of a power outage, or for that matter even a power supply failure (which a UPS can't help you with.)

Virtually every HP server sold is sold with a battery-backed cache, for example; it's nothing more than a tiny rechargeable cell pack that mounts inside the server and is connected to the controller.  Takes about three hours to fully charge, and it's good to keep the cache valid for about 36 hours in a power failure.

Heck, even Microsoft SQL Server will survive that as long as it's on similar hardware and similarly configured.   ;D

Yes MySQL is a proper DB that has the ability to recover from a crash if the programer codes the ability to do so.  My comment was in no way a MySQL bash.  It was a disappointment that the forums code was not able to recover faster, and in-still more confidence in the user. 



Sent from my iPad using Tapatalk
 

alm

  • Guest
Re: Forum Outage
« Reply #107 on: November 23, 2013, 03:44:56 pm »
Yes MySQL is a proper DB that has the ability to recover from a crash if the programer codes the ability to do so.  My comment was in no way a MySQL bash.  It was a disappointment that the forums code was not able to recover faster, and in-still more confidence in the user. 
What are you talking about? The MyISAM storage engine is not robust or ACID compliant, but is the only one with properly implemented full text search (even today). How is this the programmer's or even the forum software's fault? How is the programmer supposed to write robust code if the storage engine does not support transactions and makes START TRANSACTION a non-op?
 

Offline arekm

  • Supporter
  • ****
  • Posts: 165
  • Country: pl
Re: Forum Outage
« Reply #108 on: November 23, 2013, 04:12:33 pm »
What are you talking about? The MyISAM storage engine is not robust or ACID compliant, but is the only one with properly implemented full text search (even today). How is this the programmer's or even the forum software's fault?

Current MySQL + InnoDB gives ACID and full text search. If you don't trust InnoDB for full text search then use Sphinx (mentioned in SMF performance hints).

So you see, it is possible to make improvements to this forum but the problem is that someone has to do it.

ps. SMF forum uses sphinx
http://www.simplemachines.org/community/index.php?topic=203615.msg2961417#msg2961417
and there seem to be a "big forum operator" hidden area on smf forum (http://www.simplemachines.org/community/index.php?topic=418993.0). sphinx plugin should be available there.
 

Offline madires

  • Super Contributor
  • ***
  • Posts: 7754
  • Country: de
  • A qualified hobbyist ;)
Re: Forum Outage
« Reply #109 on: November 23, 2013, 04:46:35 pm »
Banks, Telcos and Insurance companies can expect and demand continuous availability (barring natural disasters) year in year out.
Those kind of businesses usually have SLAs in place with severe penalties so their websites are run on different hardware and seperate UPS with dual power feeds. SLAs come with a price tag though so many hosting providers don't provide them by default. Although the hosting provider I've been using for over a decade offers a minimal SLA of 99.8% (less than 18 hours down time per year) or money back on all their hosting services. Even the ones they charge €6 per month for.

... in different datacenters as a complete datacenter can fail too. The usual power setup for a datacenter is:
- two HV feeds with substations in-house (we're talking about MWs!)
- large UPS (maybe multiple) for short outages and giving the diesel engines time to start and the generators time to settle (good diesel generators can start and deliver power in about 10-20s, diesels are pre-heated)
- if there's a -48VDC supply for telco equipment it's buffered by batteries (usually much longer runtime than the AC UPS)
 

Offline WarSim

  • Frequent Contributor
  • **
  • Posts: 514
Forum Outage
« Reply #110 on: November 23, 2013, 05:00:55 pm »

Yes MySQL is a proper DB that has the ability to recover from a crash if the programer codes the ability to do so.  My comment was in no way a MySQL bash.  It was a disappointment that the forums code was not able to recover faster, and in-still more confidence in the user. 
What are you talking about? The MyISAM storage engine is not robust or ACID compliant, but is the only one with properly implemented full text search (even today). How is this the programmer's or even the forum software's fault? How is the programmer supposed to write robust code if the storage engine does not support transactions and makes START TRANSACTION a non-op?
ISAM is an accessor option which the programer selected.  Full stop.


Sent from my iPad using Tapatalk
 

Offline daveshah

  • Supporter
  • ****
  • Posts: 356
  • Country: at
    • Projects
Re: Forum Outage
« Reply #111 on: November 23, 2013, 05:08:13 pm »
ISAM is an accessor option which the programer selected.  Full stop.
Unfortunately, if you want fully working fulltext searches it is the only option (although InnoDB is now starting to support full-text searches, software takes time to transition.)
 

Offline WarSim

  • Frequent Contributor
  • **
  • Posts: 514
Forum Outage
« Reply #112 on: November 23, 2013, 05:15:13 pm »
I have noticed that most people are talking about UPS to avoid the problem, instead of recovering from such issues. 
I guess it is a preference. 
I know power failures occurs and some last longer than the UPS capabilities. 
I worked in places that had UPS to span to weeks of generator power to span to mobile casually power.  Even in these cases recovery from crashes had to be safeguarded against to insure a minimum of 20 sec to availability and 15 min for full restore.  Ready for the next event.  Apparently most of the good coding practices used there are not applicable here, which I find disappointing. 
Yes these clusters are a mixture of UNIXies and Linux.  No Windows because it is not capable of the required security, not because of the DB options. 


Sent from my iPad using Tapatalk
 

Offline walshms

  • Regular Contributor
  • *
  • Posts: 183
  • Country: us
Re: Forum Outage
« Reply #113 on: November 23, 2013, 09:16:59 pm »
quit changing your argument.  i started with end-user systems, you brought up architecture and business.  i address architecture and business, you switch to servers.  pick a damn point and stick to it.

The entire thread was about servers, friend... really, it started out with Dave's server being down, and that's all I've been talking about all along.  If the subject got changed, it wasn't by me.

Edit: fix quoting
« Last Edit: November 23, 2013, 09:23:01 pm by walshms »
 

Offline walshms

  • Regular Contributor
  • *
  • Posts: 183
  • Country: us
Re: Forum Outage
« Reply #114 on: November 23, 2013, 09:22:16 pm »
I have noticed that most people are talking about UPS to avoid the problem, instead of recovering from such issues. 
I guess it is a preference. 

For my part, it's more than just having a UPS... it's monitoring the UPS so you know when the power is out, and you shut down the database gracefully.

Quote
Apparently most of the good coding practices used there are not applicable here, which I find disappointing. 
Yes these clusters are a mixture of UNIXies and Linux.  No Windows because it is not capable of the required security, not because of the DB options. 

Good coding only takes you so far, as important as it is.  If power loss is imminent, you need to shut down the DB to preserve integrity -- no matter whose DB you're using.

Even Windows can be secured.  Unplug the network and lock the door.  :-DD
 

Offline WarSim

  • Frequent Contributor
  • **
  • Posts: 514
Forum Outage
« Reply #115 on: November 23, 2013, 11:20:04 pm »


Good coding only takes you so far, as important as it is.  If power loss is imminent, you need to shut down the DB to preserve integrity -- no matter whose DB you're using.

Even Windows can be secured.  Unplug the network and lock the door.  :-DD

I think good coding practices was the wrong term for me to use. 
I was referring to systems that need to operate right up to the point of power loss.  The option to shutdown beforehand is not an option.  Since this is a special case good coding practices is not the right phrase in this context. 
Just like most system are allowed to stall if a raid block is ripped out. 

I would still like to see more robust methods used in all sectors, but accept that it is not the norm. 



Sent from my iPad using Tapatalk
 

Offline gnif

  • Administrator
  • *****
  • Posts: 1675
  • Country: au
Re: Forum Outage
« Reply #116 on: November 25, 2013, 12:47:18 am »
Ok, time to weigh in with what has been done and why, and what has caused these outages.

* The server is a dedicated server... with a twist. The actual server is a virtual machine on a dedicated server, it has 100% of the server's resources at it's disposal and is not shared. The reason for this is the provider Dave chose has insisted that it be configured this way and will not allow us to run normally on the hardware. Their reason is so that if a hardware failure occurs, they can just move the virtual system to a new host and boot it up. IMO this reason is faulty since those involved with Linux know that it is not like windows, you can just move it to new hardware and short of having to adjust a few things, like fstab entries, or ethernet configuration, it will work.

* Because the server is virtual, there is another potential point of data loss, when IOs are pending in the virtual layer between the virtual machine and the physical hardware. Soft buffers if you will. There is nothing we can do about this short of move to another DC that will allow us to have full control over the dedicated hardware.

* The database is using MyISAM for most of it's tables, we experimented earlier on with converting some of the larger tables to InnoDB to improve performance and reliability, which did help, most of the critical tables are on InnoDB, but on tables that are thrashed such as the 'users online' table, conversion to InnoDB caused performance penalties.

* The tables that are crashing are the low priority ones that only store temp data, such as the online table, again because these tables are thrashed. I may look at turning these into heap/memory tables to avoid this in future (ie, reboots will erase their contents).

* The provider does indeed have backup power hardware, but due to a fault in wiring, they had a catastrophic failure which caused them to have to re-wire a large portion of the data centre's backup systems. This affected multiple providers, not just Dave's, some of the biggest names in hosting were also taken offline, causing outages for tens of thousands of clients according to the report.

* Adding further redundancy by means of a UPS in the rack would be insanely expensive, firstly for the cost of the rack space, and then the hardware. I doubt that the provider would even allow Dave to do this with the package he currently has. While an outage of a few hours is a pain in the ass, it is sometimes unavoidable unless you have redundant servers and low TTL on your DNS, or a reverse proxy in another physical location (which also provides a point of failure). Every option here drives the cost up.

Dave's server is monitored 24/7, but since I provide this service to him as a favour, I can not prioritise it over my paying clients or my family time, if outages are detected and I am available to fix the issue, downtime will normally be much shorter. Also both outages, including the database problems were detected since I check if the forum and the website are loading correctly every 5 minutes, but I was not available at the time to look into it.

As for backups, I will discuss this with Dave, I should be able to accommodate him with a offsite backup to one of my spare servers in AU, again at no cost.
Edit: A MySQL slave is also being considered.

Edit 2: The outage was for tens of thousands according to the report.
http://newswire.net/newsroom/financial/00078336-bluehost-service-out.html
« Last Edit: November 25, 2013, 12:59:32 am by gnif »
 

Offline dr.diesel

  • Super Contributor
  • ***
  • Posts: 2214
  • Country: us
  • Cramming the magic smoke back in...
Re: Forum Outage
« Reply #117 on: November 25, 2013, 01:05:03 am »
IMO this reason is faulty since those involved with Linux know that it is not like windows, you can just move it to new hardware and short of having to adjust a few things, like fstab entries, or ethernet configuration, it will work.

Not surprising how many people in the industry, are not really knowledgeable "in" the industry.

Thanks for the help and keep up the good work.

Offline Rigby

  • Super Contributor
  • ***
  • Posts: 1476
  • Country: us
  • Learning, very new at this. Righteous Asshole, too
Re: Forum Outage
« Reply #118 on: November 25, 2013, 01:41:52 am »
IMO this reason is faulty since those involved with Linux know that it is not like windows, you can just move it to new hardware and short of having to adjust a few things, like fstab entries, or ethernet configuration, it will work.

Not surprising how many people in the industry, are not really knowledgeable "in" the industry.

Right, as a former sysadmin I certainly see where they are coming from.  They have lots and lots of customers and they can't count on any given customer knowing what they're doing, even if it's clear that they do.  If the datacenter hadn't wired their stuff wrong, that is to say "if there was never going to be an unexpected loss of power to the server hardware," the provider's failover plan would be sound.  It also usually happens that the hardware a VM runs in is not the same hardware the VM's disk resides in, and often block-level de-duplication or other optimizations are done on the SAN which help out the provider.  This is partially why they insist on a virtual machine.
 

Offline dr.diesel

  • Super Contributor
  • ***
  • Posts: 2214
  • Country: us
  • Cramming the magic smoke back in...
Re: Forum Outage
« Reply #119 on: November 25, 2013, 01:49:26 am »
knowing what they're doing

If the datacenter hadn't wired their stuff wrong

Knowing what you're doing means extensive testing, multiple times, each and every contingency.  Clearly something these amateurs forgot to cross off the checklist.  As the former admin of hundreds of boxes, this mistake would have gotten me instantly fired in my line of work.


Offline gnif

  • Administrator
  • *****
  • Posts: 1675
  • Country: au
Re: Forum Outage
« Reply #120 on: November 25, 2013, 01:58:44 am »
IMO this reason is faulty since those involved with Linux know that it is not like windows, you can just move it to new hardware and short of having to adjust a few things, like fstab entries, or ethernet configuration, it will work.

Not surprising how many people in the industry, are not really knowledgeable "in" the industry.

Right, as a former sysadmin I certainly see where they are coming from.  They have lots and lots of customers and they can't count on any given customer knowing what they're doing, even if it's clear that they do.  If the datacenter hadn't wired their stuff wrong, that is to say "if there was never going to be an unexpected loss of power to the server hardware," the provider's failover plan would be sound.  It also usually happens that the hardware a VM runs in is not the same hardware the VM's disk resides in, and often block-level de-duplication or other optimizations are done on the SAN which help out the provider.  This is partially why they insist on a virtual machine.

In this instance the disks are physically in the host on an adaptec raid controller configured as a RAID-1 array, so in this instance, it does not explain why the machine is a VM. Regarding customers not knowing what they are doing, yes, I completely understand that, but this basic package does not include a support level with the software layer, you are expected to know what you are doing, or be willing to pay someone to do it for you.

Knowing what you're doing means extensive testing, multiple times, each and every contingency.  Clearly something these amateurs forgot to cross off the checklist.  As the former admin of hundreds of boxes, this mistake would have gotten me instantly fired in my line of work.

Agreed! The DC I use in AU (Host Networks) does load testing of their backup system every 3 months, which involves real testing, not dummy load, which is one of the main reasons I chose them for my mission critical infrastructure. A failure of that magnitude would cause heads to roll.
 

Offline Rigby

  • Super Contributor
  • ***
  • Posts: 1476
  • Country: us
  • Learning, very new at this. Righteous Asshole, too
Re: Forum Outage
« Reply #121 on: November 25, 2013, 02:42:11 am »
I'm sure someone's been fired over this, but it still happened, didn't it?  Threat of job loss doesn't prevent stupidity, it just makes for some half-assed "If I did that I'd have been fired" story later on.  There is no job I've ever had where a single mistake would have resulted in termination of employment.  There are plenty of intentional things you can do to get fired, but anyone that fires someone without any other cause than the single mistake deserves to go out of business.  Maybe this hosting company will.

The datacenter management entity needs to find out how the hell ALL the contractors, ALL the employees, everyone missed that mistake, perhaps implement some process to prevent it in the future, conduct a bit of training or something, and stop firing people for mistakes.  Fire the guy that steals all the 48V bus bar copper for his meth addiction, yes.  Let people learn from their mistakes.
 

Offline xrunner

  • Super Contributor
  • ***
  • Posts: 7513
  • Country: us
  • hp>Agilent>Keysight>???
Re: Forum Outage
« Reply #122 on: November 25, 2013, 03:01:30 am »
Well all I have to say is these outages are just unconscionable.

There are people blowing up test equipment that post here and we should not have to miss any of those posts.  >:(
I told my friends I could teach them to be funny, but they all just laughed at me.
 

Offline EEVblogTopic starter

  • Administrator
  • *****
  • Posts: 37730
  • Country: au
    • EEVblog
Re: Forum Outage
« Reply #123 on: November 25, 2013, 03:24:12 am »
As for backups, I will discuss this with Dave, I should be able to accommodate him with a offsite backup to one of my spare servers in AU, again at no cost.
Edit: A MySQL slave is also being considered.

Thanks.
I will need needing a solution like this shortly, because my current automated backup system at siteautobackup.com has said they are no longer going to offer the service. I'm not sure of hte cutoff date though. I currently have two backups, daily, and weekly. Both are full cpanel backups of all the files and databases. At present I would still rely upon HostGator to restore my basic server should the actual redundant hard drives fail, before I could reinstall the cpanel backup.

 

Offline EEVblogTopic starter

  • Administrator
  • *****
  • Posts: 37730
  • Country: au
    • EEVblog
Re: Forum Outage
« Reply #124 on: November 25, 2013, 03:30:37 am »
Agreed! The DC I use in AU (Host Networks) does load testing of their backup system every 3 months, which involves real testing, not dummy load, which is one of the main reasons I chose them for my mission critical infrastructure. A failure of that magnitude would cause heads to roll.

But that's the trick, no matter how much testing you do, you can never guarantee that it's going to be fail-safe when the time comes.
It's like equipment calibration, it's all about a confidence level. One test every 12 months saying everything is fine is one confidence level, one test every 3 months is a higher confidence level again, but it's not 100%. For a bit of test gear, it can fail out of calibration one day after the cal test, and likely in both cases a server backup failure could lie lurking ready to strike one day after the test.
 

Offline gnif

  • Administrator
  • *****
  • Posts: 1675
  • Country: au
Re: Forum Outage
« Reply #125 on: November 25, 2013, 04:20:09 am »
The heap/memory tables seem ideal for the users online table. How many other tables are there that get corrupted and are they as suitable for heap/memory? Are they all just temp data tables?

Were the performance issues related to the time earlier this year when RAM was inadequate? Would you expect the increased RAM to accommodate the increase in memory use after converting the tables or will still more RAM be needed?

What are the pros/cons of a Mysql slave? Would it facilitate a faster recovery in the event of a primary database outage? Would you need an extra level of network infrastructure to detect and switch to the slave if an automated switchover were to occur?

The heap/memory tables will only take up a few hundred kilobytes, not enough to worry about. We can only convert the tables that have temporary data in them, and even then, we will only bother with tables that are very active. Performance was ram related earlier this year, we have enough ram free to allocate some to a heap table, this wont be a problem.

A slave database is running on a different server, every time an update occurs, the update gets replicated across to the slave giving us a database backup that is current as of the crash/failure of the master server. This means that in the event of data loss, we can recover pretty much up to the second of the outage, instead of X hours ago when the last backup was performed. This is not the only function of a slave database instance, but it would suit Dave's requirements quite nicely. Obviously we will still need to perform daily backups of the database in case a bad SQL statement is executed which gets replicated to the slave.
 

Offline EEVblogTopic starter

  • Administrator
  • *****
  • Posts: 37730
  • Country: au
    • EEVblog
Re: Forum Outage
« Reply #126 on: November 25, 2013, 05:24:30 am »
A slave database is running on a different server, every time an update occurs, the update gets replicated across to the slave giving us a database backup that is current as of the crash/failure of the master server. This means that in the event of data loss, we can recover pretty much up to the second of the outage, instead of X hours ago when the last backup was performed. This is not the only function of a slave database instance, but it would suit Dave's requirements quite nicely. Obviously we will still need to perform daily backups of the database in case a bad SQL statement is executed which gets replicated to the slave.

That sounds like the go.
How easy is it to set up a slave database?
 

Offline gnif

  • Administrator
  • *****
  • Posts: 1675
  • Country: au
Re: Forum Outage
« Reply #127 on: November 25, 2013, 05:48:56 am »
A slave database is running on a different server, every time an update occurs, the update gets replicated across to the slave giving us a database backup that is current as of the crash/failure of the master server. This means that in the event of data loss, we can recover pretty much up to the second of the outage, instead of X hours ago when the last backup was performed. This is not the only function of a slave database instance, but it would suit Dave's requirements quite nicely. Obviously we will still need to perform daily backups of the database in case a bad SQL statement is executed which gets replicated to the slave.

That sounds like the go.
How easy is it to set up a slave database?

It is a bit of a fiddle, but not too bad, I will omit further information about the configuration publicly for security reasons :).
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf