Author Topic: Forum Outage  (Read 55812 times)

0 Members and 1 Guest are viewing this topic.

Offline EEVblogTopic starter

  • Administrator
  • *****
  • Posts: 37738
  • Country: au
    • EEVblog
Re: Forum Outage
« Reply #50 on: November 21, 2013, 10:43:11 am »
Do you expect to hear from the hosting company as to the true root cause of the server outages?

Of course not!

Quote
Do you have options under consideration that can give you some fallback to another server or that can insulate you from database corruption? Backups are necessary but if you experience a prolonged outage you might be glad you had a plan B.

I have 2 other servers (1 shared, 1 cloud) and can reroute the domain name in a few hours, and reinstall the backups with some work, but could be done in a day probably. Would not have the same performance, but would get the site back up and running.

But really, this is the eevblog forum, not a bank. How much effort and expense should I really go to ensure for every contingency?

 

Offline apelly

  • Supporter
  • ****
  • Posts: 1061
  • Country: nz
  • Probe
Re: Forum Outage
« Reply #51 on: November 21, 2013, 10:51:07 am »
But really, this is the eevblog forum, not a bank. How much effort and expense should I really go to ensure for every contingency?

Ah, the sweet voice of sanity!
 

Offline EEVblogTopic starter

  • Administrator
  • *****
  • Posts: 37738
  • Country: au
    • EEVblog
Re: Forum Outage
« Reply #52 on: November 21, 2013, 11:07:44 am »
Move the whole thing to Amazon, its all I am working on now with superb results, never any worry about hardware :)

Atlium did that. Their website and forum famously goes down every week (I'm not kidding)
 

Offline nctnico

  • Super Contributor
  • ***
  • Posts: 26906
  • Country: nl
    • NCT Developments
Re: Forum Outage
« Reply #53 on: November 21, 2013, 11:12:42 am »
Maybe its time to get serious about software and hardware.

How much more serious can I get about hardware? I already run a multi-hundred dollar/month dedicated top shelf server box with a top provider.
Arguing over which host provider is best, and which hardware is best is pretty pointless, ask 100 experts (I have!), and you get 100 different answers.
Yes and no. The best way is to ask for hands on experience with server parks and not their PCs at home. That usually reduces 100 opinions to about 3  ;) I put the best one can get on my list. That list is based on over 15 years of experience with setups which need to be very reliable. Anyway, databases don't corrupt themselves. That is like software getting a mind of its own. The usual suspect is a power outage or bad hardware. I've come across several cases where bad hardware caused all kinds of subtile errors like some particular software not working  or large compilation runs failing every now and then. A couple of years ago it took me two days to find out which memory module was bad in one of my PCs. But without any hard evidence its impossible to tell where and how a system goes wrong. Just assume hardware isn't 100% perfect and one out of every trillion of operations goes wrong. On a desktop PC this isn't much of a problem because it only runs for a couple of hours per day and if it starts to behave oddly then you click again or restart it. A server OTOH is running 24/7 so faults can accumulate so its necessary for key software and hardware components to have some self healing capabilities like ECC memory and a database which adheres to the ACID rules.
Quote
And really, if the forum is down for a few hours (rare), or even a few days (never happened), is it the end of the world?
Uhhh... :'(
Quote
Quote
- Database system with journaling / automatic recovery (preferably a real database like Postgresql and most certainly not Mysql with myIsam tables)

I believe I can only run what the SMF forum software uses?
It seems SMF forum also supports Postgresql.
There are small lies, big lies and then there is what is on the screen of your oscilloscope.
 

Offline AndersAnd

  • Frequent Contributor
  • **
  • Posts: 572
  • Country: dk
Re: Forum Outage
« Reply #54 on: November 21, 2013, 11:16:50 am »
And really, if the forum is down for a few hours (rare), or even a few days (never happened), is it the end of the world?
Heh, reminds me of this.  ;D

« Last Edit: November 21, 2013, 12:27:58 pm by AndersAnd »
 

Offline EEVblogTopic starter

  • Administrator
  • *****
  • Posts: 37738
  • Country: au
    • EEVblog
Re: Forum Outage
« Reply #55 on: November 21, 2013, 11:27:22 am »
Yes and no. The best way is to ask for hands on experience with server parks. That usually reduces 100 opinions to about 3  ;)

Fraid not!
Before I got my dedicated server, I asked many people involved in various aspects of the industry (and many fans are also in the pro server industry that offered pro advice), all with that kind of professional experience you also claim. I got more conflicting advice than I could poke a stick at. So in the end I went with what I knew and what seemed good enough. In the end most grungingly agreed that what I got was more than good enough for the job.

Quote
I put the best one can get on my list. That list is based on over 15 years of experience with setups which need to be very reliable.


Quote
Anyway, databases don't corrupt themselves.

No one has said they do. The problem has been identified as hard reset likely caused by some form of power loss.

 

Offline MrRedHat

  • Supporter
  • ****
  • Posts: 31
  • Country: us
Re: Forum Outage
« Reply #56 on: November 21, 2013, 11:46:12 am »
Fraid not!

Before I got my dedicated server, I asked many people involved in various aspects of the industry (and many fans are also in the pro server industry that offered pro advice), all with that kind of professional experience you also claim. I got more conflicting advice than I could poke a stick at.

I’ve been doing IT work for a long time and one thing that is for sure....

Opinions are like assholes. Everybody's got one and everyone thinks everyone else's stinks. :)
 

Offline Towger

  • Super Contributor
  • ***
  • Posts: 1645
  • Country: ie
Re: Forum Outage
« Reply #57 on: November 21, 2013, 05:22:14 pm »
Anyway, databases don't corrupt themselves.

Aaagh but they can...

I once had that problem will a newish release of a well know database, and they said it was impossible.  I managed to reproduce the problem in its simplest form and uploaded the source code to their forum.  I created a blank database. Added one simple table with 4 or 5 fields (had to be in the correct type/order).  Added a unique index on one field.  I then added 3 records and at this point it got interesting.  The next command was a simple SQL delete of one record on its unique primary key.  Then a simple Select * from the table.... and low and be hold only one record was left.!  Several people who downloaded and tried it got the same result, but when their own staff found it they initially acknowledged they could reproduce the problem, thread was then pulled from the forum and an update was soon released which did not mention it was a bug fix for my discovery... The feckers never even emailed my back to say they had fixed it...
 

Offline madires

  • Super Contributor
  • ***
  • Posts: 7764
  • Country: de
  • A qualified hobbyist ;)
Re: Forum Outage
« Reply #58 on: November 21, 2013, 06:11:48 pm »
I was curious about the outage and I found this link.
http://newswire.net/newsroom/financial/00078336-bluehost-service-out.html
There are probably dozens of similarly derived news stories.

How do you have power damage between the servers and the backup diesel generators? Without it being self inflicted.

Sounds like the common backup power problem. The diesel engines are run regularly, but with a dummy load attached. If there's a problem with the switching or wiring it won't be detected during a test run. And some data centers forget to upgrade their backup systems to meet customer demand, e.g. the intial backup infrastructure supports 1MW but the actual load is 1.5MW. Those selfmade outages happened at Amazon and all the others several times. The only way to detect them early and maybe prevent them is to take the risk of using the real servers as load instead of the dummy load during the regularly test runs.
 

Offline madires

  • Super Contributor
  • ***
  • Posts: 7764
  • Country: de
  • A qualified hobbyist ;)
Re: Forum Outage
« Reply #59 on: November 21, 2013, 06:43:08 pm »
Yes and no. The best way is to ask for hands on experience with server parks. That usually reduces 100 opinions to about 3  ;)

Fraid not!
Before I got my dedicated server, I asked many people involved in various aspects of the industry (and many fans are also in the pro server industry that offered pro advice), all with that kind of professional experience you also claim. I got more conflicting advice than I could poke a stick at. So in the end I went with what I knew and what seemed good enough. In the end most grungingly agreed that what I got was more than good enough for the job.

The only thing you can do is to filter out the bad hosters (poor service, cheap hardware, poor connectivity, lots of outages). Anything else is simply luck because all hosters will tell you that the server is located in a first class datacenter. And even the first class ones got outages as I wrote before in another post. The good point about the two EEVblog outages is that the datacenter is under huge pressure now to fix the backup power problem ASAP. And that means it's more unlikely to happen again in the near future. But there might be a planned outage (maintenance window) for required work.
 

Offline Stonent

  • Super Contributor
  • ***
  • Posts: 3824
  • Country: us
Re: Forum Outage
« Reply #60 on: November 21, 2013, 07:57:49 pm »
An old IBM retiree that I worked with for a few years used to say "Computers are an art, science."
The larger the government, the smaller the citizen.
 

Offline IanB

  • Super Contributor
  • ***
  • Posts: 11882
  • Country: us
Re: Forum Outage
« Reply #61 on: November 21, 2013, 08:06:43 pm »
Just to mention, there were many reports of service disruptions and stealth DDOS attacks over the past day or so. So the server going down at the same time may not have been a coincidence...
 

Offline xrunner

  • Super Contributor
  • ***
  • Posts: 7517
  • Country: us
  • hp>Agilent>Keysight>???
Re: Forum Outage
« Reply #62 on: November 21, 2013, 08:28:57 pm »
Just to mention, there were many reports of service disruptions and stealth DDOS attacks over the past day or so.

Maybe it was a Rigol company task force / power supply division trying to shut him down.  :)
I told my friends I could teach them to be funny, but they all just laughed at me.
 

Offline Stonent

  • Super Contributor
  • ***
  • Posts: 3824
  • Country: us
Re: Forum Outage
« Reply #63 on: November 21, 2013, 11:52:08 pm »
Just to mention, there were many reports of service disruptions and stealth DDOS attacks over the past day or so.

Maybe it was a Rigol company task force / power supply division trying to shut him down.  :)

Nah I think it was FLIR.
The larger the government, the smaller the citizen.
 

Offline walshms

  • Regular Contributor
  • *
  • Posts: 183
  • Country: us
Re: Forum Outage
« Reply #64 on: November 22, 2013, 12:45:08 am »
I’ve been doing IT work for a long time and one thing that is for sure....

Opinions are like assholes. Everybody's got one and everyone thinks everyone else's stinks. :)

Too true!  :-+   Here's mine:  :blah:

What Dave has is good enough.  If he can't protect any better for power outages (and Dave, might be worth asking if you can put your own UPS there...) then there's going to be DB corruption incidents, and not much he can do about it. 

File system journaling only takes you so far... the server has to have a battery-backed cache, UPS and UPS monitoring software in addition to those to ensure graceful shutdown on power outages, and the server should be configured to automatically power on when power is restored.  With all of that, you stand a pretty good chance of never seeing the issue -- however, it's still not a guarantee. 

The cost of a 99.999% reliable system isn't justified here, no matter how much you suffer from withdrawal...  :scared:   ;D
 

Offline os40la

  • Regular Contributor
  • *
  • Posts: 122
  • Country: us
Re: Forum Outage
« Reply #65 on: November 22, 2013, 02:25:51 am »

Maybe it was a Rigol company task force / power supply division trying to shut him down.  :)

Close,

What really happend was that Dave's server was powered by Rigol's DP832 power supply with the rev1 board. google found out and remotely logged into it and shut off the fan. The second outage was when the backup with the rev2 board was yet again attacked be Google. They now have it running with the rev3 board. the one Dave now has. We should be good now..... :phew:
"No, but I did stay at a Holiday Inn Express"
 

Offline BravoV

  • Super Contributor
  • ***
  • Posts: 7547
  • Country: 00
  • +++ ATH1
Re: Forum Outage
« Reply #66 on: November 22, 2013, 02:35:54 am »
LOL ... just didn't expect this thread turned into, its so hilarious now.  :-DD

Offline Stonent

  • Super Contributor
  • ***
  • Posts: 3824
  • Country: us
Re: Forum Outage
« Reply #67 on: November 22, 2013, 02:41:31 am »
Where in Texas is this server located?
The larger the government, the smaller the citizen.
 

Offline xrunner

  • Super Contributor
  • ***
  • Posts: 7517
  • Country: us
  • hp>Agilent>Keysight>???
Re: Forum Outage
« Reply #68 on: November 22, 2013, 02:53:42 am »
Where in Texas is this server located?

Oh dear ...  :-\
I told my friends I could teach them to be funny, but they all just laughed at me.
 

Offline Stonent

  • Super Contributor
  • ***
  • Posts: 3824
  • Country: us
Re: Forum Outage
« Reply #69 on: November 22, 2013, 03:59:13 am »
Where in Texas is this server located?

Oh dear ...  :-\

What's that supposed to mean?

The larger the government, the smaller the citizen.
 

Offline Rigby

  • Super Contributor
  • ***
  • Posts: 1476
  • Country: us
  • Learning, very new at this. Righteous Asshole, too
Re: Forum Outage
« Reply #70 on: November 22, 2013, 04:22:10 am »
mandatory eye roll 'linux'  tssss...
Unlike the monthly scheduled Windows reboot you mean. Can be mighty inconvenient on that second Tuesday when all the IIS servers do a simultaneous reboot. Linux server that does the real work stays up for months until you turn it off because the fans need cleaning or the hardware fails.

I have a Linux cluster that's been up continuously for more than four years.  The one Windows VM running on it can't last a month without a reboot.

Odds are probably very good the outage was caused by something hardware-related causing the reboot and the database being caught by the reboot. 

Linux itself is stable... so stable that NASA relies on it almost exclusively, and the ISS is running nothing but Linux today; before the Shuttles stopped going there, they pulled all of the Windows machines off.

I have been running Windows servers since 1996, Linux since 1997, and FreeBSD since 2001.  I don't have problems with any of them.

Anyone can keep a server going if they want to be insecure and measure their OS quality solely with uptime stats.

Hearing or reading folks complain about Windows says little about Windows and a lot about the person doing the talking or typing.
 

Offline EEVblogTopic starter

  • Administrator
  • *****
  • Posts: 37738
  • Country: au
    • EEVblog
Re: Forum Outage
« Reply #71 on: November 22, 2013, 05:40:18 am »
Where in Texas is this server located?

I thought it was Provo Utah, but I think it's actually uses Softlayer:
http://support.hostgator.com/articles/hosting-guide/hardware-software/hostgator-data-centers
 

Offline Stonent

  • Super Contributor
  • ***
  • Posts: 3824
  • Country: us
Re: Forum Outage
« Reply #72 on: November 22, 2013, 06:17:20 am »
Where in Texas is this server located?

I thought it was Provo Utah, but I think it's actually uses Softlayer:
http://support.hostgator.com/articles/hosting-guide/hardware-software/hostgator-data-centers

Ok, I thought you mentioned one time that it was in Texas.
The larger the government, the smaller the citizen.
 

Offline free_electron

  • Super Contributor
  • ***
  • Posts: 8517
  • Country: us
    • SiliconValleyGarage
Re: Forum Outage
« Reply #73 on: November 22, 2013, 06:34:29 am »
Looks like another power failure or whatever happened to the server again, along with another database corruption.
All fixed now.

Time to send em those big APC UPS machines you have sitting in your lab ... haven't those guys heard about backup power
( diesel generators , bloomboxes , batteries ). it is inexcusable for a datacenter not to have alternate on-line power.

As for corrupted files ... we got journaling file systems these days ... it should recover ! If you get a raid controller with an on-board battery backup ( 12 volt lead-gel pack) even in catastrophic power failure those controllers preserve the cached data in their buffers for up to 24 hours. When power is restored the the controller applies the last file changes and all is well. Those controllers double-check that the requested changes are commited. This is protection of the file integrity on a hardware level. the operating system is unaware this happens.

Promise and 3Ware have such controllers.
Professional Electron Wrangler.
Any comments, or points of view expressed, are my own and not endorsed , induced or compensated by my employer(s).
 

Offline Stonent

  • Super Contributor
  • ***
  • Posts: 3824
  • Country: us
Re: Forum Outage
« Reply #74 on: November 22, 2013, 09:40:31 am »
Just get some of those maxwell supercaps. They go up to 6000F. That ought to keep the server running a bit. 
« Last Edit: November 22, 2013, 09:42:52 am by Stonent »
The larger the government, the smaller the citizen.
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf