Author Topic: The BIG EEVblog Server Fire  (Read 10805 times)

0 Members and 1 Guest are viewing this topic.

Offline gnif

  • Administrator
  • *****
  • Posts: 1279
  • Country: au
The BIG EEVblog Server Fire
« on: April 08, 2021, 03:26:37 am »
After many days of stress, hair loss, and sleepless nights, we're back baby!
Please note that there may still be some disruptions over the next few days as we are still running in a degraded state.

Online xrunner

  • Super Contributor
  • ***
  • Posts: 5610
  • Country: us
  • hp>Agilent>Keysight>?
Re: The BIG EEVblog Server Fire
« Reply #1 on: April 08, 2021, 03:33:27 am »
Thank you.  :)
My friends say they're procrastinators. I say I've been meaning to tell them for years, but I just keep putting it off.
 
The following users thanked this post: gnif, fourtytwo42

Offline CatalinaWOW

  • Super Contributor
  • ***
  • Posts: 4199
  • Country: us
Re: The BIG EEVblog Server Fire
« Reply #2 on: April 08, 2021, 03:40:58 am »
I am sure all of us would like an after action report when you finally get back to some semblance of normalcy.

I am really curious about how use of backup generators caused a fire.  Everything after that is just the dominoes falling, with an extra helping of bad luck for the EEVBlog servers.
 
The following users thanked this post: LateLesley

Offline EEVblog

  • Administrator
  • *****
  • Posts: 33401
  • Country: au
    • EEVblog
Re: The BIG EEVblog Server Fire
« Reply #3 on: April 08, 2021, 03:41:53 am »
I figured this event needed it's own thread, so moved it from the servere reports thread.
HUGE thanks to gnif for handling this:
https://hostfission.com/

The server was down from 2021-04-04 21:13 UTC to 2021-04-08 03:36 UTC

It's currently still operating in a degraded state, and performance is surrently impacted until the caches catch up.
Gorillaservers upgraded the server box (maybe the old box was water damaged?) from Dual Xeon 2620V2 from the older dual L5630
Presumably they'll upgrade the other redundant box too to match, but the 2nd box is not currently online yet.

The lesson here is, whilst it's great to have a fully redundant automatic backup server, it was kinda silly to have it in the same datacenter!
We are going to ask Gorillaservers is they can provision one of the boxes in their LA data center, so if a whole city/state goes out the server will still operate.

I aslo learned the importance of relying on a single email server. I was surprised at the stuff I couldn't do that relied on my primary email for confirmations etc.
 
The following users thanked this post: SeanB, gnif, xrunner, xavier60, LateLesley, beanflying

Online edpalmer42

  • Super Contributor
  • ***
  • Posts: 1908
  • Country: ca
Re: The BIG EEVblog Server Fire
« Reply #4 on: April 08, 2021, 03:44:45 am »
The standby power system caused a fire that shut down the data center with some servers expected to be offline for several weeks!!  :palm: :palm:

https://www.gorillaservers.com/outage.html

You just can't make this stuff up!!

 

Offline gnif

  • Administrator
  • *****
  • Posts: 1279
  • Country: au
Re: The BIG EEVblog Server Fire
« Reply #5 on: April 08, 2021, 03:45:45 am »
I am sure all of us would like an after action report when you finally get back to some semblance of normalcy.

I am really curious about how use of backup generators caused a fire.  Everything after that is just the dominoes falling, with an extra helping of bad luck for the EEVBlog servers.

This would have to be provided by GorillaServers first :).
We are not fully aware of all the details yet either as GS have given priority to restoring servers.
HostFission - Full Server Monitoring and Management Solutions.
https://hostfission.com/
https://twitter.com/HostFission
https://twitter.com/Geoffrey_McRae
 

Online Tomorokoshi

  • Super Contributor
  • ***
  • Posts: 1034
  • Country: us
Re: The BIG EEVblog Server Fire
« Reply #6 on: April 08, 2021, 03:46:42 am »
I expected there was progress when the "502 Gateway Not Found" message came up!
 

Offline gnif

  • Administrator
  • *****
  • Posts: 1279
  • Country: au
Re: The BIG EEVblog Server Fire
« Reply #7 on: April 08, 2021, 03:49:40 am »
I figured this event needed it's own thread, so moved it from the servere reports thread.

Scared the crap out me when my post went missing, I thought we had a major DB issue, lol
HostFission - Full Server Monitoring and Management Solutions.
https://hostfission.com/
https://twitter.com/HostFission
https://twitter.com/Geoffrey_McRae
 
The following users thanked this post: The Soulman

Offline EEVblog

  • Administrator
  • *****
  • Posts: 33401
  • Country: au
    • EEVblog
Re: The BIG EEVblog Server Fire
« Reply #8 on: April 08, 2021, 03:50:05 am »
For those who weren't following along on Twitter, it was eventually confirmed that 2 of the three EEVblog boxes (the ones that handle the website and forum databases etc) were in the "splash zone".
So they took longer to get back up and running than my email/management server box which was in another part of the datacenter and another subnet (it's a different type of box, single xeon instead of dual xeon).
Given that they set us up on a new box (and presumably just pulling the old drives), it's likely the old boxes were either water damaged, or it was simply easier to give us a new box until such time as the old boxes can be evaluated properly.
 
The following users thanked this post: cdev

Online Algoma

  • Regular Contributor
  • *
  • Posts: 168
  • Country: ca
Re: The BIG EEVblog Server Fire
« Reply #9 on: April 08, 2021, 03:51:18 am »
Good to see things back online, a bit at a time. I've certainly been there at the center of the NOC when things go sideways.
 
The following users thanked this post: cdev

Offline EEVblog

  • Administrator
  • *****
  • Posts: 33401
  • Country: au
    • EEVblog
Re: The BIG EEVblog Server Fire
« Reply #10 on: April 08, 2021, 03:51:42 am »
I figured this event needed it's own thread, so moved it from the servere reports thread.
Scared the crap out me when my post went missing, I thought we had a major DB issue, lol

Sorry!  :scared:
 

Offline hamster_nz

  • Super Contributor
  • ***
  • Posts: 2482
  • Country: nz
Re: The BIG EEVblog Server Fire
« Reply #11 on: April 08, 2021, 03:54:10 am »
Good to see things back to normal.

I'm feeling for the DC guys... there but for the grace of god go I.
Gaze not into the abyss, lest you become recognized as an abyss domain expert, and they expect you keep gazing into the damn thing.
 

Offline tautech

  • Super Contributor
  • ***
  • Posts: 22232
  • Country: nz
  • Taupaki Technologies Ltd. NZ Siglent Distributor
    • Taupaki Technologies Ltd.
Re: The BIG EEVblog Server Fire
« Reply #12 on: April 08, 2021, 04:21:10 am »
Will be interesting to see if they honor their pledge of 1 day free hosting for every 15 minutes down.
96 free days for every 24 hours should give Dave a year or more free server hosting.
 :popcorn:

gnif, you poor bugger go and get some sleep !
« Last Edit: April 08, 2021, 04:29:07 am by tautech »
Avid Rabid Hobbyist
 

Offline NiHaoMike

  • Super Contributor
  • ***
  • Posts: 7476
  • Country: us
  • "Don't turn it on - Take it apart!"
    • Facebook Page
Re: The BIG EEVblog Server Fire
« Reply #13 on: April 08, 2021, 04:21:51 am »
For those who weren't following along on Twitter, it was eventually confirmed that 2 of the three EEVblog boxes (the ones that handle the website and forum databases etc) were in the "splash zone".
So they took longer to get back up and running than my email/management server box which was in another part of the datacenter and another subnet (it's a different type of box, single xeon instead of dual xeon).
Given that they set us up on a new box (and presumably just pulling the old drives), it's likely the old boxes were either water damaged, or it was simply easier to give us a new box until such time as the old boxes can be evaluated properly.

Are you going to make sure the "backup" will now be in another room or at least several racks away? (I think it was mentioned somewhere that there's enough bandwidth required for real time syncing that putting it in a separate building is not feasible.)
Cryptocurrency has taught me to love math and at the same time be baffled by it.

Cryptocurrency lesson 0: Altcoins and Bitcoin are not the same thing.
 

Offline Whales

  • Super Contributor
  • ***
  • Posts: 1325
  • Country: au
    • Halestrom
Re: The BIG EEVblog Server Fire
« Reply #14 on: April 08, 2021, 04:41:01 am »
Will be interesting to see if they honor their pledge of 1 day free hosting for every 15 minutes down.
96 free days for every 24 hours should give Dave a year or more free server hosting.
 :popcorn:

I think I recall seeing a 30day cap.

Super glad to see things back up.  Sick today, needed a happy reading escape :)

Offline tautech

  • Super Contributor
  • ***
  • Posts: 22232
  • Country: nz
  • Taupaki Technologies Ltd. NZ Siglent Distributor
    • Taupaki Technologies Ltd.
Re: The BIG EEVblog Server Fire
« Reply #15 on: April 08, 2021, 04:45:14 am »
Will be interesting to see if they honor their pledge of 1 day free hosting for every 15 minutes down.
96 free days for every 24 hours should give Dave a year or more free server hosting.
 :popcorn:

I think I recall seeing a 30day cap.
You're right. I didn't go looking at the T&C's.  ::)
https://webnx.com/sla/
Credit shall not exceed 100% of billing in a thirty day cycle
Avid Rabid Hobbyist
 

Offline graybeard

  • Frequent Contributor
  • **
  • Posts: 316
  • Country: us
  • Senior III-V RF/mixed signal/device engineer
    • Chris Grossman
Re: The BIG EEVblog Server Fire
« Reply #16 on: April 08, 2021, 04:49:08 am »
I aslo learned the importance of relying on a single email server. I was surprised at the stuff I couldn't do that relied on my primary email for confirmations etc.

I have been relying on a single mail server for years.  I have two servers running.  One in a farm, and one on a fixed IP at home.   I should configure the home server as a secondary mail server.

Offline Ed.Kloonk

  • Super Contributor
  • ***
  • Posts: 2297
  • Country: au
Re: The BIG EEVblog Server Fire
« Reply #17 on: April 08, 2021, 04:54:06 am »
˙uʍop ǝpᴉsdn plɹoʍ ǝloɥʍ ǝɯ pǝuɹn┴
 

Offline tautech

  • Super Contributor
  • ***
  • Posts: 22232
  • Country: nz
  • Taupaki Technologies Ltd. NZ Siglent Distributor
    • Taupaki Technologies Ltd.
Re: The BIG EEVblog Server Fire
« Reply #18 on: April 08, 2021, 04:56:41 am »
For prosperity....including the bad spelling:

As of approimately 2021-04-04 21:13:00 UTC there was a major outage at the datacenter in Odgen Utah where the EEVBlog servers are hosted. This outage was caused by a fire as a result of performing regular load testing of from the usage of the emergency generators as the result of a city power outage.

Unfortuantly due to the nature of the failure it may be some time before the EEVBlog website and forums are restored to normal operation. Thankfully we maintain several off-site backups of the entire EEVBlog infrastructure, as a result there is no need to worry about any loss of data.

At this time we simply have to sit and wait for more information from the datacenter, in the meantime you can follow Dave or HostFission on twitter for updates.

If you cant wait you can keep yourself entertained by watching Dave's videos on Youtube or watch them over on Odysee.

Alternatively you can join us on IRC on #EEVBlog over at irc.austnet.org

Updates:

2021-04-05 08:56:00 UTC - Current updates project that close to 90-95% of all hardware experienced zero damage. Electricians are working now to restore power to the Ogden Utah data center and are estimating power should return at approximately 3:00 pm MDT tomorrow.
2021-04-05 13:18:00 UTC - Power restored to one of the three servers restoring Dave's email along with some of the EEVBlog management infrastructure.
2021-04-06 03:45:43 UTC - Network restored to the management server
2021-04-07 08:25:46 UTC - Informational update above, fire was not caused by genset test but rather genset use during a city wide power outage.
2021-04-07 08:40:00 UTC - GorillaServers have confirmed that our servers "were located in the section with a high probability of water damage. So they will need to be physically inspected before they can be powered on."
2021-04-07 18:21:21 UTC - A contact at GorillaServers has confirmed that it is almost certain that the two webservers will not be restored today in their current state and are looking at the possibility of issuing us a temporary server in the Los Angeles DC in the meantime
2021-04-07 18:45:32 UTC - LA replacement hardware is not up to spec for our needs, GS has added our servers to a "quite extensive" priority list.
2021-04-07 22:33:00 UTC - GorillaServers have moved the EEVBlog servers to the top of the priority list and are working on them now!
Avid Rabid Hobbyist
 
The following users thanked this post: LateLesley, I wanted a rude username

Offline james_s

  • Super Contributor
  • ***
  • Posts: 16104
  • Country: us
Re: The BIG EEVblog Server Fire
« Reply #19 on: April 08, 2021, 05:05:38 am »
For those who weren't following along on Twitter, it was eventually confirmed that 2 of the three EEVblog boxes (the ones that handle the website and forum databases etc) were in the "splash zone".

I thought datacenters typically used Halon fire suppression systems? The last place I worked that had an onsite datacenter had one, there were warning strobes to indicate the system had discharged and asphyxiation warning signs.
 

Online bdunham7

  • Super Contributor
  • ***
  • Posts: 3422
  • Country: us
Re: The BIG EEVblog Server Fire
« Reply #20 on: April 08, 2021, 05:05:52 am »
The lesson here is, whilst it's great to have a fully redundant automatic backup server, it was kinda silly to have it in the same datacenter!
We are going to ask Gorillaservers is they can provision one of the boxes in their LA data center, so if a whole city/state goes out the server will still operate.

The thing is that if it is 20 feet away, a 10GBASE-T connection can maintain the sync with a very small initial investment and no recurring costs--and microseconds of latency.  A 10Gb/s connection to another state would cost lotsa bucks and would still have 1000X or more latency.
A 3.5 digit 4.5 digit 5 digit 5.5 digit 6.5 digit 7.5 digit DMM is good enough for most people.
 

Online bdunham7

  • Super Contributor
  • ***
  • Posts: 3422
  • Country: us
Re: The BIG EEVblog Server Fire
« Reply #21 on: April 08, 2021, 05:15:34 am »
It's currently still operating in a degraded state, and performance is surrently impacted until the caches catch up.
Gorillaservers upgraded the server box (maybe the old box was water damaged?) from Dual Xeon 2620V2 from the older dual L5630

I just noticed this--they're both pretty old tech, actually.  Spinning drives too? 
A 3.5 digit 4.5 digit 5 digit 5.5 digit 6.5 digit 7.5 digit DMM is good enough for most people.
 

Offline Halcyon

  • Global Moderator
  • *****
  • Posts: 4555
  • Country: au
Re: The BIG EEVblog Server Fire
« Reply #22 on: April 08, 2021, 05:30:41 am »
I aslo learned the importance of relying on a single email server. I was surprised at the stuff I couldn't do that relied on my primary email for confirmations etc.

I'm actually surprised to learn that you weren't using Google Workspace or Office 365 Dave. For the sake of $8-9/month per user, you can have all of the Google services, redundancy, spam filtering and 30-something email aliases. I haven't run my own mail server for decades and it's a bit of a thing of the past.
 

Offline Ed.Kloonk

  • Super Contributor
  • ***
  • Posts: 2297
  • Country: au
Re: The BIG EEVblog Server Fire
« Reply #23 on: April 08, 2021, 05:33:10 am »
I aslo learned the importance of relying on a single email server. I was surprised at the stuff I couldn't do that relied on my primary email for confirmations etc.

I'm actually surprised to learn that you weren't using Google Workspace or Office 365 Dave. For the sake of $8-9/month per user, you can have all of the Google services, redundancy, spam filtering and 30-something email aliases. I haven't run my own mail server for decades and it's a bit of a thing of the past.

That still seems like a lot of money.
 

Offline james_s

  • Super Contributor
  • ***
  • Posts: 16104
  • Country: us
Re: The BIG EEVblog Server Fire
« Reply #24 on: April 08, 2021, 05:36:41 am »
I aslo learned the importance of relying on a single email server. I was surprised at the stuff I couldn't do that relied on my primary email for confirmations etc.

I'm actually surprised to learn that you weren't using Google Workspace or Office 365 Dave. For the sake of $8-9/month per user, you can have all of the Google services, redundancy, spam filtering and 30-something email aliases. I haven't run my own mail server for decades and it's a bit of a thing of the past.

We used the Google suite for a while at my job, absolutely hated it since it was all crippled browser based stuff, they don't even have a proper desktop email client. When we were acquired we went back to Microsoft, which while I'm not the biggest fan of Microsoft, their Outlook for email and calendar blows the doors off of Google's clunky offerings. Google Docs is nice for shared documents but the lack of a desktop version really kills it for most other uses.

For email, yeah, I wouldn't bother hosting my own server, but for the client side, no way, I don't rent software, and I absolutely hate browser based productivity applications. They are a total pain in the ass and never offer the same functionality of a desktop application.
 
The following users thanked this post: cdev, peter-h, Jacon


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf