Author Topic: The BIG EEVblog Server Fire  (Read 19811 times)

0 Members and 1 Guest are viewing this topic.

Offline gnifTopic starter

  • Administrator
  • *****
  • Posts: 1675
  • Country: au
The BIG EEVblog Server Fire
« on: April 08, 2021, 03:26:37 am »
After many days of stress, hair loss, and sleepless nights, we're back baby!
Please note that there may still be some disruptions over the next few days as we are still running in a degraded state.

Offline xrunner

  • Super Contributor
  • ***
  • Posts: 7512
  • Country: us
  • hp>Agilent>Keysight>???
Re: The BIG EEVblog Server Fire
« Reply #1 on: April 08, 2021, 03:33:27 am »
Thank you.  :)
I told my friends I could teach them to be funny, but they all just laughed at me.
 
The following users thanked this post: gnif, fourtytwo42

Offline CatalinaWOW

  • Super Contributor
  • ***
  • Posts: 5226
  • Country: us
Re: The BIG EEVblog Server Fire
« Reply #2 on: April 08, 2021, 03:40:58 am »
I am sure all of us would like an after action report when you finally get back to some semblance of normalcy.

I am really curious about how use of backup generators caused a fire.  Everything after that is just the dominoes falling, with an extra helping of bad luck for the EEVBlog servers.
 
The following users thanked this post: LateLesley

Offline EEVblog

  • Administrator
  • *****
  • Posts: 37728
  • Country: au
    • EEVblog
Re: The BIG EEVblog Server Fire
« Reply #3 on: April 08, 2021, 03:41:53 am »
I figured this event needed it's own thread, so moved it from the servere reports thread.
HUGE thanks to gnif for handling this:
https://hostfission.com/

The server was down from 2021-04-04 21:13 UTC to 2021-04-08 03:36 UTC

It's currently still operating in a degraded state, and performance is surrently impacted until the caches catch up.
Gorillaservers upgraded the server box (maybe the old box was water damaged?) from Dual Xeon 2620V2 from the older dual L5630
Presumably they'll upgrade the other redundant box too to match, but the 2nd box is not currently online yet.

The lesson here is, whilst it's great to have a fully redundant automatic backup server, it was kinda silly to have it in the same datacenter!
We are going to ask Gorillaservers is they can provision one of the boxes in their LA data center, so if a whole city/state goes out the server will still operate.

I aslo learned the importance of relying on a single email server. I was surprised at the stuff I couldn't do that relied on my primary email for confirmations etc.
 
The following users thanked this post: SeanB, gnif, xrunner, xavier60, LateLesley, beanflying

Online edpalmer42

  • Super Contributor
  • ***
  • Posts: 2268
  • Country: ca
Re: The BIG EEVblog Server Fire
« Reply #4 on: April 08, 2021, 03:44:45 am »
The standby power system caused a fire that shut down the data center with some servers expected to be offline for several weeks!!  :palm: :palm:

https://www.gorillaservers.com/outage.html

You just can't make this stuff up!!

 

Offline gnifTopic starter

  • Administrator
  • *****
  • Posts: 1675
  • Country: au
Re: The BIG EEVblog Server Fire
« Reply #5 on: April 08, 2021, 03:45:45 am »
I am sure all of us would like an after action report when you finally get back to some semblance of normalcy.

I am really curious about how use of backup generators caused a fire.  Everything after that is just the dominoes falling, with an extra helping of bad luck for the EEVBlog servers.

This would have to be provided by GorillaServers first :).
We are not fully aware of all the details yet either as GS have given priority to restoring servers.
 

Offline Tomorokoshi

  • Super Contributor
  • ***
  • Posts: 1212
  • Country: us
Re: The BIG EEVblog Server Fire
« Reply #6 on: April 08, 2021, 03:46:42 am »
I expected there was progress when the "502 Gateway Not Found" message came up!
 

Offline gnifTopic starter

  • Administrator
  • *****
  • Posts: 1675
  • Country: au
Re: The BIG EEVblog Server Fire
« Reply #7 on: April 08, 2021, 03:49:40 am »
I figured this event needed it's own thread, so moved it from the servere reports thread.

Scared the crap out me when my post went missing, I thought we had a major DB issue, lol
 
The following users thanked this post: The Soulman

Offline EEVblog

  • Administrator
  • *****
  • Posts: 37728
  • Country: au
    • EEVblog
Re: The BIG EEVblog Server Fire
« Reply #8 on: April 08, 2021, 03:50:05 am »
For those who weren't following along on Twitter, it was eventually confirmed that 2 of the three EEVblog boxes (the ones that handle the website and forum databases etc) were in the "splash zone".
So they took longer to get back up and running than my email/management server box which was in another part of the datacenter and another subnet (it's a different type of box, single xeon instead of dual xeon).
Given that they set us up on a new box (and presumably just pulling the old drives), it's likely the old boxes were either water damaged, or it was simply easier to give us a new box until such time as the old boxes can be evaluated properly.
 
The following users thanked this post: cdev

Offline Algoma

  • Frequent Contributor
  • **
  • Posts: 291
  • Country: ca
Re: The BIG EEVblog Server Fire
« Reply #9 on: April 08, 2021, 03:51:18 am »
Good to see things back online, a bit at a time. I've certainly been there at the center of the NOC when things go sideways.
 
The following users thanked this post: cdev

Offline EEVblog

  • Administrator
  • *****
  • Posts: 37728
  • Country: au
    • EEVblog
Re: The BIG EEVblog Server Fire
« Reply #10 on: April 08, 2021, 03:51:42 am »
I figured this event needed it's own thread, so moved it from the servere reports thread.
Scared the crap out me when my post went missing, I thought we had a major DB issue, lol

Sorry!  :scared:
 

Offline hamster_nz

  • Super Contributor
  • ***
  • Posts: 2803
  • Country: nz
Re: The BIG EEVblog Server Fire
« Reply #11 on: April 08, 2021, 03:54:10 am »
Good to see things back to normal.

I'm feeling for the DC guys... there but for the grace of god go I.
Gaze not into the abyss, lest you become recognized as an abyss domain expert, and they expect you keep gazing into the damn thing.
 

Online tautech

  • Super Contributor
  • ***
  • Posts: 28323
  • Country: nz
  • Taupaki Technologies Ltd. Siglent Distributor NZ.
    • Taupaki Technologies Ltd.
Re: The BIG EEVblog Server Fire
« Reply #12 on: April 08, 2021, 04:21:10 am »
Will be interesting to see if they honor their pledge of 1 day free hosting for every 15 minutes down.
96 free days for every 24 hours should give Dave a year or more free server hosting.
 :popcorn:

gnif, you poor bugger go and get some sleep !
« Last Edit: April 08, 2021, 04:29:07 am by tautech »
Avid Rabid Hobbyist
Siglent Youtube channel: https://www.youtube.com/@SiglentVideo/videos
 

Offline NiHaoMike

  • Super Contributor
  • ***
  • Posts: 9007
  • Country: us
  • "Don't turn it on - Take it apart!"
    • Facebook Page
Re: The BIG EEVblog Server Fire
« Reply #13 on: April 08, 2021, 04:21:51 am »
For those who weren't following along on Twitter, it was eventually confirmed that 2 of the three EEVblog boxes (the ones that handle the website and forum databases etc) were in the "splash zone".
So they took longer to get back up and running than my email/management server box which was in another part of the datacenter and another subnet (it's a different type of box, single xeon instead of dual xeon).
Given that they set us up on a new box (and presumably just pulling the old drives), it's likely the old boxes were either water damaged, or it was simply easier to give us a new box until such time as the old boxes can be evaluated properly.

Are you going to make sure the "backup" will now be in another room or at least several racks away? (I think it was mentioned somewhere that there's enough bandwidth required for real time syncing that putting it in a separate building is not feasible.)
Cryptocurrency has taught me to love math and at the same time be baffled by it.

Cryptocurrency lesson 0: Altcoins and Bitcoin are not the same thing.
 

Online Whales

  • Super Contributor
  • ***
  • Posts: 1899
  • Country: au
    • Halestrom
Re: The BIG EEVblog Server Fire
« Reply #14 on: April 08, 2021, 04:41:01 am »
Will be interesting to see if they honor their pledge of 1 day free hosting for every 15 minutes down.
96 free days for every 24 hours should give Dave a year or more free server hosting.
 :popcorn:

I think I recall seeing a 30day cap.

Super glad to see things back up.  Sick today, needed a happy reading escape :)

Online tautech

  • Super Contributor
  • ***
  • Posts: 28323
  • Country: nz
  • Taupaki Technologies Ltd. Siglent Distributor NZ.
    • Taupaki Technologies Ltd.
Re: The BIG EEVblog Server Fire
« Reply #15 on: April 08, 2021, 04:45:14 am »
Will be interesting to see if they honor their pledge of 1 day free hosting for every 15 minutes down.
96 free days for every 24 hours should give Dave a year or more free server hosting.
 :popcorn:

I think I recall seeing a 30day cap.
You're right. I didn't go looking at the T&C's.  ::)
https://webnx.com/sla/
Credit shall not exceed 100% of billing in a thirty day cycle
Avid Rabid Hobbyist
Siglent Youtube channel: https://www.youtube.com/@SiglentVideo/videos
 

Offline graybeard

  • Frequent Contributor
  • **
  • Posts: 431
  • Country: us
  • Consulting III-V RF/mixed signal/device engineer
    • Chris Grossman
Re: The BIG EEVblog Server Fire
« Reply #16 on: April 08, 2021, 04:49:08 am »
I aslo learned the importance of relying on a single email server. I was surprised at the stuff I couldn't do that relied on my primary email for confirmations etc.

I have been relying on a single mail server for years.  I have two servers running.  One in a farm, and one on a fixed IP at home.   I should configure the home server as a secondary mail server.

Offline Ed.Kloonk

  • Super Contributor
  • ***
  • Posts: 4000
  • Country: au
  • Cat video aficionado
Re: The BIG EEVblog Server Fire
« Reply #17 on: April 08, 2021, 04:54:06 am »
˙uʍop ǝpᴉsdn plɹoʍ ǝloɥʍ ǝɯ pǝuɹn┴
iratus parum formica
 

Online tautech

  • Super Contributor
  • ***
  • Posts: 28323
  • Country: nz
  • Taupaki Technologies Ltd. Siglent Distributor NZ.
    • Taupaki Technologies Ltd.
Re: The BIG EEVblog Server Fire
« Reply #18 on: April 08, 2021, 04:56:41 am »
For prosperity....including the bad spelling:

As of approimately 2021-04-04 21:13:00 UTC there was a major outage at the datacenter in Odgen Utah where the EEVBlog servers are hosted. This outage was caused by a fire as a result of performing regular load testing of from the usage of the emergency generators as the result of a city power outage.

Unfortuantly due to the nature of the failure it may be some time before the EEVBlog website and forums are restored to normal operation. Thankfully we maintain several off-site backups of the entire EEVBlog infrastructure, as a result there is no need to worry about any loss of data.

At this time we simply have to sit and wait for more information from the datacenter, in the meantime you can follow Dave or HostFission on twitter for updates.

If you cant wait you can keep yourself entertained by watching Dave's videos on Youtube or watch them over on Odysee.

Alternatively you can join us on IRC on #EEVBlog over at irc.austnet.org

Updates:

2021-04-05 08:56:00 UTC - Current updates project that close to 90-95% of all hardware experienced zero damage. Electricians are working now to restore power to the Ogden Utah data center and are estimating power should return at approximately 3:00 pm MDT tomorrow.
2021-04-05 13:18:00 UTC - Power restored to one of the three servers restoring Dave's email along with some of the EEVBlog management infrastructure.
2021-04-06 03:45:43 UTC - Network restored to the management server
2021-04-07 08:25:46 UTC - Informational update above, fire was not caused by genset test but rather genset use during a city wide power outage.
2021-04-07 08:40:00 UTC - GorillaServers have confirmed that our servers "were located in the section with a high probability of water damage. So they will need to be physically inspected before they can be powered on."
2021-04-07 18:21:21 UTC - A contact at GorillaServers has confirmed that it is almost certain that the two webservers will not be restored today in their current state and are looking at the possibility of issuing us a temporary server in the Los Angeles DC in the meantime
2021-04-07 18:45:32 UTC - LA replacement hardware is not up to spec for our needs, GS has added our servers to a "quite extensive" priority list.
2021-04-07 22:33:00 UTC - GorillaServers have moved the EEVBlog servers to the top of the priority list and are working on them now!
Avid Rabid Hobbyist
Siglent Youtube channel: https://www.youtube.com/@SiglentVideo/videos
 
The following users thanked this post: LateLesley, I wanted a rude username

Offline james_s

  • Super Contributor
  • ***
  • Posts: 21611
  • Country: us
Re: The BIG EEVblog Server Fire
« Reply #19 on: April 08, 2021, 05:05:38 am »
For those who weren't following along on Twitter, it was eventually confirmed that 2 of the three EEVblog boxes (the ones that handle the website and forum databases etc) were in the "splash zone".

I thought datacenters typically used Halon fire suppression systems? The last place I worked that had an onsite datacenter had one, there were warning strobes to indicate the system had discharged and asphyxiation warning signs.
 

Online bdunham7

  • Super Contributor
  • ***
  • Posts: 7818
  • Country: us
Re: The BIG EEVblog Server Fire
« Reply #20 on: April 08, 2021, 05:05:52 am »
The lesson here is, whilst it's great to have a fully redundant automatic backup server, it was kinda silly to have it in the same datacenter!
We are going to ask Gorillaservers is they can provision one of the boxes in their LA data center, so if a whole city/state goes out the server will still operate.

The thing is that if it is 20 feet away, a 10GBASE-T connection can maintain the sync with a very small initial investment and no recurring costs--and microseconds of latency.  A 10Gb/s connection to another state would cost lotsa bucks and would still have 1000X or more latency.
A 3.5 digit 4.5 digit 5 digit 5.5 digit 6.5 digit 7.5 digit DMM is good enough for most people.
 

Online bdunham7

  • Super Contributor
  • ***
  • Posts: 7818
  • Country: us
Re: The BIG EEVblog Server Fire
« Reply #21 on: April 08, 2021, 05:15:34 am »
It's currently still operating in a degraded state, and performance is surrently impacted until the caches catch up.
Gorillaservers upgraded the server box (maybe the old box was water damaged?) from Dual Xeon 2620V2 from the older dual L5630

I just noticed this--they're both pretty old tech, actually.  Spinning drives too? 
A 3.5 digit 4.5 digit 5 digit 5.5 digit 6.5 digit 7.5 digit DMM is good enough for most people.
 

Offline Halcyon

  • Global Moderator
  • *****
  • Posts: 5669
  • Country: au
Re: The BIG EEVblog Server Fire
« Reply #22 on: April 08, 2021, 05:30:41 am »
I aslo learned the importance of relying on a single email server. I was surprised at the stuff I couldn't do that relied on my primary email for confirmations etc.

I'm actually surprised to learn that you weren't using Google Workspace or Office 365 Dave. For the sake of $8-9/month per user, you can have all of the Google services, redundancy, spam filtering and 30-something email aliases. I haven't run my own mail server for decades and it's a bit of a thing of the past.
 

Offline Ed.Kloonk

  • Super Contributor
  • ***
  • Posts: 4000
  • Country: au
  • Cat video aficionado
Re: The BIG EEVblog Server Fire
« Reply #23 on: April 08, 2021, 05:33:10 am »
I aslo learned the importance of relying on a single email server. I was surprised at the stuff I couldn't do that relied on my primary email for confirmations etc.

I'm actually surprised to learn that you weren't using Google Workspace or Office 365 Dave. For the sake of $8-9/month per user, you can have all of the Google services, redundancy, spam filtering and 30-something email aliases. I haven't run my own mail server for decades and it's a bit of a thing of the past.

That still seems like a lot of money.
iratus parum formica
 

Offline james_s

  • Super Contributor
  • ***
  • Posts: 21611
  • Country: us
Re: The BIG EEVblog Server Fire
« Reply #24 on: April 08, 2021, 05:36:41 am »
I aslo learned the importance of relying on a single email server. I was surprised at the stuff I couldn't do that relied on my primary email for confirmations etc.

I'm actually surprised to learn that you weren't using Google Workspace or Office 365 Dave. For the sake of $8-9/month per user, you can have all of the Google services, redundancy, spam filtering and 30-something email aliases. I haven't run my own mail server for decades and it's a bit of a thing of the past.

We used the Google suite for a while at my job, absolutely hated it since it was all crippled browser based stuff, they don't even have a proper desktop email client. When we were acquired we went back to Microsoft, which while I'm not the biggest fan of Microsoft, their Outlook for email and calendar blows the doors off of Google's clunky offerings. Google Docs is nice for shared documents but the lack of a desktop version really kills it for most other uses.

For email, yeah, I wouldn't bother hosting my own server, but for the client side, no way, I don't rent software, and I absolutely hate browser based productivity applications. They are a total pain in the ass and never offer the same functionality of a desktop application.
 
The following users thanked this post: cdev, peter-h, Jacon

Online tautech

  • Super Contributor
  • ***
  • Posts: 28323
  • Country: nz
  • Taupaki Technologies Ltd. Siglent Distributor NZ.
    • Taupaki Technologies Ltd.
Re: The BIG EEVblog Server Fire
« Reply #25 on: April 08, 2021, 05:39:33 am »
Better drop this here too:
Avid Rabid Hobbyist
Siglent Youtube channel: https://www.youtube.com/@SiglentVideo/videos
 

Offline hendorog

  • Super Contributor
  • ***
  • Posts: 1617
  • Country: nz
Re: The BIG EEVblog Server Fire
« Reply #26 on: April 08, 2021, 05:42:41 am »
I aslo learned the importance of relying on a single email server. I was surprised at the stuff I couldn't do that relied on my primary email for confirmations etc.

I'm actually surprised to learn that you weren't using Google Workspace or Office 365 Dave. For the sake of $8-9/month per user, you can have all of the Google services, redundancy, spam filtering and 30-something email aliases. I haven't run my own mail server for decades and it's a bit of a thing of the past.

We used the Google suite for a while at my job, absolutely hated it since it was all crippled browser based stuff, they don't even have a proper desktop email client. When we were acquired we went back to Microsoft, which while I'm not the biggest fan of Microsoft, their Outlook for email and calendar blows the doors off of Google's clunky offerings. Google Docs is nice for shared documents but the lack of a desktop version really kills it for most other uses.

For email, yeah, I wouldn't bother hosting my own server, but for the client side, no way, I don't rent software, and I absolutely hate browser based productivity applications. They are a total pain in the ass and never offer the same functionality of a desktop application.

Pretty much 100% the opposite from me. Goes to show no one solution will please everyone.

Been whoring myself to google full time now for 10 years. Became permanent after I found out that the cat was sneakily pissing on my home servers. That shortened the life of all concerned.
 
The following users thanked this post: mathsquid

Offline EEVblog

  • Administrator
  • *****
  • Posts: 37728
  • Country: au
    • EEVblog
Re: The BIG EEVblog Server Fire
« Reply #27 on: April 08, 2021, 05:45:46 am »
I'm actually surprised to learn that you weren't using Google Workspace or Office 365 Dave. For the sake of $8-9/month per user, you can have all of the Google services, redundancy, spam filtering and 30-something email aliases. I haven't run my own mail server for decades and it's a bit of a thing of the past.

I have Office 365, but only use it for Word/Excel etc. No idea it had other email server related stuff.
 

Offline james_s

  • Super Contributor
  • ***
  • Posts: 21611
  • Country: us
Re: The BIG EEVblog Server Fire
« Reply #28 on: April 08, 2021, 05:47:50 am »
Pretty much 100% the opposite from me. Goes to show no one solution will please everyone.

Been whoring myself to google full time now for 10 years. Became permanent after I found out that the cat was sneakily pissing on my home servers. That shortened the life of all concerned.

Server is one thing, but desktop productivity software? The browser based stuff is a joke, it doesn't even work at all if your internet goes down, and in my case I was constantly logged out every time I cleared my browser cache which is something I do often at work. Since the calendar was browser based it couldn't integrate into the OS and provide meeting reminders so I kept missing meetings, total pain in the ass. Not to mention Google is an advertising company, their products are almost all trojan horses in a sense, their function is to entice you into giving them your valuable personal information. I use them for email and nothing else, if I log into a browser I ALWAYS make sure I log out immediately when I'm done.
 
The following users thanked this post: peter-h, Jacon

Offline Halcyon

  • Global Moderator
  • *****
  • Posts: 5669
  • Country: au
Re: The BIG EEVblog Server Fire
« Reply #29 on: April 08, 2021, 06:30:29 am »
I aslo learned the importance of relying on a single email server. I was surprised at the stuff I couldn't do that relied on my primary email for confirmations etc.

I'm actually surprised to learn that you weren't using Google Workspace or Office 365 Dave. For the sake of $8-9/month per user, you can have all of the Google services, redundancy, spam filtering and 30-something email aliases. I haven't run my own mail server for decades and it's a bit of a thing of the past.

That still seems like a lot of money.

I guess it depends on the individual but I use Google Workspace for my own private email on my own domain. It costs me AUD$8.32 per month and I have various email addresses for things, e.g,:
payments@ - Links to my bank account (in Australia you can transfer money instantly between bank accounts using email addresses/phone numbers), also used for Paypal
forums@ - Forum subscriptions
uber@ - Uber
sonos@ - Sonos

Email aliases are essentially "disposable" and I can create/delete them on a whim without having my private mailbox spammed because someone/a service has my personal email address.
« Last Edit: April 08, 2021, 06:32:10 am by Halcyon »
 

Offline TomS_

  • Frequent Contributor
  • **
  • Posts: 834
  • Country: gb
Re: The BIG EEVblog Server Fire
« Reply #30 on: April 08, 2021, 06:35:03 am »
I thought datacenters typically used Halon fire suppression systems?
I don't think anyone uses Halon any more, it would be something like FM200 these days. And yes, this would usually be installed in the server rooms.

You would usually get something like 30 seconds from alarms and strobes activating to evacuate before it is dumped into the room because yes, it's whole purpose is to displace oxygen to starve a fire.

But if the fire was caused by one of the generators, it's anyone's guess whether they installed the same fire suppression in that area. Presumably not if it got this bad!?

But there seems to be a missing piece of the puzzle (or at least I've missed it)... They say the fire was caused by a generator. The generators would not normally be installed in the same room as servers. So how does a generator fire end up leading to water damage to servers?? Unless it burned through a wall??

OVH also suffered a major fire recently. Interesting trend of events.
 

Offline Ranayna

  • Frequent Contributor
  • **
  • Posts: 861
  • Country: de
Re: The BIG EEVblog Server Fire
« Reply #31 on: April 08, 2021, 06:56:28 am »
Yay, EEVBlog is back :)

Interestingly the second datacenter fire that affected services that i use in only a couple of months. According to statistics the services i use should be save now for some time :p

Anyway, always interesting to see what can happen. Makes me glad that my employer has definitly not skimped of the equipment and setup of our internal datacenter.
Our internal datacenter is quite small, we only have 26 Racks. But it is set up very well: redundant cooling, redundant power (one leg UPS and generator protected). All the critical components are in separated rooms with at least 2 firewalls (the brick kind ;) ) separating them.
The room with the racks and the UPS room are equipped with a Novec fire suppression system. The generator room, on the other hand, is not. It was explained that it was, first, not really needed, since the generator has its own room, the fuel has its own room, and the datacenter itself is two rooms away. And secondly, misfires of the required early detection system might trigger on the slightest leaks of the generator. And refilling the Novec would supposedly cost around 20.000 Euro :D

This kind of setup is expensive, but at least for us, it seems to be worth it: We had not a single outage beyond faults of individual servers. Yet...  :-BROKE
 

Offline james_s

  • Super Contributor
  • ***
  • Posts: 21611
  • Country: us
Re: The BIG EEVblog Server Fire
« Reply #32 on: April 08, 2021, 07:18:29 am »
I don't think anyone uses Halon any more, it would be something like FM200 these days. And yes, this would usually be installed in the server rooms.

You would usually get something like 30 seconds from alarms and strobes activating to evacuate before it is dumped into the room because yes, it's whole purpose is to displace oxygen to starve a fire.

But if the fire was caused by one of the generators, it's anyone's guess whether they installed the same fire suppression in that area. Presumably not if it got this bad!?

But there seems to be a missing piece of the puzzle (or at least I've missed it)... They say the fire was caused by a generator. The generators would not normally be installed in the same room as servers. So how does a generator fire end up leading to water damage to servers?? Unless it burned through a wall??

OVH also suffered a major fire recently. Interesting trend of events.

Ah, yeah FM200 was what we had at that place, I just assumed it was a brand name for a Halon system, I never really looked into it.

Yes I wondered about that too, I would have thought they'd have the generators in a separate structure, but who knows, maybe the datacenter is in a highrise or something. I actually spent a night in Ogden UT once but it was close to 15 years ago and I don't think I ever went into town.
 

Offline peter-h

  • Super Contributor
  • ***
  • Posts: 3694
  • Country: gb
  • Doing electronics since the 1960s...
Re: The BIG EEVblog Server Fire
« Reply #33 on: April 08, 2021, 08:03:43 am »
Well done on EEVBLOG for having kept good backups.

Not many people do that.

Also a lot of "hits" on forums are malicious, perhaps organised by people who got banned for behaving badly.
« Last Edit: April 08, 2021, 08:07:16 am by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline tszaboo

  • Super Contributor
  • ***
  • Posts: 7364
  • Country: nl
  • Current job: ATEX product design
Re: The BIG EEVblog Server Fire
« Reply #34 on: April 08, 2021, 08:08:41 am »
The lesson here is, whilst it's great to have a fully redundant automatic backup server, it was kinda silly to have it in the same datacenter!
We are going to ask Gorillaservers is they can provision one of the boxes in their LA data center, so if a whole city/state goes out the server will still operate.
I was taught by the back-end people that your second server shall be on another continent, with a different government in charge of the company running it.
 

Offline Ed.Kloonk

  • Super Contributor
  • ***
  • Posts: 4000
  • Country: au
  • Cat video aficionado
Re: The BIG EEVblog Server Fire
« Reply #35 on: April 08, 2021, 08:51:07 am »
The lesson here is, whilst it's great to have a fully redundant automatic backup server, it was kinda silly to have it in the same datacenter!
We are going to ask Gorillaservers is they can provision one of the boxes in their LA data center, so if a whole city/state goes out the server will still operate.
I was taught by the back-end people that your second server shall be on another continent, with a different government in charge of the company running it.

It might have sounded prudent some while ago but these days, with numerous jurisdictions in each other's pockets, I don't think it matters anymore.
iratus parum formica
 

Offline peter-h

  • Super Contributor
  • ***
  • Posts: 3694
  • Country: gb
  • Doing electronics since the 1960s...
Re: The BIG EEVblog Server Fire
« Reply #36 on: April 08, 2021, 09:06:11 am »
Also a "mirror" server won't protect you from a clever attack, where only older data is deleted, perhaps over time.

One has to do several things at the same time, including a database snapshot every day or so and keep the snapshots for many months.

Hacking is a big thing these days - much bigger than servers blowing up.
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline Towger

  • Super Contributor
  • ***
  • Posts: 1645
  • Country: ie
Re: The BIG EEVblog Server Fire
« Reply #37 on: April 08, 2021, 09:23:33 am »
But there seems to be a missing piece of the puzzle (or at least I've missed it)... They say the fire was caused by a generator. The generators would not normally be installed in the same room as servers. So how does a generator fire end up leading to water damage to servers?? Unless it burned through a wall??

From the sounds of it, I would not be surprised if the generators were on the roof.  Normally I would have expected the generators to be in a separate building.  But I have heard of similar issues with mission critical UPS systems failing.  If you have a fire and call you call any local fire department, they will want all power turned off.  They don't care about your business model etc, minimising risk to their firefighters is more important.  The same goes if you call out the lifeboat, their job is to save life, saving the vessel is of secondary importance.
 

Offline EEVblog

  • Administrator
  • *****
  • Posts: 37728
  • Country: au
    • EEVblog
Re: The BIG EEVblog Server Fire
« Reply #38 on: April 08, 2021, 09:30:08 am »
Yes I wondered about that too, I would have thought they'd have the generators in a separate structure, but who knows, maybe the datacenter is in a highrise or something. I actually spent a night in Ogden UT once but it was close to 15 years ago and I don't think I ever went into town.

it's here:
https://www.google.com/maps/place/119+600+W+Bldg+3B,+Ogden,+UT+84404,+USA/@41.2631939,-111.994884,17z/data=!3m1!4b1!4m5!3m4!1s0x87530c31eeede3af:0xe0843e2258f1e68c!8m2!3d41.2631939!4d-111.9926953

 
The following users thanked this post: james_s

Offline EEVblog

  • Administrator
  • *****
  • Posts: 37728
  • Country: au
    • EEVblog
Re: The BIG EEVblog Server Fire
« Reply #39 on: April 08, 2021, 09:31:39 am »
Well done on EEVBLOG for having kept good backups.

Didn't need the backups in the end though.

Quote
Also a lot of "hits" on forums are malicious, perhaps organised by people who got banned for behaving badly.

It's happened.
 

Offline Towger

  • Super Contributor
  • ***
  • Posts: 1645
  • Country: ie
Re: The BIG EEVblog Server Fire
« Reply #40 on: April 08, 2021, 09:33:07 am »
Also a "mirror" server won't protect you from a clever attack, where only older data is deleted, perhaps over time.

Royal Bank of Scotland learned that the hard way, it took years to properly sort out: https://en.wikipedia.org/wiki/2012_RBS_Group_computer_system_problems
In their case it was incompetence. They could have pulled the plug on the main system before it was automatically replicated on the backup system, a couple of hours later.

 

Offline MadScientist

  • Frequent Contributor
  • **
  • Posts: 439
  • Country: 00
Re: The BIG EEVblog Server Fire
« Reply #41 on: April 08, 2021, 09:49:25 am »
I’d be really interested to know what happened and see a layout of the building

Genset fires are not uncommon , ( IC engines etc) and typically are positioned well away from the devices they powered, airgapped by 100s of feet , walls , separate buildings etc. Gensets should be able to burn to the ground and not damage their associated systems. Not to mention comprehensive fire suppression

Someone skimped here at gorilla methinks
« Last Edit: April 08, 2021, 09:52:28 am by MadScientist »
EE's: We use silicon to make things  smaller!
 

Offline Ed.Kloonk

  • Super Contributor
  • ***
  • Posts: 4000
  • Country: au
  • Cat video aficionado
Re: The BIG EEVblog Server Fire
« Reply #42 on: April 08, 2021, 09:53:59 am »
I’d be really interested to know what happened and see a layout of the building

Genset fires are not uncommon , ( IC engines etc) and typically are positioned well away from the devices they powered, airgapped by 100s of feet , walls , separate buildings etc. Gensets should be able to burn to the ground and not damage their associated systems. Not to mention comprehensive fire suppression

Someone skimped here at gorilla methinks

My money's on switchboxes. They really can and do indeed release magic black smoke.
iratus parum formica
 

Online Ian.M

  • Super Contributor
  • ***
  • Posts: 12852
Re: The BIG EEVblog Server Fire
« Reply #43 on: April 08, 2021, 09:59:10 am »
EEVblog got an honorable mention in the article about the incident at 'The Register': https://www.theregister.com/2021/04/06/webnx_data_fire/

The best unofficial live discussion thread about the incident is at: https://www.webhostingtalk.com/showthread.php?t=1842301
It has a lot of stuff that WebNX and GorillaServers were either slow to, or did not, publicly release.
 
The following users thanked this post: duckduck

Offline MadScientist

  • Frequent Contributor
  • **
  • Posts: 439
  • Country: 00
Re: The BIG EEVblog Server Fire
« Reply #44 on: April 08, 2021, 10:01:18 am »
I’d be really interested to know what happened and see a layout of the building

Genset fires are not uncommon , ( IC engines etc) and typically are positioned well away from the devices they powered, airgapped by 100s of feet , walls , separate buildings etc. Gensets should be able to burn to the ground and not damage their associated systems. Not to mention comprehensive fire suppression

Someone skimped here at gorilla methinks

My money's on switchboxes. They really can and do indeed release magic black smoke.

Hmm, normally the changeover systems are in the genset space not the supported devices space, the switchgear there would be in use daily using the grid supplies etc and wouldn’t or shouldn’t fail like it did. ( if it did ) , sometimes the changeover system burns out , but again this should be significantly airgapped 
EE's: We use silicon to make things  smaller!
 

Offline Ultrapurple

  • Super Contributor
  • ***
  • Posts: 1027
  • Country: gb
  • Just zis guy, you know?
    • Therm-App Users on Flickr
Re: The BIG EEVblog Server Fire
« Reply #45 on: April 08, 2021, 10:06:35 am »
Whatever the cause of the fire and resultant outage, congratulations to all concerned for getting EEVblog (and presumably many other systems) back up and running - even at mildly reduced capacity - so quickly.
Rubber bands bridge the gap between WD40 and duct tape.
 
The following users thanked this post: Towger

Offline tggzzz

  • Super Contributor
  • ***
  • Posts: 19463
  • Country: gb
  • Numbers, not adjectives
    • Having fun doing more, with less
Re: The BIG EEVblog Server Fire
« Reply #46 on: April 08, 2021, 10:18:16 am »
The lesson here is, whilst it's great to have a fully redundant automatic backup server, it was kinda silly to have it in the same datacenter!

You're not the first to notice that, and won't be the last.

The Brown's Ferry fire very nearly caused two nuke reactors to meltdown. It was caused by a candle :)

Talking about the control circuits (my emphasis):
Quote
Speaking of tape recorders, there was one really interesting phone conversation between J. R. Calhoun, the chef of TVA’s Nuclear Generation Branch at the time and Frank Long of the NRC (and reported by a Canadian website):
    Calhoun: Yah, you know everything for those two units comes through that one room. It’s common to both units, just like the control room is common to both units.

https://hackaday.com/2018/12/06/fail-of-the-week-1975-the-browns-ferry-nuclear-incident/
There are lies, damned lies, statistics - and ADC/DAC specs.
Glider pilot's aphorism: "there is no substitute for span". Retort: "There is a substitute: skill+imagination. But you can buy span".
Having fun doing more, with less
 

Offline tggzzz

  • Super Contributor
  • ***
  • Posts: 19463
  • Country: gb
  • Numbers, not adjectives
    • Having fun doing more, with less
Re: The BIG EEVblog Server Fire
« Reply #47 on: April 08, 2021, 10:24:42 am »
One has to do several things at the same time, including a database snapshot every day or so and keep the snapshots for many months.

... and test the backups.

Old story: backups to tape drive completed successfully, but when needed the tapes were found to be blank. The tape head was electrically connected but hanging loose away from the tapes!
There are lies, damned lies, statistics - and ADC/DAC specs.
Glider pilot's aphorism: "there is no substitute for span". Retort: "There is a substitute: skill+imagination. But you can buy span".
Having fun doing more, with less
 
The following users thanked this post: tooki

Offline Ed.Kloonk

  • Super Contributor
  • ***
  • Posts: 4000
  • Country: au
  • Cat video aficionado
Re: The BIG EEVblog Server Fire
« Reply #48 on: April 08, 2021, 10:34:46 am »


My money's on switchboxes. They really can and do indeed release magic black smoke.

Hmm, normally the changeover systems are in the genset space not the supported devices space, the switchgear there would be in use daily using the grid supplies etc and wouldn’t or shouldn’t fail like it did. ( if it did ) , sometimes the changeover system burns out , but again this should be significantly airgapped

Don't underestimate the power of fire protection systems and the fire department.

A plant near me recently had the control room catch fire. Such a mess. The panel where the fire started was vaporized. The adjacent panels had the wiring behind get so hot that all that is left looks like the wiring in a cheap mattress. The auto fire suppression system went nuts and sprayed yellow/orange gunk all over other panels not near the fire. And the boys that were sent to fix it all now need counseling, because of 60+? years of control mods that nobody kept track of.

Management: "But will it be all working again by this time next week?"

 :popcorn:
iratus parum formica
 

Offline MadScientist

  • Frequent Contributor
  • **
  • Posts: 439
  • Country: 00
Re: The BIG EEVblog Server Fire
« Reply #49 on: April 08, 2021, 10:35:35 am »
All their reports say they had a mechanical genset  failure resulting in a fire. So this isn’t a switch gear fault I suspect. The issue seems to be the proximity  of servers to the Gensets. It sounds like they were in the same space  :palm:
« Last Edit: April 08, 2021, 10:37:35 am by MadScientist »
EE's: We use silicon to make things  smaller!
 
The following users thanked this post: Ed.Kloonk

Offline madires

  • Super Contributor
  • ***
  • Posts: 7752
  • Country: de
  • A qualified hobbyist ;)
Re: The BIG EEVblog Server Fire
« Reply #50 on: April 08, 2021, 10:52:16 am »
I thought datacenters typically used Halon fire suppression systems? The last place I worked that had an onsite datacenter had one, there were warning strobes to indicate the system had discharged and asphyxiation warning signs.

Halon was banned, but old fire suppression systems may still use it. And there are a few alternatives, i.e. FM-200. Anyhow, you'll also find classic water based fire suppression systems in many data centers. Some of them create a misty spray.
 

Offline YurkshireLad

  • Frequent Contributor
  • **
  • Posts: 365
  • Country: ca
Re: The BIG EEVblog Server Fire
« Reply #51 on: April 08, 2021, 11:38:03 am »
My question is, are you staying with them or moving to another service provider?  :)
 

Offline Gyro

  • Super Contributor
  • ***
  • Posts: 9474
  • Country: gb
Re: The BIG EEVblog Server Fire
« Reply #52 on: April 08, 2021, 11:50:50 am »
My question is, are you staying with them or moving to another service provider?  :)

That's a question of whether you stick with the provider that's already had the fire, and hopefully learned something from the experience - or go with a new provider that hasn't had a fire yet;)
Best Regards, Chris
 
The following users thanked this post: SeanB, Damianos

Offline coppice

  • Super Contributor
  • ***
  • Posts: 8636
  • Country: gb
Re: The BIG EEVblog Server Fire
« Reply #53 on: April 08, 2021, 11:55:01 am »
Halon was banned, but old fire suppression systems may still use it. And there are a few alternatives, i.e. FM-200. Anyhow, you'll also find classic water based fire suppression systems in many data centers. Some of them create a misty spray.
In the early 80s the company I worked for kept changing its insurers each year. These alternated between pro-halon insurers and pro-water insurers. We had halon and sprinkler systems alternately installed and ripped out of the computer room I oversaw for several years.
 

Offline coppice

  • Super Contributor
  • ***
  • Posts: 8636
  • Country: gb
Re: The BIG EEVblog Server Fire
« Reply #54 on: April 08, 2021, 11:56:28 am »
My question is, are you staying with them or moving to another service provider?  :)
Why abandon one of the hottest operations in the business? Are you concerned their service may be cooling off now?
 

Offline YurkshireLad

  • Frequent Contributor
  • **
  • Posts: 365
  • Country: ca
Re: The BIG EEVblog Server Fire
« Reply #55 on: April 08, 2021, 11:56:47 am »
My question is, are you staying with them or moving to another service provider?  :)

That's a question of whether you stick with the provider that's already had the fire, and hopefully learned something from the experience - or go with a new provider that hasn't had a fire yet;)

True, or do you stick with a provider that had a fire or move to one that hasn't.  ;D

It is a business after all.
 

Online Ian.M

  • Super Contributor
  • ***
  • Posts: 12852
Re: The BIG EEVblog Server Fire
« Reply #56 on: April 08, 2021, 12:04:37 pm »
My question is, are you staying with them or moving to another service provider?  :)
Better the devil you know.   At least you can be fairly certain they'll put measures in place to prevent or reduce the impact of similar future incidents, and have some idea of their actual disaster recovery capabilities.

It would be nice if WebNX added a large NAS with its own power management in a shipping container at both their locations, with high bandwidth connectivity into their data centers.   Its only a bit over 600 miles as the crow flies between their locations, (729 miles by road, est. 10.5H driving time), so if they had another extended outage, the whole NAS could be shipped to their other location to facilitate getting fall-back servers online without the bandwidth implications of either uploading from off-site backups, or live syncing over a significant geographical distance.  They'd also need a priority standby contract with a local shipping company at each end to provide a self-loading container truck and two drivers with N hours notice 24/7.
« Last Edit: April 08, 2021, 12:14:02 pm by Ian.M »
 

Offline wilfred

  • Super Contributor
  • ***
  • Posts: 1252
  • Country: au
Re: The BIG EEVblog Server Fire
« Reply #57 on: April 08, 2021, 12:10:25 pm »
For prosperity....including the bad spelling:
Also for posterity.
 
The following users thanked this post: Microdoser

Offline wilfred

  • Super Contributor
  • ***
  • Posts: 1252
  • Country: au
Re: The BIG EEVblog Server Fire
« Reply #58 on: April 08, 2021, 12:21:56 pm »
My question is, are you staying with them or moving to another service provider?  :)

That's a question of whether you stick with the provider that's already had the fire, and hopefully learned something from the experience - or go with a new provider that hasn't had a fire yet;)

True, or do you stick with a provider that had a fire or move to one that hasn't.  ;D

It is a business after all.

The real question is were they prepared as they claimed they would be and did they recover as they promised they would. They can only control how they plan and respond. No-one knows which disaster scenario will play out. It could have been a Texas like power outage and the trucks delivering diesel for the generators may not have been able to get through to resupply.

I do think Dave could (should?) make preparations to restart from offsite backups. From what I read there was only 10% of servers water affected. How long would it have taken to recover from a total loss of all the site. I can't imagine sufficient servers would be readily available.

 

Offline cdev

  • Super Contributor
  • ***
  • !
  • Posts: 7350
  • Country: 00
Re: The BIG EEVblog Server Fire
« Reply #59 on: April 08, 2021, 12:44:14 pm »
They put their backup server in LA?

The city thats been cinematically annihilated more times than any other!

 Why one only needs to look at maps from not so long ago to realize that California is likely to tear itself off of the mainand and strike out on its own at it's earliest opportunity..

I figured this event needed it's own thread, so moved it from the servere reports thread.
HUGE thanks to gnif for handling this:
https://hostfission.com/


Fission? Degraded state?

The server was down from 2021-04-04 21:13 UTC to 2021-04-08 03:36 UTC

It's currently still operating in a degraded state, and performance is surrently impacted until the caches catch up.
Gorillaservers upgraded the server box (maybe the old box was water damaged?) from Dual Xeon 2620V2 from the older dual L5630

Presumably they'll upgrade the other redundant box too to match, but the 2nd box is not currently online yet.

The lesson here is, whilst it's great to have a fully redundant automatic backup server, it was kinda silly to have it in the same datacenter!
We are going to ask Gorillaservers is they can provision one of the boxes in their LA data center, so if a whole city/state goes out the server will still operate.


The city of LA is not a secure location. It's as insecure a location as there is in this country.

They suppress all news to the contrary, as a favor to the real estate industry.
Ive been told by a relative who was in a position to know.

They have lots of tornadoes (and waterspouts, of salty water, which Ive seen between the city and Santa Cataline Island. )
And substantial numbers of earthquakes.

I aslo learned the importance of relying on a single email server. I was surprised at the stuff I couldn't do that relied on my primary email for confirmations etc.
[/quote]

Its truly horrible how much we depend on easily broken computers and unreliable networks.

What if (insert unspeakable tragedy here) ? Huh?

what if.. We need more resilience and redundancy..
« Last Edit: April 08, 2021, 12:52:48 pm by cdev »
"What the large print giveth, the small print taketh away."
 

Offline madires

  • Super Contributor
  • ***
  • Posts: 7752
  • Country: de
  • A qualified hobbyist ;)
Re: The BIG EEVblog Server Fire
« Reply #60 on: April 08, 2021, 01:21:06 pm »
It would be nice if WebNX added a large NAS with its own power management in a shipping container at both their locations, with high bandwidth connectivity into their data centers.   Its only a bit over 600 miles as the crow flies between their locations, (729 miles by road, est. 10.5H driving time), so if they had another extended outage, the whole NAS could be shipped to their other location to facilitate getting fall-back servers online without the bandwidth implications of either uploading from off-site backups, or live syncing over a significant geographical distance.  They'd also need a priority standby contract with a local shipping company at each end to provide a self-loading container truck and two drivers with N hours notice 24/7.

First off, SAN, not NAS. Secondly, such an off-site backup requires fat pipes anyway for read and write, e.g. 100 Gbps Ethernet. So moving the SAN around won't speed up things. Actually it would cause a delay by the transport and also would add the risk of being damaged during transport, by a traffic accident for example.
« Last Edit: April 08, 2021, 01:24:06 pm by madires »
 

Offline Ultrapurple

  • Super Contributor
  • ***
  • Posts: 1027
  • Country: gb
  • Just zis guy, you know?
    • Therm-App Users on Flickr
Re: The BIG EEVblog Server Fire
« Reply #61 on: April 08, 2021, 03:26:44 pm »
Never underestimate the bandwidth of a station wagon full of tapes hurtling down the highway.

— Andrew S Tanenbaum, 1989
Rubber bands bridge the gap between WD40 and duct tape.
 
The following users thanked this post: Ed.Kloonk, Ian.M

Offline Microdoser

  • Frequent Contributor
  • **
  • Posts: 423
  • Country: gb
Re: The BIG EEVblog Server Fire
« Reply #62 on: April 08, 2021, 03:33:27 pm »
This has made me consider my own data safety and the importance of having an off-site backup of at leat the most critical things.
 

Offline Ultrapurple

  • Super Contributor
  • ***
  • Posts: 1027
  • Country: gb
  • Just zis guy, you know?
    • Therm-App Users on Flickr
Re: The BIG EEVblog Server Fire
« Reply #63 on: April 08, 2021, 03:40:44 pm »
I have in front of me a 1TB microSD card, 15 x 11 x 1mm thick (fascinating hi-res X-Rays here). That works out at roughly 6GB per cubic millimetre.

Assuming we have a VW Passat (which claims 1780 litres of cargo space with the rear seats down), our modern equivalent of the station-wagon-full of tapes is potentially something like 1780 x (100 x 100 x 100) x 6GB, or around 10 exabytes (10x10^9 GB).

And yes I know we wouldn't get that capacity, they're slow to write to, and so on and so on, but it does make one think.
« Last Edit: April 08, 2021, 03:45:35 pm by Ultrapurple »
Rubber bands bridge the gap between WD40 and duct tape.
 

Online Ian.M

  • Super Contributor
  • ***
  • Posts: 12852
Re: The BIG EEVblog Server Fire
« Reply #64 on: April 08, 2021, 03:45:21 pm »
 

Offline madires

  • Super Contributor
  • ***
  • Posts: 7752
  • Country: de
  • A qualified hobbyist ;)
Re: The BIG EEVblog Server Fire
« Reply #65 on: April 08, 2021, 03:48:03 pm »
Never underestimate the bandwidth of a station wagon full of tapes hurtling down the highway.

— Andrew S Tanenbaum, 1989

Or the old fashioned sneakernet (https://en.wikipedia.org/wiki/Sneakernet) ;)

Anyhow, do your risk assessment!
 

Offline Ultrapurple

  • Super Contributor
  • ***
  • Posts: 1027
  • Country: gb
  • Just zis guy, you know?
    • Therm-App Users on Flickr
Re: The BIG EEVblog Server Fire
« Reply #66 on: April 08, 2021, 04:16:19 pm »
Never underestimate the bandwidth of a station wagon full of tapes hurtling down the highway.

— Andrew S Tanenbaum, 1989

Or the old fashioned sneakernet (https://en.wikipedia.org/wiki/Sneakernet) ;)

Yup, that was the source of my Tanenbaum quote.
Rubber bands bridge the gap between WD40 and duct tape.
 

Offline xmo

  • Regular Contributor
  • *
  • Posts: 193
  • Country: us
Re: The BIG EEVblog Server Fire
« Reply #67 on: April 08, 2021, 04:23:51 pm »
Day three without the forum....

TEA user:

"I have discovered something interesting."

I seem to be living with a woman!

Apparently, she is my wife.

She seems nice."
 
The following users thanked this post: schmitt trigger, Microdoser, duckduck

Offline tooki

  • Super Contributor
  • ***
  • Posts: 11471
  • Country: ch
Re: The BIG EEVblog Server Fire
« Reply #68 on: April 08, 2021, 04:31:22 pm »
Never underestimate the bandwidth of a station wagon full of tapes hurtling down the highway.

— Andrew S Tanenbaum, 1989
Yes, or of actual messenger pigeons!

With that said, the critical downside of the station wagon or pigeon is the latency! :P
 

Offline Microdoser

  • Frequent Contributor
  • **
  • Posts: 423
  • Country: gb
Re: The BIG EEVblog Server Fire
« Reply #69 on: April 08, 2021, 04:34:22 pm »
I have noticed the site, as we were told, seems a bit clunky and slow right now.

Still, gives me more time to fix up the I7 6700K 24GB Ram 256GB SSD that I found in the street fully working (except for the graphics card). It has a legitimate Windows 7 Ultimate OEM on it which I am currently getting Windows updates up to date in preparation to upgrade to Win 10 Pro
 

Offline Microdoser

  • Frequent Contributor
  • **
  • Posts: 423
  • Country: gb
Re: The BIG EEVblog Server Fire
« Reply #70 on: April 08, 2021, 04:35:58 pm »
Never underestimate the bandwidth of a station wagon full of tapes hurtling down the highway.

— Andrew S Tanenbaum, 1989
Yes, or of actual messenger pigeons!

With that said, the critical downside of the station wagon or pigeon is the latency! :P

Or the time someone put a CD on the back of a snail and sent it across a table and got better data transmission rates than using the LAN (at the time)
 
The following users thanked this post: tooki

Offline Ultrapurple

  • Super Contributor
  • ***
  • Posts: 1027
  • Country: gb
  • Just zis guy, you know?
    • Therm-App Users on Flickr
Re: The BIG EEVblog Server Fire
« Reply #71 on: April 08, 2021, 04:37:40 pm »
Never underestimate the bandwidth of a station wagon full of tapes hurtling down the highway.

— Andrew S Tanenbaum, 1989
Yes, or of actual messenger pigeons!

With that said, the critical downside of the station wagon or pigeon is the latency! :P

I'm sorry I'm dragging this so far off-topic; I promise to sit on the Naughty Step for a while and Think About What I've Done. But meanwhile, have a ponder about the effective latency in even, say, a thousand parallel, error-free, get-the-label-speed 10Gbps direct links for that amount of data. How many miles apart do your data centres have to be before it's quicker to send the data via wires than wheels?

And don't get me started on the effective data rate of a cargo plane worth of 6GB/mm3...

OK, I've arrived at the Naughty Step. Sitting in 3... 2... 1...
« Last Edit: April 08, 2021, 04:42:18 pm by Ultrapurple »
Rubber bands bridge the gap between WD40 and duct tape.
 

Offline madires

  • Super Contributor
  • ***
  • Posts: 7752
  • Country: de
  • A qualified hobbyist ;)
Re: The BIG EEVblog Server Fire
« Reply #72 on: April 08, 2021, 04:49:49 pm »
Yes, or of actual messenger pigeons!

With that said, the critical downside of the station wagon or pigeon is the latency! :P

... or a Tesla with autopilot turned on, or Mrs. Eagle looking for a nice meal to feed her children. If that was your only backup you'll have problem. >:D
 

Offline Hamster

  • Regular Contributor
  • *
  • Posts: 115
  • Country: us
Re: The BIG EEVblog Server Fire
« Reply #73 on: April 08, 2021, 04:54:52 pm »
New Article : https://datacenterfrontier.com/generator-catches-fire-causes-lengthy-data-center-outage-at-webnx/

Webhosting talk forum: https://www.webhostingtalk.com/showthread.php?t=1842301

---

I would abort continuing to host anything at this data center, here in Florida, most all data centers do NOT use water.. they use an FM200 Fire Suppression System, also, Generators are not co-located with the data center, they are usually detached just in case something like this happens, Generators due tend to go, and when the go, they go violently..  the worst part is, this data center probably had multiple gensets, covering the load, when one went down, it caused a cascade effect going down the line..  Fuel should of also been shut off and isolated, but it appears not?  There should of been 0 water damage inside the facilities, this clearly is some kind of failed engineering design of a building that probably should of not been designed to be a data center.

---

Another popular site ( pinside.com ) is still down.. they have recovered the data and restored it to a lower tiered server, but still down.. they have to "order' new hardware for them.
Arcade Board Repair Guru.  [ twitch: HammysHangout , youTube: Hammy Builds ]
 
The following users thanked this post: duckduck

Offline madires

  • Super Contributor
  • ***
  • Posts: 7752
  • Country: de
  • A qualified hobbyist ;)
Re: The BIG EEVblog Server Fire
« Reply #74 on: April 08, 2021, 05:05:20 pm »
And don't get me started on the effective data rate of a cargo plane worth of 6GB/mm3...

I'll take the bait. ;D So your data center #1 has burned down, nothing left, no network connectivity, no servers, no power. But you have a backup in data center #2. To which servers do you want to restore the backup?
 

Offline artag

  • Super Contributor
  • ***
  • Posts: 1064
  • Country: gb
Re: The BIG EEVblog Server Fire
« Reply #75 on: April 08, 2021, 05:14:52 pm »

From the sounds of it, I would not be surprised if the generators were on the roof.  Normally I would have expected the generators to be in a separate building.  But I have heard of similar issues with mission critical UPS systems failing.  If you have a fire and call you call any local fire department, they will want all power turned off.  They don't care about your business model etc, minimising risk to their firefighters is more important.  The same goes if you call out the lifeboat, their job is to save life, saving the vessel is of secondary importance.

All the reports seem to say that the generators caught fire, the remaining power was taken down by the fire department, and that some servers were in danger of water damage.

My conclusion is that the fire department applied plenty of water and a little of it got into a small area of servers. Perhaps near, but not colocated with the fiery genset.
 
 

Online ejeffrey

  • Super Contributor
  • ***
  • Posts: 3711
  • Country: us
Re: The BIG EEVblog Server Fire
« Reply #76 on: April 08, 2021, 05:15:56 pm »
I thought datacenters typically used Halon fire suppression systems?
I don't think anyone uses Halon any more, it would be something like FM200 these days. And yes, this would usually be installed in the server rooms.

You would usually get something like 30 seconds from alarms and strobes activating to evacuate before it is dumped into the room because yes, it's whole purpose is to displace oxygen to starve a fire.

Halon and FM200 *do not work* by displacing oxygen, that is a persistent myth. They work by neutralizing the free radicals that allow fire to propagate.  This allows them to be used in much smaller quantities than would be needed to extinguish a fire by oxygen displacement.  Typically they are used at <10% concentration so they only lower the oxygen concentration slightly.  You still want to evacuate because the distribution is not uniform and also because -- THERE IS A FIRE, but generally they are low risk to humans.

CO2 fire extinguisher systems do work by displacing oxygen, and full-room CO2 systems are not generally used in occupied areas because of this.
 

Offline james_s

  • Super Contributor
  • ***
  • Posts: 21611
  • Country: us
Re: The BIG EEVblog Server Fire
« Reply #77 on: April 08, 2021, 05:23:55 pm »
In the early 80s the company I worked for kept changing its insurers each year. These alternated between pro-halon insurers and pro-water insurers. We had halon and sprinkler systems alternately installed and ripped out of the computer room I oversaw for several years.

Seems like they could just leave both systems in place and disable the one they weren't using.
 

Offline james_s

  • Super Contributor
  • ***
  • Posts: 21611
  • Country: us
Re: The BIG EEVblog Server Fire
« Reply #78 on: April 08, 2021, 05:33:27 pm »
All their reports say they had a mechanical genset  failure resulting in a fire. So this isn’t a switch gear fault I suspect. The issue seems to be the proximity  of servers to the Gensets. It sounds like they were in the same space  :palm:

The pictures Dave posted look like a random light industrial park of the sort of place my friend's machine shop is in, there's even a sports bar & grill in one of the units. Looks like the genset has to be in the same space, there isn't anywhere else to put it.
 

Offline madires

  • Super Contributor
  • ***
  • Posts: 7752
  • Country: de
  • A qualified hobbyist ;)
Re: The BIG EEVblog Server Fire
« Reply #79 on: April 08, 2021, 05:56:43 pm »
Webhosting talk forum: https://www.webhostingtalk.com/showthread.php?t=1842301

---

I would abort continuing to host anything at this data center, here in Florida, most all data centers do NOT use water.. they use an FM200 Fire Suppression System, also, Generators are not co-located with the data center, they are usually detached just in case something like this happens, Generators due tend to go, and when the go, they go violently..  the worst part is, this data center probably had multiple gensets, covering the load, when one went down, it caused a cascade effect going down the line..  Fuel should of also been shut off and isolated, but it appears not?  There should of been 0 water damage inside the facilities, this clearly is some kind of failed engineering design of a building that probably should of not been designed to be a data center.

---

Wishful thinking! In case of a fire the fire brigade decides what to do. If they think it's a good idea to hose down all servers despite a FM-200 fire suppression system you can't do much about that. Local regulations might force you to use water. Or the management board tells you to use water because it's cheap and the insurance will pay for any damages. Fire is one of the more likely events, but there are also many more, and in those cases it doesn't matter if the fire suppression system uses water or something more hardware friendly. If you need to keep your platform running 24x7 you have to design a redundant solution, i.e. you can't rely on a single data center. Complaining about a data center using a water based fire suppression system is laughable. Simply do your homework!
 

Offline Bud

  • Super Contributor
  • ***
  • Posts: 6903
  • Country: ca
Re: The BIG EEVblog Server Fire
« Reply #80 on: April 08, 2021, 06:10:25 pm »
I am questioning the details of the culprit, that is  - what happened to the backup generatir, did they buy it on Alibaba?
Facebook-free life and Rigol-free shack.
 

Offline james_s

  • Super Contributor
  • ***
  • Posts: 21611
  • Country: us
Re: The BIG EEVblog Server Fire
« Reply #81 on: April 08, 2021, 06:15:58 pm »
I am questioning the details of the culprit, that is  - what happened to the backup generatir, did they buy it on Alibaba?

Generators fail. Even well known name brand stuff like Caterpillar, Detroit, etc blow up now and then, especially older ones which it was mentioned this was. These are big diesel engines, they require maintenance and occasionally stuff breaks. If it spins a bearing, or has an injector problem, or something like a bad seal in a turbocharger can cause the engine to run away consuming its own lubricating oil until it throws a rod through the side of the block and spills oil and/or fuel all over the hot exhaust system and then you have a fire.
 

Offline Bud

  • Super Contributor
  • ***
  • Posts: 6903
  • Country: ca
Re: The BIG EEVblog Server Fire
« Reply #82 on: April 08, 2021, 06:34:28 pm »
A backup generator's keword is 'backup', isn't it. Some robustness is supposed to be embedded in it from the get go starting from specs.
Facebook-free life and Rigol-free shack.
 

Offline PaulAm

  • Frequent Contributor
  • **
  • Posts: 938
  • Country: us
Re: The BIG EEVblog Server Fire
« Reply #83 on: April 08, 2021, 06:53:39 pm »
Don't forget that part of the backup power architecture is a humongous UPS.  It's job is to keep everything up until the genset is up and stable.  Quite possible something went wrong in that gear.

My thought after the first day was that EEs everywhere across the world were showing an unexplained increase in productivity  :-DD
 
The following users thanked this post: james_s, Jacon

Offline SilverSolder

  • Super Contributor
  • ***
  • Posts: 6126
  • Country: 00
Re: The BIG EEVblog Server Fire
« Reply #84 on: April 08, 2021, 07:09:41 pm »
I have in front of me a 1TB microSD card, 15 x 11 x 1mm thick (fascinating hi-res X-Rays here). That works out at roughly 6GB per cubic millimetre.

Assuming we have a VW Passat (which claims 1780 litres of cargo space with the rear seats down), our modern equivalent of the station-wagon-full of tapes is potentially something like 1780 x (100 x 100 x 100) x 6GB, or around 10 exabytes (10x10^9 GB).

And yes I know we wouldn't get that capacity, they're slow to write to, and so on and so on, but it does make one think.

1TB microSD still seems like sci-fi territory to me...   an amazing milestone.  Next:  10TB!  :D
 

Offline james_s

  • Super Contributor
  • ***
  • Posts: 21611
  • Country: us
Re: The BIG EEVblog Server Fire
« Reply #85 on: April 08, 2021, 07:17:35 pm »
1TB microSD still seems like sci-fi territory to me...   an amazing milestone.  Next:  10TB!  :D

Makes me think of the old spy movies where somebody is trying to smuggle a microfilm containing information. These days you could fit an entire library on a thumbnail sized micro SD card that is easily hidden inside nearly anything.
 
The following users thanked this post: SilverSolder

Offline SilverSolder

  • Super Contributor
  • ***
  • Posts: 6126
  • Country: 00
Re: The BIG EEVblog Server Fire
« Reply #86 on: April 08, 2021, 07:20:47 pm »
I know of a sizeable datacenter in east coast US serving the financial industry that went down due to the backup systems many years ago.  It is always Murphy that gets you: 


(1) External power failed, causing the building to switch to battery backup based on enormous banks of lead-acid batteries (a large room full) in anticipation of starting an array of big diesel generators.

(2) One of those batteries blew up under the sudden load, spraying an employee with acid.  The other men ran to his rescue and got him in a shower real fast (he was OK)...

(3) ...but sadly, this cost crucial minutes...  and the batteries ran out the second they tried to start the diesel generators...  BLINK, the center went pitch black, with no way to start the diesels!


 

Offline jonpaul

  • Super Contributor
  • ***
  • Posts: 3365
  • Country: fr
Re: The BIG EEVblog Server Fire
« Reply #87 on: April 08, 2021, 07:54:03 pm »
Hello: Thanks for the info,

Most pro server farms use a non-water fire suppression system eg nitrogen inerting.

Wonder why EEV blog did not select a large host form like AWS, Ionos, GoDaddy etc as a host.

I had never heard of this firm.

Kind Regards,
Jon
Jean-Paul  the Internet Dinosaur
 

Offline 3roomlab

  • Frequent Contributor
  • **
  • Posts: 825
  • Country: 00
Re: The BIG EEVblog Server Fire
« Reply #88 on: April 08, 2021, 08:06:03 pm »
maybe it is time to move to other countries with less power interruptions





 
The following users thanked this post: tooki, I wanted a rude username

Offline james_s

  • Super Contributor
  • ***
  • Posts: 21611
  • Country: us
Re: The BIG EEVblog Server Fire
« Reply #89 on: April 08, 2021, 08:56:23 pm »
A backup generator's keword is 'backup', isn't it. Some robustness is supposed to be embedded in it from the get go starting from specs.

Complex mechanical devices fail, it's just a fact of life. Turbine engines on aircraft are highly critical life safety items that are engineered to be extremely reliable and impeccably maintained, they still fail catastrophically now and then, sometimes with loss of life. Some robustness IS embedded in backup generators, it isn't like they blow up every day, but occasionally they still fail, especially if maintenance has been less than stellar, which is often the case if a company is not compelled to do it religiously the way aviation is. It's easy to look at the bottom line when the budget is tight and margins are slim and think "do we REALLY need to invest $$$ in rebuilding or replacing the genset or can we defer that to next year?" It happens, and it's much easier to predict and point fingers with the benefit of hindsight. If this place had a history of generators failing I would be more critical, but a single catastrophic failure is much too small of a sample to judge by. It could have been a total fluke where something just blew up due to a manufacturing defect that was never caught, or it could be it wasn't properly maintained, or it could be somebody monkeyed with it at some point, we don't know.
 

Offline bd139

  • Super Contributor
  • ***
  • Posts: 23018
  • Country: gb
Re: The BIG EEVblog Server Fire
« Reply #90 on: April 08, 2021, 10:04:20 pm »
Interesting thread.

Also after looking at the photos and Google maps, an observation:

Don’t go with a facility with no crash wall, no fence and parks a trailer probably with a nice propane bottle actually in the boundary of the building. If they do that then there’s probably 9000 even worse things inside the building you can’t see which are waiting to take your business out.

May be better to leverage an IaaS cloud here. The whole availability zone concept makes these events into non events. If a facility burns then you lose an availability zone, not your entire deployment.

I made this observation originally after mistakenly hosting half a rack of shit at a red neck provider here.

Edit: also always host your email somewhere completely different as you might find it difficult to contact support if your email server is on fire (did that once)
« Last Edit: April 08, 2021, 10:10:36 pm by bd139 »
 

Offline CatalinaWOW

  • Super Contributor
  • ***
  • Posts: 5226
  • Country: us
Re: The BIG EEVblog Server Fire
« Reply #91 on: April 09, 2021, 12:01:46 am »
I am questioning the details of the culprit, that is  - what happened to the backup generatir, did they buy it on Alibaba?

Generators fail. Even well known name brand stuff like Caterpillar, Detroit, etc blow up now and then, especially older ones which it was mentioned this was. These are big diesel engines, they require maintenance and occasionally stuff breaks. If it spins a bearing, or has an injector problem, or something like a bad seal in a turbocharger can cause the engine to run away consuming its own lubricating oil until it throws a rod through the side of the block and spills oil and/or fuel all over the hot exhaust system and then you have a fire.

Failure is expected.  Failure with catastrophic results is what is surprising.  Aerospace uses something called FMECA, (Failure Modes, Effects and Criticallity Analysis) to try to prevent the latter.  A spun bearing, or low oil or ....   It isn't perfect.  Things still slip through the cracks, but apparently we don't care that much about data loss, or respirator failure or anything else that depends on backup generators.
 

Offline ve7xen

  • Super Contributor
  • ***
  • Posts: 1192
  • Country: ca
    • VE7XEN Blog
Re: The BIG EEVblog Server Fire
« Reply #92 on: April 09, 2021, 12:11:53 am »
"do we REALLY need to invest $$$ in rebuilding or replacing the genset or can we defer that to next year?" It happens, and it's much easier to predict and point fingers with the benefit of hindsight. If this place had a history of generators failing I would be more critical, but a single catastrophic failure is much too small of a sample to judge by. It could have been a total fluke where something just blew up due to a manufacturing defect that was never caught, or it could be it wasn't properly maintained, or it could be somebody monkeyed with it at some point, we don't know.

And like you say, sometimes you still get bitten. We're pretty good about maintaining our generators, every 6 months a guy comes from the service company and does all the scheduled maintenance items according to the manufacturers' recommendations, we do monthly full-load tests, and so on.

A few years back we suffered a ~12h outage anyway. The genset fired up and took the load, no problem, and a few hours later, we got the overheat alarm and it shut itself down. Go to the bunker and find that one of the coolant hoses has burst and it's pumped all of its many litres of coolant onto the floor. We later learned that while the maintenance guy had been there two weeks before, and a 5-year replacement of the hoses was due, he happened to not have that one particular hose in his truck that day, and we allowed to be deferred to the next maintenance. Murphy gets you every time.
« Last Edit: April 09, 2021, 12:22:31 am by ve7xen »
73 de VE7XEN
He/Him
 

Offline EEVblog

  • Administrator
  • *****
  • Posts: 37728
  • Country: au
    • EEVblog
Re: The BIG EEVblog Server Fire
« Reply #93 on: April 09, 2021, 12:37:07 am »
I just got one months credit. So they implemented the 30 day limit clause.
 

Online bdunham7

  • Super Contributor
  • ***
  • Posts: 7818
  • Country: us
Re: The BIG EEVblog Server Fire
« Reply #94 on: April 09, 2021, 12:43:58 am »
I just got one months credit. So they implemented the 30 day limit clause.

At least they didn't invoke force majeure and weasel out entirely!
A 3.5 digit 4.5 digit 5 digit 5.5 digit 6.5 digit 7.5 digit DMM is good enough for most people.
 

Offline james_s

  • Super Contributor
  • ***
  • Posts: 21611
  • Country: us
Re: The BIG EEVblog Server Fire
« Reply #95 on: April 09, 2021, 01:53:29 am »
Failure is expected.  Failure with catastrophic results is what is surprising.  Aerospace uses something called FMECA, (Failure Modes, Effects and Criticallity Analysis) to try to prevent the latter.  A spun bearing, or low oil or ....   It isn't perfect.  Things still slip through the cracks, but apparently we don't care that much about data loss, or respirator failure or anything else that depends on backup generators.

We don't really know how much people care, because we know very little about the situation, only that a genset they refer to as "older" failed in some catastrophic manner and caused the outage. I've never shopped around for hosting of this sort but I would expect that prices vary widely and that to some extent, you get what you pay for in terms of reliability. There could be redundant sites, or at the very least, redundant gensets located in separate buildings, or newer/higher quality gensets, or any number of other things There is a lot of web hosting, this forum included, which I would not call mission critical. Nobody dies if it goes down for a few days and very few people are even particularly inconvenienced, is it worth paying more for higher reliability? Only Dave can answer that since it's his money, but given the rarity of outages I would vote no. I mean this is ONE outage, and it's impossible to measure reliability from a single failure. Maybe it was an accident waiting to happen and it is miraculous that things held on this long, or maybe it's all top notch high quality gear that is meticulously maintained and something still blew up because, sometimes no matter how careful you are shit happens, we don't know. I'm going to guess that the truth is somewhere in the middle.
 
The following users thanked this post: Jacon

Offline james_s

  • Super Contributor
  • ***
  • Posts: 21611
  • Country: us
Re: The BIG EEVblog Server Fire
« Reply #96 on: April 09, 2021, 01:58:04 am »
Interesting thread.

Also after looking at the photos and Google maps, an observation:

Don’t go with a facility with no crash wall, no fence and parks a trailer probably with a nice propane bottle actually in the boundary of the building. If they do that then there’s probably 9000 even worse things inside the building you can’t see which are waiting to take your business out.

The trailer wouldn't bother me particularly, it looks in good shape and those things don't often just go up unless someone is using one as a meth kitchen. Of all the details observable in the photos, restaurant in the building is the most eyebrow raising. It's not that rare for a restaurant to catch fire although even that doesn't happen every day.
 

Online tautech

  • Super Contributor
  • ***
  • Posts: 28323
  • Country: nz
  • Taupaki Technologies Ltd. Siglent Distributor NZ.
    • Taupaki Technologies Ltd.
Re: The BIG EEVblog Server Fire
« Reply #97 on: April 09, 2021, 02:19:58 am »
Failure is expected.  Failure with catastrophic results is what is surprising.  Aerospace uses something called FMECA, (Failure Modes, Effects and Criticallity Analysis) to try to prevent the latter.  A spun bearing, or low oil or ....   It isn't perfect.  Things still slip through the cracks, but apparently we don't care that much about data loss, or respirator failure or anything else that depends on backup generators.

We don't really know how much people care, because we know very little about the situation, only that a genset they refer to as "older" failed in some catastrophic manner and caused the outage. I've never shopped around for hosting of this sort but I would expect that prices vary widely and that to some extent, you get what you pay for in terms of reliability. There could be redundant sites, or at the very least, redundant gensets located in separate buildings, or newer/higher quality gensets, or any number of other things There is a lot of web hosting, this forum included, which I would not call mission critical. Nobody dies if it goes down for a few days and very few people are even particularly inconvenienced, is it worth paying more for higher reliability? Only Dave can answer that since it's his money, but given the rarity of outages I would vote no. I mean this is ONE outage, and it's impossible to measure reliability from a single failure. Maybe it was an accident waiting to happen and it is miraculous that things held on this long, or maybe it's all top notch high quality gear that is meticulously maintained and something still blew up because, sometimes no matter how careful you are shit happens, we don't know. I'm going to guess that the truth is somewhere in the middle.
Sure but a year or two back Dave changed hosts to this crowd due to server unreliability and in 8 years I've be aboard this one by far was the largest outage.
We may have been inconvenienced some however Dave will have taken a financial hit due to his shop being down and loss of forum advertising.
Dave's very lucky gnif does most of his admin on the cheap.
Avid Rabid Hobbyist
Siglent Youtube channel: https://www.youtube.com/@SiglentVideo/videos
 

Offline Ed.Kloonk

  • Super Contributor
  • ***
  • Posts: 4000
  • Country: au
  • Cat video aficionado
Re: The BIG EEVblog Server Fire
« Reply #98 on: April 09, 2021, 03:03:33 am »
I just got one months credit. So they implemented the 30 day limit clause.

At least they didn't invoke force majeure and weasel out entirely!

I was an act of God, I tell you.

https://en.wikipedia.org/wiki/The_Man_Who_Sued_God
iratus parum formica
 

Offline tooki

  • Super Contributor
  • ***
  • Posts: 11471
  • Country: ch
Re: The BIG EEVblog Server Fire
« Reply #99 on: April 09, 2021, 05:18:10 am »
Never underestimate the bandwidth of a station wagon full of tapes hurtling down the highway.

— Andrew S Tanenbaum, 1989
Yes, or of actual messenger pigeons!

With that said, the critical downside of the station wagon or pigeon is the latency! :P

I'm sorry I'm dragging this so far off-topic; I promise to sit on the Naughty Step for a while and Think About What I've Done. But meanwhile, have a ponder about the effective latency in even, say, a thousand parallel, error-free, get-the-label-speed 10Gbps direct links for that amount of data. How many miles apart do your data centres have to be before it's quicker to send the data via wires than wheels?

And don't get me started on the effective data rate of a cargo plane worth of 6GB/mm3...

OK, I've arrived at the Naughty Step. Sitting in 3... 2... 1...
Well, that’s not really what latency means... ;) (Latency is the minimum time for a single bit to go in one end of the “pipe” and come out the other end. As a metric, it isn’t dependent on the total size of the data you’re sending.)
 

Offline tooki

  • Super Contributor
  • ***
  • Posts: 11471
  • Country: ch
Re: The BIG EEVblog Server Fire
« Reply #100 on: April 09, 2021, 05:23:34 am »
maybe it is time to move to other countries with less power interruptions






As someone who has lived in both North Carolina (the worst-ranked place in those graphs) and Switzerland (the best-ranked in those graphs), anecdotally, my experience completely agrees with that! (When I moved from USA to Switzerland the second time, I didn’t even bother buying a UPS, since the power here never goes out. Good enough for my home computing.)
 

Offline Nusa

  • Super Contributor
  • ***
  • Posts: 2416
  • Country: us
Re: The BIG EEVblog Server Fire
« Reply #101 on: April 09, 2021, 05:31:38 am »
Dave scares gnif.

Sounds normal.
 

Offline hamster_nz

  • Super Contributor
  • ***
  • Posts: 2803
  • Country: nz
Re: The BIG EEVblog Server Fire
« Reply #102 on: April 09, 2021, 05:44:04 am »
How many miles apart do your data centres have to be before it's quicker to send the data via wires than wheels?

With a stupid number of media devices (e.g. tape drives) at either end the answer is:

10km / 6 mile

That is about the limit for single-mode fibre running at 10G Ethernet....
Gaze not into the abyss, lest you become recognized as an abyss domain expert, and they expect you keep gazing into the damn thing.
 

Offline peter-h

  • Super Contributor
  • ***
  • Posts: 3694
  • Country: gb
  • Doing electronics since the 1960s...
Re: The BIG EEVblog Server Fire
« Reply #103 on: April 09, 2021, 07:50:58 am »
There is no easy solution to server backup, because you never know what common vulnerabilities there are.

And how much do you want to pay? EEVBLOG probably doesn't make millions :) A fully redundant solution isn't cheap.

I run a few sites on virtual servers, with various backup policies (which I won't write about openly for obvious reasons) and if the virtual server company blew up and vanished for ever, I could start up a backup server which is a media PC running on an FTTP (80/30mbps) ADSL line :) That would actually be fast enough for EEVBLOG, on a bad day. Times have changed; this would not have been possible 10 years ago, and the bw required for EEVBLOG is probably of the order of 100-500GB/month which is nothing. And the whole real server is rsynced (only changed files copied) to this media PC every night. There are other backups of course because you cannot mirror the whole server, due to many open files etc. I would merely need to manually edit the DNS panel which is hosted by a company different to the server company. So the worst case is losing a day's data. It is practically impossible to lose everything, in this setup, and it is very cheap.


« Last Edit: April 09, 2021, 07:54:25 am by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline ve7xen

  • Super Contributor
  • ***
  • Posts: 1192
  • Country: ca
    • VE7XEN Blog
Re: The BIG EEVblog Server Fire
« Reply #104 on: April 09, 2021, 07:57:18 am »
How many miles apart do your data centres have to be before it's quicker to send the data via wires than wheels?

With a stupid number of media devices (e.g. tape drives) at either end the answer is:

10km / 6 mile

That is about the limit for single-mode fibre running at 10G Ethernet....

100G at 80km is trivial with off the shelf gear these days, and I don't imagine many datacentre interconnects are only 10G.

Even so, the wheels win when it takes longer to transfer the data than it would to drive the media. At 100G you can transfer ~45TB/hr. That's only 4 or 5 hard drives or LTO-8 tapes so it's not hard for the car to win. If you fill a typical station wagon small SUV (~2000L cargo area) with LTO-8 tapes (~275cm^3), you can fit about 7,500 tapes (x12GB native, no compression here) for 90PB. At 45TB/hr on your 100Gbps that would take 2000 hours during which you should be able to drive/sail anywhere on the planet. Of course it's usually much more practical to transfer it; copying the data to/from the media becomes a significant time sink itself, but that may or may not matter.

Fibre can only reasonably win this race when the data volume is relatively small, even when you start talking about 400G systems or multiple links, what you can transfer in an hour still fits in a suitcase.
« Last Edit: April 09, 2021, 07:58:55 am by ve7xen »
73 de VE7XEN
He/Him
 

Offline Syntax Error

  • Frequent Contributor
  • **
  • Posts: 584
  • Country: gb
Re: The BIG EEVblog Server Fire
« Reply #105 on: April 09, 2021, 09:13:33 am »
@eevblog : Welcome back :)

The pictures Dave posted look like a random light industrial park of the sort of place my friend's machine shop is in, there's even a sports bar & grill in one of the units. Looks like the genset has to be in the same space, there isn't anywhere else to put it.

I agree. I was wondering how water ended up on the racks. And there I was thinking the backup genny was in a shipping container some 100 feet from the main building. Just in case it caught fire. Nar, that is extra rental space.

Maybe not odd in Utah, but certainly odd looking in the Uk, the power control boxes, transformer and what I assume is the deisel fuelling point, are not caged in collision resistant fencing - or even a crash carrier. All it would take is a truck driver choking on a hotdog from the grill... and back to square one.

Morale of the story from both WebNX and OVH is, backup power systems are highly flammable! Just let the power go off and rebuild the filesystems, not the entire data center.
 

Offline hamster_nz

  • Super Contributor
  • ***
  • Posts: 2803
  • Country: nz
Re: The BIG EEVblog Server Fire
« Reply #106 on: April 09, 2021, 09:19:10 am »
Even so, the wheels win when it takes longer to transfer the data than it would to drive the media. At 100G you can transfer ~45TB/hr. That's only 4 or 5 hard drives or LTO-8 tapes so it's not hard for the car to win. If you fill a typical station wagon small SUV (~2000L cargo area) with LTO-8 tapes (~275cm^3), you can fit about 7,500 tapes (x12GB native, no compression here) for 90PB. At 45TB/hr on your 100Gbps that would take 2000 hours during which you should be able to drive/sail anywhere on the planet. Of course it's usually much more practical to transfer it; copying the data to/from the media becomes a significant time sink itself, but that may or may not matter.

Fibre can only reasonably win this race when the data volume is relatively small, even when you start talking about 400G systems or multiple links, what you can transfer in an hour still fits in a suitcase.

LOL Writing 7500 LTO-8 tapes and read them back in under 2,000 hours.  :-DD

Sure, the channel bandwidth of an SUV is high, but you can't transmit or receive the data at anything like that rate.

Each LTO-8 tape can take 8 hours to write (source: https://en.wikipedia.org/wiki/Linear_Tape-Open) and I'm guessing about the same to read back, that is around 15 hours to move 30TB - about 2TB per hour per drive pair (about 5Gb/s), even just moving data across the room.

To fill a SUV with tapes using a single drive will take about 12.8 years, and maybe another 12.8 years to read it back.

I'll take the 2000 hours (85 days or so) using a 100Gb fibre...



Gaze not into the abyss, lest you become recognized as an abyss domain expert, and they expect you keep gazing into the damn thing.
 
The following users thanked this post: Someone

Offline Ultrapurple

  • Super Contributor
  • ***
  • Posts: 1027
  • Country: gb
  • Just zis guy, you know?
    • Therm-App Users on Flickr
Re: The BIG EEVblog Server Fire
« Reply #107 on: April 09, 2021, 09:37:37 am »
Whilst it's been fun discussing the merits and demerits of high-capacity microSD cards vs terabit fibre vs No 8 wire, I think we have lost sight of an important point.

Dave provides the world - us - with a fantastic meeting place to discuss our ideas, and he does it without any cost to us. I salute him, and also all those who work with him to make this wonderful place happen.

Thank you Dave.
« Last Edit: April 09, 2021, 09:43:29 am by Ultrapurple »
Rubber bands bridge the gap between WD40 and duct tape.
 
The following users thanked this post: gnif, hamster_nz, peter-h, SilverSolder, CatalinaWOW, RoGeorge, Damianos, Jacon, bd139, MIS42N

Offline madires

  • Super Contributor
  • ***
  • Posts: 7752
  • Country: de
  • A qualified hobbyist ;)
Re: The BIG EEVblog Server Fire
« Reply #108 on: April 09, 2021, 11:47:45 am »
To fill a SUV with tapes using a single drive will take about 12.8 years, and maybe another 12.8 years to read it back.

I'll take the 2000 hours (85 days or so) using a 100Gb fibre...

Exactly! It's the same fallacy for moving a SAN with backups from one data center to another for restoring servers. The SAN has a limited read/write throughput, let's say 100 Gbps. So you would get a 100 Gbps link between both data centers to be able to backup with full throughput. The same is true for the other direction, i.e. restoring servers. Moving the SAN would cause a delay by the transport and would also add the risk of a traffic accident and possibly the complete loss of the backups. But it wouldn't speed up the restore process because its read/write throughput is still 100 Gbps.
 

Online Ian.M

  • Super Contributor
  • ***
  • Posts: 12852
Re: The BIG EEVblog Server Fire
« Reply #109 on: April 09, 2021, 12:01:34 pm »
That's not what Amazon claim for their AWS Snowmobile 100PB data transfer storage in a 45' shipping container.  They claim up to 1 TB/s aggregated over multiple 40Gb/s interfaces.  See their FAQ for details: https://aws.amazon.com/snowmobile/faqs/
 
The following users thanked this post: bd139

Offline Algoma

  • Frequent Contributor
  • **
  • Posts: 291
  • Country: ca
Re: The BIG EEVblog Server Fire
« Reply #110 on: April 09, 2021, 12:14:21 pm »
https://en.m.wikipedia.org/wiki/Terabit_Ethernet

112Gb/s SerDes in a single channel.. I can only imagine the rate that signal is being switched on and off, let alone trying to sample such a frequency.
 

Offline madires

  • Super Contributor
  • ***
  • Posts: 7752
  • Country: de
  • A qualified hobbyist ;)
Re: The BIG EEVblog Server Fire
« Reply #111 on: April 09, 2021, 12:23:55 pm »
That's not what Amazon claim for their AWS Snowmobile 100PB data transfer storage in a 45' shipping container.  They claim up to 1 TB/s aggregated over multiple 40Gb/s interfaces.  See their FAQ for details: https://aws.amazon.com/snowmobile/faqs/

Sorry, but I don't get your point. :-//
 

Offline SilverSolder

  • Super Contributor
  • ***
  • Posts: 6126
  • Country: 00
Re: The BIG EEVblog Server Fire
« Reply #112 on: April 09, 2021, 12:29:59 pm »
[...]  we had to patch on an old stupid design with the oldest technology. [...]


Some old technology was built really well, and has stood the test of time and proven reliable and inexpensive.  (Incandescent light bulbs?)

The problem is - how to identify what technology of today, that will gain a good reputation and respect over the next several decades! 
 

Offline bd139

  • Super Contributor
  • ***
  • Posts: 23018
  • Country: gb
Re: The BIG EEVblog Server Fire
« Reply #113 on: April 09, 2021, 12:52:10 pm »
The problem is - how to identify what technology of today, that will gain a good reputation and respect over the next several decades!

That's fairly easy. Look out for stuff at conferences, then avoid the shit out of it. You want the stale unsexy things everyone takes for granted.

My favourite tools this week are SQLite and Python. Both ancient in the scale of things :)

100G at 80km is trivial with off the shelf gear these days, and I don't imagine many datacentre interconnects are only 10G.

Even so, the wheels win when it takes longer to transfer the data than it would to drive the media. At 100G you can transfer ~45TB/hr. That's only 4 or 5 hard drives or LTO-8 tapes so it's not hard for the car to win. If you fill a typical station wagon small SUV (~2000L cargo area) with LTO-8 tapes (~275cm^3), you can fit about 7,500 tapes (x12GB native, no compression here) for 90PB. At 45TB/hr on your 100Gbps that would take 2000 hours during which you should be able to drive/sail anywhere on the planet. Of course it's usually much more practical to transfer it; copying the data to/from the media becomes a significant time sink itself, but that may or may not matter.

Fibre can only reasonably win this race when the data volume is relatively small, even when you start talking about 400G systems or multiple links, what you can transfer in an hour still fits in a suitcase.

Actually a surprising chunk of peering is only at 10G per link. The ISP I use has only got 320G aggregate across all links which isn't a lot in the scale of things and it only averages 100-200G. That has tens of thousands of leechers on it.

As for transit, as mentioned AWS snowball/snowmobile type solutions are best for moving stuff around in large chunks. You can get 100G into one of them without having to dig up any roads.

But better to boil the frog slowly. When I migrated 130TB over to S3 a couple of years back, we built a service abstraction over the SAN and S3 so it used S3 as read-write-through cache for the SAN. This allowed us to sling all the stuff up over a dedicated DirectConnect up to S3 over the space of a few months without introducing any link capacity problems or having to do any nasty switch overs.
 
The following users thanked this post: SilverSolder

Offline schmitt trigger

  • Super Contributor
  • ***
  • Posts: 2218
  • Country: mx
Re: The BIG EEVblog Server Fire
« Reply #114 on: April 09, 2021, 01:13:03 pm »
This incident clearly indicates how dependent we have become of technology in general, the WWW in particular.

As such I am sure, well almost sure, that Dave will release a video of the incident, once that the investigation has been completed.
It will be an extremely interesting episode.
 

Offline BrokenYugo

  • Super Contributor
  • ***
  • Posts: 1100
  • Country: us
Re: The BIG EEVblog Server Fire
« Reply #115 on: April 09, 2021, 03:52:03 pm »
A backup generator's keword is 'backup', isn't it. Some robustness is supposed to be embedded in it from the get go starting from specs.

On the other hand outside of some motorsports applications, where spectacular failure is just part of the game, I can't think of harder duty for an IC engine than standby generator service. It's always either sitting around or put on a heavy load from a cold start.

I once met a guy who rebuilt big diesel engines, and asked what most of them came from, the answer was mostly generators. Im not sure how much was mandatory maintenance and how much were failures but either way they live a quite hard life.
 
The following users thanked this post: tooki

Offline jmelson

  • Super Contributor
  • ***
  • Posts: 2765
  • Country: us
Re: The BIG EEVblog Server Fire
« Reply #116 on: April 09, 2021, 04:13:55 pm »
maybe it is time to move to other countries with less power interruptions
I'm in Missouri, current uptime is 177 days, no UPS.  I run my web server out of my house.

Jon
 

Offline jmelson

  • Super Contributor
  • ***
  • Posts: 2765
  • Country: us
Re: The BIG EEVblog Server Fire
« Reply #117 on: April 09, 2021, 04:18:08 pm »
Whilst it's been fun discussing the merits and demerits of high-capacity microSD cards vs terabit fibre vs No 8 wire, I think we have lost sight of an important point.

Dave provides the world - us - with a fantastic meeting place to discuss our ideas, and he does it without any cost to us. I salute him, and also all those who work with him to make this wonderful place happen.

Thank you Dave.
Yes, INDEED!  Thanks, Dave, and all the hard workers at webNX and GorillaServers, and glad to see eevblog back up!

Jon
 
The following users thanked this post: gnif, Ian.M

Offline james_s

  • Super Contributor
  • ***
  • Posts: 21611
  • Country: us
Re: The BIG EEVblog Server Fire
« Reply #118 on: April 09, 2021, 06:12:50 pm »
As someone who has lived in both North Carolina (the worst-ranked place in those graphs) and Switzerland (the best-ranked in those graphs), anecdotally, my experience completely agrees with that! (When I moved from USA to Switzerland the second time, I didn’t even bother buying a UPS, since the power here never goes out. Good enough for my home computing.)

I suspect averages like that are misleading, it really depends on where you live. In my location for example the downtown urban areas may go years without a single power interruption while the outlying rural areas may lose power several times a month over the winter and the average of two extremes just a few miles apart is not a very useful number. If you were to look at my whole state, power outages are probably common, however at my house they have generally been rare, this last winter being an unusual exception where I had 3 significant outages. Downtown Seattle which is only about 12 miles from here I have never seen a power outage other than a very localized one due to something like a transformer failure that knocks out a specific building.
 

Offline ve7xen

  • Super Contributor
  • ***
  • Posts: 1192
  • Country: ca
    • VE7XEN Blog
Re: The BIG EEVblog Server Fire
« Reply #119 on: April 09, 2021, 07:14:52 pm »
To fill a SUV with tapes using a single drive will take about 12.8 years, and maybe another 12.8 years to read it back.

I'll take the 2000 hours (85 days or so) using a 100Gb fibre...

LOL yeah, of course, it's completely impractical, but really it's the same equation regardless of media, the practical limit is going to be how quickly you can read/write the media. You can swap the LTOs with hard drives (though probably can't take as many due to weight) if getting the data off the media faster is important, maybe you just plug them into a chassis on the other end that can use them directly, or even truck the entire chassis of storage in whatever form it's used in production. If you can provision the practical storage bandwidth on the fibre, then it's a wash, otherwise shipping it is going to win (though cost will likely push you to shipping way before you hit the practical limit). The bandwidth of the proverbial station wagon is still immense, even taking practical considerations into account.

Quote
Actually a surprising chunk of peering is only at 10G per link. The ISP I use has only got 320G aggregate across all links which isn't a lot in the scale of things and it only averages 100-200G. That has tens of thousands of leechers on it.

We're not talking about Internet, we're talking about private datacentre interconnect. You're certainly not going to run your PB storage migration over a 10Gb peering link and the Internet. I can't imagine any service provider that would be running lots of uncoloured 10G over dark fibre these days, it's too costly; for your 10G peerings it's either a leased wavelength or more usually just a patch cable within the data centre between cheap ports because you don't need more to that peer (with several peers / transit at the POP, and likely Nx100G to your core). As an SP, you're either leasing a wave on someone else's OTN or running your own WDM system that can carry at least 100Gb per pair ('cheap' bog standard systems do 40x10G, state of the art off the shelf systems do 12x100G or more). Of course there are small SPs that only ever lease 10G or even 1G waves/EPLs, but I thought we were discussing the capacity of the fibre not what a small business actually buys on it ;).

Quote
But better to boil the frog slowly. When I migrated 130TB over to S3 a couple of years back, we built a service abstraction over the SAN and S3 so it used S3 as read-write-through cache for the SAN. This allowed us to sling all the stuff up over a dedicated DirectConnect up to S3 over the space of a few months without introducing any link capacity problems or having to do any nasty switch overs.

Clever solution, I like it!

Quote
Whilst it's been fun discussing the merits and demerits of high-capacity microSD cards vs terabit fibre vs No 8 wire, I think we have lost sight of an important point.

Dave provides the world - us - with a fantastic meeting place to discuss our ideas, and he does it without any cost to us. I salute him, and also all those who work with him to make this wonderful place happen.

Thank you Dave.

+1000! I actually had to get work done this week  :-DD
« Last Edit: April 09, 2021, 07:17:31 pm by ve7xen »
73 de VE7XEN
He/Him
 
The following users thanked this post: bd139

Offline SilverSolder

  • Super Contributor
  • ***
  • Posts: 6126
  • Country: 00
Re: The BIG EEVblog Server Fire
« Reply #120 on: April 09, 2021, 09:00:43 pm »
As someone who has lived in both North Carolina (the worst-ranked place in those graphs) and Switzerland (the best-ranked in those graphs), anecdotally, my experience completely agrees with that! (When I moved from USA to Switzerland the second time, I didn’t even bother buying a UPS, since the power here never goes out. Good enough for my home computing.)

I suspect averages like that are misleading, it really depends on where you live. In my location for example the downtown urban areas may go years without a single power interruption while the outlying rural areas may lose power several times a month over the winter and the average of two extremes just a few miles apart is not a very useful number. If you were to look at my whole state, power outages are probably common, however at my house they have generally been rare, this last winter being an unusual exception where I had 3 significant outages. Downtown Seattle which is only about 12 miles from here I have never seen a power outage other than a very localized one due to something like a transformer failure that knocks out a specific building.

The main issue in North America is the "pioneering spirit" electrical system where wires are strung up among the trees in rural / suburban areas...   what could possibly go wrong?  :D
 

Offline Red Squirrel

  • Super Contributor
  • ***
  • Posts: 2750
  • Country: ca
Re: The BIG EEVblog Server Fire
« Reply #121 on: April 09, 2021, 09:33:34 pm »
Quite a crazy turn of events.  Comes to show though that even professional commercial setups can and do fail.  Always have to be prepared and have offsite backups.  In this case they were not needed but things could have been worse.

 

Offline eti

  • Super Contributor
  • ***
  • !
  • Posts: 1801
  • Country: gb
  • MOD: a.k.a Unlokia, glossywhite, iamwhoiam etc
Re: The BIG EEVblog Server Fire
« Reply #122 on: April 09, 2021, 11:28:45 pm »
Wow, EEVblog has a history of attracting water!
 

Offline Ed.Kloonk

  • Super Contributor
  • ***
  • Posts: 4000
  • Country: au
  • Cat video aficionado
Re: The BIG EEVblog Server Fire
« Reply #123 on: April 10, 2021, 01:13:17 am »
Wow, EEVblog has a history of attracting water!

It's from all the dissing of those poor, harmless God botherers.

Having a gripe about the Hillsong peeps in the solar upgrade vid during their Easter festivus..well karma is a bitch.

Thou shalt be baptised.

Praise be. And God bless you, Dave.
iratus parum formica
 
The following users thanked this post: Ian.M

Offline Microdoser

  • Frequent Contributor
  • **
  • Posts: 423
  • Country: gb
Re: The BIG EEVblog Server Fire
« Reply #124 on: April 10, 2021, 02:40:23 am »
Whilst it's been fun discussing the merits and demerits of high-capacity microSD cards vs terabit fibre vs No 8 wire

You don't seem to have been here long, every thread descends into discussion of that type.

If you are trying to change that behaviour, you will be as effective as King Canute.
 

Offline jh15

  • Frequent Contributor
  • **
  • Posts: 561
  • Country: us
Re: The BIG EEVblog Server Fire
« Reply #125 on: April 10, 2021, 03:44:15 am »
Just put the servers in rice.

A few days before this, I heard to put something in rice on the NCIS tv show, a rerun. Now I know where most people got it from.

NO!

Best part is about 13 mins in.


Tek 575 curve trcr top shape, Tek 535, Tek 465. Tek 545 Hickok clone, Tesla Model S,  Ohio Scientific c24P SBC, c-64's from club days, Giant electric bicycle, Rigol stuff, Heathkit AR-15's. Heathkit ET- 3400a trainer&interface. Starlink pizza.
 
The following users thanked this post: SilverSolder, bd139

Offline EEVblog

  • Administrator
  • *****
  • Posts: 37728
  • Country: au
    • EEVblog
Re: The BIG EEVblog Server Fire
« Reply #126 on: April 10, 2021, 12:00:20 pm »
@eevblog : Welcome back :)

The pictures Dave posted look like a random light industrial park of the sort of place my friend's machine shop is in, there's even a sports bar & grill in one of the units. Looks like the genset has to be in the same space, there isn't anywhere else to put it.

I agree. I was wondering how water ended up on the racks. And there I was thinking the backup genny was in a shipping container some 100 feet from the main building. Just in case it caught fire. Nar, that is extra rental space.

Maybe not odd in Utah, but certainly odd looking in the Uk, the power control boxes, transformer and what I assume is the deisel fuelling point, are not caged in collision resistant fencing - or even a crash carrier. All it would take is a truck driver choking on a hotdog from the grill... and back to square one.

Morale of the story from both WebNX and OVH is, backup power systems are highly flammable! Just let the power go off and rebuild the filesystems, not the entire data center.

I can confirm that the generator was only meters away from some of the racks in the same room. And that no servers were damaged due to the fire, it was all water.
 
The following users thanked this post: Syntax Error

Offline madires

  • Super Contributor
  • ***
  • Posts: 7752
  • Country: de
  • A qualified hobbyist ;)
Re: The BIG EEVblog Server Fire
« Reply #127 on: April 10, 2021, 12:12:50 pm »
Diesel generator and racks in the same room? That's insane for many reasons!
 
The following users thanked this post: tooki

Offline SilverSolder

  • Super Contributor
  • ***
  • Posts: 6126
  • Country: 00
Re: The BIG EEVblog Server Fire
« Reply #128 on: April 10, 2021, 12:54:48 pm »
Diesel generator and racks in the same room? That's insane for many reasons!

It does sound a bit how'ya doin'...
 

Offline bd139

  • Super Contributor
  • ***
  • Posts: 23018
  • Country: gb
Re: The BIG EEVblog Server Fire
« Reply #129 on: April 10, 2021, 01:01:00 pm »
I think that transcends that and into “what the hell were they thinking?” territory....

Suitable ORA book attached  :-DD
« Last Edit: April 10, 2021, 01:03:42 pm by bd139 »
 

Offline tggzzz

  • Super Contributor
  • ***
  • Posts: 19463
  • Country: gb
  • Numbers, not adjectives
    • Having fun doing more, with less
Re: The BIG EEVblog Server Fire
« Reply #130 on: April 10, 2021, 01:23:00 pm »
The main issue in North America is the "pioneering spirit" electrical system where wires are strung up among the trees in rural / suburban areas...   what could possibly go wrong?  :D

So, completely the opposite of this example, 4 miles from the centre of a major UK city

There are lies, damned lies, statistics - and ADC/DAC specs.
Glider pilot's aphorism: "there is no substitute for span". Retort: "There is a substitute: skill+imagination. But you can buy span".
Having fun doing more, with less
 
The following users thanked this post: bd139

Offline Ed.Kloonk

  • Super Contributor
  • ***
  • Posts: 4000
  • Country: au
  • Cat video aficionado
Re: The BIG EEVblog Server Fire
« Reply #131 on: April 10, 2021, 01:53:25 pm »
The main issue in North America is the "pioneering spirit" electrical system where wires are strung up among the trees in rural / suburban areas...   what could possibly go wrong?  :D

So, completely the opposite of this example, 4 miles from the centre of a major UK city


We are world famous here for our y-shaped gum trees on the side of the road.

https://www.smh.com.au/national/nsw/ausgrid-accused-of-street-tree-vandalism-by-sydney-councils-20150807-giugg9.html

« Last Edit: April 10, 2021, 01:57:58 pm by Ed.Kloonk »
iratus parum formica
 

Offline calzap

  • Frequent Contributor
  • **
  • Posts: 448
  • Country: us
Re: The BIG EEVblog Server Fire
« Reply #132 on: April 10, 2021, 02:46:59 pm »
I’m curious as to how many data center standby generators are powered by diesel versus propane.  In the U.S., diesel is usually more expensive per kw-hr produced than propane, especially because EPA requires such generators to use ultralow sulfur diesel fuel.  Diesel has shelf-life issues which require ongoing testing and maintenance, and replacement in some cases.  Propane has an infinite shelf-life.  Most building codes in the U.S. do not allow liquid propane inside buildings.  So, if there is no outside storage area, propane can’t be used.

I know of a biological lab near me that became so fed-up with diesel fuel issues with their generators that they switched to propane and have been happy with it.

Nice link:
https://www.csemag.com/articles/understanding-backup-power-system-fuel-choices/

At my ranch, we have a small (8 kw continuous) backup generator.  It’s just big enough to run one well plus a few refrigerators/freezers and a few low wattage items.  Forced to run it 4-6 times per year.  Longest run has been 2 days.  It’s powered by gasoline, and there have been no fuel issues (we use a preservative).  It’s in a “dog house” attached to the smallest building on the property.  Only real worry is that building is where electrical panels for incoming power are located.  In retrospect, I could have built the dog house 10 m away from any building and should have.  We have propane available, and dual-fuel installation is on my to-do list … has been for 15 years!

Mike in California
 

Online David Hess

  • Super Contributor
  • ***
  • Posts: 16604
  • Country: us
  • DavidH
Re: The BIG EEVblog Server Fire
« Reply #133 on: April 10, 2021, 03:24:27 pm »
I’m curious as to how many data center standby generators are powered by diesel versus propane.  In the U.S., diesel is usually more expensive per kw-hr produced than propane, especially because EPA requires such generators to use ultralow sulfur diesel fuel.  Diesel has shelf-life issues which require ongoing testing and maintenance, and replacement in some cases.  Propane has an infinite shelf-life.  Most building codes in the U.S. do not allow liquid propane inside buildings.  So, if there is no outside storage area, propane can’t be used.

Diesel also has the disadvantage of requiring special containment provisions because it makes a hell of a mess if it spills, which also brings the EPA into it.  For this reason and the others you mention, sites which are remote tend toward propane now.

For tall buildings, diesel may not be stored at the upper levels so provisions must be made to pump it from tanks near the ground.

In some cases requirements are so strict that backup generators of any type are precluded and the only solution is sufficient battery power.
 

Online David Hess

  • Super Contributor
  • ***
  • Posts: 16604
  • Country: us
  • DavidH
Re: The BIG EEVblog Server Fire
« Reply #134 on: April 10, 2021, 03:47:00 pm »
Also the control chassis is a mess with f*ing LM723 being abused as OPAMP, and no chip is decoupled properly.

I do not really consider that an abuse, except of course for the lack of proper decoupling.  723s do not make very good operational amplifiers but most applications do not require good operational amplifiers; consider all of the older regulator designs which only use discrete differential pairs for the voltage and current control loops.  723s also have several virtues including a built in reference and provisions for a high current output.

Quote
In short, the old design uses old parts, which is no longer made. So the maintenance had to look for alternative parts, and they found an Indian company making supposedly identical old parts for special customers. It turns out the Indians made better parts, faster and more stronger. Different process and die, same paper spec.

That is a common problem even when the same manufacturer changes processes.  Unspecified characteristics are unspecified and cannot be relied on.  Either test for them or design to handle them.

Quote
If not the NRC being dissatisfied and contracted a friend of mine to fix it who contracted me for a small portion of the project, I wouldn't believe how flaky something used as a last line of defense from a nuclear disaster can be.

...

And no, the NRC will not take a new control board or new 21st century parts. They insist on all digital control parts must be nuclear certified, and all critical power parts must too. The only leeway we have were non power analog parts, in other words we had to patch on an old stupid design with the oldest technology.

Non-power analog parts are actually more susceptible to radiation damage than power analog parts, but as long as the NRC follows the rules no matter how stupid, they are safe.

It is stupid but I am not surprised.  The "safest" option when authority is divorced from responsibility is to do nothing, and make sure the blame will fall on someone else, which is why having nothing to do with that sort of project is the best thing to do.  I have learned not to even notify them; they are not interested and doing so can create further jeopardy.
 

Offline madires

  • Super Contributor
  • ***
  • Posts: 7752
  • Country: de
  • A qualified hobbyist ;)
Re: The BIG EEVblog Server Fire
« Reply #135 on: April 10, 2021, 03:47:52 pm »
Diesel also has the disadvantage of requiring special containment provisions because it makes a hell of a mess if it spills, which also brings the EPA into it.  For this reason and the others you mention, sites which are remote tend toward propane now.

So propane doesn't need any special containment provisions? >:D Diesel is easy to handle, to refill and to get hold of. Anyhow, you simply use the fuel which is easily available and allowed by local regulations. If you go for a battery system only you need deep pockets, because it has to provide several MW for a few days.
« Last Edit: April 10, 2021, 03:50:57 pm by madires »
 

Offline Tom45

  • Frequent Contributor
  • **
  • Posts: 556
  • Country: us
Re: The BIG EEVblog Server Fire
« Reply #136 on: April 10, 2021, 04:02:04 pm »
At a place I worked 50 years ago we had backup power for two large mainframes. The mainframes were fed by motor generator sets so switchover timing wasn't too critical. The top floor of the building was filled with lead acid batteries to hold over until the diesel generator got going.

The generator was buried in a concrete pit under the parking lot with a diesel tank above ground.

The whole thing worked well until one stormy period when power was so flaky they decided to just run continuously off the generator until the weather calmed down. No problems for a couple of days until someone forgot to order more diesel. Oops.

No, it wasn't me.
 

Offline calzap

  • Frequent Contributor
  • **
  • Posts: 448
  • Country: us
Re: The BIG EEVblog Server Fire
« Reply #137 on: April 10, 2021, 04:19:34 pm »
Diesel also has the disadvantage of requiring special containment provisions because it makes a hell of a mess if it spills, which also brings the EPA into it.  For this reason and the others you mention, sites which are remote tend toward propane now.

So propane doesn't need any special containment provisions? >:D Diesel is easy to handle, to refill and to get hold of. Anyhow, you simply use the fuel which is easily available and allowed by local regulations. If you go for a battery system only you need deep pockets, because it has to provide several MW for a few days.
Any fuel stored onsite obviously must have a primary container.  In the U.S., propane tanks containing liquid propane must be outdoors.  There are usually building setback requirements as well.  So, if there is a leak, it wafts away as a gas.  Large diesel tanks in most jurisdictions must have secondary containment in case of a leak.  If indoors, there are usually fire sprinkler or other suppression and protection requirements as well.

Mike in California
 

Online David Hess

  • Super Contributor
  • ***
  • Posts: 16604
  • Country: us
  • DavidH
Re: The BIG EEVblog Server Fire
« Reply #138 on: April 10, 2021, 04:26:34 pm »
Diesel also has the disadvantage of requiring special containment provisions because it makes a hell of a mess if it spills, which also brings the EPA into it.  For this reason and the others you mention, sites which are remote tend toward propane now.

So propane doesn't need any special containment provisions? >:D Diesel is easy to handle, to refill and to get hold of. Anyhow, you simply use the fuel which is easily available and allowed by local regulations. If you go for a battery system only you need deep pockets, because it has to provide several MW for a few days.

Propane requires a pressure tank but is actually *safer* if there is a leak or fire.  A diesel leak makes a hell of a mess which is where the EPA gets involved.  A propane leaks leaves nothing to clean up.

Propane tanks handle fire just fine.  When the pressure relief valve activates, the propane exhaust is burned and evaporation cools the tank until the propane is exhausted.  Just make sure that the exhaust is directed in a safe direction.  I know of one case at a mountaintop repeater site where the exhaust was directed at the blockhouse.  When personal showed up to find out why all of the repeaters had failed after a brush fire, they found that the blockhouse was completely destroyed (melted) by the jet of burning propane.
 

Offline james_s

  • Super Contributor
  • ***
  • Posts: 21611
  • Country: us
Re: The BIG EEVblog Server Fire
« Reply #139 on: April 11, 2021, 07:46:55 am »
I’m curious as to how many data center standby generators are powered by diesel versus propane.  In the U.S., diesel is usually more expensive per kw-hr produced than propane, especially because EPA requires such generators to use ultralow sulfur diesel fuel.  Diesel has shelf-life issues which require ongoing testing and maintenance, and replacement in some cases.  Propane has an infinite shelf-life.  Most building codes in the U.S. do not allow liquid propane inside buildings.  So, if there is no outside storage area, propane can’t be used.

I've seen natural gas fired backup generators which have the obvious advantage of fuel being piped in rather than stored on the premises. Propane, that I have not seen other than for small portable generators and some that are used in RVs where you already have propane available. I think diesel pretty much owns the large backup generator market, the engines are the same as used for things like semi trucks, motor yachts and locomotives. I don't think anybody is making huge spark ignition engines anymore although there have been some really big ones in the past.
 

Offline SilverSolder

  • Super Contributor
  • ***
  • Posts: 6126
  • Country: 00
Re: The BIG EEVblog Server Fire
« Reply #140 on: April 11, 2021, 11:00:21 am »

The cost of diesel probably doesn't actually matter, since the generators are not used that often.
 

Offline madires

  • Super Contributor
  • ***
  • Posts: 7752
  • Country: de
  • A qualified hobbyist ;)
Re: The BIG EEVblog Server Fire
« Reply #141 on: April 11, 2021, 11:27:21 am »
Propane requires a pressure tank but is actually *safer* if there is a leak or fire.  A diesel leak makes a hell of a mess which is where the EPA gets involved.  A propane leaks leaves nothing to clean up.

I'd think that propane is more prone to a nice BOOM than diesel. BTW, it's also heavier than air. And don't worry too much about the storage of diesel. Storage regulations for oil and oil based fuels take care about that.
 

Offline TimFox

  • Super Contributor
  • ***
  • Posts: 7938
  • Country: us
  • Retired, now restoring antique test equipment
Re: The BIG EEVblog Server Fire
« Reply #142 on: April 11, 2021, 02:28:48 pm »
In the recent Texas cold-weather catastrophe, the natural gas delivery failed due to lack of power at the wellhead compressors and frozen distribution pipes.
 

Offline Ultrapurple

  • Super Contributor
  • ***
  • Posts: 1027
  • Country: gb
  • Just zis guy, you know?
    • Therm-App Users on Flickr
Re: The BIG EEVblog Server Fire
« Reply #143 on: April 11, 2021, 03:01:41 pm »

You don't seem to have been here long

I invite you to check my profile info.
Rubber bands bridge the gap between WD40 and duct tape.
 

Online bdunham7

  • Super Contributor
  • ***
  • Posts: 7818
  • Country: us
Re: The BIG EEVblog Server Fire
« Reply #144 on: April 11, 2021, 03:06:25 pm »
I don't think anybody is making huge spark ignition engines anymore although there have been some really big ones in the past.

Waukesha Engine is still around, under different ownership.

https://www.innio.com/en/products/waukesha
A 3.5 digit 4.5 digit 5 digit 5.5 digit 6.5 digit 7.5 digit DMM is good enough for most people.
 

Offline calzap

  • Frequent Contributor
  • **
  • Posts: 448
  • Country: us
Re: The BIG EEVblog Server Fire
« Reply #145 on: April 12, 2021, 01:22:55 am »
I've seen natural gas fired backup generators which have the obvious advantage of fuel being piped in rather than stored on the premises. Propane, that I have not seen other than for small portable generators and some that are used in RVs where you already have propane available. I think diesel pretty much owns the large backup generator market, the engines are the same as used for things like semi trucks, motor yachts and locomotives. I don't think anybody is making huge spark ignition engines anymore although there have been some really big ones in the past.
I agree diesel-powered industrial generators are probably the most common now.  However, I think that will change in favor of natural gas and propane where they are readily available.  Energy costs, emissions limits and fuel stability will bring it about.

Natural gas has the advantage of not having to store it on site.  In fact, storing it on site, as CNG or LNG is an expensive proposition.  However, as recent events in Texas have shown, natural disasters can stop the flow of natural gas.  That low temperatures did it is primarily a reflection of inappropriate penny-pinching in engineering and building pumping stations.  Even with appropriate design and installation, natural disasters can interrupt flow … like in an earthquake area where I live.

Large industrial standby generators powered by propane or natural gas are already available.  For example, Generac sells 150 kW propane generators and 500 kW dual fuel (natural gas/ diesel).  And no, separate engines aren’t required for natural gas/diesel gensets.  They are diesel engines modified to aspirate an air/gas mixture, which provides most of the energy.  Small diesel injections provide ignition, but the engines can run on diesel alone if necessary.  Generac’s largest natural gas generator is 1 MW and powered by a 12-cylinder, 49 L spark-ignited engine.

Mike in California
 

Online tautech

  • Super Contributor
  • ***
  • Posts: 28323
  • Country: nz
  • Taupaki Technologies Ltd. Siglent Distributor NZ.
    • Taupaki Technologies Ltd.
Re: The BIG EEVblog Server Fire
« Reply #146 on: April 12, 2021, 01:34:40 am »
Gas powered gensets are nothing special or new. Not far from me is a bank of 1MW gensets running on methane from a landfill.
https://www.terracat.co.nz/power-systems/new-power-systems/epg/gas-generator-sets
Avid Rabid Hobbyist
Siglent Youtube channel: https://www.youtube.com/@SiglentVideo/videos
 
The following users thanked this post: james_s

Offline james_s

  • Super Contributor
  • ***
  • Posts: 21611
  • Country: us
Re: The BIG EEVblog Server Fire
« Reply #147 on: April 12, 2021, 01:53:26 am »
I'd think that propane is more prone to a nice BOOM than diesel. BTW, it's also heavier than air. And don't worry too much about the storage of diesel. Storage regulations for oil and oil based fuels take care about that.

That's definitely true. You can throw a lighted match into a bucket of diesel fuel and it will go out. If you do manage to light the stuff, it burns pretty lethargically, it's similar to kerosene or salad oil as far as flammability. Propane on the other hand can be dangerous stuff. Unlike many fuels, it doesn't need a stoichiometric mixture to burn explosively, indeed a small engine will run pretty well if you just poke the end of a non-lit propane torch into the air intake, it's not like gasoline where the mixture has to be just right, propane will still go bang in very rich or very lean conditions.
 
The following users thanked this post: tooki

Offline floobydust

  • Super Contributor
  • ***
  • Posts: 6958
  • Country: ca
Re: The BIG EEVblog Server Fire
« Reply #148 on: April 12, 2021, 02:15:51 am »
I looked at google street view and only one exhaust pipe for a generator, near the electrical room.
The facility appears to be in some old warehouse (military?) district with brick exterior walls and a wooden roof? If true that's a problem.

Had to laugh, not a solar panel in sight.
 

Offline NiHaoMike

  • Super Contributor
  • ***
  • Posts: 9007
  • Country: us
  • "Don't turn it on - Take it apart!"
    • Facebook Page
Re: The BIG EEVblog Server Fire
« Reply #149 on: April 12, 2021, 03:04:00 am »
How much can the backup power infrastructure be cut back if there's a system to force all CPUs to minimum frequency when running on backup power?
Cryptocurrency has taught me to love math and at the same time be baffled by it.

Cryptocurrency lesson 0: Altcoins and Bitcoin are not the same thing.
 
The following users thanked this post: duckduck

Offline james_s

  • Super Contributor
  • ***
  • Posts: 21611
  • Country: us
Re: The BIG EEVblog Server Fire
« Reply #150 on: April 12, 2021, 04:03:50 am »
I looked at google street view and only one exhaust pipe for a generator, near the electrical room.
The facility appears to be in some old warehouse (military?) district with brick exterior walls and a wooden roof? If true that's a problem.

Had to laugh, not a solar panel in sight.

Buildings like that are all over the place in light industrial areas. As I mentioned my friends have a machine shop in a similar building, they've had all sorts of different neighbors and they've moved a couple of times too. Auto mechanic, sign company, importer warehouse, cabinet maker, in their current spot the place next door sells and services air compressors, that kind of stuff. In most of these industrial parks the business rents one or more bays and outfits the interior as needed. Solar probably isn't really an option in most of those places, they don't own the building and the landlord doesn't care about the power bill, they aren't the one paying it. They don't want a bunch of holes drilled in their roof.
 

Offline schmitt trigger

  • Super Contributor
  • ***
  • Posts: 2218
  • Country: mx
Re: The BIG EEVblog Server Fire
« Reply #151 on: April 12, 2021, 03:05:16 pm »
Very interesting thread!

As I mentioned previously, hopefully Dave will do a video(s) regarding the subtle details of ensuring AC mains uptime.

With everything nowadays tied to the web, this issue has become more critical than ever.

Back in the mainframe days, I remember a motor-generator set where the blackout ride-through energy was stored in a gigantic flywheel.
To my surprise, they are still being used.
Google has plenty of examples.
.
 

Offline duckduck

  • Frequent Contributor
  • **
  • Posts: 408
  • Country: us
  • 20Hz < fun < 20kHz, and RF is Really Fun
Re: The BIG EEVblog Server Fire
« Reply #152 on: April 12, 2021, 06:41:00 pm »
How much can the backup power infrastructure be cut back if there's a system to force all CPUs to minimum frequency when running on backup power?

That's a great idea. One issue I can see for a hosting company is that they allow their customers to manage/reinstall to OS and apps. It would be difficult to enforce the installation of power-management software. It would be great if servers had a (let's say) 5 volt input, and when it dropped to below 1 volt, the BIOS would throttle the CPU down. Then you "just" run a 5 volt line run off of non-UPS, non-generator mains to each server and you're golden.
 
The following users thanked this post: SilverSolder

Offline schmitt trigger

  • Super Contributor
  • ***
  • Posts: 2218
  • Country: mx
Re: The BIG EEVblog Server Fire
« Reply #153 on: April 12, 2021, 07:02:11 pm »
In the rotary UPS I mentioned above, it was explained to me that during the elapsed time between mains interruption and the diesel genset actually supplying power, certain non-critical processes were halted. Like printers and punched card readers.
There may have been others.
 

Offline bd139

  • Super Contributor
  • ***
  • Posts: 23018
  • Country: gb
Re: The BIG EEVblog Server Fire
« Reply #154 on: April 12, 2021, 07:10:03 pm »
How much can the backup power infrastructure be cut back if there's a system to force all CPUs to minimum frequency when running on backup power?

That probably wouldn't work. If you're running near your CPU provision, you can enter a thing called "load hysteresis" which may be irrecoverable. This is where your load average goes above the total capacity and the CPUs can never catch up with the workload.  It requires adding much much more capacity than you had to start with before you can being the demand you had originally. Either that or breaking a huge chunk of your incoming load to recover.

This is similar to halving the size of your cluster during peak demand, which never works out well. I was working for a very large stats company here a few years back and there was a sudden spike of packet loss over the inter-DC link they were running. The ops director at the time decided to fail the active-active over to active-standby and caused a 5 minute chunk of slowness into a 4 hour recovery job.  :palm:
« Last Edit: April 12, 2021, 07:12:05 pm by bd139 »
 
The following users thanked this post: duckduck

Offline gnifTopic starter

  • Administrator
  • *****
  • Posts: 1675
  • Country: au
Re: The BIG EEVblog Server Fire
« Reply #155 on: April 13, 2021, 12:23:50 am »
I run a few sites on virtual servers, with various backup policies (which I won't write about openly for obvious reasons) and if the virtual server company blew up and vanished for ever, I could start up a backup server which is a media PC running on an FTTP (80/30mbps) ADSL line :) That would actually be fast enough for EEVBLOG, on a bad day.

And how would you know how much bandwidth EEVBlog uses each day? Outgoing traffic averages 90mbps peaking out at 200mbps during high activity periods such as giveaways, etc.

Quote
It is practically impossible to lose everything, in this setup, and it is very cheap.

What if someone deletes all the files on your server and rsync does it's dutiful job and deletes them on your home machine? Or worse, some files are corrupted and you dont detect this for a few weeks? rsync is not a recommended enterprise grade backup solution, I suggest you look into BareOS (Free), or R1Soft (commercial).

EEVBlog is backed up daily to two remote locations with a 6 month data retention policy, restoration to a bare metal server can be done in an hour or two (depending on network speed) and server configuration is performed via Puppet, this takes mere seconds. Total production ready stand-up time form bare metal is restore time + 10-20 seconds. With a warm backup (which we will be looking into) downtime would be nearly zero in the event of a failure like this in the future.

One has to weigh up the time vs cost in standing up a new server when things like this happen. Sure, Dave could have paid for a server elsewhere (in fact, we had several offers of temp servers), however this makes things more complex when it comes to decommissioning these servers when they are no longer needed. Ie, sync temp to primary servers, change over DNS records and while waiting for DNS records to propagate proxy traffic to the primary servers from the temp server. At the end of the day, it's up to the site owner to decide on the best course of action for their business, even if there are technical solutions that could be implemented here and now.

When you start hosting sites the size of EEVBlog you will quickly learn that you can't just cowboy things, because that 0.5s of downtime when you decided it wont hurt to just restart the HTTP service to make a config tweak, will impact people.
« Last Edit: April 13, 2021, 12:38:07 am by gnif »
 
The following users thanked this post: Ed.Kloonk, xrunner, thm_w, Jacon, bd139

Offline SL4P

  • Super Contributor
  • ***
  • Posts: 2318
  • Country: au
  • There's more value if you figure it out yourself!
Re: The BIG EEVblog Server Fire
« Reply #156 on: April 13, 2021, 01:27:39 am »
Simple question.
Why was there water damage in the datacenter?  Surely the backup power was in an adjacent building or basement ?
Don't ask a question if you aren't willing to listen to the answer.
 

Offline gnifTopic starter

  • Administrator
  • *****
  • Posts: 1675
  • Country: au
Re: The BIG EEVblog Server Fire
« Reply #157 on: April 13, 2021, 01:29:20 am »
Simple answer, read through this thread. GS are still recovering and are yet to release details.
 

Offline Monkeh

  • Super Contributor
  • ***
  • Posts: 7992
  • Country: gb
Re: The BIG EEVblog Server Fire
« Reply #158 on: April 13, 2021, 01:37:39 am »
I looked at google street view and only one exhaust pipe for a generator, near the electrical room.
The facility appears to be in some old warehouse (military?) district with brick exterior walls and a wooden roof? If true that's a problem.

Had to laugh, not a solar panel in sight.

What on earth makes you think that's a wooden roof?
 
The following users thanked this post: thm_w

Offline NiHaoMike

  • Super Contributor
  • ***
  • Posts: 9007
  • Country: us
  • "Don't turn it on - Take it apart!"
    • Facebook Page
Re: The BIG EEVblog Server Fire
« Reply #159 on: April 13, 2021, 02:45:44 am »
That's a great idea. One issue I can see for a hosting company is that they allow their customers to manage/reinstall to OS and apps. It would be difficult to enforce the installation of power-management software. It would be great if servers had a (let's say) 5 volt input, and when it dropped to below 1 volt, the BIOS would throttle the CPU down. Then you "just" run a 5 volt line run off of non-UPS, non-generator mains to each server and you're golden.
I have hacked some PCs to do just that by using a small MOSFET to pull the PROCHOT line to ground.
That probably wouldn't work. If you're running near your CPU provision, you can enter a thing called "load hysteresis" which may be irrecoverable. This is where your load average goes above the total capacity and the CPUs can never catch up with the workload.  It requires adding much much more capacity than you had to start with before you can being the demand you had originally. Either that or breaking a huge chunk of your incoming load to recover.
That doesn't sound like something that should happen with a robustly designed service, wouldn't that mean a hacker who wants to take it down can just DDoS it for a short time and let queue overflow continue it for much longer than the initial attack?
Cryptocurrency has taught me to love math and at the same time be baffled by it.

Cryptocurrency lesson 0: Altcoins and Bitcoin are not the same thing.
 

Offline drussell

  • Super Contributor
  • ***
  • Posts: 1855
  • Country: ca
  • Hardcore Geek
Re: The BIG EEVblog Server Fire
« Reply #160 on: April 13, 2021, 03:04:51 am »
How much can the backup power infrastructure be cut back if there's a system to force all CPUs to minimum frequency when running on backup power?

That's a great idea.

No, it is not.

You don't randomly force-throttle a server...   :palm:
In many cases that would be just as bad of a scenario and you might as well just have pulled the power.
 

Offline Ed.Kloonk

  • Super Contributor
  • ***
  • Posts: 4000
  • Country: au
  • Cat video aficionado
Re: The BIG EEVblog Server Fire
« Reply #161 on: April 13, 2021, 03:16:16 am »
How much can the backup power infrastructure be cut back if there's a system to force all CPUs to minimum frequency when running on backup power?

That's a great idea.

No, it is not.

You don't randomly force-throttle a server...   :palm:
In many cases that would be just as bad of a scenario and you might as well just have pulled the power.

Software should tell the hardware what to do, not the other way around.
iratus parum formica
 

Offline NiHaoMike

  • Super Contributor
  • ***
  • Posts: 9007
  • Country: us
  • "Don't turn it on - Take it apart!"
    • Facebook Page
Re: The BIG EEVblog Server Fire
« Reply #162 on: April 13, 2021, 03:36:03 am »
You don't randomly force-throttle a server...   :palm:
In many cases that would be just as bad of a scenario and you might as well just have pulled the power.
Why not if the software could recover once things are back to normal? Obviously, you wouldn't do that for critical real time applications like a VoIP server, but for something like a web server, I don't see why the software couldn't be designed to handle it gracefully. Perhaps whether or not a server gets throttled on backup power be something that's reflected in the service fees (perhaps multiple levels that specify how much throttling and for how much time before partial refunds will be provided), I'd imagine that would do a lot to motivate the developers to make their programs able to tolerate it in order to take advantage of cheaper hosting.
Cryptocurrency has taught me to love math and at the same time be baffled by it.

Cryptocurrency lesson 0: Altcoins and Bitcoin are not the same thing.
 

Offline drussell

  • Super Contributor
  • ***
  • Posts: 1855
  • Country: ca
  • Hardcore Geek
Re: The BIG EEVblog Server Fire
« Reply #163 on: April 13, 2021, 04:07:33 am »
Are you going to make sure the "backup" will now be in another room or at least several racks away?

Why not if the software could recover once things are back to normal? Obviously, you wouldn't do that for critical real time applications like a VoIP server, but for something like a web server, I don't see why the software couldn't be designed to handle it gracefully.

Oh, my....   :palm: 

You've obviously never had the joy of running any "real" servers.   :-DD  (even real web servers) 
 

Offline floobydust

  • Super Contributor
  • ***
  • Posts: 6958
  • Country: ca
Re: The BIG EEVblog Server Fire
« Reply #164 on: April 13, 2021, 04:10:30 am »
What on earth makes you think that's a wooden roof?

I see a (modified) bitumen roof with wood fascia. A "fire suppression system" does nothing if that lights up.
It's common problem in building fires here, the roof and trusses (which are above all the sprinklers) can start burning and the fire crawls along the roof. Firefighters have to dump water despite the interior not being on fire at all.
I would think buildings in the UK have similar issues, with flat tar roofs?
 

Offline Monkeh

  • Super Contributor
  • ***
  • Posts: 7992
  • Country: gb
Re: The BIG EEVblog Server Fire
« Reply #165 on: April 13, 2021, 04:27:54 am »
What on earth makes you think that's a wooden roof?

I see a (modified) bitumen roof with wood fascia. A "fire suppression system" does nothing if that lights up.
It's common problem in building fires here, the roof and trusses (which are above all the sprinklers) can start burning and the fire crawls along the roof. Firefighters have to dump water despite the interior not being on fire at all.
I would think buildings in the UK have similar issues, with flat tar roofs?

I see a structure which will have steel trusses. The deck could be anything up to and including pre-cast concrete panels.
 

Online ejeffrey

  • Super Contributor
  • ***
  • Posts: 3711
  • Country: us
Re: The BIG EEVblog Server Fire
« Reply #166 on: April 13, 2021, 04:31:23 am »
Most cloud compute platforms offer some form of preemptable / flexible instance class that the provider can shut down at any time.  This allows them to reduce over-provisioning while serving peak loads to high paying customers but I'm sure it can also be used to throttle for power consumption purposes due to distribution capacity, backup generator size, or cooling.  You then pay less -- possibly much less -- per CPU hour.  It's not usually useful for a webserver since you can't control when users come to your site.  You can't get an HTTPS request and just say "I'll respond to that when the spot price drops."  There might be situations where a moderate amount of downtime isn't a big deal but it isn't the normal situation for commercial web hosting.

Cancelling low priority or low urgency jobs is much more effective than force throttling CPU frequency.  Even cancelling high priority jobs is sometimes the best approach if it gets you through a crunch without outright failure.  Throttling CPUs not a great way to apply backpressure.  At first it does nothing but reduce the sleep/idle time fraction.  Then it generates congestion.  Only when the congestion degrades the service enough people stop using it does it really reduce the workload.  Essentially you put human behavior in your feedback loop, and require poor quality of service to be effective.
 

Offline floobydust

  • Super Contributor
  • ***
  • Posts: 6958
  • Country: ca
Re: The BIG EEVblog Server Fire
« Reply #167 on: April 13, 2021, 04:49:21 am »
What on earth makes you think that's a wooden roof?

I see a (modified) bitumen roof with wood fascia. A "fire suppression system" does nothing if that lights up.
It's common problem in building fires here, the roof and trusses (which are above all the sprinklers) can start burning and the fire crawls along the roof. Firefighters have to dump water despite the interior not being on fire at all.
I would think buildings in the UK have similar issues, with flat tar roofs?

I see a structure which will have steel trusses. The deck could be anything up to and including pre-cast concrete panels.

The roof overhang on the loading docks is all wood, for many of the buildings. They might be cheaply built, whatever era they are from. There's no rooftop A/C on any of the buildings so they might not support any weight beyond snow load.
Point is, a datacenter should be constructed entirely of non-combustible building materials IMHO.
 

Offline james_s

  • Super Contributor
  • ***
  • Posts: 21611
  • Country: us
Re: The BIG EEVblog Server Fire
« Reply #168 on: April 13, 2021, 06:06:40 am »
What on earth makes you think that's a wooden roof?

I see a (modified) bitumen roof with wood fascia. A "fire suppression system" does nothing if that lights up.
It's common problem in building fires here, the roof and trusses (which are above all the sprinklers) can start burning and the fire crawls along the roof. Firefighters have to dump water despite the interior not being on fire at all.
I would think buildings in the UK have similar issues, with flat tar roofs?

Wood roofs are typical on that sort of building, I don't recall the name they call buildings of that style but the walls are cast in place reinforced concrete with wooden beams running across, supporting a plywood roof that is covered in a weatherproof outer layer. All three of the light industrial complexes my friends' shop has been been built in that way.
 

Offline bd139

  • Super Contributor
  • ***
  • Posts: 23018
  • Country: gb
Re: The BIG EEVblog Server Fire
« Reply #169 on: April 13, 2021, 07:40:22 am »
That's a great idea. One issue I can see for a hosting company is that they allow their customers to manage/reinstall to OS and apps. It would be difficult to enforce the installation of power-management software. It would be great if servers had a (let's say) 5 volt input, and when it dropped to below 1 volt, the BIOS would throttle the CPU down. Then you "just" run a 5 volt line run off of non-UPS, non-generator mains to each server and you're golden.
I have hacked some PCs to do just that by using a small MOSFET to pull the PROCHOT line to ground.
That probably wouldn't work. If you're running near your CPU provision, you can enter a thing called "load hysteresis" which may be irrecoverable. This is where your load average goes above the total capacity and the CPUs can never catch up with the workload.  It requires adding much much more capacity than you had to start with before you can being the demand you had originally. Either that or breaking a huge chunk of your incoming load to recover.
That doesn't sound like something that should happen with a robustly designed service, wouldn't that mean a hacker who wants to take it down can just DDoS it for a short time and let queue overflow continue it for much longer than the initial attack?

DDoS is not something you handle at the service level. There’s no way to sink one. I mean how do you scale up something (taking our shit as an example) from aggregate average 700Mbit out to say 40Gbit out which is your saturation boundary? Even if you can scale up enough nodes horizontally it’s unlikely to be sensible or even reasonable to add that capacity. I mean we’re not going to add 57x the database capacity on the fly when our customer base is fairly static.

In this case we pay the provider to handle it. They do traffic analysis and if they detect an incoming DDoS then they drop the traffic from it at their network edge.

Edit: there is a mid ground though which is perfectly valid where you are unexpectedly successful. This can make or break a product. I’ve seen both outcomes.
« Last Edit: April 13, 2021, 07:44:36 am by bd139 »
 

Offline calzap

  • Frequent Contributor
  • **
  • Posts: 448
  • Country: us
Re: The BIG EEVblog Server Fire
« Reply #170 on: April 13, 2021, 05:25:36 pm »
What on earth makes you think that's a wooden roof?

I see a (modified) bitumen roof with wood fascia. A "fire suppression system" does nothing if that lights up.
It's common problem in building fires here, the roof and trusses (which are above all the sprinklers) can start burning and the fire crawls along the roof. Firefighters have to dump water despite the interior not being on fire at all.
I would think buildings in the UK have similar issues, with flat tar roofs?
Wood roofs are typical on that sort of building, I don't recall the name they call buildings of that style but the walls are cast in place reinforced concrete with wooden beams running across, supporting a plywood roof that is covered in a weatherproof outer layer. All three of the light industrial complexes my friends' shop has been been built in that way.
Often done as tilt-up construction.  Floor slab is poured.  After it hardens, forms for wall sections are put on the floor, and the wall section concrete is poured.  After wall sections are hard, they are hoisted into place with a crane.  Has been most popular in the U.S., Australia and NZ.   Wikipedia describes it pretty well.

Mike in California
 

Online tautech

  • Super Contributor
  • ***
  • Posts: 28323
  • Country: nz
  • Taupaki Technologies Ltd. Siglent Distributor NZ.
    • Taupaki Technologies Ltd.
Re: The BIG EEVblog Server Fire
« Reply #171 on: April 13, 2021, 06:20:50 pm »
What on earth makes you think that's a wooden roof?

I see a (modified) bitumen roof with wood fascia. A "fire suppression system" does nothing if that lights up.
It's common problem in building fires here, the roof and trusses (which are above all the sprinklers) can start burning and the fire crawls along the roof. Firefighters have to dump water despite the interior not being on fire at all.
I would think buildings in the UK have similar issues, with flat tar roofs?
Wood roofs are typical on that sort of building, I don't recall the name they call buildings of that style but the walls are cast in place reinforced concrete with wooden beams running across, supporting a plywood roof that is covered in a weatherproof outer layer. All three of the light industrial complexes my friends' shop has been been built in that way.
Often done as tilt-up construction.  Floor slab is poured.  After it hardens, forms for wall sections are put on the floor, and the wall section concrete is poured.  After wall sections are hard, they are hoisted into place with a crane.  Has been most popular in the U.S., Australia and NZ.   Wikipedia describes it pretty well.

Mike in California
Would be nice if it were so but sadly not as few slab layers are capable of laying a perfectly flat slab so here we have had professional tilt slab companies for some decades. Finished and cured tilt slabs prefitted with lifting eyes are trucked on edge from the tilt slab plants to construction sites where they are lifted from trucks and placed straight into position were they are strapped to adjoining slabs and temporarily held vertical with adjustable building props until roofing trusses and 2nd story steelwork can be fitted at which time they are considered a safe structure.
Mini tornado flattened some stood and braced slabs a couple years back killing some workers.
Avid Rabid Hobbyist
Siglent Youtube channel: https://www.youtube.com/@SiglentVideo/videos
 

Offline Nusa

  • Super Contributor
  • ***
  • Posts: 2416
  • Country: us
Re: The BIG EEVblog Server Fire
« Reply #172 on: April 13, 2021, 06:24:25 pm »
You're talking about modern construction techniques. The building in question was built during WWII using brick and steel construction. The whole complex (Defense Depot Ogden) remained military until about 25 years ago, and is now known as Business Depot Ogden. Some of the warehouses are still owned and operated by the military, but the complex as a whole is now commercial.

There's an interior picture of one of the old warehouses here, clearly showing the steel construction inside the shell:
https://www.boyerbdo.com/history/
 

Offline floobydust

  • Super Contributor
  • ***
  • Posts: 6958
  • Country: ca
Re: The BIG EEVblog Server Fire
« Reply #173 on: April 13, 2021, 08:19:50 pm »
Yes, wood roof on steel trusses mostly, there are huge timbers in old pics but a newer pic (brewery) shows them replaced with new concrete and steel. I imagine the buildings are all renovated. Back then lumber was plentiful.
It's very difficult with old buildings to meet modern NFPA fire safety codes. They usually don't have enough exits or an updated fire-suppression system. Sprinklers being under the roof are useless and in the pic pipes extend outside, under the overhang - those pipes freeze here in winter.

The military history of the area is incredible 5,000 POW's working there in WWII. B-17's being assembled.
"The soil and groundwater beneath Business Depot Ogden have been polluted with trichloroethylene, vinyl chloride, arsenic, lead, cadmium, mercury, barium and pesticides. The toxic mix of chemicals is the result of decades of cavalier disposal and burning of military-grade trash at the 1,100-acre former military facility, according to EPA records."
 

Offline gnifTopic starter

  • Administrator
  • *****
  • Posts: 1675
  • Country: au
 

Offline madires

  • Super Contributor
  • ***
  • Posts: 7752
  • Country: de
  • A qualified hobbyist ;)
Re: The BIG EEVblog Server Fire
« Reply #175 on: April 14, 2021, 02:05:14 pm »
Looks like communication skills aren't a strength of GS.
 

Offline YurkshireLad

  • Frequent Contributor
  • **
  • Posts: 365
  • Country: ca
Re: The BIG EEVblog Server Fire
« Reply #176 on: April 14, 2021, 02:08:02 pm »
Looks like communication skills aren't a strength of GS.

Time to switch?
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf