Author Topic: The BIG EEVblog Server Fire  (Read 20014 times)

0 Members and 1 Guest are viewing this topic.

Offline artag

  • Super Contributor
  • ***
  • Posts: 1077
  • Country: gb
Re: The BIG EEVblog Server Fire
« Reply #75 on: April 08, 2021, 05:14:52 pm »

From the sounds of it, I would not be surprised if the generators were on the roof.  Normally I would have expected the generators to be in a separate building.  But I have heard of similar issues with mission critical UPS systems failing.  If you have a fire and call you call any local fire department, they will want all power turned off.  They don't care about your business model etc, minimising risk to their firefighters is more important.  The same goes if you call out the lifeboat, their job is to save life, saving the vessel is of secondary importance.

All the reports seem to say that the generators caught fire, the remaining power was taken down by the fire department, and that some servers were in danger of water damage.

My conclusion is that the fire department applied plenty of water and a little of it got into a small area of servers. Perhaps near, but not colocated with the fiery genset.
 
 

Online ejeffrey

  • Super Contributor
  • ***
  • Posts: 3727
  • Country: us
Re: The BIG EEVblog Server Fire
« Reply #76 on: April 08, 2021, 05:15:56 pm »
I thought datacenters typically used Halon fire suppression systems?
I don't think anyone uses Halon any more, it would be something like FM200 these days. And yes, this would usually be installed in the server rooms.

You would usually get something like 30 seconds from alarms and strobes activating to evacuate before it is dumped into the room because yes, it's whole purpose is to displace oxygen to starve a fire.

Halon and FM200 *do not work* by displacing oxygen, that is a persistent myth. They work by neutralizing the free radicals that allow fire to propagate.  This allows them to be used in much smaller quantities than would be needed to extinguish a fire by oxygen displacement.  Typically they are used at <10% concentration so they only lower the oxygen concentration slightly.  You still want to evacuate because the distribution is not uniform and also because -- THERE IS A FIRE, but generally they are low risk to humans.

CO2 fire extinguisher systems do work by displacing oxygen, and full-room CO2 systems are not generally used in occupied areas because of this.
 

Offline james_s

  • Super Contributor
  • ***
  • Posts: 21611
  • Country: us
Re: The BIG EEVblog Server Fire
« Reply #77 on: April 08, 2021, 05:23:55 pm »
In the early 80s the company I worked for kept changing its insurers each year. These alternated between pro-halon insurers and pro-water insurers. We had halon and sprinkler systems alternately installed and ripped out of the computer room I oversaw for several years.

Seems like they could just leave both systems in place and disable the one they weren't using.
 

Offline james_s

  • Super Contributor
  • ***
  • Posts: 21611
  • Country: us
Re: The BIG EEVblog Server Fire
« Reply #78 on: April 08, 2021, 05:33:27 pm »
All their reports say they had a mechanical genset  failure resulting in a fire. So this isn’t a switch gear fault I suspect. The issue seems to be the proximity  of servers to the Gensets. It sounds like they were in the same space  :palm:

The pictures Dave posted look like a random light industrial park of the sort of place my friend's machine shop is in, there's even a sports bar & grill in one of the units. Looks like the genset has to be in the same space, there isn't anywhere else to put it.
 

Offline madires

  • Super Contributor
  • ***
  • Posts: 7770
  • Country: de
  • A qualified hobbyist ;)
Re: The BIG EEVblog Server Fire
« Reply #79 on: April 08, 2021, 05:56:43 pm »
Webhosting talk forum: https://www.webhostingtalk.com/showthread.php?t=1842301

---

I would abort continuing to host anything at this data center, here in Florida, most all data centers do NOT use water.. they use an FM200 Fire Suppression System, also, Generators are not co-located with the data center, they are usually detached just in case something like this happens, Generators due tend to go, and when the go, they go violently..  the worst part is, this data center probably had multiple gensets, covering the load, when one went down, it caused a cascade effect going down the line..  Fuel should of also been shut off and isolated, but it appears not?  There should of been 0 water damage inside the facilities, this clearly is some kind of failed engineering design of a building that probably should of not been designed to be a data center.

---

Wishful thinking! In case of a fire the fire brigade decides what to do. If they think it's a good idea to hose down all servers despite a FM-200 fire suppression system you can't do much about that. Local regulations might force you to use water. Or the management board tells you to use water because it's cheap and the insurance will pay for any damages. Fire is one of the more likely events, but there are also many more, and in those cases it doesn't matter if the fire suppression system uses water or something more hardware friendly. If you need to keep your platform running 24x7 you have to design a redundant solution, i.e. you can't rely on a single data center. Complaining about a data center using a water based fire suppression system is laughable. Simply do your homework!
 

Offline Bud

  • Super Contributor
  • ***
  • Posts: 6913
  • Country: ca
Re: The BIG EEVblog Server Fire
« Reply #80 on: April 08, 2021, 06:10:25 pm »
I am questioning the details of the culprit, that is  - what happened to the backup generatir, did they buy it on Alibaba?
Facebook-free life and Rigol-free shack.
 

Offline james_s

  • Super Contributor
  • ***
  • Posts: 21611
  • Country: us
Re: The BIG EEVblog Server Fire
« Reply #81 on: April 08, 2021, 06:15:58 pm »
I am questioning the details of the culprit, that is  - what happened to the backup generatir, did they buy it on Alibaba?

Generators fail. Even well known name brand stuff like Caterpillar, Detroit, etc blow up now and then, especially older ones which it was mentioned this was. These are big diesel engines, they require maintenance and occasionally stuff breaks. If it spins a bearing, or has an injector problem, or something like a bad seal in a turbocharger can cause the engine to run away consuming its own lubricating oil until it throws a rod through the side of the block and spills oil and/or fuel all over the hot exhaust system and then you have a fire.
 

Offline Bud

  • Super Contributor
  • ***
  • Posts: 6913
  • Country: ca
Re: The BIG EEVblog Server Fire
« Reply #82 on: April 08, 2021, 06:34:28 pm »
A backup generator's keword is 'backup', isn't it. Some robustness is supposed to be embedded in it from the get go starting from specs.
Facebook-free life and Rigol-free shack.
 

Offline PaulAm

  • Frequent Contributor
  • **
  • Posts: 938
  • Country: us
Re: The BIG EEVblog Server Fire
« Reply #83 on: April 08, 2021, 06:53:39 pm »
Don't forget that part of the backup power architecture is a humongous UPS.  It's job is to keep everything up until the genset is up and stable.  Quite possible something went wrong in that gear.

My thought after the first day was that EEs everywhere across the world were showing an unexplained increase in productivity  :-DD
 
The following users thanked this post: james_s, Jacon

Offline SilverSolder

  • Super Contributor
  • ***
  • Posts: 6126
  • Country: 00
Re: The BIG EEVblog Server Fire
« Reply #84 on: April 08, 2021, 07:09:41 pm »
I have in front of me a 1TB microSD card, 15 x 11 x 1mm thick (fascinating hi-res X-Rays here). That works out at roughly 6GB per cubic millimetre.

Assuming we have a VW Passat (which claims 1780 litres of cargo space with the rear seats down), our modern equivalent of the station-wagon-full of tapes is potentially something like 1780 x (100 x 100 x 100) x 6GB, or around 10 exabytes (10x10^9 GB).

And yes I know we wouldn't get that capacity, they're slow to write to, and so on and so on, but it does make one think.

1TB microSD still seems like sci-fi territory to me...   an amazing milestone.  Next:  10TB!  :D
 

Offline james_s

  • Super Contributor
  • ***
  • Posts: 21611
  • Country: us
Re: The BIG EEVblog Server Fire
« Reply #85 on: April 08, 2021, 07:17:35 pm »
1TB microSD still seems like sci-fi territory to me...   an amazing milestone.  Next:  10TB!  :D

Makes me think of the old spy movies where somebody is trying to smuggle a microfilm containing information. These days you could fit an entire library on a thumbnail sized micro SD card that is easily hidden inside nearly anything.
 
The following users thanked this post: SilverSolder

Offline SilverSolder

  • Super Contributor
  • ***
  • Posts: 6126
  • Country: 00
Re: The BIG EEVblog Server Fire
« Reply #86 on: April 08, 2021, 07:20:47 pm »
I know of a sizeable datacenter in east coast US serving the financial industry that went down due to the backup systems many years ago.  It is always Murphy that gets you: 


(1) External power failed, causing the building to switch to battery backup based on enormous banks of lead-acid batteries (a large room full) in anticipation of starting an array of big diesel generators.

(2) One of those batteries blew up under the sudden load, spraying an employee with acid.  The other men ran to his rescue and got him in a shower real fast (he was OK)...

(3) ...but sadly, this cost crucial minutes...  and the batteries ran out the second they tried to start the diesel generators...  BLINK, the center went pitch black, with no way to start the diesels!


 

Offline jonpaul

  • Super Contributor
  • ***
  • Posts: 3366
  • Country: fr
Re: The BIG EEVblog Server Fire
« Reply #87 on: April 08, 2021, 07:54:03 pm »
Hello: Thanks for the info,

Most pro server farms use a non-water fire suppression system eg nitrogen inerting.

Wonder why EEV blog did not select a large host form like AWS, Ionos, GoDaddy etc as a host.

I had never heard of this firm.

Kind Regards,
Jon
Jean-Paul  the Internet Dinosaur
 

Offline 3roomlab

  • Frequent Contributor
  • **
  • Posts: 825
  • Country: 00
Re: The BIG EEVblog Server Fire
« Reply #88 on: April 08, 2021, 08:06:03 pm »
maybe it is time to move to other countries with less power interruptions





 
The following users thanked this post: tooki, I wanted a rude username

Offline james_s

  • Super Contributor
  • ***
  • Posts: 21611
  • Country: us
Re: The BIG EEVblog Server Fire
« Reply #89 on: April 08, 2021, 08:56:23 pm »
A backup generator's keword is 'backup', isn't it. Some robustness is supposed to be embedded in it from the get go starting from specs.

Complex mechanical devices fail, it's just a fact of life. Turbine engines on aircraft are highly critical life safety items that are engineered to be extremely reliable and impeccably maintained, they still fail catastrophically now and then, sometimes with loss of life. Some robustness IS embedded in backup generators, it isn't like they blow up every day, but occasionally they still fail, especially if maintenance has been less than stellar, which is often the case if a company is not compelled to do it religiously the way aviation is. It's easy to look at the bottom line when the budget is tight and margins are slim and think "do we REALLY need to invest $$$ in rebuilding or replacing the genset or can we defer that to next year?" It happens, and it's much easier to predict and point fingers with the benefit of hindsight. If this place had a history of generators failing I would be more critical, but a single catastrophic failure is much too small of a sample to judge by. It could have been a total fluke where something just blew up due to a manufacturing defect that was never caught, or it could be it wasn't properly maintained, or it could be somebody monkeyed with it at some point, we don't know.
 

Offline bd139

  • Super Contributor
  • ***
  • Posts: 23033
  • Country: gb
Re: The BIG EEVblog Server Fire
« Reply #90 on: April 08, 2021, 10:04:20 pm »
Interesting thread.

Also after looking at the photos and Google maps, an observation:

Don’t go with a facility with no crash wall, no fence and parks a trailer probably with a nice propane bottle actually in the boundary of the building. If they do that then there’s probably 9000 even worse things inside the building you can’t see which are waiting to take your business out.

May be better to leverage an IaaS cloud here. The whole availability zone concept makes these events into non events. If a facility burns then you lose an availability zone, not your entire deployment.

I made this observation originally after mistakenly hosting half a rack of shit at a red neck provider here.

Edit: also always host your email somewhere completely different as you might find it difficult to contact support if your email server is on fire (did that once)
« Last Edit: April 08, 2021, 10:10:36 pm by bd139 »
 

Offline CatalinaWOW

  • Super Contributor
  • ***
  • Posts: 5239
  • Country: us
Re: The BIG EEVblog Server Fire
« Reply #91 on: April 09, 2021, 12:01:46 am »
I am questioning the details of the culprit, that is  - what happened to the backup generatir, did they buy it on Alibaba?

Generators fail. Even well known name brand stuff like Caterpillar, Detroit, etc blow up now and then, especially older ones which it was mentioned this was. These are big diesel engines, they require maintenance and occasionally stuff breaks. If it spins a bearing, or has an injector problem, or something like a bad seal in a turbocharger can cause the engine to run away consuming its own lubricating oil until it throws a rod through the side of the block and spills oil and/or fuel all over the hot exhaust system and then you have a fire.

Failure is expected.  Failure with catastrophic results is what is surprising.  Aerospace uses something called FMECA, (Failure Modes, Effects and Criticallity Analysis) to try to prevent the latter.  A spun bearing, or low oil or ....   It isn't perfect.  Things still slip through the cracks, but apparently we don't care that much about data loss, or respirator failure or anything else that depends on backup generators.
 

Offline ve7xen

  • Super Contributor
  • ***
  • Posts: 1193
  • Country: ca
    • VE7XEN Blog
Re: The BIG EEVblog Server Fire
« Reply #92 on: April 09, 2021, 12:11:53 am »
"do we REALLY need to invest $$$ in rebuilding or replacing the genset or can we defer that to next year?" It happens, and it's much easier to predict and point fingers with the benefit of hindsight. If this place had a history of generators failing I would be more critical, but a single catastrophic failure is much too small of a sample to judge by. It could have been a total fluke where something just blew up due to a manufacturing defect that was never caught, or it could be it wasn't properly maintained, or it could be somebody monkeyed with it at some point, we don't know.

And like you say, sometimes you still get bitten. We're pretty good about maintaining our generators, every 6 months a guy comes from the service company and does all the scheduled maintenance items according to the manufacturers' recommendations, we do monthly full-load tests, and so on.

A few years back we suffered a ~12h outage anyway. The genset fired up and took the load, no problem, and a few hours later, we got the overheat alarm and it shut itself down. Go to the bunker and find that one of the coolant hoses has burst and it's pumped all of its many litres of coolant onto the floor. We later learned that while the maintenance guy had been there two weeks before, and a 5-year replacement of the hoses was due, he happened to not have that one particular hose in his truck that day, and we allowed to be deferred to the next maintenance. Murphy gets you every time.
« Last Edit: April 09, 2021, 12:22:31 am by ve7xen »
73 de VE7XEN
He/Him
 

Online EEVblog

  • Administrator
  • *****
  • Posts: 37744
  • Country: au
    • EEVblog
Re: The BIG EEVblog Server Fire
« Reply #93 on: April 09, 2021, 12:37:07 am »
I just got one months credit. So they implemented the 30 day limit clause.
 

Offline bdunham7

  • Super Contributor
  • ***
  • Posts: 7861
  • Country: us
Re: The BIG EEVblog Server Fire
« Reply #94 on: April 09, 2021, 12:43:58 am »
I just got one months credit. So they implemented the 30 day limit clause.

At least they didn't invoke force majeure and weasel out entirely!
A 3.5 digit 4.5 digit 5 digit 5.5 digit 6.5 digit 7.5 digit DMM is good enough for most people.
 

Offline james_s

  • Super Contributor
  • ***
  • Posts: 21611
  • Country: us
Re: The BIG EEVblog Server Fire
« Reply #95 on: April 09, 2021, 01:53:29 am »
Failure is expected.  Failure with catastrophic results is what is surprising.  Aerospace uses something called FMECA, (Failure Modes, Effects and Criticallity Analysis) to try to prevent the latter.  A spun bearing, or low oil or ....   It isn't perfect.  Things still slip through the cracks, but apparently we don't care that much about data loss, or respirator failure or anything else that depends on backup generators.

We don't really know how much people care, because we know very little about the situation, only that a genset they refer to as "older" failed in some catastrophic manner and caused the outage. I've never shopped around for hosting of this sort but I would expect that prices vary widely and that to some extent, you get what you pay for in terms of reliability. There could be redundant sites, or at the very least, redundant gensets located in separate buildings, or newer/higher quality gensets, or any number of other things There is a lot of web hosting, this forum included, which I would not call mission critical. Nobody dies if it goes down for a few days and very few people are even particularly inconvenienced, is it worth paying more for higher reliability? Only Dave can answer that since it's his money, but given the rarity of outages I would vote no. I mean this is ONE outage, and it's impossible to measure reliability from a single failure. Maybe it was an accident waiting to happen and it is miraculous that things held on this long, or maybe it's all top notch high quality gear that is meticulously maintained and something still blew up because, sometimes no matter how careful you are shit happens, we don't know. I'm going to guess that the truth is somewhere in the middle.
 
The following users thanked this post: Jacon

Offline james_s

  • Super Contributor
  • ***
  • Posts: 21611
  • Country: us
Re: The BIG EEVblog Server Fire
« Reply #96 on: April 09, 2021, 01:58:04 am »
Interesting thread.

Also after looking at the photos and Google maps, an observation:

Don’t go with a facility with no crash wall, no fence and parks a trailer probably with a nice propane bottle actually in the boundary of the building. If they do that then there’s probably 9000 even worse things inside the building you can’t see which are waiting to take your business out.

The trailer wouldn't bother me particularly, it looks in good shape and those things don't often just go up unless someone is using one as a meth kitchen. Of all the details observable in the photos, restaurant in the building is the most eyebrow raising. It's not that rare for a restaurant to catch fire although even that doesn't happen every day.
 

Online tautech

  • Super Contributor
  • ***
  • Posts: 28393
  • Country: nz
  • Taupaki Technologies Ltd. Siglent Distributor NZ.
    • Taupaki Technologies Ltd.
Re: The BIG EEVblog Server Fire
« Reply #97 on: April 09, 2021, 02:19:58 am »
Failure is expected.  Failure with catastrophic results is what is surprising.  Aerospace uses something called FMECA, (Failure Modes, Effects and Criticallity Analysis) to try to prevent the latter.  A spun bearing, or low oil or ....   It isn't perfect.  Things still slip through the cracks, but apparently we don't care that much about data loss, or respirator failure or anything else that depends on backup generators.

We don't really know how much people care, because we know very little about the situation, only that a genset they refer to as "older" failed in some catastrophic manner and caused the outage. I've never shopped around for hosting of this sort but I would expect that prices vary widely and that to some extent, you get what you pay for in terms of reliability. There could be redundant sites, or at the very least, redundant gensets located in separate buildings, or newer/higher quality gensets, or any number of other things There is a lot of web hosting, this forum included, which I would not call mission critical. Nobody dies if it goes down for a few days and very few people are even particularly inconvenienced, is it worth paying more for higher reliability? Only Dave can answer that since it's his money, but given the rarity of outages I would vote no. I mean this is ONE outage, and it's impossible to measure reliability from a single failure. Maybe it was an accident waiting to happen and it is miraculous that things held on this long, or maybe it's all top notch high quality gear that is meticulously maintained and something still blew up because, sometimes no matter how careful you are shit happens, we don't know. I'm going to guess that the truth is somewhere in the middle.
Sure but a year or two back Dave changed hosts to this crowd due to server unreliability and in 8 years I've be aboard this one by far was the largest outage.
We may have been inconvenienced some however Dave will have taken a financial hit due to his shop being down and loss of forum advertising.
Dave's very lucky gnif does most of his admin on the cheap.
Avid Rabid Hobbyist
Siglent Youtube channel: https://www.youtube.com/@SiglentVideo/videos
 

Online Ed.Kloonk

  • Super Contributor
  • ***
  • Posts: 4000
  • Country: au
  • Cat video aficionado
Re: The BIG EEVblog Server Fire
« Reply #98 on: April 09, 2021, 03:03:33 am »
I just got one months credit. So they implemented the 30 day limit clause.

At least they didn't invoke force majeure and weasel out entirely!

I was an act of God, I tell you.

https://en.wikipedia.org/wiki/The_Man_Who_Sued_God
iratus parum formica
 

Offline tooki

  • Super Contributor
  • ***
  • Posts: 11561
  • Country: ch
Re: The BIG EEVblog Server Fire
« Reply #99 on: April 09, 2021, 05:18:10 am »
Never underestimate the bandwidth of a station wagon full of tapes hurtling down the highway.

— Andrew S Tanenbaum, 1989
Yes, or of actual messenger pigeons!

With that said, the critical downside of the station wagon or pigeon is the latency! :P

I'm sorry I'm dragging this so far off-topic; I promise to sit on the Naughty Step for a while and Think About What I've Done. But meanwhile, have a ponder about the effective latency in even, say, a thousand parallel, error-free, get-the-label-speed 10Gbps direct links for that amount of data. How many miles apart do your data centres have to be before it's quicker to send the data via wires than wheels?

And don't get me started on the effective data rate of a cargo plane worth of 6GB/mm3...

OK, I've arrived at the Naughty Step. Sitting in 3... 2... 1...
Well, that’s not really what latency means... ;) (Latency is the minimum time for a single bit to go in one end of the “pipe” and come out the other end. As a metric, it isn’t dependent on the total size of the data you’re sending.)
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf