Author Topic: The BIG EEVblog Server Fire  (Read 19572 times)

0 Members and 1 Guest are viewing this topic.

Offline james_s

  • Super Contributor
  • ***
  • Posts: 21611
  • Country: us
Re: The BIG EEVblog Server Fire
« Reply #150 on: April 12, 2021, 04:03:50 am »
I looked at google street view and only one exhaust pipe for a generator, near the electrical room.
The facility appears to be in some old warehouse (military?) district with brick exterior walls and a wooden roof? If true that's a problem.

Had to laugh, not a solar panel in sight.

Buildings like that are all over the place in light industrial areas. As I mentioned my friends have a machine shop in a similar building, they've had all sorts of different neighbors and they've moved a couple of times too. Auto mechanic, sign company, importer warehouse, cabinet maker, in their current spot the place next door sells and services air compressors, that kind of stuff. In most of these industrial parks the business rents one or more bays and outfits the interior as needed. Solar probably isn't really an option in most of those places, they don't own the building and the landlord doesn't care about the power bill, they aren't the one paying it. They don't want a bunch of holes drilled in their roof.
 

Offline schmitt trigger

  • Super Contributor
  • ***
  • Posts: 2205
  • Country: mx
Re: The BIG EEVblog Server Fire
« Reply #151 on: April 12, 2021, 03:05:16 pm »
Very interesting thread!

As I mentioned previously, hopefully Dave will do a video(s) regarding the subtle details of ensuring AC mains uptime.

With everything nowadays tied to the web, this issue has become more critical than ever.

Back in the mainframe days, I remember a motor-generator set where the blackout ride-through energy was stored in a gigantic flywheel.
To my surprise, they are still being used.
Google has plenty of examples.
.
 

Offline duckduck

  • Frequent Contributor
  • **
  • Posts: 407
  • Country: us
  • 20Hz < fun < 20kHz, and RF is Really Fun
Re: The BIG EEVblog Server Fire
« Reply #152 on: April 12, 2021, 06:41:00 pm »
How much can the backup power infrastructure be cut back if there's a system to force all CPUs to minimum frequency when running on backup power?

That's a great idea. One issue I can see for a hosting company is that they allow their customers to manage/reinstall to OS and apps. It would be difficult to enforce the installation of power-management software. It would be great if servers had a (let's say) 5 volt input, and when it dropped to below 1 volt, the BIOS would throttle the CPU down. Then you "just" run a 5 volt line run off of non-UPS, non-generator mains to each server and you're golden.
 
The following users thanked this post: SilverSolder

Offline schmitt trigger

  • Super Contributor
  • ***
  • Posts: 2205
  • Country: mx
Re: The BIG EEVblog Server Fire
« Reply #153 on: April 12, 2021, 07:02:11 pm »
In the rotary UPS I mentioned above, it was explained to me that during the elapsed time between mains interruption and the diesel genset actually supplying power, certain non-critical processes were halted. Like printers and punched card readers.
There may have been others.
 

Offline bd139

  • Super Contributor
  • ***
  • Posts: 23017
  • Country: gb
Re: The BIG EEVblog Server Fire
« Reply #154 on: April 12, 2021, 07:10:03 pm »
How much can the backup power infrastructure be cut back if there's a system to force all CPUs to minimum frequency when running on backup power?

That probably wouldn't work. If you're running near your CPU provision, you can enter a thing called "load hysteresis" which may be irrecoverable. This is where your load average goes above the total capacity and the CPUs can never catch up with the workload.  It requires adding much much more capacity than you had to start with before you can being the demand you had originally. Either that or breaking a huge chunk of your incoming load to recover.

This is similar to halving the size of your cluster during peak demand, which never works out well. I was working for a very large stats company here a few years back and there was a sudden spike of packet loss over the inter-DC link they were running. The ops director at the time decided to fail the active-active over to active-standby and caused a 5 minute chunk of slowness into a 4 hour recovery job.  :palm:
« Last Edit: April 12, 2021, 07:12:05 pm by bd139 »
 
The following users thanked this post: duckduck

Offline gnifTopic starter

  • Administrator
  • *****
  • Posts: 1672
  • Country: au
Re: The BIG EEVblog Server Fire
« Reply #155 on: April 13, 2021, 12:23:50 am »
I run a few sites on virtual servers, with various backup policies (which I won't write about openly for obvious reasons) and if the virtual server company blew up and vanished for ever, I could start up a backup server which is a media PC running on an FTTP (80/30mbps) ADSL line :) That would actually be fast enough for EEVBLOG, on a bad day.

And how would you know how much bandwidth EEVBlog uses each day? Outgoing traffic averages 90mbps peaking out at 200mbps during high activity periods such as giveaways, etc.

Quote
It is practically impossible to lose everything, in this setup, and it is very cheap.

What if someone deletes all the files on your server and rsync does it's dutiful job and deletes them on your home machine? Or worse, some files are corrupted and you dont detect this for a few weeks? rsync is not a recommended enterprise grade backup solution, I suggest you look into BareOS (Free), or R1Soft (commercial).

EEVBlog is backed up daily to two remote locations with a 6 month data retention policy, restoration to a bare metal server can be done in an hour or two (depending on network speed) and server configuration is performed via Puppet, this takes mere seconds. Total production ready stand-up time form bare metal is restore time + 10-20 seconds. With a warm backup (which we will be looking into) downtime would be nearly zero in the event of a failure like this in the future.

One has to weigh up the time vs cost in standing up a new server when things like this happen. Sure, Dave could have paid for a server elsewhere (in fact, we had several offers of temp servers), however this makes things more complex when it comes to decommissioning these servers when they are no longer needed. Ie, sync temp to primary servers, change over DNS records and while waiting for DNS records to propagate proxy traffic to the primary servers from the temp server. At the end of the day, it's up to the site owner to decide on the best course of action for their business, even if there are technical solutions that could be implemented here and now.

When you start hosting sites the size of EEVBlog you will quickly learn that you can't just cowboy things, because that 0.5s of downtime when you decided it wont hurt to just restart the HTTP service to make a config tweak, will impact people.
« Last Edit: April 13, 2021, 12:38:07 am by gnif »
 
The following users thanked this post: Ed.Kloonk, xrunner, thm_w, Jacon, bd139

Offline SL4P

  • Super Contributor
  • ***
  • Posts: 2318
  • Country: au
  • There's more value if you figure it out yourself!
Re: The BIG EEVblog Server Fire
« Reply #156 on: April 13, 2021, 01:27:39 am »
Simple question.
Why was there water damage in the datacenter?  Surely the backup power was in an adjacent building or basement ?
Don't ask a question if you aren't willing to listen to the answer.
 

Offline gnifTopic starter

  • Administrator
  • *****
  • Posts: 1672
  • Country: au
Re: The BIG EEVblog Server Fire
« Reply #157 on: April 13, 2021, 01:29:20 am »
Simple answer, read through this thread. GS are still recovering and are yet to release details.
 

Offline Monkeh

  • Super Contributor
  • ***
  • Posts: 7990
  • Country: gb
Re: The BIG EEVblog Server Fire
« Reply #158 on: April 13, 2021, 01:37:39 am »
I looked at google street view and only one exhaust pipe for a generator, near the electrical room.
The facility appears to be in some old warehouse (military?) district with brick exterior walls and a wooden roof? If true that's a problem.

Had to laugh, not a solar panel in sight.

What on earth makes you think that's a wooden roof?
 
The following users thanked this post: thm_w

Online NiHaoMike

  • Super Contributor
  • ***
  • Posts: 8973
  • Country: us
  • "Don't turn it on - Take it apart!"
    • Facebook Page
Re: The BIG EEVblog Server Fire
« Reply #159 on: April 13, 2021, 02:45:44 am »
That's a great idea. One issue I can see for a hosting company is that they allow their customers to manage/reinstall to OS and apps. It would be difficult to enforce the installation of power-management software. It would be great if servers had a (let's say) 5 volt input, and when it dropped to below 1 volt, the BIOS would throttle the CPU down. Then you "just" run a 5 volt line run off of non-UPS, non-generator mains to each server and you're golden.
I have hacked some PCs to do just that by using a small MOSFET to pull the PROCHOT line to ground.
That probably wouldn't work. If you're running near your CPU provision, you can enter a thing called "load hysteresis" which may be irrecoverable. This is where your load average goes above the total capacity and the CPUs can never catch up with the workload.  It requires adding much much more capacity than you had to start with before you can being the demand you had originally. Either that or breaking a huge chunk of your incoming load to recover.
That doesn't sound like something that should happen with a robustly designed service, wouldn't that mean a hacker who wants to take it down can just DDoS it for a short time and let queue overflow continue it for much longer than the initial attack?
Cryptocurrency has taught me to love math and at the same time be baffled by it.

Cryptocurrency lesson 0: Altcoins and Bitcoin are not the same thing.
 

Offline drussell

  • Super Contributor
  • ***
  • Posts: 1855
  • Country: ca
  • Hardcore Geek
Re: The BIG EEVblog Server Fire
« Reply #160 on: April 13, 2021, 03:04:51 am »
How much can the backup power infrastructure be cut back if there's a system to force all CPUs to minimum frequency when running on backup power?

That's a great idea.

No, it is not.

You don't randomly force-throttle a server...   :palm:
In many cases that would be just as bad of a scenario and you might as well just have pulled the power.
 

Offline Ed.Kloonk

  • Super Contributor
  • ***
  • Posts: 4000
  • Country: au
  • Cat video aficionado
Re: The BIG EEVblog Server Fire
« Reply #161 on: April 13, 2021, 03:16:16 am »
How much can the backup power infrastructure be cut back if there's a system to force all CPUs to minimum frequency when running on backup power?

That's a great idea.

No, it is not.

You don't randomly force-throttle a server...   :palm:
In many cases that would be just as bad of a scenario and you might as well just have pulled the power.

Software should tell the hardware what to do, not the other way around.
iratus parum formica
 

Online NiHaoMike

  • Super Contributor
  • ***
  • Posts: 8973
  • Country: us
  • "Don't turn it on - Take it apart!"
    • Facebook Page
Re: The BIG EEVblog Server Fire
« Reply #162 on: April 13, 2021, 03:36:03 am »
You don't randomly force-throttle a server...   :palm:
In many cases that would be just as bad of a scenario and you might as well just have pulled the power.
Why not if the software could recover once things are back to normal? Obviously, you wouldn't do that for critical real time applications like a VoIP server, but for something like a web server, I don't see why the software couldn't be designed to handle it gracefully. Perhaps whether or not a server gets throttled on backup power be something that's reflected in the service fees (perhaps multiple levels that specify how much throttling and for how much time before partial refunds will be provided), I'd imagine that would do a lot to motivate the developers to make their programs able to tolerate it in order to take advantage of cheaper hosting.
Cryptocurrency has taught me to love math and at the same time be baffled by it.

Cryptocurrency lesson 0: Altcoins and Bitcoin are not the same thing.
 

Offline drussell

  • Super Contributor
  • ***
  • Posts: 1855
  • Country: ca
  • Hardcore Geek
Re: The BIG EEVblog Server Fire
« Reply #163 on: April 13, 2021, 04:07:33 am »
Are you going to make sure the "backup" will now be in another room or at least several racks away?

Why not if the software could recover once things are back to normal? Obviously, you wouldn't do that for critical real time applications like a VoIP server, but for something like a web server, I don't see why the software couldn't be designed to handle it gracefully.

Oh, my....   :palm: 

You've obviously never had the joy of running any "real" servers.   :-DD  (even real web servers) 
 

Offline floobydust

  • Super Contributor
  • ***
  • Posts: 6923
  • Country: ca
Re: The BIG EEVblog Server Fire
« Reply #164 on: April 13, 2021, 04:10:30 am »
What on earth makes you think that's a wooden roof?

I see a (modified) bitumen roof with wood fascia. A "fire suppression system" does nothing if that lights up.
It's common problem in building fires here, the roof and trusses (which are above all the sprinklers) can start burning and the fire crawls along the roof. Firefighters have to dump water despite the interior not being on fire at all.
I would think buildings in the UK have similar issues, with flat tar roofs?
 

Offline Monkeh

  • Super Contributor
  • ***
  • Posts: 7990
  • Country: gb
Re: The BIG EEVblog Server Fire
« Reply #165 on: April 13, 2021, 04:27:54 am »
What on earth makes you think that's a wooden roof?

I see a (modified) bitumen roof with wood fascia. A "fire suppression system" does nothing if that lights up.
It's common problem in building fires here, the roof and trusses (which are above all the sprinklers) can start burning and the fire crawls along the roof. Firefighters have to dump water despite the interior not being on fire at all.
I would think buildings in the UK have similar issues, with flat tar roofs?

I see a structure which will have steel trusses. The deck could be anything up to and including pre-cast concrete panels.
 

Online ejeffrey

  • Super Contributor
  • ***
  • Posts: 3683
  • Country: us
Re: The BIG EEVblog Server Fire
« Reply #166 on: April 13, 2021, 04:31:23 am »
Most cloud compute platforms offer some form of preemptable / flexible instance class that the provider can shut down at any time.  This allows them to reduce over-provisioning while serving peak loads to high paying customers but I'm sure it can also be used to throttle for power consumption purposes due to distribution capacity, backup generator size, or cooling.  You then pay less -- possibly much less -- per CPU hour.  It's not usually useful for a webserver since you can't control when users come to your site.  You can't get an HTTPS request and just say "I'll respond to that when the spot price drops."  There might be situations where a moderate amount of downtime isn't a big deal but it isn't the normal situation for commercial web hosting.

Cancelling low priority or low urgency jobs is much more effective than force throttling CPU frequency.  Even cancelling high priority jobs is sometimes the best approach if it gets you through a crunch without outright failure.  Throttling CPUs not a great way to apply backpressure.  At first it does nothing but reduce the sleep/idle time fraction.  Then it generates congestion.  Only when the congestion degrades the service enough people stop using it does it really reduce the workload.  Essentially you put human behavior in your feedback loop, and require poor quality of service to be effective.
 

Offline floobydust

  • Super Contributor
  • ***
  • Posts: 6923
  • Country: ca
Re: The BIG EEVblog Server Fire
« Reply #167 on: April 13, 2021, 04:49:21 am »
What on earth makes you think that's a wooden roof?

I see a (modified) bitumen roof with wood fascia. A "fire suppression system" does nothing if that lights up.
It's common problem in building fires here, the roof and trusses (which are above all the sprinklers) can start burning and the fire crawls along the roof. Firefighters have to dump water despite the interior not being on fire at all.
I would think buildings in the UK have similar issues, with flat tar roofs?

I see a structure which will have steel trusses. The deck could be anything up to and including pre-cast concrete panels.

The roof overhang on the loading docks is all wood, for many of the buildings. They might be cheaply built, whatever era they are from. There's no rooftop A/C on any of the buildings so they might not support any weight beyond snow load.
Point is, a datacenter should be constructed entirely of non-combustible building materials IMHO.
 

Offline james_s

  • Super Contributor
  • ***
  • Posts: 21611
  • Country: us
Re: The BIG EEVblog Server Fire
« Reply #168 on: April 13, 2021, 06:06:40 am »
What on earth makes you think that's a wooden roof?

I see a (modified) bitumen roof with wood fascia. A "fire suppression system" does nothing if that lights up.
It's common problem in building fires here, the roof and trusses (which are above all the sprinklers) can start burning and the fire crawls along the roof. Firefighters have to dump water despite the interior not being on fire at all.
I would think buildings in the UK have similar issues, with flat tar roofs?

Wood roofs are typical on that sort of building, I don't recall the name they call buildings of that style but the walls are cast in place reinforced concrete with wooden beams running across, supporting a plywood roof that is covered in a weatherproof outer layer. All three of the light industrial complexes my friends' shop has been been built in that way.
 

Offline bd139

  • Super Contributor
  • ***
  • Posts: 23017
  • Country: gb
Re: The BIG EEVblog Server Fire
« Reply #169 on: April 13, 2021, 07:40:22 am »
That's a great idea. One issue I can see for a hosting company is that they allow their customers to manage/reinstall to OS and apps. It would be difficult to enforce the installation of power-management software. It would be great if servers had a (let's say) 5 volt input, and when it dropped to below 1 volt, the BIOS would throttle the CPU down. Then you "just" run a 5 volt line run off of non-UPS, non-generator mains to each server and you're golden.
I have hacked some PCs to do just that by using a small MOSFET to pull the PROCHOT line to ground.
That probably wouldn't work. If you're running near your CPU provision, you can enter a thing called "load hysteresis" which may be irrecoverable. This is where your load average goes above the total capacity and the CPUs can never catch up with the workload.  It requires adding much much more capacity than you had to start with before you can being the demand you had originally. Either that or breaking a huge chunk of your incoming load to recover.
That doesn't sound like something that should happen with a robustly designed service, wouldn't that mean a hacker who wants to take it down can just DDoS it for a short time and let queue overflow continue it for much longer than the initial attack?

DDoS is not something you handle at the service level. There’s no way to sink one. I mean how do you scale up something (taking our shit as an example) from aggregate average 700Mbit out to say 40Gbit out which is your saturation boundary? Even if you can scale up enough nodes horizontally it’s unlikely to be sensible or even reasonable to add that capacity. I mean we’re not going to add 57x the database capacity on the fly when our customer base is fairly static.

In this case we pay the provider to handle it. They do traffic analysis and if they detect an incoming DDoS then they drop the traffic from it at their network edge.

Edit: there is a mid ground though which is perfectly valid where you are unexpectedly successful. This can make or break a product. I’ve seen both outcomes.
« Last Edit: April 13, 2021, 07:44:36 am by bd139 »
 

Offline calzap

  • Frequent Contributor
  • **
  • Posts: 437
  • Country: us
Re: The BIG EEVblog Server Fire
« Reply #170 on: April 13, 2021, 05:25:36 pm »
What on earth makes you think that's a wooden roof?

I see a (modified) bitumen roof with wood fascia. A "fire suppression system" does nothing if that lights up.
It's common problem in building fires here, the roof and trusses (which are above all the sprinklers) can start burning and the fire crawls along the roof. Firefighters have to dump water despite the interior not being on fire at all.
I would think buildings in the UK have similar issues, with flat tar roofs?
Wood roofs are typical on that sort of building, I don't recall the name they call buildings of that style but the walls are cast in place reinforced concrete with wooden beams running across, supporting a plywood roof that is covered in a weatherproof outer layer. All three of the light industrial complexes my friends' shop has been been built in that way.
Often done as tilt-up construction.  Floor slab is poured.  After it hardens, forms for wall sections are put on the floor, and the wall section concrete is poured.  After wall sections are hard, they are hoisted into place with a crane.  Has been most popular in the U.S., Australia and NZ.   Wikipedia describes it pretty well.

Mike in California
 

Offline tautech

  • Super Contributor
  • ***
  • Posts: 28136
  • Country: nz
  • Taupaki Technologies Ltd. Siglent Distributor NZ.
    • Taupaki Technologies Ltd.
Re: The BIG EEVblog Server Fire
« Reply #171 on: April 13, 2021, 06:20:50 pm »
What on earth makes you think that's a wooden roof?

I see a (modified) bitumen roof with wood fascia. A "fire suppression system" does nothing if that lights up.
It's common problem in building fires here, the roof and trusses (which are above all the sprinklers) can start burning and the fire crawls along the roof. Firefighters have to dump water despite the interior not being on fire at all.
I would think buildings in the UK have similar issues, with flat tar roofs?
Wood roofs are typical on that sort of building, I don't recall the name they call buildings of that style but the walls are cast in place reinforced concrete with wooden beams running across, supporting a plywood roof that is covered in a weatherproof outer layer. All three of the light industrial complexes my friends' shop has been been built in that way.
Often done as tilt-up construction.  Floor slab is poured.  After it hardens, forms for wall sections are put on the floor, and the wall section concrete is poured.  After wall sections are hard, they are hoisted into place with a crane.  Has been most popular in the U.S., Australia and NZ.   Wikipedia describes it pretty well.

Mike in California
Would be nice if it were so but sadly not as few slab layers are capable of laying a perfectly flat slab so here we have had professional tilt slab companies for some decades. Finished and cured tilt slabs prefitted with lifting eyes are trucked on edge from the tilt slab plants to construction sites where they are lifted from trucks and placed straight into position were they are strapped to adjoining slabs and temporarily held vertical with adjustable building props until roofing trusses and 2nd story steelwork can be fitted at which time they are considered a safe structure.
Mini tornado flattened some stood and braced slabs a couple years back killing some workers.
Avid Rabid Hobbyist
Siglent Youtube channel: https://www.youtube.com/@SiglentVideo/videos
 

Offline Nusa

  • Super Contributor
  • ***
  • Posts: 2416
  • Country: us
Re: The BIG EEVblog Server Fire
« Reply #172 on: April 13, 2021, 06:24:25 pm »
You're talking about modern construction techniques. The building in question was built during WWII using brick and steel construction. The whole complex (Defense Depot Ogden) remained military until about 25 years ago, and is now known as Business Depot Ogden. Some of the warehouses are still owned and operated by the military, but the complex as a whole is now commercial.

There's an interior picture of one of the old warehouses here, clearly showing the steel construction inside the shell:
https://www.boyerbdo.com/history/
 

Offline floobydust

  • Super Contributor
  • ***
  • Posts: 6923
  • Country: ca
Re: The BIG EEVblog Server Fire
« Reply #173 on: April 13, 2021, 08:19:50 pm »
Yes, wood roof on steel trusses mostly, there are huge timbers in old pics but a newer pic (brewery) shows them replaced with new concrete and steel. I imagine the buildings are all renovated. Back then lumber was plentiful.
It's very difficult with old buildings to meet modern NFPA fire safety codes. They usually don't have enough exits or an updated fire-suppression system. Sprinklers being under the roof are useless and in the pic pipes extend outside, under the overhang - those pipes freeze here in winter.

The military history of the area is incredible 5,000 POW's working there in WWII. B-17's being assembled.
"The soil and groundwater beneath Business Depot Ogden have been polluted with trichloroethylene, vinyl chloride, arsenic, lead, cadmium, mercury, barium and pesticides. The toxic mix of chemicals is the result of decades of cavalier disposal and burning of military-grade trash at the 1,100-acre former military facility, according to EPA records."
 

Offline gnifTopic starter

  • Administrator
  • *****
  • Posts: 1672
  • Country: au
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf