General > General Technical Chat

The BIG EEVblog Server Fire

<< < (32/36) > >>

gnif:

--- Quote from: peter-h on April 09, 2021, 07:50:58 am ---I run a few sites on virtual servers, with various backup policies (which I won't write about openly for obvious reasons) and if the virtual server company blew up and vanished for ever, I could start up a backup server which is a media PC running on an FTTP (80/30mbps) ADSL line :) That would actually be fast enough for EEVBLOG, on a bad day.

--- End quote ---

And how would you know how much bandwidth EEVBlog uses each day? Outgoing traffic averages 90mbps peaking out at 200mbps during high activity periods such as giveaways, etc.


--- Quote ---It is practically impossible to lose everything, in this setup, and it is very cheap.
--- End quote ---

What if someone deletes all the files on your server and rsync does it's dutiful job and deletes them on your home machine? Or worse, some files are corrupted and you dont detect this for a few weeks? rsync is not a recommended enterprise grade backup solution, I suggest you look into BareOS (Free), or R1Soft (commercial).

EEVBlog is backed up daily to two remote locations with a 6 month data retention policy, restoration to a bare metal server can be done in an hour or two (depending on network speed) and server configuration is performed via Puppet, this takes mere seconds. Total production ready stand-up time form bare metal is restore time + 10-20 seconds. With a warm backup (which we will be looking into) downtime would be nearly zero in the event of a failure like this in the future.

One has to weigh up the time vs cost in standing up a new server when things like this happen. Sure, Dave could have paid for a server elsewhere (in fact, we had several offers of temp servers), however this makes things more complex when it comes to decommissioning these servers when they are no longer needed. Ie, sync temp to primary servers, change over DNS records and while waiting for DNS records to propagate proxy traffic to the primary servers from the temp server. At the end of the day, it's up to the site owner to decide on the best course of action for their business, even if there are technical solutions that could be implemented here and now.

When you start hosting sites the size of EEVBlog you will quickly learn that you can't just cowboy things, because that 0.5s of downtime when you decided it wont hurt to just restart the HTTP service to make a config tweak, will impact people.

SL4P:
Simple question.
Why was there water damage in the datacenter?  Surely the backup power was in an adjacent building or basement ?

gnif:
Simple answer, read through this thread. GS are still recovering and are yet to release details.

Monkeh:

--- Quote from: floobydust on April 12, 2021, 02:15:51 am ---I looked at google street view and only one exhaust pipe for a generator, near the electrical room.
The facility appears to be in some old warehouse (military?) district with brick exterior walls and a wooden roof? If true that's a problem.

Had to laugh, not a solar panel in sight.

--- End quote ---

What on earth makes you think that's a wooden roof?

NiHaoMike:

--- Quote from: duckduck on April 12, 2021, 06:41:00 pm ---That's a great idea. One issue I can see for a hosting company is that they allow their customers to manage/reinstall to OS and apps. It would be difficult to enforce the installation of power-management software. It would be great if servers had a (let's say) 5 volt input, and when it dropped to below 1 volt, the BIOS would throttle the CPU down. Then you "just" run a 5 volt line run off of non-UPS, non-generator mains to each server and you're golden.

--- End quote ---
I have hacked some PCs to do just that by using a small MOSFET to pull the PROCHOT line to ground.

--- Quote from: bd139 on April 12, 2021, 07:10:03 pm ---That probably wouldn't work. If you're running near your CPU provision, you can enter a thing called "load hysteresis" which may be irrecoverable. This is where your load average goes above the total capacity and the CPUs can never catch up with the workload.  It requires adding much much more capacity than you had to start with before you can being the demand you had originally. Either that or breaking a huge chunk of your incoming load to recover.

--- End quote ---
That doesn't sound like something that should happen with a robustly designed service, wouldn't that mean a hacker who wants to take it down can just DDoS it for a short time and let queue overflow continue it for much longer than the initial attack?

Navigation

[0] Message Index

[#] Next page

[*] Previous page

There was an error while thanking
Thanking...
Go to full version
Powered by SMFPacks Advanced Attachments Uploader Mod