Author Topic: Wesbite & Forum Outage 10th Dec 2016  (Read 7829 times)

0 Members and 1 Guest are viewing this topic.

Offline EEVblog

  • Administrator
  • *****
  • Posts: 25884
  • Country: au
    • EEVblog
Wesbite & Forum Outage 10th Dec 2016
« on: December 10, 2016, 08:31:22 pm »
Down for over 12 hours.
It was the fault of Hostgator:
https://forums.hostgator.com/12-09-16-15-00-multiple-t345474.html

Cue the people who say "switch to a real host" in 3, 2, 1...
 
The following users thanked this post: jonovid

Offline ez24

  • Super Contributor
  • ***
  • Posts: 3052
  • Country: us
  • L.D.A.
Re: Wesbite & Forum Outage 10th Dec 2016
« Reply #1 on: December 10, 2016, 08:38:04 pm »
It was tough going
YouTube and Website Electronic Resources ------>  http://www.eevblog.com/forum/other-blog-specific/a/msg1341166/#msg1341166
 

Offline NottheDan

  • Frequent Contributor
  • **
  • Posts: 271
  • Country: gb
Re: Wesbite & Forum Outage 10th Dec 2016
« Reply #2 on: December 10, 2016, 08:38:26 pm »
Switch to a real gator, mate  :-DD


Just getting it out of the way.
 

Offline wilfred

  • Super Contributor
  • ***
  • Posts: 4976
  • Country: au
Re: Wesbite & Forum Outage 10th Dec 2016
« Reply #3 on: December 10, 2016, 08:40:42 pm »
Why don't you switch to a real host?

Quote
We have identified a packet filtering problem in our core routing layer. We have worked closely with our vendor to develop a global fix and are in the process of applying it. While there will be clean up, we anticipate making quick progress to recover everyone and expect services to be restored soon.

What is that? How do they suddenly need to get a vendor to develop a fix for a problem they didn't have before? Did they ram in some network software upgrade and not have a way to revert to the previous software level?

A 12 hour outage that seems self inflicted is pretty poor.
 

Online TheSteve

  • Supporter
  • ****
  • Posts: 2410
  • Country: ca
  • GHz or bust
Re: Wesbite & Forum Outage 10th Dec 2016
« Reply #4 on: December 10, 2016, 08:47:19 pm »
I highly doubt they have any intention of admitting what really happened.
VE7FM
 

Offline Muttley Snickers

  • Supporter
  • ****
  • Posts: 1694
  • Country: au
Re: Wesbite & Forum Outage 10th Dec 2016
« Reply #5 on: December 10, 2016, 09:21:07 pm »
I learnt how to change my nickname on the EEVblog IRC site, and then I learnt that I could join the IRC channel from my smart TV, this wouldn't have happened if the forum was up and running so not all was wasted really.

Thats all I have to offer up in the way of positives.    ::)
One smart cookie, better make that two for good measure.
 

Offline SeanB

  • Super Contributor
  • ***
  • Posts: 14705
  • Country: za
Re: Wesbite & Forum Outage 10th Dec 2016
« Reply #6 on: December 10, 2016, 09:28:01 pm »
Not a problem, just went and got through a little of the YT backlog, and went to bed.
 

Offline jonovid

  • Frequent Contributor
  • **
  • Posts: 684
  • Country: au
    • JONOVID
Re: Wesbite & Forum Outage 10th Dec 2016
« Reply #7 on: December 10, 2016, 10:03:31 pm »
Quote
Down for over 12 hours.
It was the fault of Hostgator:
Dave maybe We're gonna need a bigger server  ;D
Hobby of evil genius      basic knowledge of electronics
 

Offline timb

  • Super Contributor
  • ***
  • Posts: 2528
  • Country: us
  • Pretentiously Posting Polysyllabic Prose
    • timb.us
Re: Wesbite & Forum Outage 10th Dec 2016
« Reply #8 on: December 10, 2016, 11:19:46 pm »
Jesus, I don't know how HostGator is still is business... They were a crappy, subpar host 10 years ago when I was in the hosting industry, I guess much doesn't change...

Dave, if you ever decide to rent a dedicated VM on a cluster somewhere, I've got some good recommendations. It's a lot cheaper than you think! I'd also be happy to administrate it for you.
Any sufficiently advanced technology is indistinguishable from magic; e.g., Cheez Whiz, Hot Dogs and RF.
 

Offline FrankBuss

  • Supporter
  • ****
  • Posts: 1890
  • Country: de
    • Frank Buss
Re: Wesbite & Forum Outage 10th Dec 2016
« Reply #9 on: December 11, 2016, 12:45:26 am »
Well, why don't you switch to a real host? Did you at least negotiate unlimited traffic, as it is standard e.g. at http://1and1.com , where my server runs (and I had never a timeout of 12 hours, just a short reboot in the last year. I recommend this for my customers, too, they would lose quite some money with 12 hour outages) ? Same at http://www.ovh.com . Looks like many of the big providers offers this.

Long time ago I had a server with max traffic and then a notification and additional costs. It got hacked and was used as a warez server for a day until I fixed it. Cost this month was multiple times the normal cost.
So Long, and Thanks for All the Fish
 

Offline Jay_Diddy_B

  • Super Contributor
  • ***
  • Posts: 1625
  • Country: ca
Re: Wesbite & Forum Outage 10th Dec 2016
« Reply #10 on: December 11, 2016, 01:33:04 am »
Hi group,

When there is an outage like this, In my mind it just reinforces, what a great thing we have here.  :-+

I have to admit that I was getting withdrawal...

Regards,

Jay_Diddy_B
 
The following users thanked this post: SeanB, dr.diesel, SL4P

Offline Brumby

  • Supporter
  • ****
  • Posts: 6963
  • Country: au
Re: Wesbite & Forum Outage 10th Dec 2016
« Reply #11 on: December 11, 2016, 12:22:08 pm »
Aside from the separation anxiety, I noticed I knocked off a couple more items from my 'To Do' list than I expected.  Luckily SWMBO didn't.
 

Offline ez24

  • Super Contributor
  • ***
  • Posts: 3052
  • Country: us
  • L.D.A.
Re: Wesbite & Forum Outage 10th Dec 2016
« Reply #12 on: December 11, 2016, 12:45:12 pm »
Is this outage a record?
YouTube and Website Electronic Resources ------>  http://www.eevblog.com/forum/other-blog-specific/a/msg1341166/#msg1341166
 

Offline SeanB

  • Super Contributor
  • ***
  • Posts: 14705
  • Country: za
Re: Wesbite & Forum Outage 10th Dec 2016
« Reply #13 on: December 11, 2016, 04:53:06 pm »
Is this outage a record?

No, there were 40 years of Dave's life when the blog was not available. Don't know how he survived that.
 

Offline EEVblog

  • Administrator
  • *****
  • Posts: 25884
  • Country: au
    • EEVblog
Re: Wesbite & Forum Outage 10th Dec 2016
« Reply #14 on: December 15, 2016, 03:41:25 pm »
 

Offline djos

  • Supporter
  • ****
  • Posts: 778
  • Country: au
Re: Wesbite & Forum Outage 10th Dec 2016
« Reply #15 on: December 15, 2016, 03:49:08 pm »
Down for over 12 hours.
It was the fault of Hostgator:
https://forums.hostgator.com/12-09-16-15-00-multiple-t345474.html

Cue the people who say "switch to a real host" in 3, 2, 1...

Maybe if you switched to a high-end hosting provider like AWS.... oh wait...  :-DD
The impossible often has a kind of integrity which the merely improbable lacks.
 

Online tautech

  • Super Contributor
  • ***
  • Posts: 11740
  • Country: nz
    • Taupaki Technologies Ltd.
Re: Wesbite & Forum Outage 10th Dec 2016
« Reply #16 on: December 15, 2016, 03:55:44 pm »
Avid Rabid Hobbyist & NZ Siglent Distributor
 

Offline djos

  • Supporter
  • ****
  • Posts: 778
  • Country: au
Re: Wesbite & Forum Outage 10th Dec 2016
« Reply #17 on: December 15, 2016, 03:58:09 pm »
Condescending crap.

Mostly yes but I don't doubt they broke their network and a spanning tree loop (or propagation) compounded it - I've seen it happen in several Org's while a Major Incident Manager.
The impossible often has a kind of integrity which the merely improbable lacks.
 

Offline wilfred

  • Super Contributor
  • ***
  • Posts: 4976
  • Country: au
Re: Wesbite & Forum Outage 10th Dec 2016
« Reply #18 on: December 15, 2016, 04:07:23 pm »
I'm not an expert in networks but that sounds like baffling the customer with jargon.
I found this link which answered a few of my questions. http://www.enterprisenetworkingplanet.com/netsp/article.php/3580966/Networking-101-Understanding-Spanning-Tree.htm

"we are working diligently to proactively mitigate and prevent future outages" yeah NOW!

As I read that explanation it seems they made a faulty network topology change and introduced a loop in the network and it took 16 hours to find and eliminate it. And now they are going to introduce a more contemporary network topology configuration method that utilises the network infrastructure more efficiently and is subject to less likelyhood of total failure.

It still seems self inflicted and I'm surprised there isn't a way of simulating network topology changes to test for introduced loops. Or if a link fails there isn't an automated response to activate another link that is pre-configured as known to be the correct one that will not lead to a loop. Any link can fail and there has to be a defined alternate for it.

I'd be interested in the comments of someone with network experience.
 

Offline EEVblog

  • Administrator
  • *****
  • Posts: 25884
  • Country: au
    • EEVblog
Re: Wesbite & Forum Outage 10th Dec 2016
« Reply #19 on: December 15, 2016, 04:10:50 pm »
Down for over 12 hours.
It was the fault of Hostgator:
https://forums.hostgator.com/12-09-16-15-00-multiple-t345474.html
Cue the people who say "switch to a real host" in 3, 2, 1...
Maybe if you switched to a high-end hosting provider like AWS.... oh wait...  :-DD

Bingo.
Everyone thinks that their hosting providers shit doesn't stink.
 

Offline djos

  • Supporter
  • ****
  • Posts: 778
  • Country: au
Re: Wesbite & Forum Outage 10th Dec 2016
« Reply #20 on: December 15, 2016, 04:15:24 pm »
Maybe if you switched to a high-end hosting provider like AWS.... oh wait...  :-DD

Bingo.
Everyone thinks that their hosting providers shit doesn't stink.

The amusing thing about that was their systems total inability to deal with external power sags using UPS and start the GenSets up before the batteries drained - a total DC 101 Fail!
The impossible often has a kind of integrity which the merely improbable lacks.
 

Offline Muttley Snickers

  • Supporter
  • ****
  • Posts: 1694
  • Country: au
Re: Wesbite & Forum Outage 10th Dec 2016
« Reply #21 on: December 15, 2016, 04:20:27 pm »
I don't know enough about ISPs and hosting services to make any logical comment but I do know one thing, at least they had the decency to offer up an explanation and apology which is more than some other mobs do.

On the day in question the error message I kept getting showed that the EEVblog Cloudfair Host located in Hong Kong was offline, I should have saved a snapshot to verify but didn't, I'm still getting the odd brief outage for a few minutes here and there as recently as this morning.
One smart cookie, better make that two for good measure.
 

Offline wilfred

  • Super Contributor
  • ***
  • Posts: 4976
  • Country: au
Re: Wesbite & Forum Outage 10th Dec 2016
« Reply #22 on: December 15, 2016, 04:50:41 pm »
I do know one thing, at least they had the decency to offer up an explanation and apology which is more than some other mobs do.
It is also more than this mob are doing. You won't find the words "apologise" or "sorry" in that letter. For a letter not intended for network gurus it has so much waffle and jargon that for mere customers it is no explanation at all.

It is a grovelling admission that they f&*$ked up big time.

Like insurance companies and banks, ISPs shit stinks, a lot. The stink is the 16 hours to sort it out, not that it happened.

"Please allow me to take this opportunity to thank you for your business and provide my personal assurance that we are dedicated to meeting our commitment to you."
If that is true today, it wasn't a week ago. Because if it was true, the proposed changes in response to this debacle would have been done long ago. And in that event this statement would also have been true "we are working diligently to proactively mitigate and prevent future outages". ie. this one that just happened.

No, HG's shit stinks, I can smell it in Australia.
 

Online tautech

  • Super Contributor
  • ***
  • Posts: 11740
  • Country: nz
    • Taupaki Technologies Ltd.
Re: Wesbite & Forum Outage 10th Dec 2016
« Reply #23 on: December 15, 2016, 05:12:21 pm »
I don't know enough about ISPs and hosting services to make any logical comment but I do know one thing, at least they had the decency to offer up an explanation and apology which is more than some other mobs do.

On the day in question the error message I kept getting showed that the EEVblog Cloudfair Host located in Hong Kong was offline, I should have saved a snapshot to verify but didn't, I'm still getting the odd brief outage for a few minutes here and there as recently as this morning.
Hello, mine showed Sydney, what's going on here ?  :-//

Yep, I should have grabbed a screenshot too.
« Last Edit: December 15, 2016, 05:22:39 pm by tautech »
Avid Rabid Hobbyist & NZ Siglent Distributor
 

Online mtdoc

  • Super Contributor
  • ***
  • Posts: 3114
  • Country: us
Re: Wesbite & Forum Outage 10th Dec 2016
« Reply #24 on: December 15, 2016, 05:21:34 pm »
All I know is that of the 4 forums I regularly visit, this is the only one that has regular outages....
 

Offline FrankBuss

  • Supporter
  • ****
  • Posts: 1890
  • Country: de
    • Frank Buss
Re: Wesbite & Forum Outage 10th Dec 2016
« Reply #25 on: December 15, 2016, 07:24:57 pm »
Down for over 12 hours.
It was the fault of Hostgator:
https://forums.hostgator.com/12-09-16-15-00-multiple-t345474.html
Cue the people who say "switch to a real host" in 3, 2, 1...
Maybe if you switched to a high-end hosting provider like AWS.... oh wait...  :-DD

Bingo.
Everyone thinks that their hosting providers shit doesn't stink.
Well, that was a 2 hour outage at AWS because of a power outage. Hostgator was a 16 hour outage because of a configuration problem. Bigger datacenters like 1and1 have UPS, backup UPS and diesel generators for longer power outages.
So Long, and Thanks for All the Fish
 

Offline djos

  • Supporter
  • ****
  • Posts: 778
  • Country: au
Re: Wesbite & Forum Outage 10th Dec 2016
« Reply #26 on: December 15, 2016, 07:34:26 pm »

Well, that was a 2 hour outage at AWS because of a power outage. Hostgator was a 16 hour outage because of a configuration problem. Bigger datacenters like 1and1 have UPS, backup UPS and diesel generators for longer power outages.

The outage was caused by poor configuration of the DC BMS and UPS integration.

The BMS didn't know how to handle a grid voltage sag (aka brown out) and just ran the ups batteries flat because it didn't know to call up the generators.

I managed a tier 3 spec Colo DC up till a few years ago and this sort of Charlie Foxtrot just boggles the mind.
The impossible often has a kind of integrity which the merely improbable lacks.
 

Offline Ice-Tea

  • Frequent Contributor
  • **
  • Posts: 913
  • Country: be
    • Freelance Hardware Engineer
Re: Wesbite & Forum Outage 10th Dec 2016
« Reply #27 on: December 15, 2016, 07:55:00 pm »
Just wondering, can't get to the "show new replies" page for a few days now... I get this:

Code: [Select]
502 - Bad Gateway

We are sorry for the inconvenience that this error may be causing you. We are aware of the issue and are working to resolve it, please be patient.

There is no need to report this error.

Thank you for your patience,
Dave & gnif

Is that part of this issue or something else?
An engineer never has a problem. He just needs more time.

FS: TTi TSX1820P, TDS754A (w SPC fail), Agilent Infinium 54815A, 54825A, R&S CMU 200 (multiple units, various options) UPL, SMIQ03,Tektronix CSA8000B, HP 8594E, 8595E, Marconi 6201B (8GHz), IFR 2390A (22GHz), 2383 (4GHz)
 

Online MK14

  • Super Contributor
  • ***
  • Posts: 1786
  • Country: gb
Re: Wesbite & Forum Outage 10th Dec 2016
« Reply #28 on: December 15, 2016, 07:56:31 pm »
Just wondering, can't get to the "show new replies" page for a few days now... I get this:

Code: [Select]
502 - Bad Gateway

We are sorry for the inconvenience that this error may be causing you. We are aware of the issue and are working to resolve it, please be patient.

There is no need to report this error.

Thank you for your patience,
Dave & gnif

Is that part of this issue or something else?

I had the same problem and solved it (today), by clicking on the web browsers refresh button.

http://www.eevblog.com/forum/chat/show-new-reply-to-your-posts-causes-502-forum-error/msg1091765/?topicseen#msg1091765
« Last Edit: December 15, 2016, 07:58:03 pm by MK14 »
 

Online sleemanj

  • Super Contributor
  • ***
  • Posts: 2092
  • Country: nz
  • Professional tightwad.
    • The electronics hobby components I sell.
Re: Wesbite & Forum Outage 10th Dec 2016
« Reply #29 on: December 15, 2016, 08:04:39 pm »

Maybe if you switched to a high-end hosting provider like AWS.... oh wait...  :-DD

Yes, but with AWS you have a lot of power at your direct command and you can structure your systems to assume the sort of risk you want, from "availability zone in this region goes down, site is dead until it's fixed" to "this thing will keep running unless every Amazon region around the world has died".

But of course, you have to be able to do that setup, if it's not your day job, better to leave somebody else to do it for you, if nothing else that there's a lot of assholes and botnets out there who will stop at nothing to try and kill your site/server/network.  Some are smarter than others.

Not all of them are as kind as one I caught today who kindly announced it's UserAgent as "WebFuck V2.1 T0PHackTeam www.t0p.xyz" as it searched through some sites for exploits. 
~~~
EEVBlog Members - get yourself 10% discount off all my electronic components for sale just use the Buy Direct links and use Coupon Code "eevblog" during checkout.  Shipping from New Zealand, international orders welcome :-)
 

Offline djos

  • Supporter
  • ****
  • Posts: 778
  • Country: au
Re: Wesbite & Forum Outage 10th Dec 2016
« Reply #30 on: December 15, 2016, 08:07:13 pm »

Maybe if you switched to a high-end hosting provider like AWS.... oh wait...  :-DD

Yes, but with AWS you have a lot of power at your direct command and you can structure your systems to assume the sort of risk you want, from "availability zone in this region goes down, site is dead until it's fixed" to "this thing will keep running unless every Amazon region around the world has died".

But of course, you have to be able to do that setup, if it's not your day job, better to leave somebody else to do it for you, if nothing else that there's a lot of assholes and botnets out there who will stop at nothing to try and kill your site/server/network.  Some are smarter than others.

Not all of them are as kind as one I caught today who kindly announced it's UserAgent as "WebFuck V2.1 T0PHackTeam www.t0p.xyz" as it searched through some sites for exploits.

You clearly didn't click on the link, they lost the whole Sydney D.C. Due to a poorly configured BMS not handling a grid voltage sag.

Morale of the story, it can happen to anyone.
The impossible often has a kind of integrity which the merely improbable lacks.
 

Online sleemanj

  • Super Contributor
  • ***
  • Posts: 2092
  • Country: nz
  • Professional tightwad.
    • The electronics hobby components I sell.
Re: Wesbite & Forum Outage 10th Dec 2016
« Reply #31 on: December 15, 2016, 08:22:41 pm »
You clearly didn't click on the link, they lost the whole Sydney D.C. Due to a poorly configured BMS not handling a grid voltage sag.

My point was that the people who were using the Sydney DC could have used the AWS services at their disposal to prepare their system to be resilient to such failure of an entire region if they wanted to invest that time effort and expense.

~~~
EEVBlog Members - get yourself 10% discount off all my electronic components for sale just use the Buy Direct links and use Coupon Code "eevblog" during checkout.  Shipping from New Zealand, international orders welcome :-)
 

Offline djos

  • Supporter
  • ****
  • Posts: 778
  • Country: au
Re: Wesbite & Forum Outage 10th Dec 2016
« Reply #32 on: December 15, 2016, 09:05:49 pm »
You clearly didn't click on the link, they lost the whole Sydney D.C. Due to a poorly configured BMS not handling a grid voltage sag.

My point was that the people who were using the Sydney DC could have used the AWS services at their disposal to prepare their system to be resilient to such failure of an entire region if they wanted to invest that time effort and expense.

Geographic diversity is no guarantee of 99.9999% uptime, I've managed 2 salesforce outages in 6 months caused by their Japanese D.C. Suffering networking failures.
The impossible often has a kind of integrity which the merely improbable lacks.
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf