Author Topic: Server down 13th August  (Read 6849 times)

0 Members and 1 Guest are viewing this topic.

Offline EEVblogTopic starter

  • Administrator
  • *****
  • Posts: 37626
  • Country: au
    • EEVblog
Server down 13th August
« on: August 13, 2021, 07:24:37 am »
The server was bought down today because the 2nd mirror servere FINALLY came back online at GorillaServers after the fire in April, yes it took that long.
This needs lots fo tweaks to get working again, gnif is working furiously on it.
 
The following users thanked this post: SeanB, bd139

Offline EEVblogTopic starter

  • Administrator
  • *****
  • Posts: 37626
  • Country: au
    • EEVblog
Re: Server down 13th August
« Reply #1 on: August 13, 2021, 07:38:02 am »
Work complete, shold be fully operation again thanks to gnif  :-+
 
The following users thanked this post: SeanB, bingo600, gnif, Dr. Frank, thm_w, Andy Watson, Brumby, Ian.M, TheSteve, RoGeorge, 2N3055, bd139

Offline EEVblogTopic starter

  • Administrator
  • *****
  • Posts: 37626
  • Country: au
    • EEVblog
Re: Server down 13th August
« Reply #2 on: August 14, 2021, 03:59:22 am »
You have to  :-DD
The new server has failed within 24 hours!
Power failed or was shut off it seems!
Forum still works because it's a redundant system, but may be a bit slower.
 
The following users thanked this post: bd139

Offline EEVblogTopic starter

  • Administrator
  • *****
  • Posts: 37626
  • Country: au
    • EEVblog
Re: Server down 13th August
« Reply #3 on: August 14, 2021, 07:24:37 am »
Update:
Quote
Our tech replaced the the memory, and reapplied the thermal paste on the cpu just to be safe. We ran our tests for a couple of hours and found no issues.

Should I send them a dumpster PC? It's probably more reliable  :-DD

They are damn lucky they are cheap, but now we know why...
« Last Edit: August 14, 2021, 07:26:10 am by EEVblog »
 
The following users thanked this post: SeanB, bd139

Online Ian.M

  • Super Contributor
  • ***
  • Posts: 12754
Re: Server down 13th August
« Reply #4 on: August 14, 2021, 07:51:35 am »
WTF??? If the memory was compromised due to the April 4th fire and water damage event, the whole motherboard will also be compromised.  It smells like they cant diagnose the failure so are 'shotgunning' it with unnecessary parts replacement.  Be glad its their box you are renting!

OTOH, refitting heatsinks with fresh thermal paste is the proper thing to do if there is any possibility that a system has been handled roughly enough to disturb them.
« Last Edit: August 14, 2021, 07:54:11 am by Ian.M »
 

Offline EEVblogTopic starter

  • Administrator
  • *****
  • Posts: 37626
  • Country: au
    • EEVblog
Re: Server down 13th August
« Reply #5 on: August 16, 2021, 10:38:41 am »
It's failed AGAIN!  :palm:
 

Offline EEVblogTopic starter

  • Administrator
  • *****
  • Posts: 37626
  • Country: au
    • EEVblog
Re: Server down 13th August
« Reply #6 on: August 16, 2021, 10:44:26 am »
Reported error
 

Offline rsjsouza

  • Super Contributor
  • ***
  • Posts: 5974
  • Country: us
  • Eternally curious
    • Vbe - vídeo blog eletrônico
Re: Server down 13th August
« Reply #7 on: August 16, 2021, 10:50:04 am »
These situations always remind me of this classic:
Vbe - vídeo blog eletrônico http://videos.vbeletronico.com

Oh, the "whys" of the datasheets... The information is there not to be an axiomatic truth, but instead each speck of data must be slowly inhaled while carefully performing a deep search inside oneself to find the true metaphysical sense...
 
The following users thanked this post: PlainName

Offline thm_w

  • Super Contributor
  • ***
  • Posts: 6231
  • Country: ca
  • Non-expert
Re: Server down 13th August
« Reply #8 on: August 16, 2021, 08:44:15 pm »
WTF??? If the memory was compromised due to the April 4th fire and water damage event, the whole motherboard will also be compromised.  It smells like they cant diagnose the failure so are 'shotgunning' it with unnecessary parts replacement.  Be glad its their box you are renting!

OTOH, refitting heatsinks with fresh thermal paste is the proper thing to do if there is any possibility that a system has been handled roughly enough to disturb them.

Isn't the whole point of server gear you just swap out everything, unless you are 100% certain of the problem?
Then test it outside on your own time if you really want to reuse it. Possibly for more than a few hours, I've had memtest issues show up at the 3hr+ mark.
Profile -> Modify profile -> Look and Layout ->  Don't show users' signatures
 

Online Ian.M

  • Super Contributor
  • ***
  • Posts: 12754
Re: Server down 13th August
« Reply #9 on: August 16, 2021, 09:07:07 pm »
Isn't the whole point of server gear you just swap out everything, unless you are 100% certain of the problem?
Then test it outside on your own time if you really want to reuse it. Possibly for more than a few hours, I've had memtest issues show up at the 3hr+ mark.
As you point out, full replacement would have been the best option for everything, but we know GorillaServers are 'bargain-basement', so 'swap out everything' probably isn't in their 'game-plan' unless a large customer is holding their nuts to the fire!

My point was that any mishap that could damage previously functioning memory would be equally likely to damage the motherboard.  OTOH if they were total muppets and removed all the RAM to clean it in bulk without appropriate ESD precautions they *could* be seeing a higher than normal incidence of (self-inflicted) RAM failures.
 
The following users thanked this post: thm_w


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf