Author Topic: How long should you run Server Hardware (just a friendly reminder)?  (Read 2171 times)

0 Members and 1 Guest are viewing this topic.

Offline peteb2Topic starter

  • Regular Contributor
  • *
  • Posts: 242
  • Country: nz
Sure, the manufacturer will give you an idea because, well, they want to sell you new equipment. Sometimes though there isn't a budget and sometimes things are forgotten and assumptions made they go for 'ever' or if pulled out of service for a few months to be reinstalled on another project they should be fine.

Today i witnessed 1st hand a reminder on just what's at risk if you leave a MAIN "mega - important" Server (and similar hardware Fail-over tray running without sticking to what the manufacturer would have stipulated in their specs...

This is an standard 'workhorse IBM' e-Server tray... What concerns me is for such important gear like this that is being stretched on use-time that some best-practice needs to be adopted, even as minimal as a visual inspection i guess.

Love to hear of others' experiences. (It was pure luck this failure was discovered and a massive issue averted)...

Have a look at the fantastic bend condition of the raid-controller strip module all caused by the bulging LiOn battery ...  ;)
 

Offline helius

  • Super Contributor
  • ***
  • Posts: 3642
  • Country: us
Re: How long should you run Server Hardware (just a friendly reminder)?
« Reply #1 on: March 19, 2018, 09:52:59 am »
It seems rather that a depressingly low-budget shortcut was taken by the engineers (why am I not surprised that you said IBM?) to use a pouch cell rather than an established product like an Eagle-Pitcher module.
 

Offline Towger

  • Super Contributor
  • ***
  • Posts: 1645
  • Country: ie
Re: How long should you run Server Hardware (just a friendly reminder)?
« Reply #2 on: March 19, 2018, 09:57:02 am »
Depends. 
Buy servers with full all singing 3 year warranty.  Replace once warranty has 6 months left.

At the other end of the scale,
I switched of a 12 old server a few months back.  Still going strong, but caps on the PSU getting weak. 
 

Offline capt bullshot

  • Super Contributor
  • ***
  • Posts: 3033
  • Country: de
    • Mostly useless stuff, but nice to have: wunderkis.de
Re: How long should you run Server Hardware (just a friendly reminder)?
« Reply #3 on: March 19, 2018, 10:18:44 am »
Have a look at the fantastic bend condition of the raid-controller strip module all caused by the bulging LiOn battery ...  ;)

That's quite the worst placement of a battery I've seen ...
This battery is doomed to fail early by placing it in the middle of heat generating electronics.

BTW this design fail is commonly seen with low end UPS units, they place heat sensitive lead acid batteries near to the heat sources also.

Safety devices hinder evolution
 

Offline Ian.M

  • Super Contributor
  • ***
  • Posts: 12860
Re: How long should you run Server Hardware (just a friendly reminder)?
« Reply #4 on: March 19, 2018, 10:29:14 am »
+1

W.T.F. were the designers smoking?

It would have been better if they'd put it on the other side of the module.
With a well designed mounting frame and connector for the battery, it could even have been designed to self-disconnect if it bulged excessively.
 

Offline trys

  • Regular Contributor
  • *
  • Posts: 170
  • Country: gb
  • I started with the AC128
    • Trystan's Workbench
Re: How long should you run Server Hardware (just a friendly reminder)?
« Reply #5 on: March 19, 2018, 10:30:36 am »
For one of our servers we bought another onsite warranty after the three year onsite warranty ran out. Then that warranty ran out, and bought a spare server (used) with the same spec and same components so we could replace parts as they fail. In the life of that particular server (Dell R610) the following items failed:

2x HDD (SAS, 15k)
RiLO Card (the remote lights out card), this brought the whole server down, it would not boot
 

Offline xani

  • Frequent Contributor
  • **
  • Posts: 400
Re: How long should you run Server Hardware (just a friendly reminder)?
« Reply #6 on: March 19, 2018, 12:57:09 pm »
From my experience RAID controllers generally complain when their battery is at bad state.

So it is not "was run too long", but "nobody bothered to fucking monitor the damn thing".

Servers, especially newer ones, generally have pretty good error propagation and something like IPMI log will contain everything from disk thru RAM to failing battery errors

As for IBM let's just say that you can clearly see evolution when looking at newer models :D.
 
The following users thanked this post: SeanB

Offline Mr. Scram

  • Super Contributor
  • ***
  • Posts: 9810
  • Country: 00
  • Display aficionado
Re: How long should you run Server Hardware (just a friendly reminder)?
« Reply #7 on: March 19, 2018, 01:04:12 pm »
Your server comes with a little burrito tray? No fair, I want one too.
 

Offline krho

  • Regular Contributor
  • *
  • Posts: 223
  • Country: si
Re: How long should you run Server Hardware (just a friendly reminder)?
« Reply #8 on: March 19, 2018, 06:43:35 pm »
If the data was not damaged then you are fine. The HP RAID controller has overwritten approximately 10 % of the array when the battery on the ram developed similar bulge.
AFAIR this happened around the 3 year term.
 

Offline CJay

  • Super Contributor
  • ***
  • Posts: 4136
  • Country: gb
Re: How long should you run Server Hardware (just a friendly reminder)?
« Reply #9 on: March 19, 2018, 07:31:53 pm »
From my experience RAID controllers generally complain when their battery is at bad state.

So it is not "was run too long", but "nobody bothered to fucking monitor the damn thing".

Servers, especially newer ones, generally have pretty good error propagation and something like IPMI log will contain everything from disk thru RAM to failing battery errors

As for IBM let's just say that you can clearly see evolution when looking at newer models :D.

What he said above, RAID controllers monitor their cache batteries, that server wasn't looked after or monitored properly by the owner.
 

Offline ejeffrey

  • Super Contributor
  • ***
  • Posts: 3719
  • Country: us
Re: How long should you run Server Hardware (just a friendly reminder)?
« Reply #10 on: March 20, 2018, 04:34:46 am »
That is obviously a defective design/part.

Given that, I am not convinced there is any sense to a "replace after X years" for mass market computer hardware like this.  A battery fault like that does take some time to develop but with a faulty design it could happen as well after 6 months as 5 years.  There are a few other examples of this, most notably "capacitor plague", but most of those capacitors have likely failed out already.

Beyond built-in defects like that, I don't think there is a sufficient reason to think that a 3 or 5 year old computer is enough more likely to fail than a 1 year old computer to make it worth replacing.  Remember that replacing costs significant capital cost, labor, risk of downtime, and more exposure to the leading edge of the 'bathtub curve' of early failure, or even buying into a new design flaw like this or a bad batch of electrolytic capacitors.  Computers should be replaced when they fail or when they are obsolete.  We have basically passed the point where computers (at least 1-2 socket servers) become obsolete.  Newer computers are faster/$ and more energy efficient than the previous generation, but not enough that it makes sense to throw the old ones out rather than keep them running.

There is a completely separate issue that your operations has to be resilient against failure of a computer.  But replacing the hardware doesn't really get that for you.  At best, you would get the same effect by deliberately formating the drive every year, and practicing your restore from scratch.  No need to buy new hardware for that.
 

Offline Daixiwen

  • Frequent Contributor
  • **
  • Posts: 352
  • Country: no
Re: How long should you run Server Hardware (just a friendly reminder)?
« Reply #11 on: March 20, 2018, 07:23:36 am »
Have a look at the fantastic bend condition of the raid-controller strip module all caused by the bulging LiOn battery ...  ;)

That's quite the worst placement of a battery I've seen ...
No this is genius.... by bulging, the battery makes the memory module pop out of its socket, making the controller fail and the technician open the server to see what's wrong. That's a poor man's integrated built in test system.
 

Offline xani

  • Frequent Contributor
  • **
  • Posts: 400
Re: How long should you run Server Hardware (just a friendly reminder)?
« Reply #12 on: March 20, 2018, 12:31:29 pm »
That is obviously a defective design/part.

Given that, I am not convinced there is any sense to a "replace after X years" for mass market computer hardware like this.  A battery fault like that does take some time to develop but with a faulty design it could happen as well after 6 months as 5 years.  There are a few other examples of this, most notably "capacitor plague", but most of those capacitors have likely failed out already.

Beyond built-in defects like that, I don't think there is a sufficient reason to think that a 3 or 5 year old computer is enough more likely to fail than a 1 year old computer to make it worth replacing.  Remember that replacing costs significant capital cost, labor, risk of downtime, and more exposure to the leading edge of the 'bathtub curve' of early failure, or even buying into a new design flaw like this or a bad batch of electrolytic capacitors.  Computers should be replaced when they fail or when they are obsolete.  We have basically passed the point where computers (at least 1-2 socket servers) become obsolete.  Newer computers are faster/$ and more energy efficient than the previous generation, but not enough that it makes sense to throw the old ones out rather than keep them running.

That is  only if you look at very small server count. Yes, replacing 1U box with another 1U box is rarely worth it in itself, but replacing 3-5 1U boxes with one is worth it, both from power and space perspective.

Especially if your 1U box isn't even getting close to CPU capacity, then instead of having server for a job you just put a bunch of VMs on beefy machine

And, depending on hosting space and power prices, it might very well be more expensive to keep some power hungry box running than to replace it for brand new one. At my current job if we just bought latest and greatest we could shrink our 6 racks to one, maybe two (but it is not my decision to do it and we have enough work already so...).

That is the reason why used servers are so much cheaper than new ones, for companies that have a ton of them it is just cheaper to replace them every 2-3 CPU generations and get savings on power and space.

 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf