Servers -- at least those that are in use and serving a purpose -- are intended to run constantly, not to be shutdown every day.
Servers and appliances exist to provide a service their users need. Whether they are on or not, depends on the users' needs.
NAS box for thin clients? Yep, probably needs to run 24/7.
End of the week backup archive server? Makes sense to turn that power-hog only when needed; plus reduces the chance someone "accidentally" messes with the backups in the mean time.
Like I mentioned before, spinny-disk startup causes wear on the power supply electronics. (I do not mean specifically PC power supplies, but the entire supply from the wall socket down to the BLDC motor.) Thermal cycling is a significant part of it, because as I recall, Google found out they could run their HDDs constantly at 40°C/100°F ambient without any degradation in lifetime –– we're speaking statistically here, of course.
If you have 6 or more spinny drives in the same enclosure, it's time to worry about power-on sequencing, too, since the BLDC spin-up current draw is significant. (I'm still wondering why the darn things have to spin up immediately when power is applied. It'd be so much easier if they could be powered on, and then a separate "
now spin up" message would spin them up. BIOS/EFI/Firmware already needs to support the drive controller anyway, it'd really be just one more message in the flow.)
Current machines support various power saving levels. The two we need to consider are
idle and
suspend.
Idle is when the machine is not doing any significant work, but is ready to immediately respond to requests.
Suspend, or more specifically suspend to RAM, is a state where the processors and most of the rest of the machine is unpowered, but RAM is kept refreshed.
Linux kernel-based suspend is stable on Intel/AMD and ARM hardware. (There may be some specific cases where hardware bugs or undocumented features prohibit suspend, but they're really, really rare.)
On many ARM hardware, suspend consumes only a fraction of a watt of power. I already mentioned Odroids as examples. Nowadays there are even Intel/AMD server-class motherboards that support sub-watt suspend-to-RAM. As it is a motherboard feature –– namely, the power supplies must be under software control and power saving level change sequencing bullet-proof ––, each motherboard manufacturer decides for themselves how important a low-power suspend feature is. As it is not something many mass customers demand, the exact power level achievable using typical server-class hardware varies. On desktop-class machines, and especially mini-ITX and micro-ATX Intel/AMD motherboards, the suspend support gets much more focus, so low-power suspend tends to be more common.
At sub-watt suspend power levels, is there even a difference between "off" and "suspended"?
(Other than a suspended machine being accessible with a couple of second latency, and a machine turned off being inaccessible, unless it has external wakeup support, I mean.)
(Note: When the RAM-based image is put into storage, and the machine fully powered down, we're talking about suspend-to-disk or
hibernation. This is well supported on tablet and laptop boards, and Linux can support it, but it obviously requires proper storage: you don't want to hibernate to a SD-card, for example. Also, combined suspend+hibernate exists, in which case the initial suspended image is stored on disk, but power-off is delayed in case the machine is needed within some time window. Waking up from hibernation is "slow", because it involves a full hardware boot-up; only, instead of loading an OS, the hibernated image is loaded. A major part of the wake-up delay is hardware (BIOS/EFI/Firmware) boot-up, with the loading of the full system image from storage also important. The activation of the hibernated image is fast, basically the same as from suspension. Thus, hibernation wake-up time is dictated by hardware and firmware, not the OS.)
Because only the RAM (and some low-power subsystems) are powered during suspend (suspend-to-RAM; in hibernation, the entire machine is powered off), some sort of a notification is needed for when the system needs to come back up to idle/active power levels. In PC-class Intel/AMD hardware, power supplies and motherboards keep one of the +5V lines powered even during suspend, so that USB HID events (detected by the motherboard, involving no CPU activity!) will wake up the machine. Similarly, many network interface (cards and built-in ports) use that line to power their receive sides, so that when a suitable Wake-On-LAN packet is received, the NIC tells the machine to wake up. (Again, WOL involves hardware only.)
WOL support for ARM hardware varies, but it present on for example the Odroids I already mentioned. (And I mentioned them only because I have one, and know of them; I am NOT implying they're the best, or even the only ones to do that. They're examples and suggestions to look at, nothing more.)
In particular, basically all ARM-based NAS boards, and all ARM-based router/switch boards with suspend capability (low-power modes where RAM is kept refreshed but CPU and some other subsystems are not), have had WOL support. Some even expose a wakeup pin (although I suspect it is more common to have it as an undocumented pad or test point; but that's just unfounded suspicion, not knowledge). Whether USB is powered during suspend or not varies, mostly whether the board is able to power off the USB ports under software control at all. (When it is supported, it's also supported by the Linux kernel; take a look at
/sys/bus/usb/devices/usbN/power/ pseudofiles. This file-like interface (under
/sys and
/proc) is how the Linux kernel exports things. They are "pseudo" in the sense that they do not actually exist at all, not even in RAM: the kernel only generates the structures when there is some process actually examining things.)
So, what does all that above waffle mean when one considers a NAS box with spinny-rust HDD drives?
- You want your OS and related files on Flash (M.2 SSD).
Because software will do a lot of file accesses whenever not suspended, having an SSD for the OS minimizes the impact on the HDDs.
There are lots of tunables for such accesses in Linux and BSD –– even the act of when and how often file access timestamps are modified when the files are read is configurable. You don't want to have to mess with those on top of everything else to get the NAS box to perform best to your needs, and putting the OS and logs etc. on a separate SSD drive gives you that option. - If you have a copy of the OS SSD you keep as a backup, updated say once a week, you can recover from Doh! moments by swapping the two.
It also means the HDDs and the NAS system itself are decoupled, letting you transfer HDDs in-and-out and between NAS boxes, for example when upgrading hardware, with minimal effort. If the OS is on the same HDDs as your data, when anything goes wrong, pain will ensue. - You want suspend (suspend-to-RAM) and Wake-on-LAN support, with a sub-1W suspend power use.
This way, you do not need to power down the box unless you do not want anyone to be able to access it: at 1W, the power draw is just 8.8 kWh/year.
My suggestion is to only power it down when you are away for several days, and do not want it accessible at all. - Suspend works perfectly with HDD spin-up/down/parking.
HDDs will automatically spin up when read from or written to, and most can be configured to automatically spin down when not accessed for a specified time.
If your OS and related files are on an SSD, HDDs will only get spun up when information on them is actually accessed. - On plain HDDs and software RAID setups, you'll want to run the smartd daemon (Linux, BSD) to periodically read each entire drive when otherwise idle, so that the drive hardware itself can detect deteriorating data and relocate failing blocks.
Because the rest of the system doesn't care about the data itself, only whether the drive reports success (with the data), such scanning is integrated into most RAID controllers, and you don't need to run such a daemon; you just run some kind of host RAID controller management daemon instead.
Now, to limit the wear on the HDDs when they're expected to be spun down for the majority of the time, you do need to configure smartd smartly: you want it to scan one drive at a time (to limit "idle" power use), as continuously as possible (when the machine is otherwise idle), with each drive fully scanned in a given period: I prefer about a month, but it varies depending on drives and people. What you don't want, is having to spin-up a disk just for smartd scanning.
If you run on mains electricity, you may prefer to set smartd to run only at night. If you run on solar, you may prefer to set smartd to run only during the day, when there is plenty of energy available. - The above is not controlled by any script or GUI, but by implicitly, timing operations sensibly.
HDDs do not know or care whether a given access is because an user wants to open a file, or because smartd decides it's time to scan the drive.
HDDs spin up when needed, and spin down when their internal controller decides they should, or (for ATA/SATA) when OS or userspace sends a spin-down-now command to the drive. On Linux and BSD, you can typically use hdparm to send such commands, or configure drive idle spin-up/spin-down parameters. Most HDDs also have internal temperature sensors, which you can read with hddtemp. So, to make things happen like I described above, you need to consider the entire system and configure each subsystem in a suitable fashion. (I do not know if e.g. TrueNAS core, a Linux distribution dedicated for NAS boxes, makes such configuration any easier.) - In Linux and Android, the kernel has one or more power state governor, CPU frequency scaling and power state management subsystem, that can be configured from the userspace. Not all hardware supports all possible governors, and not all ready-made kernels have more than one governor compiled, so to be able to select the one you want and best matches your use case, you may have to experiment and even recompile your own kernel (for testing; I do recommend using distro kernels for appliances if possible, because those get proper maintenance).
- Sensible "idle" power draw is nice, but when suspend and HDD spin-up/down are configured well, it only mixes with the active power draw in some duty ratio (depending on the behaviour of the above-mentioned governor), and is much less important than low suspended power draw.
Essentially, the NAS box will only be idle when active bursts occur within (idle time limit before suspend). Do not forget that you can configure a different spin-down time limit for the HDDs than the suspend limit: it is perfectly okay to have the NAS box that wakes up from suspend within a couple of seconds, to suspend rather aggressively (say, after 10 minutes of idle), but at the same time have the HDDs spin down only after 30+ minutes of no accesses.
I like that, because it aggressively reduces power use and running temperatures, but is very gentle on stressing the HDDs by avoiding unnecessary spin-ups/downs, and –– as far as I understand –– should also maximise the system lifetime, stressing it minimally, overall.
Okay, but we were talking about
servers, weren't we?
The same argument applies to servers. If you have a motherboard that supports <1W suspend-to-RAM (as measured at the wall socket), and wakes up from suspend within a second or two –– easily achieved in Linux using Intel/AMD/ARM hardware, when the OS is on an SSD; with OS on a HDD, some accesses will always occur during the wakeup process and thus wakeup delayed by the HDD spin-up time ––, then the same energy savings apply, and hibernation or fully powering down the server is no longer necessary.
If you have a server motherboard that does not have low-power suspend –– typically the case with legacy hardware! ––, then hibernation or powering off the darn thing when not needed is definitely a good option.
If you have such a server, one trick that few people realize is easily available, is to use a low-power ARM SBC with Ethernet connected to the same switch and LAN (and VLAN) as the server, just to receive wakeup-on-LAN and other commands to control server wakeup-from-hibernation. Power-down is not a problem, as even legacy server motherboards have software power-down support, and it is required anyway for proper hibernation; but you might want to add a normally closed (openable) contactor to cut off power for remote recovery from lock-ups. There is a soft power-on button on all server boards, which you can connect in parallel with an optocoupler (might need a transistor for signal inversion, as the pin is normally not connected, and connects to ground or voltage when the button is pushed) to an SBC output pin, so that the SBC can wake up the server from hibernation safely. A current sensor on a suitable server power bus would tell the SBC when the server is fully shut down (it's difficult to determine reliably otherwise), plus you can often sprinkle some temperature sensors read by the SBC to determine issues; the combination of both plus some watchdog software would allow even lockup detection automatically.
So, as someone who has practical experience in HPC and servers, both software and hardware, I do fundamentally object to the idea that a server or appliance needs to be running 24/7. To me, they need to be running when I need them; and when I don't need them, I want them to not waste power (with my utterly arbitrarily set limit at around 1 watt). There are many ways of making that happen, including "hacks" like adding a low-power cheap ARM Linux SBC as your power controller for hardware that doesn't support power save or suspend, so
use them when needed. Do not just adopt silly rules of thumb like "a server is on 24/7" without carefully considering the implications.