Author Topic: what's behind the infamous Seagate BSY bug?  (Read 5502 times)

0 Members and 1 Guest are viewing this topic.

Offline DiTBhoTopic starter

  • Super Contributor
  • ***
  • Posts: 4954
  • Country: gb
what's behind the infamous Seagate BSY bug?
« on: April 30, 2022, 11:52:16 am »
Segate was a serious company in the 90s and 2000s, I don't know anything about the modern production lines of their Internal Hard Drives

  • Seagate SkyHawk
  • Seagate IronWolf
  • Seagate IronWolf PRO
  • Seagate SkyHawk AI

But I am reading bad bad things about the hard-drives I bought:
  • qty=6, st3500418sas, fw cc44, 7200.12, 500GB, used in project myNAS, RAID
  • qty=4, st1000dm010, fw cc43, barracuda, 1TB, used in project SCSI-to-sATA RAID-box
  • qty=2, st1000dm010, fw cc43, barracuda, 1TB, used in a UNIX server, RAID-mirroring

I am not sure about the st1000dm010-barracuda, but it seems (because reported by a lot of people) that the 7200.12 can be stuck in the BSY state, a kind of abnormal working state of the disk that can be determined by the fact that one day the disk won't be recognized by the sATA controller.

Why does it happen? It's not clear to me ...

The BSY state is a bug, but what's exactly behind it? I haven't yet understood what causes the BSY bug, but it seems related to reliability. It seems the disk has poor quality about the heads or the magnetic plates, so the firmware continuously needs to reallocate, reallocate, reallocate until the table used for this gets full and the disk goes into "busy" until you manually clean the reallocating-table.

Which means, behind the BSY-bug there is disk poor quality, and behind that there is also poor QA quality in the Seagate system of maintaining standards in manufactured products by testing a sample of the output against the specification.

I am seriously worried about those disks because used for sensible data. I do back-ups on regulars basis, but it's seriously annoying that you cannot trust your storage devices to the point that you have to change weekly backups into daily backups.

Bug life, always  :-//
« Last Edit: April 30, 2022, 10:07:03 pm by DiTBho »
The opposite of courage is not cowardice, it is conformity. Even a dead fish can go with the flow
 

Offline amyk

  • Super Contributor
  • ***
  • Posts: 8910
Re: what's behind the infamous Seagate BSY bug?
« Reply #1 on: April 30, 2022, 09:40:12 pm »
I'm not sure about the 7200.12 but the .11 definitely had a firmware bug with a circular event log buffer:

http://www.datarecoveryspecialists.co.uk/blog/firmware-bug-on-seagate-hard-drive

There's also this infamous one, which is likely a physical problem than a firmware one:

https://en.wikipedia.org/wiki/ST3000DM001
 
The following users thanked this post: DiTBho

Offline DiTBhoTopic starter

  • Super Contributor
  • ***
  • Posts: 4954
  • Country: gb
Re: what's behind the infamous Seagate BSY bug?
« Reply #2 on: April 30, 2022, 10:09:59 pm »
Quote
Class action
In 2016, Seagate faced a class action over the failure rates of its ST3000DM001 3 TB drives. Law firm Hagens Berman filed the lawsuit on 1 February in the United States District Court for the Northern District of California, and primarily cited reliability data provided by Backblaze. The lawsuit also pointed to user reviews of the hard disk drive on Newegg, which totaled more than 700 reviews with 2 or fewer stars.

The lawsuit lists Christopher Nelson, who purchased a Seagate Backup Plus 3 TB drive and a Seagate Barracuda 3 TB hard disk drive in October 2011, as its plaintiff. Both products subsequently failed, and the lawsuit contended that Seagate replaced them with inherently faulty products.
[...]

Class action is what you get when your QA is poor.
The opposite of courage is not cowardice, it is conformity. Even a dead fish can go with the flow
 

Offline BradC

  • Super Contributor
  • ***
  • Posts: 2350
  • Country: au
Re: what's behind the infamous Seagate BSY bug?
« Reply #3 on: May 01, 2022, 02:20:53 am »
Which means, behind the BSY-bug there is disk poor quality, and behind that there is also poor QA quality in the Seagate system of maintaining standards in manufactured products by testing a sample of the output against the specification.

Those disks are 10-12 years old now and provided it didn't bite you, the bug could be mitigated by a firmware upgrade which you could do at home.
If there was such a terrible QA issue with those disks, they wouldn't have made it this far.
Now having said that, the 10 I had all failed from other issues and were the least reliable disks I'd every owned, but every manufacturer drops a clanger at some point.

This is old news (in IT lifetimes it's even ancient news).
 

Offline Monkeh

  • Super Contributor
  • ***
  • Posts: 8522
  • Country: gb
Re: what's behind the infamous Seagate BSY bug?
« Reply #4 on: May 01, 2022, 02:23:04 am »
I don't know about the 7200.12s, but the 7200.11s were so mechanically crap I'd be amazed if many survived to run into firmware bugs..
 

Offline station240

  • Supporter
  • ****
  • Posts: 967
  • Country: au
Re: what's behind the infamous Seagate BSY bug?
« Reply #5 on: May 01, 2022, 06:20:54 am »
Why does it happen? It's not clear to me ...

The BSY state is a bug, but what's exactly behind it? I haven't yet understood what causes the BSY bug, but it seems related to reliability. It seems the disk has poor quality about the heads or the magnetic plates, so the firmware continuously needs to reallocate, reallocate, reallocate until the table used for this gets full and the disk goes into "busy" until you manually clean the reallocating-table.

Dunno about Segate drive failures, but I had a Western Digital drive got nuts reallocating and properly brick itself.
Was a head related issue, they had used silver plating on the PCB, an eventually the springy contacts from the head mechanism would make poor contact.
Stupid firmware would assume the random errors were bad sectors, and reallocate, reallocate, reallocate until the disk surface was worn out. Yeah 24/7/365 disk activity.
I heard the noise and figured out something funny was going on, recovered the 99.9999% of the data. Couldn't save the disk, reformatting it properly bricked it.
Hence why the WD Green series vanished for a long time.
 

Offline hans

  • Super Contributor
  • ***
  • Posts: 1852
  • Country: 00
Re: what's behind the infamous Seagate BSY bug?
« Reply #6 on: May 01, 2022, 09:48:37 am »
I had the infamous ST3000DM001 die after a couple of years. Luckily I already had replaced it with WD Red 3x4TB ZFS array. It was only used as scratchpad, unpack disk, and cable TV recordings that, at the time, I was able to do. So nothing critical lost.

However, if you dive into the specs of HDDs, you'll see that WD Red line-up isn't also without issues. There was the recent scandal of them swapping CMR for SMR technology, which is the last thing you need in a RAID array once a disk eventually fails. Then going further, they are still releasing higher capacity drives (up to 18 or 20TB I think now) on the same workload conditions as the smaller drives. I think the yearly workload for the drive head is 180TB/yr, which counts for both reads and writes. That means you can only fully read/write your 18TB drive 5 times per year. So if you do a minimal 1x write-then-readback test before RAID deployment, then that's already 1 out of 5 R/W cycles spent. ZFS also does monthly scrubbing of the data, so *only* that would exceed the workload rating.
At that point you're really better of with an enterprise drive, or perhaps even solid-state storage, even QLC has better endurance, but is still alot more expensive/GB...

Anyhow, no disk has infinite lifetime. I still have a Samsung F1 750GB disk that *works*, but it's speeds are dropping every year. I'm still amazed it works after 13 years of daily use. I took it out of my machine last year, as I was swapping my machine into a smaller chassis which only had 2 3.5" drive bays.
« Last Edit: May 01, 2022, 09:52:40 am by hans »
 
The following users thanked this post: DiTBho

Offline DiTBhoTopic starter

  • Super Contributor
  • ***
  • Posts: 4954
  • Country: gb
Re: what's behind the infamous Seagate BSY bug?
« Reply #7 on: May 01, 2022, 11:10:25 am »
Dunno about Segate drive failures, but I had a Western Digital drive got nuts reallocating and properly brick itself.
Was a head related issue, they had used silver plating on the PCB, an eventually the springy contacts from the head mechanism would make poor contact.
Stupid firmware would assume the random errors were bad sectors, and reallocate, reallocate, reallocate until the disk surface was worn out. Yeah 24/7/365 disk activity.

Yup, thanks, it's exactly what I was thinking about: a problem with plates or on the PCB, causing continuous reallocating, which then causes the rest  :D


kind of *Domino* effect  :o :o :o
The opposite of courage is not cowardice, it is conformity. Even a dead fish can go with the flow
 

Offline DiTBhoTopic starter

  • Super Contributor
  • ***
  • Posts: 4954
  • Country: gb
Re: what's behind the infamous Seagate BSY bug?
« Reply #8 on: May 01, 2022, 11:34:15 am »
Those disks are 10-12 years old now and provided it didn't bite you

They didn't bite, no problem. SMART reports less than 2000 hours, the six 7200.12.st3500418sas  and the four four st1000dm010 have been rarely used but they contain crucial data.

Of course there are backups, just I can't easily replace them with modern ones for several reasons, but I'm thinking of doing it.

the bug could be mitigated by a firmware upgrade which you could do at home.

If behind the bug there is a physical problem, like  eventually the springy contacts from the head mechanism, or a badly engineered protection film on the plates that wears out the heads more than ever, than the bug is only the tip of the iceberg.
The opposite of courage is not cowardice, it is conformity. Even a dead fish can go with the flow
 

Offline madires

  • Super Contributor
  • ***
  • Posts: 8819
  • Country: de
  • A qualified hobbyist ;)
Re: what's behind the infamous Seagate BSY bug?
« Reply #9 on: May 01, 2022, 12:10:06 pm »
Explanation and fix: http://fillwithcoolblogname.blogspot.com/2011/02/fixing-seagate-720011-bsy-0-lba-fw-bug.html

Someone from my former team had this issue. I've seen also several other problems with Seagate disks in the past. They seem to have a lucky hand in making turd drives.
 
The following users thanked this post: edavid, DiTBho

Offline DiTBhoTopic starter

  • Super Contributor
  • ***
  • Posts: 4954
  • Country: gb
Re: what's behind the infamous Seagate BSY bug?
« Reply #10 on: May 01, 2022, 12:20:43 pm »
What I have vaguely understood:

  • CMR, Conventional Magnetic Recording
  • SMR, Shingled Magnetic Recording

I know that if the drive has a lower cache, like 64MB, it is most likely an SMR drive, and to be honest, there are not that many drives that are being manufactured as SMR. At least, I can only count a few.

  • SMR
    Pros Of An SMR Hard Drive
    Cheaper
    good choice if they are used mostly for just data storage
    good for archiving tasks
    provide more storage capacity
    more energy-efficient

    Cons Of An SMR Hard Drive
    not particularly well suited if the drive is meant to be constantly and permanently performing writing operations as that can result in a cache overflow
    Slow Transfer
  • CMR
    Pros Of An CMR Hard Drive
    good choice when data is intended to be stored at high transfer rates
    good choice when extremely large amounts of data is intended to be stored
    activities ranging from music streaming, audio, video, image processing

    Cons Of An CMR Hard Drive
    not made for NAS servers

swapping CMR for SMR technology, which is the last thing you need in a RAID array once a disk eventually fails

can you explain this point?  :-//
The opposite of courage is not cowardice, it is conformity. Even a dead fish can go with the flow
 

Offline DiTBhoTopic starter

  • Super Contributor
  • ***
  • Posts: 4954
  • Country: gb
Re: what's behind the infamous Seagate BSY bug?
« Reply #11 on: May 01, 2022, 12:23:39 pm »
They seem to have a lucky hand in making turd drives.


"smile so as not to cry"
I think they should put a sticker like this on the top of their HDs.

or T-shirts?  :o :o :o
The opposite of courage is not cowardice, it is conformity. Even a dead fish can go with the flow
 

Offline hans

  • Super Contributor
  • ***
  • Posts: 1852
  • Country: 00
Re: what's behind the infamous Seagate BSY bug?
« Reply #12 on: May 01, 2022, 01:18:29 pm »
swapping CMR for SMR technology, which is the last thing you need in a RAID array once a disk eventually fails

can you explain this point?  :-//

CMR/SMR in essence has little to do with "NAS operation". CMR drives have been made for 24/7 operation for centuries, including NAS. SMR is a relatively new tech in which magnetic tracks are recorded so closely together, that you can't write 1 without affecting the others. Therefore in order to effectively write new data to a track, the disk needs to read nearby data and encode it accordingly. This means that a SMR drive may have good write speeds whilst unformatted, but when it gets fuller it gets slower and slower.. eventually dropping to rubbish USB 2.0 thumb drive speeds. These disks are horrible for write-heavy workloads.

For many applications, this is not all bad. If your NAS is full of large video files which you never overwrite or delete, then you'll be happy with the extra capacity/$. However, if you're in a NAS environment, then it's likely to be running a RAID setup. That means that WHEN a disk fails, it will need to be replaced, and a full disk read/write needs to take place to rebuild from the parity data. It turns out that this is incredibly slow on SMR drives, like an order of magnitude slower. While the RAID array is rebuilding, it is less or completely unprotected against any further disk failures. And how much can go wrong in 1 week instead of half a day rebuild?
Well, your guess is as good as mine! But all I know is that if a light bulb in my car fails and I go to the garage to replace it, then I will also replace the bulb of the other side, since it has the same amount of running time and 'wear'. That old bulb might fail on the trip back home, or 3 years down the line, who knows. But I wouldn't feel very comfortable with the array running in that state for a very long time.
« Last Edit: May 01, 2022, 01:21:02 pm by hans »
 
The following users thanked this post: DiTBho

Offline CJay

  • Super Contributor
  • ***
  • Posts: 4136
  • Country: gb
Re: what's behind the infamous Seagate BSY bug?
« Reply #13 on: May 01, 2022, 01:25:34 pm »
Lots of Seagate bashing but WD are pretty awful, the most regular data loss I've encountered has been on WD drives being unable to read firmware from the platters, then there's the infamous IBM/Hitachi Deathstar drives and then the Fujitsu Hybrid drives which corrupt filesystems at random and etc. etc. etc.

Point being, all hard drives have failure modes, at least the Seagate BSY one could be 'fixed' via the serial console, you're shit out of luck if your WD drive fails to read its firmware and the less said about the Deathstars the better.
 
The following users thanked this post: DiTBho

Offline DiTBhoTopic starter

  • Super Contributor
  • ***
  • Posts: 4954
  • Country: gb
Re: what's behind the infamous Seagate BSY bug?
« Reply #14 on: May 01, 2022, 02:34:28 pm »
So, SMR as data storage, CMR for recurring write processes.

And it's not so easy because both technologies SMR and CMR have their justification and their respective fields of application, with benefits and issues

I am looking at
  • the Seagate IronWolf is a 4TB CMR NAS HDD.
  • the WD Red Plus WD60EFRX-CMR is a 6TB CMR NAS HDD.

meanwhile I have just ordered qty=16 Fujitsu Maw-Enterprise uwide-320 SCSI disks. Only 72GB of storage, four disks are something like 500GB, so less than what you can store in a common low-end sATA disc, but their specs look impressive and I paid only 20 euro each, old-new-stock, and this way I don't need to provide my customers any sATA-to-SCSI box for adapting disks, only a super simple 8x SCSI bay, two boxes, 8x bays each, connect each to a dedicated SCSI chain.

I hope these 2008 disks are worth the money, I bought the entire stock, physically the entire cargo of SCSI disks in a remote warehouse (add 60 euro for UPS, but thanks god, no importing fee).

Right, I will think again about the sATA SMR disks ... I also need something modern for editing and storing videos, and I am completely new to this.
The opposite of courage is not cowardice, it is conformity. Even a dead fish can go with the flow
 

Offline madires

  • Super Contributor
  • ***
  • Posts: 8819
  • Country: de
  • A qualified hobbyist ;)
Re: what's behind the infamous Seagate BSY bug?
« Reply #15 on: May 01, 2022, 02:48:11 pm »
Point being, all hard drives have failure modes, at least the Seagate BSY one could be 'fixed' via the serial console, you're shit out of luck if your WD drive fails to read its firmware and the less said about the Deathstars the better.

As someone already mentioned, all HDD manufactureres produce some self-destructing junk from time to time. From my personal experience Seagate seems to be exceptionally good at this. However, I've also seen many cases of failed disks caused by choosing the wrong model/series usage-wise (mostly to save some bucks).
« Last Edit: May 01, 2022, 02:49:53 pm by madires »
 
The following users thanked this post: DiTBho

Offline hans

  • Super Contributor
  • ***
  • Posts: 1852
  • Country: 00
Re: what's behind the infamous Seagate BSY bug?
« Reply #16 on: May 01, 2022, 03:48:16 pm »
Those 15k disks had impressive specs in 2002  :-// Nowadays they are mostly loud and hot disks that are far outmatched by solid-state. A 2TB SSD can be had for 150EUR, and it runs circles sized many orders of magnitudes around any HDD in terms of performance, power consumption, noise and to some degree also reliability (given that you buy from a decent brand with solid firmware/controller).
You'd only buy spinning rust now if you need to store dozens of TBs. And then enterprise drives can make sense, because they come with extended warranty (5 years) and higher workload ratings. But I think they only make them at <=7200rpm speeds. HDDs for data density, SSDs for speed.

In the 12 yrs I've been using SSDs, I've only had 1 fail. And that was on day 1 in which the controller must have struck a bad FLASH sector.

Daily data backup is always a good idea. I run daily automated snapshots from my NAS and home server, which are stored outside the house in case any devastating disaster happens.
 

Offline CJay

  • Super Contributor
  • ***
  • Posts: 4136
  • Country: gb
Re: what's behind the infamous Seagate BSY bug?
« Reply #17 on: May 01, 2022, 04:13:48 pm »
And then enterprise drives can make sense, because they come with extended warranty (5 years) and higher workload ratings. But I think they only make them at <=7200rpm speeds.

15K enterprise disks are still a thing.

Daily data backup is always a good idea. I run daily automated snapshots from my NAS and home server, which are stored outside the house in case any devastating disaster happens.

I've been involved in the recovery of a few *large* arrays for organisations that thought they had a decent backup/recovery plan, it's also important to *TEST* the plan because it's no fun finding out your backup wasn't doing what you thought it was after your array has failed.

Have also had plenty of SSDs fail in use, one only a month or so ago, it's a thing

Caveat, I spend ~£500K-600K per annum on end user laptops alone and that's not counting the other equipment so I handle more IT hardware in a year than some people see in their entire life, those numbers alone ensure I see more failures too.
« Last Edit: May 01, 2022, 04:20:41 pm by CJay »
 

Offline hans

  • Super Contributor
  • ***
  • Posts: 1852
  • Country: 00
Re: what's behind the infamous Seagate BSY bug?
« Reply #18 on: May 01, 2022, 05:33:49 pm »
Good points. At the rates that I'm buying computer equipment, my failure rate can't reach a resolution below 5%.
I imagine there are users and applications everywhere. My comments are from a power user perspective.

A backup isn't a backup if you haven't tested it. Reminds me that I should run that test again some day. I mount my backup repo from time to time to grab some files I have deleted, so I know the repo is intact and gets updated. But I should schedule a complete recovery from scratch next time I reinstall my lab workstation.
Hypothetically, say my house has burned down, I just got a new laptop in and I want to login on my accounts again. See if I can get all my data back without using any local resources.
 

Offline tooki

  • Super Contributor
  • ***
  • Posts: 14715
  • Country: ch
Re: what's behind the infamous Seagate BSY bug?
« Reply #19 on: May 02, 2022, 11:52:44 am »
Cons Of An CMR Hard Drive
not made for NAS servers
LOL what?  :-DD

Until a few years ago, CMR is all that existed! Basically, all else held equal, SMR is suitable only for low-writing situations, CMR can be used universally.
 
The following users thanked this post: DiTBho

Offline DiTBhoTopic starter

  • Super Contributor
  • ***
  • Posts: 4954
  • Country: gb
Re: what's behind the infamous Seagate BSY bug?
« Reply #20 on: May 02, 2022, 06:16:01 pm »
LOL what?  :-DD

Until a few years ago, CMR is all that existed! Basically, all else held equal, SMR is suitable only for low-writing situations, CMR can be used universally.

Eh, you are right, tons of balls of confusion in my mind.
These two are CMR, and are advertised ad "for NAS"
  • Seagate IronWolf is a 4TB CMR NAS HDD
  • WD Red Plus WD60EFRX-CMR is a 6TB CMR NAS HDD
I got confused by the fact that CMR should have large disk-cache, say more than 64MByte.
The opposite of courage is not cowardice, it is conformity. Even a dead fish can go with the flow
 

Online Buriedcode

  • Super Contributor
  • ***
  • Posts: 1872
  • Country: gb
Re: what's behind the infamous Seagate BSY bug?
« Reply #21 on: May 02, 2022, 06:36:45 pm »
But I am reading bad bad things about the hard-drives I bought:
  • qty=6, st3500418sas, fw cc44, 7200.12, 500GB, used in project myNAS, RAID
  • qty=4, st1000dm010, fw cc43, barracuda, 1TB, used in project SCSI-to-sATA RAID-box
  • qty=2, st1000dm010, fw cc43, barracuda, 1TB, used in a UNIX server, RAID-mirroring

I am not sure about the st1000dm010-barracuda, but it seems (because reported by a lot of people) that the 7200.12 can be stuck in the BSY state, a kind of abnormal working state of the disk that can be determined by the fact that one day the disk won't be recognized by the sATA controller.
You're using DM010.  The affected drives were mostly DM001.  That bug was fixed a long time ago.
 
The following users thanked this post: DiTBho

Offline DiTBhoTopic starter

  • Super Contributor
  • ***
  • Posts: 4954
  • Country: gb
Re: what's behind the infamous Seagate BSY bug?
« Reply #22 on: May 02, 2022, 06:44:20 pm »
@Buriedcode
thanks for your clarification  :-+
The opposite of courage is not cowardice, it is conformity. Even a dead fish can go with the flow
 

Offline tooki

  • Super Contributor
  • ***
  • Posts: 14715
  • Country: ch
Re: what's behind the infamous Seagate BSY bug?
« Reply #23 on: May 03, 2022, 08:13:12 am »
Eh, you are right, tons of balls of confusion in my mind.
These two are CMR, and are advertised ad "for NAS"
  • Seagate IronWolf is a 4TB CMR NAS HDD
  • WD Red Plus WD60EFRX-CMR is a 6TB CMR NAS HDD
I got confused by the fact that CMR should have large disk-cache, say more than 64MByte.
A large cache benefits any drive, but it’s SMR drives where a large cache is critical because they must write entire large sections at once, and can’t do small, random writes.
 

Offline newbrain

  • Super Contributor
  • ***
  • Posts: 1872
  • Country: se
Re: what's behind the infamous Seagate BSY bug?
« Reply #24 on: May 03, 2022, 12:09:35 pm »
I know that if the drive has a lower cache, like 64MB, it is most likely an SMR drive, and to be honest, there are not that many drives that are being manufactured as SMR. At least, I can only count a few.

  • SMR
    Pros Of An SMR Hard Drive
    Cheaper
    good choice if they are used mostly for just data storage
    good for archiving tasks
    provide more storage capacity
    more energy-efficient

    Cons Of An SMR Hard Drive
    not particularly well suited if the drive is meant to be constantly and permanently performing writing operations as that can result in a cache overflow
    Slow Transfer
  • CMR
    Pros Of An CMR Hard Drive
    good choice when data is intended to be stored at high transfer rates
    good choice when extremely large amounts of data is intended to be stored
    activities ranging from music streaming, audio, video, image processing

    Cons Of An CMR Hard Drive
    not made for NAS servers
Not to mince words, DiTBho, you got many points exactly backwards!
In RED what I find objectionable.

In order of appearance:
  • Drives with larger caches are often SMR - this because writing performance is very bad, large caches mitigate that somewhat
  • a CMR driv will have less capacity of SMR drive with the same physical structure
  • CMR drives are suitable for NAS, not SMR ones.
    Simplifying a bit: given the abysmal large writes performance of SMR, in case a NAS needs to resilver this will take an inordinate amount of time and make total failure much more probable. CMR drives have no penalty for writes wrt reads, so they are fine in this situation.

SMR is a horrible gimmick, justified by greed (news at 11) - at least now most vendors are a bit more forthcoming with the information.
Nandemo wa shiranai wa yo, shitteru koto dake.
 
The following users thanked this post: tooki


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf