EEVblog Electronics Community Forum

Products => Computers => Topic started by: DiTBho on May 27, 2022, 10:17:50 pm

Title: tools to understand when a hard drive is close to death
Post by: DiTBho on May 27, 2022, 10:17:50 pm

so, I bought eight SCSI-hard-disks, one doesn't even respond, the other seven have bad values when I query smartctl

But the seller insists that the records are acceptable ... and that I'm too picky and fussy

so, let's see what I have used and seen on the console

Code: [Select]

# smartctl -a /dev/sda
=== START OF INFORMATION SECTION ===
Vendor:               FUJITSU
Product:              MAW3147NC
Revision:             0104
User Capacity:        147,086,327,808 bytes [147 GB]
Logical block size:   512 bytes
Rotation Rate:        10025 rpm
Serial number:        DAA0P6801RGB
Device type:          disk
Transport protocol:   Parallel SCSI (SPI-4)
Local Time is:        Fri May 27 21:33:04 2022 -00
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning:  Enabled

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK

Current Drive Temperature:     35 C
Drive Trip Temperature:        65 C

Manufactured in week 32 of year 2006
Specified cycle count over device lifetime:  10000
Accumulated start-stop cycles:  129
Elements in grown defect list: 0

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:          0       28         0         0          0      24327.545           0
write:         0       65         0         0          0       6244.290           0

Non-medium error count:      142

No self-tests have been logged

This disk is considered "acceptable", but it's not as described, it's not "brand-new" because
1) it has Non-medium error count > 1 (it means that the connection with the HBA has failed several times)
2) it has processed 24 Terabyte of data
3) probably it worked non-stop for months, 24 Terabyte of data with 129 power On/Off cycles

ok? So, that's why I say "not brand new"

This disk is the best of the whole lot, there are disks with terrible values, it may be acceptable (but not for the price I paid as "brand new") because it has ZERO "uncorrected errors" and ZERO "Elements in grown defect list".

I consider these two values as an index of hardware failure, specifically how close to death are Disk read/write heads and platters.

Am I wrong with this? What do you think?

And, isn't there any other health testing software simpler and with explicit assessment of the state of health of the hard-disk for Paypal employees to understand?

We are talking about 600 euro, and when you fill a claim and you open a dispute, Paypal wants to read documentation :o :o :o

I used smartctl, part of the smartmontools (http://www.smartmontools.org) project by Bruce Allen, Christian Franke.

Title: Re: tools to understand when a hard drive is close to death
Post by: bd139 on May 27, 2022, 10:31:06 pm

Can confirm that the errors there are ECC and non medium which suggests problems with the HBA interface rather than the actual disk. Usually shitty firmware in the controller (HP I'm looking at you). They look fine to me from a "used disk" perspective.

If you paid for them as new though, show paypal the power cycle count and say they have been turned on and off 129 times and that they are definitely not new.

Title: Re: tools to understand when a hard drive is close to death
Post by: tunk on May 27, 2022, 10:32:56 pm

You could also try this:
smartctl -t short /dev/sdX
Wait until finished, then run smartctl -a again and the
disk run time should show up in the list of self-tests
(NB: for some disks the hours will turn over at 65535).

Update Nov 24th 2022:
Just wiped some SAS drives, and the hours were stuck at
65535 (both before and after wiping which took ~2 days).

Title: Re: tools to understand when a hard drive is close to death
Post by: BrokenYugo on May 28, 2022, 01:00:36 am

They're clearly used and were advertised as new, no need to go farther, refund or replace time.

Title: Re: tools to understand when a hard drive is close to death
Post by: edpalmer42 on May 28, 2022, 02:23:40 am

Shouldn't there be a count of power-on hours? SATA drives report that via smartctl.

Title: Re: tools to understand when a hard drive is close to death
Post by: DiTBho on May 28, 2022, 08:16:19 am

disk{1,2,4,5,8} don't, but disk{2,3,6,7} report the following when tested with badblock

Code: [Select]

[39374.181225] sd 1:0:1:0: [sda] tag#217 CDB: opcode=0x28 28 00 00 00 00 08 00 00 08 00
[39374.181228] print_req_error: I/O error, dev sda, sector 8 flags 0
[39374.181231] Buffer I/O error on dev sda, logical block 1, async page read
[39374.437524] sd 1:0:1:0: [sda] tag#218 UNKNOWN(0x2003) Result: hostbyte=0x01 driverbyte=0x00
[39374.437531] sd 1:0:1:0: [sda] tag#218 CDB: opcode=0x28 28 00 11 1f 83 98 00 00 08 00
[39374.437535] print_req_error: I/O error, dev sda, sector 287277976 flags 0
[39374.437539] Buffer I/O error on dev sda, logical block 35909747, async page read
[39374.694139] sd 1:0:1:0: [sda] tag#219 UNKNOWN(0x2003) Result: hostbyte=0x01 driverbyte=0x00
[39374.694148] sd 1:0:1:0: [sda] tag#219 CDB: opcode=0x28 28 00 11 1f 83 98 00 00 08 00
[39374.694153] print_req_error: I/O error, dev sda, sector 287277976 flags 0
[39374.694157] Buffer I/O error on dev sda, logical block 35909747, async page read
[39374.950455] sd 1:0:1:0: [sda] tag#220 UNKNOWN(0x2003) Result: hostbyte=0x01 driverbyte=0x00
[39374.950462] sd 1:0:1:0: [sda] tag#220 CDB: opcode=0x28 28 00 11 1f 83 98 00 00 08 00
[39374.950467] print_req_error: I/O error, dev sda, sector 287277976 flags 0
[39374.950471] Buffer I/O error on dev sda, logical block 35909747, async page read
[39375.206805] Buffer I/O error on dev sda, logical block 35909747, async page read
[39375.463100] Buffer I/O error on dev sda, logical block 35909747, async page read
[39375.719385] Buffer I/O error on dev sda, logical block 35909747, async page read
[39377.769922] sd 1:0:1:0: [sda] tag#199 UNKNOWN(0x2003) Result: hostbyte=0x01 driverbyte=0x00
[39377.769929] sd 1:0:1:0: [sda] tag#199 CDB: opcode=0x28 28 00 00 00 00 00 00 00 08 00
[39377.769934] print_req_error: I/O error, dev sda, sector 0 flags 0
[39378.026255] sd 1:0:1:0: [sda] tag#200 UNKNOWN(0x2003) Result: hostbyte=0x01 driverbyte=0x00
[39378.026262] sd 1:0:1:0: [sda] tag#200 CDB: opcode=0x28 28 00 00 00 00 00 00 00 08 00
[39378.026265] print_req_error: I/O error, dev sda, sector 0 flags 0
[39378.026270] Buffer I/O error on dev sda, logical block 0, async page read
[39378.282605] sd 1:0:1:0: [sda] tag#201 UNKNOWN(0x2003) Result: hostbyte=0x01 driverbyte=0x00
[39378.282614] sd 1:0:1:0: [sda] tag#201 CDB: opcode=0x28 28 00 00 00 00 00 00 00 08 00
[39378.282618] print_req_error: I/O error, dev sda, sector 0 flags 0
[39378.282622] Buffer I/O error on dev sda, logical block 0, async page read
[39378.538992] sd 1:0:1:0: [sda] tag#202 UNKNOWN(0x2003) Result: hostbyte=0x01 driverbyte=0x00
[39378.539000] sd 1:0:1:0: [sda] tag#202 CDB: opcode=0x28 28 00 00 00 00 00 00 00 08 00
[39378.539004] print_req_error: I/O error, dev sda, sector 0 flags 0
[39378.539008] Buffer I/O error on dev sda, logical block 0, async page read
[39378.795364] sd 1:0:1:0: [sda] tag#203 UNKNOWN(0x2003) Result: hostbyte=0x01 driverbyte=0x00
[39378.795372] sd 1:0:1:0: [sda] tag#203 CDB: opcode=0x28 28 00 00 00 00 00 00 00 08 00
[39378.795375] print_req_error: I/O error, dev sda, sector 0 flags 0
[39378.795378] Buffer I/O error on dev sda, logical block 0, async page read
[39379.051683] sd 1:0:1:0: [sda] tag#204 UNKNOWN(0x2003) Result: hostbyte=0x01 driverbyte=0x00
[39379.051693] sd 1:0:1:0: [sda] tag#204 CDB: opcode=0x28 28 00 00 00 00 00 00 00 08 00
[39379.051698] print_req_error: I/O error, dev sda, sector 0 flags 0
[39379.051702] Buffer I/O error on dev sda, logical block 0, async page read
[39379.308079] sd 1:0:1:0: [sda] tag#205 UNKNOWN(0x2003) Result: hostbyte=0x01 driverbyte=0x00
[39379.308086] sd 1:0:1:0: [sda] tag#205 CDB: opcode=0x28 28 00 00 00 00 00 00 00 08 00
[39379.308090] print_req_error: I/O error, dev sda, sector 0 flags 0
[39379.308093] Buffer I/O error on dev sda, logical block 0, async page read
[39379.564437] sd 1:0:1:0: [sda] tag#206 UNKNOWN(0x2003) Result: hostbyte=0x01 driverbyte=0x00
[39379.564444] sd 1:0:1:0: [sda] tag#206 CDB: opcode=0x28 28 00 00 00 08 00 00 00 08 00
[39379.564447] print_req_error: I/O error, dev sda, sector 2048 flags 0
[39379.564450] Buffer I/O error on dev sda, logical block 256, async page read
[39379.820776] sd 1:0:1:0: [sda] tag#207 UNKNOWN(0x2003) Result: hostbyte=0x01 driverbyte=0x00
[39379.820783] sd 1:0:1:0: [sda] tag#207 CDB: opcode=0x28 28 00 00 00 00 00 00 00 08 00
[39379.820786] print_req_error: I/O error, dev sda, sector 0 flags 0
[39379.820789] Buffer I/O error on dev sda, logical block 0, async page read
[39380.077078] sd 1:0:1:0: [sda] tag#208 UNKNOWN(0x2003) Result: hostbyte=0x01 driverbyte=0x00
[39380.077087] sd 1:0:1:0: [sda] tag#208 CDB: opcode=0x28 28 00 00 00 00 00 00 00 08 00
[39380.077091] print_req_error: I/O error, dev sda, sector 0 flags 0
[39380.077095] Buffer I/O error on dev sda, logical block 0, async page read
[39380.333454] Buffer I/O error on dev sda, logical block 0, async page read
[39382.896769] sd 1:0:1:0: [sda] tag#219 UNKNOWN(0x2003) Result: hostbyte=0x01 driverbyte=0x00
[39382.896781] sd 1:0:1:0: [sda] tag#219 CDB: opcode=0x28 28 00 00 00 00 00 00 00 08 00
[39382.896787] print_req_error: I/O error, dev sda, sector 0 flags 0
[39383.153148] sd 1:0:1:0: [sda] tag#220 UNKNOWN(0x2003) Result: hostbyte=0x01 driverbyte=0x00
[39383.153158] sd 1:0:1:0: [sda] tag#220 CDB: opcode=0x28 28 00 00 00 00 00 00 00 08 00
[39383.153163] print_req_error: I/O error, dev sda, sector 0 flags 0
[39383.153170] Buffer I/O error on dev sda, logical block 0, async page read
[39383.409579] sd 1:0:1:0: [sda] tag#221 UNKNOWN(0x2003) Result: hostbyte=0x01 driverbyte=0x00
[39383.409588] sd 1:0:1:0: [sda] tag#221 CDB: opcode=0x28 28 00 00 00 00 08 00 00 08 00
[39383.409593] print_req_error: I/O error, dev sda, sector 8 flags 0
[39383.409597] Buffer I/O error on dev sda, logical block 1, async page read
[39383.665940] sd 1:0:1:0: [sda] tag#222 UNKNOWN(0x2003) Result: hostbyte=0x01 driverbyte=0x00
[39383.665948] sd 1:0:1:0: [sda] tag#222 CDB: opcode=0x28 28 00 00 00 00 08 00 00 08 00
[39383.665952] print_req_error: I/O error, dev sda, sector 8 flags 0
[39383.665956] Buffer I/O error on dev sda, logical block 1, async page read
[39383.922295] sd 1:0:1:0: [sda] tag#223 UNKNOWN(0x2003) Result: hostbyte=0x01 driverbyte=0x00
[39383.922302] sd 1:0:1:0: [sda] tag#223 CDB: opcode=0x28 28 00 00 00 00 08 00 00 08 00
[39383.922305] print_req_error: I/O error, dev sda, sector 8 flags 0
[39383.922309] Buffer I/O error on dev sda, logical block 1, async page read
[39384.178593] sd 1:0:1:0: [sda] tag#192 UNKNOWN(0x2003) Result: hostbyte=0x01 driverbyte=0x00
[39384.178600] sd 1:0:1:0: [sda] tag#192 CDB: opcode=0x28 28 00 00 00 00 08 00 00 08 00
[39384.178604] print_req_error: I/O error, dev sda, sector 8 flags 0
[39384.178608] Buffer I/O error on dev sda, logical block 1, async page read
[39384.434949] sd 1:0:1:0: [sda] tag#193 UNKNOWN(0x2003) Result: hostbyte=0x01 driverbyte=0x00
[39384.434956] sd 1:0:1:0: [sda] tag#193 CDB: opcode=0x28 28 00 00 00 00 18 00 00 08 00
[39384.434959] print_req_error: I/O error, dev sda, sector 24 flags 0
[39384.434962] Buffer I/O error on dev sda, logical block 3, async page read
[39384.691285] sd 1:0:1:0: [sda] tag#194 UNKNOWN(0x2003) Result: hostbyte=0x01 driverbyte=0x00
[39384.691292] sd 1:0:1:0: [sda] tag#194 CDB: opcode=0x28 28 00 00 00 00 18 00 00 08 00
[39384.691295] print_req_error: I/O error, dev sda, sector 24 flags 0
[39384.691299] Buffer I/O error on dev sda, logical block 3, async page read
[39384.947577] sd 1:0:1:0: [sda] tag#195 UNKNOWN(0x2003) Result: hostbyte=0x01 driverbyte=0x00
[39384.947584] sd 1:0:1:0: [sda] tag#195 CDB: opcode=0x28 28 00 00 00 00 18 00 00 08 00
[39384.947588] print_req_error: I/O error, dev sda, sector 24 flags 0
[39384.947591] Buffer I/O error on dev sda, logical block 3, async page read
[39385.203916] sd 1:0:1:0: [sda] tag#196 UNKNOWN(0x2003) Result: hostbyte=0x01 driverbyte=0x00
[39385.203923] sd 1:0:1:0: [sda] tag#196 CDB: opcode=0x28 28 00 00 00 00 18 00 00 08 00
[39385.203926] print_req_error: I/O error, dev sda, sector 24 flags 0
[39385.203929] Buffer I/O error on dev sda, logical block 3, async page read
[39385.460250] Buffer I/O error on dev sda, logical block 7, async page read
[39388.023414] sd 1:0:1:0: [sda] tag#207 UNKNOWN(0x2003) Result: hostbyte=0x01 driverbyte=0x00
[39388.023423] sd 1:0:1:0: [sda] tag#207 CDB: opcode=0x28 28 00 00 00 00 08 00 00 08 00
[39388.023428] print_req_error: I/O error, dev sda, sector 8 flags 0
[39388.279753] sd 1:0:1:0: [sda] tag#208 UNKNOWN(0x2003) Result: hostbyte=0x01 driverbyte=0x00
[39388.279759] sd 1:0:1:0: [sda] tag#208 CDB: opcode=0x28 28 00 00 00 00 08 00 00 08 00
[39388.279763] print_req_error: I/O error, dev sda, sector 8 flags 0
[39388.279768] Buffer I/O error on dev sda, logical block 1, async page read
[39388.536087] sd 1:0:1:0: [sda] tag#209 UNKNOWN(0x2003) Result: hostbyte=0x01 driverbyte=0x00
[39388.536094] sd 1:0:1:0: [sda] tag#209 CDB: opcode=0x28 28 00 00 00 00 18 00 00 08 00
[39388.536097] print_req_error: I/O error, dev sda, sector 24 flags 0
[39388.536100] Buffer I/O error on dev sda, logical block 3, async page read
[39388.792427] sd 1:0:1:0: [sda] tag#210 UNKNOWN(0x2003) Result: hostbyte=0x01 driverbyte=0x00
[39388.792436] sd 1:0:1:0: [sda] tag#210 CDB: opcode=0x28 28 00 00 00 00 18 00 00 08 00
[39388.792439] print_req_error: I/O error, dev sda, sector 24 flags 0
[39388.792443] Buffer I/O error on dev sda, logical block 3, async page read
[39389.048754] sd 1:0:1:0: [sda] tag#211 UNKNOWN(0x2003) Result: hostbyte=0x01 driverbyte=0x00
[39389.048763] sd 1:0:1:0: [sda] tag#211 CDB: opcode=0x28 28 00 00 00 00 38 00 00 08 00
[39389.048767] print_req_error: I/O error, dev sda, sector 56 flags 0
[39389.048771] Buffer I/O error on dev sda, logical block 7, async page read
[39389.305132] sd 1:0:1:0: [sda] tag#212 UNKNOWN(0x2003) Result: hostbyte=0x01 driverbyte=0x00
[39389.305140] sd 1:0:1:0: [sda] tag#212 CDB: opcode=0x28 28 00 00 00 00 38 00 00 08 00
[39389.305143] print_req_error: I/O error, dev sda, sector 56 flags 0
[39389.305147] Buffer I/O error on dev sda, logical block 7, async page read
[39389.561501] sd 1:0:1:0: [sda] tag#213 UNKNOWN(0x2003) Result: hostbyte=0x01 driverbyte=0x00
[39389.561509] sd 1:0:1:0: [sda] tag#213 CDB: opcode=0x28 28 00 00 00 00 78 00 00 08 00
[39389.561512] print_req_error: I/O error, dev sda, sector 120 flags 0
[39389.561516] Buffer I/O error on dev sda, logical block 15, async page read
[39389.817870] sd 1:0:1:0: [sda] tag#214 UNKNOWN(0x2003) Result: hostbyte=0x01 driverbyte=0x00
[39389.817878] sd 1:0:1:0: [sda] tag#214 CDB: opcode=0x28 28 00 00 00 00 78 00 00 08 00
[39389.817881] print_req_error: I/O error, dev sda, sector 120 flags 0
[39389.817885] Buffer I/O error on dev sda, logical block 15, async page read
[39390.074197] sd 1:0:1:0: [sda] tag#215 UNKNOWN(0x2003) Result: hostbyte=0x01 driverbyte=0x00
[39390.074206] sd 1:0:1:0: [sda] tag#215 CDB: opcode=0x28 28 00 00 00 00 00 00 00 08 00
[39390.074211] print_req_error: I/O error, dev sda, sector 0 flags 0
[39390.074215] Buffer I/O error on dev sda, logical block 0, async page read
[39390.330529] sd 1:0:1:0: [sda] tag#216 UNKNOWN(0x2003) Result: hostbyte=0x01 driverbyte=0x00
[39390.330537] sd 1:0:1:0: [sda] tag#216 CDB: opcode=0x28 28 00 00 00 00 00 00 00 08 00
[39390.330541] print_req_error: I/O error, dev sda, sector 0 flags 0
[39390.330546] Buffer I/O error on dev sda, logical block 0, async page read
[39390.586850] Buffer I/O error on dev sda, logical block 0, async page read

Title: Re: tools to understand when a hard drive is close to death
Post by: DiTBho on May 28, 2022, 08:21:46 am

Quote from: edpalmer42 on May 28, 2022, 02:23:40 am

Shouldn't there be a count of power-on hours? SATA drives report that via smartctl.

Yes, here it is an example, Fujitsu MHZ2120BH G, sATA

Code: [Select]

# smartctl -a /dev/sda

=== START OF INFORMATION SECTION ===
Model Family:     Fujitsu MHZ BH
Device Model:     FUJITSU MHZ2120BH G1
Serial Number:    K64PT93260WG
LU WWN Device Id: 5 00000e 043724022
Firmware Version: 00810009
User Capacity:    120,034,123,776 bytes [120 GB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS T13/1699-D revision 3f
SATA Version is:  SATA 2.5, 1.5 Gb/s
Local Time is:    Sat May 28 09:07:16 2022 -00
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (  487) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        (  69) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x003d) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   100   100   046    Pre-fail  Always       -       0
  2 Throughput_Performance  0x0005   100   100   030    Pre-fail  Offline      -       0
  3 Spin_Up_Time            0x0003   100   100   025    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   091   091   000    Old_age   Always       -       40392
  5 Reallocated_Sector_Ct   0x0033   100   100   024    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   100   100   047    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0005   100   100   019    Pre-fail  Offline      -       0
  9 Power_On_Hours          0x0032   051   051   000    Old_age   Always       -       24958
 10 Spin_Retry_Count        0x0013   100   100   020    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       2356
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       94
193 Load_Cycle_Count        0x0032   077   077   000    Old_age   Always       -       472338
194 Temperature_Celsius     0x0022   100   085   000    Old_age   Always       -       51 (Min/Max 9/63)
195 Hardware_ECC_Recovered  0x001a   100   100   000    Old_age   Always       -       0
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   253   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x000f   100   100   060    Pre-fail  Always       -       0
203 Run_Out_Cancel          0x0002   100   100   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x003e   200   200   000    Old_age   Always       -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

these two would have been interesting if only they had been implemented on the SCSI disk :-//

Code: [Select]

  9 Power_On_Hours          0x0032   051   051   000    Old_age   Always       -       24958
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       2356

Some disks immediately report it, some disks require "-t short" to calculate the POH value :-//

Title: Re: tools to understand when a hard drive is close to death
Post by: sokoloff on May 28, 2022, 08:35:00 am

If sold as brand new, I agree that I’d focus on the power cycle count (and hours where implemented) as the primary thrust of the argument.

You could include the other data (TBW, etc), saying that it corroborates the drives as not brand new, but I’d try to keep one clear and obvious line of reasoning that any customer service person can read and understand as “yeah, this isn’t new”.

Title: Re: tools to understand when a hard drive is close to death
Post by: DiTBho on May 28, 2022, 08:43:04 am

Quote from: tunk on May 27, 2022, 10:32:56 pm

You could also try this:
Code: [Select]
smartctl -t short /dev/sdXWait until finished, then run smartctl -a again and the disk run time should show up in the list of self-tests

Code: [Select]

# smartctl -t short /dev/sda

... after 5 minutes ...

Code: [Select]

# smartctl -a /dev/sda
=== START OF INFORMATION SECTION ===
Vendor:               FUJITSU
Product:              MAW3147NC
Revision:             0104
User Capacity:        147,086,327,808 bytes [147 GB]
Logical block size:   512 bytes
Rotation Rate:        10025 rpm
Serial number:        DAA0P6300NEG
Device type:          disk
Transport protocol:   Parallel SCSI (SPI-4)
Local Time is:        Sat May 28 08:41:06 2022 -00
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning:  Disabled or Not Supported

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK

Current Drive Temperature:     28 C
Drive Trip Temperature:        65 C

Manufactured in week 10 of year 2006
Specified cycle count over device lifetime:  10000
Accumulated start-stop cycles:  98
Elements in grown defect list: 0

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:          0       12         0         0          0      36945.009           0
write:         0       15         0         0          0      11436.043           0

Non-medium error count:      965

SMART Self-test log
Num  Test              Status                 segment  LifeTime  LBA_first_err [SK ASC ASQ]
     Description                              number   (hours)
# 1  Background short  Completed                   -   57251                 - [-   -    -]

Long (extended) Self Test duration: 3432 seconds [57.2 minutes]

Code: [Select]

Num  Test              Status                 segment  LifeTime  LBA_first_err [SK ASC ASQ]
     Description                              number   (hours)
# 1  Background short  Completed                   -   57251                 - [-   -    -]

so the "brand new" disk has 57251 hours in its logs :o :o :o

Title: Re: tools to understand when a hard drive is close to death
Post by: DiTBho on May 28, 2022, 08:59:47 am

Stupid me that I didn't read the feedbacks (https://www.ebay.co.uk/fdbk/feedback_profile/apress24?filter=feedback_page%3ARECEIVED_AS_SELLER&_trksid=p2047675.m3561.l2560)

The eBay seller apress24 seems to be used to sell used stuff as new :o :o :o

Quote

messages
DiTBho: can you check that hard-disks haven't been used and are "new just opened for testing"?
Seller: Yes, I can confirm that they have just 1 turn on/ or even 0, depends if drive was tested or not

I trusted him! Big mistake!

Quote

eBay feedback

The seller lies in the description -> drives have 0 Power_On_Hours , but also show SMART tests that ware done at 20K, 30K or 60 000 hours. So this are not NEW or OPEN BOX drives, but OLD USED drives with cleared SMART values.
HGST UltraStar 7K4000 4TB 7.2k 64MB SATA III 3.5'' HUS724040ALE641 (#175065142498)
....

S.M.A.R.T. shows the disks have been used for over 7 years, yet sold as new, seller doesn't reply to messages about the issue. Caveat emptor
Fujitsu MAX3147NC 147GB Ultra320 80-PIN 15K RPM (#172697878385)
....

Achtung! Als angeblich "NEU" verkaufte Festplatte zeigte bereits 35817 Betriebsstunden, befindet sich also am Lebensende. / BEWARE! Supposedly "NEW" HDD already showed 35817 running hours, so it was end of life and definately not "new".
HGST Ultrastar 7K6000 2TB 7.2K 128MB SATA III 3.5'' HUS726020ALE614
....

This item was described "as new in original packaging" on delivery, it was found to be a "pull" from an old computer. The item was returned to the seller, but he did not co-operate with the shipping company. It's been in "limbo" now for over a month. And as of today, I have not received a refund. My worst transaction on eBay since I have traded for over 18 years on eBay. Avoid this seller.
IBM DORS-32160 2.1GB 5.4k SCSI 46H6135 3.5'' S26361-H281-V100 (#173756882675)

Title: Re: tools to understand when a hard drive is close to death
Post by: bd139 on May 28, 2022, 09:07:49 am

Have found the best thing to do is to pay with eBay pay directly not through PayPal and use a credit card or Apple Pay. EBay are mortally afraid of you raising a chargeback. Just chuck it in a box, return it and sent it tracked. Ignore the seller comms entirely past telling them you are rejecting the goods.

Notable NEVER buy anything consumable on eBay. It’s hard enough finding stuff on Amazon that isn’t used (just got an unsealed box Samsung 1TB evo plus this morning which was shipped from Amazon :palm: )

Title: Re: tools to understand when a hard drive is close to death
Post by: DiTBho on May 28, 2022, 09:31:02 am

Quote from: sokoloff on May 28, 2022, 08:35:00 am

If sold as brand new, I agree that I’d focus on the power cycle count (and hours where implemented) as the primary thrust of the argument.

You could include the other data (TBW, etc), saying that it corroborates the drives as not brand new, but I’d try to keep one clear and obvious line of reasoning that any customer service person can read and understand as “yeah, this isn’t new”.

Yes, thank you!

I opened this topic because I needed a tool or a procedure to check the health of disks and produce a document.

I think the "smart -t short" is a great trick to get the "LifeTime" value of the disk reported in a log :D

This way, I will create a pdf-document with all the log from disk-testing and a resuming table like this

Code: [Select]

disk_ID , serial number , LifeTime    , Power_Cycle_Count , note                       ;
#7      , DAA0P6300NEG  , 57251 hours , 98                , used disk with 57251 hours ;

Pdf structure:

page01: messages with the seller, where he declared "band new disks"
page02: resuming table of tests on the received disks
page03: some feedbacks on eBay, proving it's his behavior pattern
page04: test procedure with some photos
page05: disk1 log
page06: disk2 log
page07: disk3 log
page08: disk4 log
page09: disk5 log
page10: disk6 log
page11: disk7 log
page12: disk8 log

It should be enough for them, I think and hope :-//

Title: Re: tools to understand when a hard drive is close to death
Post by: DiTBho on May 28, 2022, 09:59:35 am

Quote from: bd139 on May 28, 2022, 09:07:49 am

Notable NEVER buy anything consumable on eBay

Yup, the problem in my case is where? to find SCA-SCSI disks :-//

sATA and SAS disks are easier to find, but with SCSI ... well, you have to trust the seller.

Title: Re: tools to understand when a hard drive is close to death
Post by: tunk on May 28, 2022, 10:00:58 am

Out of curiosity, you could try this to see if there's any data on the disks:
od -Ad -tc /dev/sdX | more

Title: Re: tools to understand when a hard drive is close to death
Post by: DiTBho on May 28, 2022, 10:04:40 am

Quote from: tunk on May 28, 2022, 10:00:58 am

Out of curiosity, you could try this to see if there's any data on the disks:
od -Ad -tc /dev/sdX | more

All Wiped.

Title: Re: tools to understand when a hard drive is close to death
Post by: bd139 on May 28, 2022, 11:30:30 am

Quote from: DiTBho on May 28, 2022, 09:59:35 am

Quote from: bd139 on May 28, 2022, 09:07:49 am
Notable NEVER buy anything consumable on eBay

Yup, the problem in my case is where? to find SCA-SCSI disks :-//

sATA and SAS disks are easier to find, but with SCSI ... well, you have to trust the seller.

I’d replace the device or the backplane :-DD

Title: Re: tools to understand when a hard drive is close to death
Post by: DiTBho on May 28, 2022, 01:46:19 pm

Quote from: bd139 on May 28, 2022, 11:30:30 am

I’d replace the device or the backplane :-DD

Cables, terminators, and backplane are brand new. Professional kits.
The setup works perfectly with smaller (8GB) Fujtsu disks.

Device ... which one? These crappy disks? Sure, they are crappy used with hounded hours of usage and close to death.

In conclusion: I'd replace the seller ;D

Title: Re: tools to understand when a hard drive is close to death
Post by: Monkeh on May 28, 2022, 02:20:51 pm

I'd replace the buyer expecting to find 2006 era hardware in unused condition and believing what sellers on eBay claim.

Title: Re: tools to understand when a hard drive is close to death
Post by: DiTBho on May 28, 2022, 02:40:56 pm

Quote from: Monkeh on May 28, 2022, 02:20:51 pm

I'd replace the buyer expecting to find 2006 era hardware in unused condition and believing what sellers on eBay claim.

One months ago, for a different project, I bought seven hard-drives from a different seller. We are still talking about 2004-2008 era hardware. Unfortunately they don't have SCA-SCSI disks on stock, only 68pin (u160 an u320), anyway I got them *EXACTLY* as described:
- low LifeTime hours (<20)
- low StartStopCycles (<10)
- zero error

When you pay 75 euro for each disk, the price couldn't be justified for used disks with several hundred hours of activity, because with these values, disks are not even close to "brand new" but rather close to "end of life"!

Title: Re: tools to understand when a hard drive is close to death
Post by: DiTBho on May 28, 2022, 02:56:55 pm

Quote from: Monkeh on May 28, 2022, 02:20:51 pm

unused condition and believing what sellers on eBay claim.

ummm, and for example old new stocks? Newarks sold me some old boards in mint condition.
but, I seriously wonder, if that's the way to think, what are we doing on eBay? Stupid games?

I honestly don't like the way people like you think, because you assume it's okay to have shitty sellers on eBay, so the only idiot is supposed to be someone like me who trusts them!

Well, for me eBay is serious business, with seriousness sellers and serious buyers, hence all a matter of trust and respect.

Title: Re: tools to understand when a hard drive is close to death
Post by: Monkeh on May 28, 2022, 03:41:38 pm

Quote from: DiTBho on May 28, 2022, 02:56:55 pm

Quote from: Monkeh on May 28, 2022, 02:20:51 pm
unused condition and believing what sellers on eBay claim.

ummm, and for example old new stocks? Newarks sold me some old boards in mint condition.
but, I seriously wonder, if that's the way to think, what are we doing on eBay? Stupid games?

I honestly don't like the way people like you think, because you assume it's okay to have shitty sellers on eBay, so the only idiot is supposed to be someone like me who trusts them!

Well, for me eBay is serious business, with seriousness sellers and serious buyers, hence all a matter of trust and respect.

I assume there's always someone out to scam me to make a buck, yes, it's called life experience. I never said it's okay - it's just reality.

Title: Re: tools to understand when a hard drive is close to death
Post by: bd139 on May 28, 2022, 05:19:46 pm

Actually been thinking about this. Why are you buying these disks and spending €600 on the damn things?!?!? You could buy a couple of mid range 1TB SSDs and RAID those for less money. And they are far faster, have a higher MTBF, take less power and have better failure modes. :-//

Title: Re: tools to understand when a hard drive is close to death
Post by: DiTBho on May 28, 2022, 05:42:15 pm

Quote from: bd139 on May 28, 2022, 05:19:46 pm

You could buy a couple of mid range 1TB SSDs and RAID those for less money

I have a vintage hardware project in mind and it can work with neither sATA nor SAS.
I have to play with SCSI HBAs, hence with SCSI disks.

Title: Re: tools to understand when a hard drive is close to death
Post by: bd139 on May 28, 2022, 05:52:39 pm

Ah explained. Fair enough :-+

I'd have noped that idea away super quick as I am recently allergic to anything which involves friction or risk :-DD

Title: Re: tools to understand when a hard drive is close to death
Post by: james_s on May 28, 2022, 06:08:39 pm

Quote from: bd139 on May 28, 2022, 05:52:39 pm

I'd have noped that idea away super quick as I am recently allergic to anything which involves friction or risk :-DD

Is there anything in life that doesn't involve friction or risk?

Title: Re: tools to understand when a hard drive is close to death
Post by: DiTBho on May 28, 2022, 06:10:13 pm

Quote from: bd139 on May 28, 2022, 05:19:46 pm

(SSD) [...] better failure modes

Faster, larger, cheaper ... all true, but "better failure modes"? :o :o :o

nahhhh, That's false with SSD!

When an SSD fails it simply unplugs the plug, and it's gone with all data vanished; I told something similar in my topic about how poor reliable Flash is.

It happened several times, the last one with an SSD disk on a MIPS laptop. In the end I gave up with its resurrection. Luckily, I had a backup on hands, I bought a new SSD disk from Amazon, and restored the image.

When a SCSI drive fails ... well my Fujitsu-2GB has half of its list full of bad-blocks, but you can still load a kernel from it (yes, I am doing it ... well, it's the only 50pin SCSI disk I have here, at the moment), and its electronic circuits are still alive, it's just a problem with its platters. There are some bad blocks, but until the disk gets completely dead read/write heads, some bad blocks on platters don't affect the entire disk.

With SSD I have always had bad luck, and on a couple of occasions all the data on the disk just vanished without any warning, and since then the disk has ZERO bytes of capacity, simply because some flash cells must be so damaged that the firmware has decided to pull the plug.

Also, SSD sucks for RAID! For a real modern RAID, you have to buy NAS-SATA disks, specifically made for NAS. I opened a discussion here on the forum about that.

Title: Re: tools to understand when a hard drive is close to death
Post by: bd139 on May 28, 2022, 06:15:02 pm

I disagree. I have data from over 2000 M2 SSDs used in production datacentres. They are at least two orders of magnitude more reliable than any mechanical disk. To the point we didn't actually impact storage reliability by not bothering with RAID. All redundancy was moved to logical machine level.

As for the data disappearing I have never seen this. Not once. The only failure mode I have seen is it dropping into read only mode and that was after an insane TBW in a database server. But that's not an issue as there was a hot replica node.

It of course depends what vendor you go for. Samsung 850 series and later Evo and Pro are fine as are Hitachi enterprise at least. Others, not so much.

When you get to the TBW limit, replace it. I still have some of the exceeded TBW limit drives sitting around. They are fine for low reliability desktops (admin etc) and will probably last 3-4 years fine anyway.

Title: Re: tools to understand when a hard drive is close to death
Post by: wraper on May 28, 2022, 06:20:03 pm

I had NVMe SSD failure in my computer. It dropped into read only mode which allowed to slowly read all of the data with no corruption as far as I'm aware.

Title: Re: tools to understand when a hard drive is close to death
Post by: Monkeh on May 28, 2022, 06:24:19 pm

I've seen one or two SSDs just up and vanish. It's not common these days, but it happens. I've had several fail with data corruption, most are after 2-3x their minimum TBW spec on very low cost drives, but I've seen a few lemons which are just plain faulty.. Ah yes, this Micron 1100 is one of those. I'd plug it in for the relevant stats but my USB adapter mysteriously karked it since I last needed it.

I've also got some junk 'industrial' SSDs which just corrupt data in the background, and one which at a random point around a month of uptime, will just cease operating and require a power cycle - that was fun to pin.

They seem mostly reliable, but the lower cost TLC and QLC units unsurprisingly have the higher failure rates IME, and I feel like a lot of them are firmware bugs than actual failure of the flash.

Title: Re: tools to understand when a hard drive is close to death
Post by: wraper on May 28, 2022, 06:28:19 pm

Quote from: Monkeh on May 28, 2022, 06:24:19 pm

They seem mostly reliable, but the lower cost TLC and QLC units unsurprisingly have the higher failure rates IME, and I feel like a lot of them are firmware bugs than actual failure of the flash.

Yes AFAIK most of the faults are SSD locking out due to firmware freaking out, no actual hardware failure. Was especially prominent with early SSDs. If you have the right tools they can be restored to working condition.

Title: Re: tools to understand when a hard drive is close to death
Post by: Monkeh on May 28, 2022, 06:32:28 pm

Quote from: wraper on May 28, 2022, 06:28:19 pm

Quote from: Monkeh on May 28, 2022, 06:24:19 pm
They seem mostly reliable, but the lower cost TLC and QLC units unsurprisingly have the higher failure rates IME, and I feel like a lot of them are firmware bugs than actual failure of the flash.
Yes AFAIK most of the faults are SSD locking out due to firmware freaking out, no actual hardware failure. Was especially prominent with early SSDs. If you have the right tools they can be restored to working condition.

I find the right tool is the warranty and backups, myself. If it's already out of warranty it wasn't something I planned on lasting anyway.

Title: Re: tools to understand when a hard drive is close to death
Post by: DiTBho on May 28, 2022, 07:08:54 pm

Synology NAS does not suggest using SSD for storage, it uses up to eight SATA disks.
It can optionally also use two SSDs but only for parity.

Title: Re: tools to understand when a hard drive is close to death
Post by: DiTBho on May 28, 2022, 10:24:15 pm

So, thanks to the smarctl -t short trick, I automated it with badblocks and completed tests, and these are results:

Defective disks purchased from Apress24
Model FUJITSU MAW3147NC, SCSI SCA, 147 GB

Code: [Select]

Disk , Serial Number , Version , LifeTime    , note
#1   , DAA0P76047MH  , "0104"  , 53439 hours
#2   , DAA0P7704G9V  , "0104"  , 58667 hours , Badblocks blocks after 40 min
#3   , DAF4P7400M8S  , "3701"  , 52228 hours , 2K blocks are Defective
#4   , DN00P820195T  , "0104"  , 39292 hours
#5   , DAA0P64013V3  , "0104"  , 37093 hours
#6   , na            , na      , na          , the disk is dead, doesn't respond
#7   , DAA0P6300NEG  , "0104"  , 57251 hours
#8   , DAA0P6801RCS  , "0104"  , 31692 hours

50.000 hours means 6-7 years .... :o :o :o

Title: Re: tools to understand when a hard drive is close to death
Post by: DiTBho on May 28, 2022, 10:36:14 pm

Quote

Service life
The service life is depending on the environment temperature. Therefore, the user must design the
system cabinet so that the average DE surface temperature is as low as possible.
+DE surface temperature: 40°C or less 5 years
+DE surface temperature: 41°C to 45°C 4.5 years

50.000 hours -> close to service life end -> close to death.
Indeed three of eight units already manifests failures.

Title: Re: tools to understand when a hard drive is close to death
Post by: james_s on May 29, 2022, 07:59:24 pm

Quote from: wraper on May 28, 2022, 06:20:03 pm

I had NVMe SSD failure in my computer. It dropped into read only mode which allowed to slowly read all of the data with no corruption as far as I'm aware.

I have seen this several times with workstations used at a local business. For whatever reason that prevented Windows 10 from booting but I was able to copy it to a new drive and it was back up and running. It confused me the first time I encountered it, I could boot off a rescue image and see the data, I could run error checks and everything looked fine, but whenever I tried to change something the change wouldn't stick. After that it was easy to identify the symptom.

Title: Re: tools to understand when a hard drive is close to death
Post by: DiTBho on May 30, 2022, 10:28:33 am

No one in that company(1) has yet apologized for the issue they caused, neither they have yet replied to emails. Paypal is handling everything and their today's email looks like good news.

Quote

You’ve received a full refund. To receive the refund, you need to send the item back to the seller.

I am preparing a return package for UPS; however, due to this public feedback on eBay

Quote

This item was described "as new in original packaging" on delivery, it was found to be a "pull" from an old computer. The item was returned to the seller, but he did not co-operate with the shipping company. It's been in "limbo" now for over a month. And as of today, I have not received a refund. My worst transaction on eBay since I have traded for over 18 years on eBay. Avoid this seller.
IBM DORS-32160 2.1GB 5.4k SCSI 46H6135 3.5''

I have also already informed my lawyer, just in case these dudes will not co-operate with the shipping company.

We will see. I will not publish more on this, until the conclusion of this sad story.

(1) aPress Pawel Sinkiewicz (https://apress24.pl/pl), Damian Odulinski
Located in Poland, Gospodarcza 9 Lubuskie Żary 68-200 PL (public address of the company)
They also sell on eBay with the nickname apress24

Title: Re: tools to understand when a hard drive is close to death
Post by: DavidAlfa on June 02, 2022, 08:38:27 am

Whenever you suspect something is wrong with the hdd, specially if you hear any strange noise pattern coming from it, don't leave all to Smart, doesn't always report bad status.
Download hdd scan (https://hddscan.com/) and run a full verification pass (Won't destroy data).
If you see zones where it stalls for a bit (>500ms), think about migrating your data to a new disk.
If it marks some as bad, hurry! Hopefully the damage will affect only OS files, not critical.
Used this method for years, it was one of the first tests when a customer reported random hangs, stalling, freezing... It did the job great.
Avoid tools like HDD regenerator, I've tried it several times, seems to work, so you store 500GB of data in it thinking it's fixed, but when trying reading it 2 months later, Samuel L. Jackson appears, surprise MF!! Cyclic error, your data is screwed up!

Title: Re: tools to understand when a hard drive is close to death
Post by: Jeroen3 on June 02, 2022, 09:16:57 am

I have used HD Sentinel in the past, it keeps track of changes in smart data and warns you when numbers start to look bad.
On todays high density disks one or two bad sectors is not something to panic about. You panic when it increases.

Title: Re: tools to understand when a hard drive is close to death
Post by: DiTBho on June 02, 2022, 06:05:02 pm

Quote from: DavidAlfa on June 02, 2022, 08:38:27 am

Download hdd scan (https://hddscan.com/) and run a full verification pass

Thank for the link :D

I need to compile it for non x86 servers. Yes, I can move the disks, but I'd rather avoid.
If that's not possible, I'll use hdd-scan, as you suggest.

Title: Re: tools to understand when a hard drive is close to death
Post by: DiTBho on June 02, 2022, 06:09:45 pm

Quote from: Jeroen3 on June 02, 2022, 09:16:57 am

I have used HD Sentinel in the past, it keeps track of changes in smart data and warns you when numbers start to look bad.

hdsentinel (https://www.hdsentinel.com) :o :o :o

looks great! Is there anything similar but OpenSource? So I can compile it for non-x86 computers.

Title: Re: tools to understand when a hard drive is close to death
Post by: Jeroen3 on June 02, 2022, 07:37:14 pm

It is available for a few arm architectures on linux (https://www.hdsentinel.com/hard_disk_sentinel_linux.php).

Title: Re: tools to understand when a hard drive is close to death
Post by: DiTBho on June 03, 2022, 09:36:24 am

Quote from: Jeroen3 on June 02, 2022, 07:37:14 pm

It is available for a few arm architectures on linux (https://www.hdsentinel.com/hard_disk_sentinel_linux.php).

I need it for POWER10 :o :o :o

Title: Re: tools to understand when a hard drive is close to death
Post by: BradC on June 03, 2022, 10:07:46 am

smartmontools

Title: Re: tools to understand when a hard drive is close to death
Post by: DiTBho on June 03, 2022, 10:12:20 am

Looking for other sellers, and I am willing to guide (teach?) them how to use a Linux computer with a SCSI HBA to test disks.

My typical question

Quote

are your hard-drives really brand new, opened only for testing?
How many PowerOn_hours do they have?
(just to avoid miss-understanding)

Topical answer

Quote

sorry we can not test the HDDs

(so how can they be "used only for a few hours, only for testing ?!? HOW?!?)

Quote

sorry we don't have any equipment to test the HDDs

(so how can they be "used only for a few hours, only for testing ?!? HOW?!?)

I am massively using Amazon now
- order
- test(1)
- keep|return

(1) I am working on an C program that integrates
- SCSI query, to directly get access to the disk_serial_number
- smart queries (smartools is written in C++, my version is pure C code)
- "badblocks" (Linux tool) functionalities

All in one program, written in portable C/89, able to be compiled on Linux k2.6.19 ... k5.19

Code: [Select]

boolean_t is_ok_level1
(
    p_disk_t p_disk
)
{
    boolean_t ans;
    boolean_t is_ok;

    smart_short_test(p_disk); /* it does smartctl -t short, and wait for test competition  */
    smart_lifetime_get(p_disk);

    is_ok = True;
    is_ok = ((is_ok) AND (p_disk->lifetime < 100)); /* less than 100 hours */
    //other checks?!? for the health conditions, hence of acceptability, of the hard disk? 
    // is_ok = ...
    ans = is_ok;

    return ans;
}

boolean_t is_ok_level2
(
    p_disk_t p_disk
)
{
    boolean_t ans;

    ans = disk_blocks_is_ok(p_disk); /* it does the same as badblocks -w ... -p1 */
    return ans;
}

...
    is_disk_ok = False;
    if (is_ok_level1(p_disk))
    {
        if (is_ok_level2(p_disk))
        {
            is_disk_ok = True;
        }    
    }

    if (is_disk_ok)
    {
        hinv_keep_it(p_disk)
    }
    else
    {
        hinv_return_it(p_disk)
    }
...

Disks are marked with a progressive number { #001, #002, ... }, which is automatically associated with the disk serial_number and its log, hinv_return_it() does nothing but preparing a text file with all the "#disk_number" list that I have to drop into the Amazon hub.

hinv_keep_it() prepares a similar file (disk_inventory.txt), disks that are good enough for hobby projects, not only for me, but also for my three friends ;D

Title: Re: tools to understand when a hard drive is close to death
Post by: Ed.Kloonk on June 03, 2022, 10:51:18 am

Which SCSI interface/adapter are you running?

Back in the mid '90s, I had SCSI II gear. Big HDD and Tape backup. Started out with a ISA -> SCSI adapter (adaptec?). Whilst it worked fine on 16-bit winders, the win 95 drivers or implementation was cancer. It was hogging the DMA and causing all sorts of mischief until I got the fancy PCI -> SCSI (1540?) adapter. Happy days.

Title: Re: tools to understand when a hard drive is close to death
Post by: DiTBho on June 03, 2022, 12:11:27 pm

Quote from: Ed.Kloonk on June 03, 2022, 10:51:18 am

Which SCSI interface/adapter are you running?

- LSI PCI-X SCSI U320 HBA (brand new)
- Amphenol LVD 2m cable (brand new)
- Amphenol U320 LVD terminator (brand new)
- Rax 3xSCA disks with temperature control (brand new)

Why do you ask this? Assuming the kernel driver is working fine(2), if { HBA, cable, terminator } is the problem, then you should see increasing the "non medium error" value.

It didn't happen with the new setup, hence the LVD-setup is fine; indeed, there are Fujitsu 10K rpm and Seagate 15K rpm disks that have perfectly passed all the "badblocks" tests (8 hours burn-in) :o :o :o

Good point, however ;D

I will add a function to check the "non-medium error" value, in order to stop the test program if the value is seen to increase due to a bad physical SCSI configuration rather than due to a physical problem(1) with the disk under testing.

(1) physical problems that I cannot test directly appear on the SCSI interface (hence to my testing program) like a communication/service disruption

worn out bearings -> read/write delayed correction or I/O abort
worn out read/write heads -> read/write delayed corrections or I/O abort
worn out brush less motor -> read/write delayed corrections or I/O abort
worn out SCA connector -> wrong "tag phase reported" by the Linux kernel, disk not recognized, channel too noisy with too many retries (you have exactly this symptom if you use a bad cable, or a bad SCSI terminator)
worn out electronic board with semi fried chips -> read/write delayed correction or I/O abort

edit:
Yup, I also have to write a function to monitor what the kernel complains for. Not yet done.

(2) there are certain SCSI devices { CDROM, DVDRAM, MO, CD-Jbox, Tape{DDS, LDO, ...}, Scanner } that have SCSI-quirks, but I have never observed anything similar with SCSI disks.

When a quirk arises, you see the kernel complain about phases and tags, with a lot of verbosity.
If you don't see it, it's mostly 99.97% ok.

Title: Re: tools to understand when a hard drive is close to death
Post by: DiTBho on June 03, 2022, 12:23:24 pm

(
What I really find annoying ... sATA/SAS and SCSI disks have a different way to respond to queries.

When you simply want to read the serial number, you have to write two different piece of code, one for SCSI, one for sATA/SAS, the same applies for all the other variables you may want to see the value.

SMARTtools, somehow, hides this under the same user interface
)

Title: Re: tools to understand when a hard drive is close to death
Post by: DiTBho on June 06, 2022, 09:58:32 am

Code: [Select]

    is_ok = True;
    is_ok = ((is_ok) AND (p_disk->lifetime < 100)); /* less than 100 hours */
    //other checks?!? for the health conditions, hence of acceptability, of the hard disk?
    // is_ok = ...

If you think this is a pretty weak acceptance test, please let me know other S.M.A.R.T. things to look at :o :o :o

Title: Re: tools to understand when a hard drive is close to death
Post by: bd139 on June 06, 2022, 10:02:42 am

Code: [Select]

is_ok &= bought_it_new_from_respectable_distributor && didnt_turn_up_loose_in_jiffy_bag;

:-DD

Title: Re: tools to understand when a hard drive is close to death
Post by: Karel on June 06, 2022, 10:14:50 am

Quote from: james_s on May 28, 2022, 06:08:39 pm

Quote from: bd139 on May 28, 2022, 05:52:39 pm
I'd have noped that idea away super quick as I am recently allergic to anything which involves friction or risk :-DD

Is there anything in life that doesn't involve friction or risk?

Marriage? 8)

Title: Re: tools to understand when a hard drive is close to death
Post by: DiTBho on June 06, 2022, 11:21:23 am

Quote from: bd139 on June 06, 2022, 10:02:42 am

Code: [Select]
is_ok &= bought_it_new_from_respectable_distributor && didnt_turn_up_loose_in_jiffy_bag; :-DD

new is_OK_conditions added :o :o :o

... but is_respectable_distributor() needs more study; look at the conversation with the seller: it's not enough to trust someone's words

[attachimg=1]

Title: Re: tools to understand when a hard drive is close to death
Post by: bd139 on June 06, 2022, 12:41:54 pm

That seller's email is hilarious. He either doesn't know or is lying. Either way incompetence is not an excuse :palm:

Title: Re: tools to understand when a hard drive is close to death
Post by: DiTBho on June 06, 2022, 12:47:11 pm

Quote from: bd139 on June 06, 2022, 12:41:54 pm

That seller's email is hilarious. He either doesn't know or is lying. Either way incompetence is not an excuse :palm:

And he didn't reply to further emails.
People like him are a plague for the eCommerce.

Title: Re: tools to understand when a hard drive is close to death
Post by: bd139 on June 06, 2022, 12:47:47 pm

Just remember that bad reviews travel faster than the speed of light on the internet :-DD

Title: Re: tools to understand when a hard drive is close to death
Post by: DiTBho on June 06, 2022, 12:48:14 pm

Anyway, discussing the development of the disk testing program, there are other interesting fields that may be worth considering

accumulated start-stop cycles (currently ignored)
total correction, errors algorithm, corrected invocations (currently ignored)
total uncorrected errors (currently ignored)
elements in grown defect list (currently ignored)

In this case, I need trigger values :o :o :o

Title: Re: tools to understand when a hard drive is close to death
Post by: bd139 on June 06, 2022, 12:54:19 pm

I would throw something like the following in. That gives a chance for the vendor to actually test the disk before sending it out as it's probably not shipped from the manufacturer.

Start-stop < 20

corrected errors > 10

uncorrected errors > 0

grown defect list > 0

Also to note I had some problems with actual HPE shipped SAS disks a couple of years back. The bearings seized after a few power cycles because they were stuck in a warehouse for half a decade. So storage is a consideration that needs to be made too. This put me right off any mechanical disks. In fact we have no mechanical disks at all now.

Title: Re: tools to understand when a hard drive is close to death
Post by: DiTBho on June 06, 2022, 01:34:29 pm

I will add your acceptance criteria

Quote from: bd139 on June 06, 2022, 12:54:19 pm

Start-stop < 20
corrected errors > 10
uncorrected errors > 0
grown defect list > 0

Thanks, this way I can set a solid QA level of acceptance ;D

Hope to have enough time to also support SAS, sATA, and FC disks (they have different queries).
May be I will release the final program OpenSource. Once polished and completed.

Title: Re: tools to understand when a hard drive is close to death
Post by: DiTBho on June 06, 2022, 01:43:38 pm

Quote from: bd139 on June 06, 2022, 12:54:19 pm

stuck in a warehouse for half a decade. So storage is a consideration that needs to be made too

I can consider this only by monitoring the phase-tag errors reported by the kernel.

It only works for SCSI, it's not rock-science but it can somehow summarize some possible problems concerning the aging of components: oxidation of connector, or deterioration of electronics or heads.
They can manifest in a sneaky way, not enough to cause a failure, but enough to cause too many retries, and in this case the kernel can report the problem via the SCSI verbose module.

I will add soon :D

Title: Re: tools to understand when a hard drive is close to death
Post by: bd139 on June 06, 2022, 01:45:50 pm

Worth pointing out active monitoring here.

Last place I worked were using this: https://github.com/prometheus-community/smartctl_exporter

That then gets scraped by prometheus and alerted on by alertmanager. So when the DB SQL write master disks finally gave out, it'd autonomously wake up a troll who went and failed the node over and swapped the disk out.

But they moved to EC2 / EBS where disk failure is Amazon's problem as it's a logical volume abstraction :-DD

Title: Re: tools to understand when a hard drive is close to death
Post by: DiTBho on June 06, 2022, 05:55:04 pm

Quote from: bd139 on June 06, 2022, 01:45:50 pm

Worth pointing out active monitoring here.
Last place I worked were using this: https://github.com/prometheus-community/smartctl_exporter

Yup, that is step3

step1: acquire disks
step2: put disks inside the Rack
step3: monitor disks

it's interesting code, unfortunately written in GoLang, a language that I cannot use because there is no Clang support for { HPPA2 , MIPS }.

I will study the code, anyway. Thank your for posting ;D

Title: Re: tools to understand when a hard drive is close to death
Post by: bd139 on June 06, 2022, 06:54:13 pm

You love to hurt yourself don't you :-DD

SNMP + smartctl hackery it is :-DD

Title: Re: tools to understand when a hard drive is close to death
Post by: MrMobodies on June 06, 2022, 07:57:21 pm

~~DiTBho where did you get it from?~~ Sorry I didn't see there were three pages already.

I take pictures from the time I receive stuff to the testing. For something like that I'd start a surface scan with something like MHDD. I have a machine dedicated just for that for many years with a Scsi controller that works with it.

https://hddguru.com/software/2005.10.02-MHDD/

No I don't need to test it if there are signs of use but I do it to add weight to my argument of how used it is and I had a quite a few that were very well used so they can't come close to being described as "new". I expect it to help speed up a returns delivery form without any arguments and to eliminate having to hear the following claptrap excuses such as; well I have used it a little here and there, it's an ex demo etc...

Title: Ebay scammer Apress24 Paweł Sinkiewicza selling used drives as brand new
Post by: MrMobodies on June 06, 2022, 08:24:55 pm

Quote from: Ed.Kloonk on June 03, 2022, 10:51:18 am

Which SCSI interface/adapter are you running?

Back in the mid '90s, I had SCSI II gear. Big HDD and Tape backup. Started out with a ISA -> SCSI adapter (adaptec?). Whilst it worked fine on 16-bit winders, the win 95 drivers or implementation was cancer. It was hogging the DMA and causing all sorts of mischief until I got the fancy PCI -> SCSI (1540?) adapter. Happy days.

I still have an old RAID system like this intact and stored away. I got them in 2006. It is in a Coolermaster stacker case with a Tyan S2720 serverboard, Dual Xeon 3Ghz and 6gb ram with a Megaraid Enterprise 1600 (128mb battery backed up sdram) SCSI 2 with 6x 32gb Fujitsu 15000rpm drives in raid 5 and many cooling fans in the front to cool the drives. I got the board and controller second hand but I think I got the drives new old stock in 2007 for £30 each. I had some IDE drives in there too and was using it as a file server and run experiments on. Seemed quite reliable over the years I was using it. I did have two Seagate's in there both 36gb 15,000rpm as part of the raid at the beginning but they kept on going offline and causing all sorts of problems. When I tested the Seagate's individually they seem okay, when using them alone they are okay so I find that strange despite being the same specification. I stopped using it in 2017 when I brought a HP DL380 G5. That one 6x 72gb scsi3 drives in raid 5. Two of the drives were failing but I picked a couple up for £10 each.

Just looking up the name on the email and according to his Linkedin Profile to the company he works it looks like he's a salesman:
https://www.linkedin.com/in/damian-oduli%C5%84ski-1b0814135/ (https://www.linkedin.com/in/damian-oduli%C5%84ski-1b0814135/)

Quote

Damian Oduliński
CIO aPRESS Paweł Sinkiewicz
Sales Manager
aPRESS Paweł Sinkiewicza
Apr 2014 - Present · 8 yrs 3 mosApr 2014 - Present · 8 yrs 3 mos
Żary, woj. lubuskie, PolskaŻary, woj. lubuskie, Polska

I just found the sellers profile:
https://www.ebay.co.uk/usr/apress24 (https://www.ebay.co.uk/usr/apress24)

Not looking good.
Other buyers have complained about being sold used as new:

Quote

Selling hard drives as new even when they have more than 9000 hours on them. Mine even has the previous owners data on it! Got eBay involved for deceptive listings and ended up getting a full refund. Seller doesn't reply to messages. Give this company a hard pass! Should be ashamed of themselves.
QUANTUM FIREBALL CR 6.4GB 5.4K ATA 3.5'' CR64A02H 655-0695 (#173772525138)
uyer: o***y (139) £55.00 Past month

Quote

liar the HDD not working kłamca
SAMSUNG SpinPoint T166 500GB 7.2K 16MB SATA II 3.5'' HD501LJ (#173793516446)
Buyer: i***o (84) £33.00 Past month

Quote

The seller lies in the description -> drives have 0 Power_On_Hours , but also show SMART tests that ware done at 20K, 30K or 60 000 hours. So this are not NEW or OPEN BOX drives, but OLD USED drives with cleared SMART values.
HGST UltraStar 7K4000 4TB 7.2k 64MB SATA III 3.5'' HUS724040ALE641 (#175065142498)
Buyer: n***o (102) US $600.00 Past 6 months

Quote

S.M.A.R.T. shows the disks have been used for over 7 years, yet sold as new, seller doesn't reply to messages about the issue. Caveat emptor
Fujitsu MAX3147NC 147GB Ultra320 80-PIN 15K RPM (#172697878385) Buyer: k***k (79) £46.00 Past 6 months

Just found their ebay account is in his name:
https://www.ebay.co.uk/usr/apress24 (https://www.ebay.co.uk/usr/apress24)
(https://i.imgur.com/dmbArS1.jpg)

Quote

aPRESS - We sell mainly disks to servers and server components.
We provide the highest quality products for a decent price! - We ship all over the world.
===== FAST SHIPPING COURIER UPS ===== WRITTEN GUARANTEE ===== INVOICE FOR EACH PURCHASE =====

Business details:
Business name:   aPress Paweł Sinkiewicz
First name:   Paweł
Last name:   Sinkiewicz
Address:   Ul. Gospodarcza 9
68-200
Żary
PL
Phone:   48684539221
Email:   ebay@apress24.pl

OOps 12 months:
2,570 Positive
27 Neutral
36 Negative

So they have been going since 2011. Top seller rated. If it is true they have been selling all used/damaged drives as brand new look at the amount of people they scammed so far, 2,570.
Many people may not know what to look out of or understand if it is used and works and I have seen this sort of thing before and even sellers that go to lengths to have negative feedback removed.

If you look at a few of their listings.
I all I see are what looks like STOCK photos and I'd be wary when they do that.
I'd only accept stock photos when I buying from somewhere like Ebuyer or Scan or somewhere I know is reputable and get them direct from the manufacturer.

https://www.ebay.co.uk/sch/apress24/m.html?_nkw=&_armrs=1&_ipg=&_from= (https://www.ebay.co.uk/sch/apress24/m.html?_nkw=&_armrs=1&_ipg=&_from=)

https://www.ebay.co.uk/itm/173483626670 (https://www.ebay.co.uk/itm/173483626670)
Wait a minute, looking at the decorations, the large stock photos and the affiliates.
(https://i.imgur.com/uaYoFOB.jpg)
I suppose I could also get fooled into thinking I am getting old stock or stuff directly from the manufacturers with these people.

Title: Re: Ebay scammer Apress24 Paweł Sinkiewicza selling used drives as brand new
Post by: DiTBho on June 06, 2022, 10:15:28 pm

Quote from: MrMobodies on June 06, 2022, 08:24:55 pm

32gb Fujitsu 15000rpm drives in raid 5 and many cooling fans in the front to cool the drives. I got the board and controller second hand but I think I got the drives new old stock in 2007 for £30 each

You were lucky. I paid 600 euros for 8 discs (~75 euro/disk) , and wasted another 32 euros to return them to the seller via UPS because I have a deadline to meet; I hope Paypal will refund me, and I have yet to find some good SCA disks.

Moral for the story, although it is a very frustrating situation, the upside is that at least it motivated me to write a test program, which is a very educational experience, and very useful :)

Title: Re: tools to understand when a hard drive is close to death
Post by: DiTBho on June 09, 2022, 04:27:30 pm

I shipped disks back to the seller, and he (aPress Paweł Sinkiewicz) has escalated this problem to a claim.
Definitively they are not serious! Paypal now is reviewing the information they provided, which means the claim may take longer than 30 days to be resolved.

Title: Re: tools to understand when a hard drive is close to death
Post by: bd139 on June 09, 2022, 05:07:55 pm

PayPal will review in your favour. I have had this a few times with sellers. They are usually quite quick as well I.e within a couple of days.

If they work out they’re not making any money out of him after the refund or his bank doesn’t honour the refund transaction they will freeze him, close his account instantly and go “lol” and he’s out of business.

Title: Re: tools to understand when a hard drive is close to death
Post by: DiTBho on June 15, 2022, 11:25:47 am

Quote

Case closed in your favor
We reviewed the case you filed on 29 May 2022 and have decided in your favor.
We've issued a refund to you on . It may take up to 5 days for this refund to be reflected on your PayPal account or bank. If you paid using a credit or debit card, the money will be refunded to your card. Depending on your card issuer, it can take up to 30 days for the refund to appear on your card statement.

It took me a bottle of good wine to persuade my friend lawyer to write an send a letter of judicial formal notice, but it's closed with 100% money back :-+

Title: Re: tools to understand when a hard drive is close to death
Post by: Ed.Kloonk on June 15, 2022, 11:29:56 am

Quote from: DiTBho on June 15, 2022, 11:25:47 am

It took me a bottle of good wine to persuade my friend lawyer to write an send a letter

I can sympathize. Sometimes you have to be as drunk as them to be able to communicate with them.

;)

SMF 2.0.19 | SMF © 2021, Simple Machines
Simple Audio Video Embedder
SMFAds for Free Forums | Powered by SMFPacks Advanced Attachments Uploader Mod