Author Topic: what do you use to test harddrive, SSD, and NvME units?  (Read 1858 times)

0 Members and 1 Guest are viewing this topic.

Offline DiTBhoTopic starter

  • Super Contributor
  • ***
  • Posts: 4217
  • Country: gb
what do you use to test harddrive, SSD, and NvME units?
« on: February 07, 2021, 02:12:46 pm »
Last week I tested a lot of 160GB hard-drives with the program "badblocks" and it took 1038 minutes  :o :o :o

Code: [Select]
# time badblocks -swv /dev/sdb
Checking for bad blocks in read-write mode
From block 0 to 156290903
Testing with pattern 0xaa: done, Reading and comparing: done
Testing with pattern 0x55: done, Reading and comparing: done
Testing with pattern 0xff: done, Reading and comparing: done
Testing with pattern 0x00: done, Reading and comparing: done
Pass completed, 0 bad blocks found.

1037m58.106s

So I wrote my own testing program, simpler than badblocks (only 2 passes) and it took 760 minutes, still "a lot of time", but less than badblocks.

Code: [Select]
# time ./disktest /dev/sdb
 disk_size=160041885696
 testing 19536363 slices of 8192 byte each
 Completed, 0 defect found.

759m59.631s

The USB2-FullSpeed-link is able to measure (time dd if=/dev/sdb of=/dev/null) 40Mbytes/s, but for some reason bad-blocks is slower. Ok, it has to write/read/write/read/write/read 3 times each block, but for 160Gbyte it takes 17 hours, ~ 2.5Mbyte/s

Weird, and bad  :o


Is there anything faster?
The opposite of courage is not cowardice, it is conformity. Even a dead fish can go with the flow
 

Offline tunk

  • Super Contributor
  • ***
  • Posts: 1044
  • Country: no
Re: what do you use to test harddrive, SSD, and NvME units?
« Reply #1 on: February 07, 2021, 03:54:28 pm »
It runs four write/read passes, so your average write/read speed is around 20MB/s.
You could try the -b and -c options to see if that speeds it up, i.e. increasing the default.
There's also the -t option if you want one (or several) pass(es).
 

Offline CJay

  • Super Contributor
  • ***
  • Posts: 4136
  • Country: gb
Re: what do you use to test harddrive, SSD, and NvME units?
« Reply #2 on: February 07, 2021, 04:55:42 pm »
Manufacturer's diags and patience.
 

Offline Halcyon

  • Global Moderator
  • *****
  • Posts: 5870
  • Country: au
Re: what do you use to test harddrive, SSD, and NvME units?
« Reply #3 on: February 07, 2021, 10:57:30 pm »
I wouldn't bother doing that with SSDs. The blocks are completely transparent to the operating system. If there are any bad blocks, they will be marked unusable and a block is reallocated from the spare area. Check the SMART status to see if that has occurred. Not to mention you're also probably dealing with wear levelling. Whilst the sector reported to the OS remains the same, it doesn't mean that it's in the same physical place on a particular flash chip (or even on the same flash chip) as the previous write.
« Last Edit: February 08, 2021, 02:26:02 am by Halcyon »
 
The following users thanked this post: DiTBho

Offline golden_labels

  • Super Contributor
  • ***
  • Posts: 1323
  • Country: pl
Re: what do you use to test harddrive, SSD, and NvME units?
« Reply #4 on: February 08, 2021, 12:26:58 am »
DiTBho:
You haven’t provided source of your program or even a smallest part of information about how it works. However, assuming that it works as your dd command does(1), I suspect it’s faster because you have mostly tested your RAM, not the storage medium. You never or nearly never read anything from the flash memory itself — instead you write and read to/from in-RAM buffers of your kernel. And that is optimistic — a worse, though unlikely scenario is that you never left your CPU.

Ensuring writes to the underlying medium is a bit tricky. But at very minimum you should flush buffers after a write and drop related caches before reading. You also must avoid TRIM being issued on all-zero writes, which may happen if you write to a file in a file system. Then you can at least hope you are testing your flash memory. Preferrably use direct I/O, as that solves the above issues — but, of course, is harder to use and then you see the real random-access write/read speed of the device. From that point you only need to solve the problem of the controller on the flash memory interfering with your test. ;)
____
(1) Which is pointless in that invocation and puts you in unneccessary risk.
People imagine AI as T1000. What we got so far is glorified T9.
 

Offline DiTBhoTopic starter

  • Super Contributor
  • ***
  • Posts: 4217
  • Country: gb
Re: what do you use to test harddrive, SSD, and NvME units?
« Reply #5 on: February 08, 2021, 04:59:17 am »
You haven’t provided source of your program or even a smallest part of information about how it works.

badblocks is opensource, mine is a simple piece of code that works like badblocks but with less passes

Code: [Select]
        /*
         * clean with fix pattern
         */
        disk_block_clean(disk_block0, disk_block_size, 0xff);
        test_wr0 = IO_block_wr_sync(d_file, disk_addr, disk_block0, disk_block_size);

        /*
         * write "fancy" pattern
         */
        disk_block_fancy(disk_block0, disk_block_size);
        test_wr1 = IO_block_wr_sync(d_file, disk_addr, disk_block0, disk_block_size);

        /*
         * read back
         */
        test_rd0 = IO_block_rd_sync(d_file, disk_addr, disk_block1, disk_block_size);

        /*
         * compare
         */
        test_cmp = block_compare(disk_block0, disk_block1, disk_block_size);


disk_block_size is calculated by a function that considers a lot of things and checks out if a proposed value satisfies constraints, so it chooses the best compromise (mostly, the program has limited internal buffers)

Code: [Select]
disk_size=160041852928
    152627 slices of    1048576 byte ~    843776
    305255 slices of     524288 byte ~    319488
    610511 slices of     262144 byte ~     57344
   1221022 slices of     131072 byte ~     57344
   2442044 slices of      65536 byte ~     57344
   4884089 slices of      32768 byte ~     24576
   9768179 slices of      16384 byte ~      8192
  19536359 slices of       8192 byte ~         0
Checking for bad 512-byte blocks in read-write mode
From 0 to 160041852927, by 8192 bytes at time

However, assuming that it works as your dd command does(1), I suspect it’s faster because you have mostly tested your RAM, not the storage medium. You never or nearly never read anything from the flash memory itself — instead you write and read to/from in-RAM buffers of your kernel. And that is optimistic — a worse, though unlikely scenario is that you never left your CPU.

you are talking only about flash-based storage devices; badblocks was designed for electromechanical hard drives and so my program. But you made good point: both are inadequate for the new technology :D
« Last Edit: February 08, 2021, 05:09:13 am by DiTBho »
The opposite of courage is not cowardice, it is conformity. Even a dead fish can go with the flow
 

Offline golden_labels

  • Super Contributor
  • ***
  • Posts: 1323
  • Country: pl
Re: what do you use to test harddrive, SSD, and NvME units?
« Reply #6 on: February 09, 2021, 03:13:14 am »
badblocks is opensource, mine is a simple piece of code that works like badblocks but with less passes
From your post I understand you wrote something unrelated to badblocks. Now it is clearer. So I suppose the anwer is indeed hidden in “less passes”.

you are talking only about flash-based storage devices; badblocks was designed for electromechanical hard drives and so my program.
Yes, because this is what’s written in the subject of this thread. I assumed we’re talking about runnig such programs on flash-based media.

But you made good point: both are inadequate for the new technology :D
There can be a problem with even the newest HDDs with more cache than my first PC had RAM ;). But the problem with the approach shown in the dd invocation (again: dd make no sense there; simply use cat — yes, I will rant about that each time I see someone using dd just because they deal with storage media) is not with the hardware. It fails at kernel level due to system caching. That’s the same situation that makes writes to slow USB sticks appear to go at 200MB/s just to drop suddenly to “ETA: next x-mas”. And the same reason they must be properly unmounted before removal. :)

If you are seeking a sure method of testing: fill the whole storage with data from /dev/urandom, hashing that data on the fly, sync it (for removable media: eject to ensure power cycling), drop caches (unless reading in a manner that ignores them), hash the whole storage, compare hashes. If there is a mismatch, you know the device is damaged or counterfeit. That is in particular well suited for fake USB sticks that advertise sizes larger than the really available memory: it peforms a proof-of-space attack, which they can’t deal with. The same attack is what minimizes the risk of any type of on-device caching getting in the way of the test. The drawback is that you will not know where exactly the damage is and can’t test blocks that are not currently addressable. But I would argue that since 2010s the distinction between “working” and “broken” became much more binary, so the first issue is of minor importance. The second problem can only be solved by either physically operating on the chip or some manufacturer’s tools that use hidden functionality of the device to access them (but I know of no such tools and their existence is a bad sign).
People imagine AI as T1000. What we got so far is glorified T9.
 

Offline DiTBhoTopic starter

  • Super Contributor
  • ***
  • Posts: 4217
  • Country: gb
Re: what do you use to test harddrive, SSD, and NvME units?
« Reply #7 on: February 09, 2021, 11:51:00 am »
If you are seeking a sure method of testing: fill the whole storage with data from /dev/urandom, hashing that data on the fly, sync it (for removable media: eject to ensure power cycling), drop caches (unless reading in a manner that ignores them), hash the whole storage, compare hashes. If there is a mismatch, you know the device is damaged or counterfeit.

That sounds interesting and last night I wrote a new C/89 testing program that does exactly this  :D

It access the storage-device at the low level with all the control handled by IOCTRL, then checks what IOCTRL reports about the amount of blocks (here the firmware can lie), then fills all the blocks with random data and calculate the the sh-a hash!

Next step? It invokes data-flush, meta-data-flush, kernel-sync, read-back the whole blocks to re-calculate te sh-a hash, and finally it compares the "expected hash" (performed on blocks-write) with the "actual hash" (performed on block-read-back)

test_passed = cmp(expected-hash, actual-hash)

During these steps a lot of unexpected things happened, including some USB2-sATA adapters unable to have high intensive data-flow. I am not sure about what happened, but I see weird error messages in the kernel' dmesg; not yet investigated, I don't know, once replaced then with other USB3-sATA adapters, everything worked as expected.

That is in particular well suited for fake USB sticks that advertise sizes larger than the really available memory: it peforms a proof-of-space attack, which they can’t deal with. The same attack is what minimizes the risk of any type of on-device caching getting in the way of the test. The drawback is that you will not know where exactly the damage is and can’t test blocks that are not currently addressable. But I would argue that since 2010s the distinction between “working” and “broken” became much more binary, so the first issue is of minor importance.

Yup. I own a 80Gbyte pATA hard-drive that has a couple of unrecoverable bad-blocks above the first 40Gbyte of space (if you look at it like a linear array of blocks), so I hacked the firmware in order to reduce the number of blocks it presents to IOCTRL request.

It was "n-blocks", now it it's "n-blocks/2", just a couple of bytes in the firmware can do it :D

I did it just for fun, and this way I saved this hard-drive, but it makes no sense if you think that you can buy a new one on eBay/Amazon/etc for something like 10 euro, while I spent something like two weeks to hack it.

The second problem can only be solved by either physically operating on the chip or some manufacturer’s tools that use hidden functionality of the device to access them (but I know of no such tools and their existence is a bad sign).

So ... it will be "Samsung-specific". I think. I am going to buy a lot of Samsung SSDs, and a couple of NVMe devices.


Thank you for your amazing testing idea!  :D
The opposite of courage is not cowardice, it is conformity. Even a dead fish can go with the flow
 

Offline golden_labels

  • Super Contributor
  • ***
  • Posts: 1323
  • Country: pl
Re: what do you use to test harddrive, SSD, and NvME units?
« Reply #8 on: February 09, 2021, 12:50:05 pm »
Yup. I own a 80Gbyte pATA hard-drive that has a couple of unrecoverable bad-blocks above the first 40Gbyte of space (if you look at it like a linear array of blocks), so I hacked the firmware in order to reduce the number of blocks it presents to IOCTRL request.
Oh, have you ever published a description about that project? Not the modified firmware (that would be illegal), but a report of how you did that, what tools used &c.
People imagine AI as T1000. What we got so far is glorified T9.
 

Offline DiTBhoTopic starter

  • Super Contributor
  • ***
  • Posts: 4217
  • Country: gb
Re: what do you use to test harddrive, SSD, and NvME units?
« Reply #9 on: February 09, 2021, 01:15:00 pm »
Oh, have you ever published a description about that project? Not the modified firmware (that would be illegal), but a report of how you did that, what tools used &c.

No, it was done just for fun and to save a personal hard-drive for which I worked hard when I was a student. I remember I prepared hundreds of McDonald's sandwiches in order to get enough money to buy a new (second hand) laptop with a new hard-drive. To hack it I didn't use IDA or similar because too expensive to buy. I think I was lucky because I found a SH RISC chip, so I desolder the flash-chip, read it and deassebled its firmware with GNU tools. I understood very little about it (less than the 10%), but found the point where it was reporting the block-numbers to IOCTRL.

Code: [Select]
ioctl(fd, BLKGETSIZE, &numblocks);

Lucky find! Exactly the number reported by fdisk, which internally uses IOCTRL to read the number of blocks! That's a specific question that the kernel drives asks to the unit and to which the units accordenly replies.

Luckily there was no checksum for the "data area", so I burned a second flash, resoldered it, and that's it. Half the space, but working again with no more "bad-blocks" once I installed it back to the laptop I used to write my thesis years ago :D
« Last Edit: February 09, 2021, 01:29:14 pm by DiTBho »
The opposite of courage is not cowardice, it is conformity. Even a dead fish can go with the flow
 

Offline DiTBhoTopic starter

  • Super Contributor
  • ***
  • Posts: 4217
  • Country: gb
Re: what do you use to test harddrive, SSD, and NvME units?
« Reply #10 on: February 09, 2021, 01:15:56 pm »
Code: [Select]

usb 1-1: reset high speed USB device number 2 using ehci_hcd
usb 1-1: device descriptor read/64, error -71
usb 1-1: device descriptor read/64, error -71
usb 1-1: reset high speed USB device number 2 using ehci_hcd
usb 1-1: device descriptor read/64, error -71
usb 1-1: device descriptor read/64, error -71
usb 1-1: reset high speed USB device number 2 using ehci_hcd
usb 1-1: device not accepting address 2, error -71
usb 1-1: reset high speed USB device number 2 using ehci_hcd
usb 1-1: device not accepting address 2, error -71
usb 1-1: USB disconnect, device number 2
sd 0:0:0:0: Device offlined - not ready after error recovery
sd 0:0:0:0: [sda] Unhandled error code
sd 0:0:0:0: [sda]  Result: hostbyte=0x01 driverbyte=0x00
sd 0:0:0:0: [sda] CDB: cdb[0]=0x2a: 2a 00 00 00 4b 40 00 00 80 00
end_request: I/O error, dev sda, sector 19264
Unable to handle kernel paging request for data at address 0x00000030
Faulting instruction address: 0xc02f8b10
Oops: Kernel access of bad area, sig: 11 [#1]
PREEMPT PowerMac
last sysfs file: /sys/devices/virtual/net/br0/broadcast
Modules linked in:
NIP: c02f8b10 LR: c02fdd64 CTR: 00000000
REGS: db59d9b0 TRAP: 0300   Not tainted  (2.6.39-apple-minimac-G4)
MSR: 00009032 <EE,ME,IR,DR>  CR: 24084424  XER: 20000000
DAR: 00000030, DSISR: 40000000
TASK = df44e000[4168] 'badblocks' THREAD: db59c000
GPR00: c02fdd64 db59da60 df44e000 df88d908 db6a0000 00000010 db6a00d4 00000000
GPR08: fffffffc 00000000 ffffffff 00000001 24084448 1001d0dc 00000040 00000001
GPR16: 805e2000 00000008 df88d908 c0817344 c0817394 db59db50 00000000 00000010
GPR24: 00000001 db59c000 db6a0000 00000005 00000000 00000001 df88d908 02200000
NIP [c02f8b10] elv_set_request+0x14/0x50
LR [c02fdd64] get_request+0x338/0x358
Call Trace:
[db59da60] [00000001] 0x1 (unreliable)
[db59da70] [c02fdd64] get_request+0x338/0x358
[db59daa0] [c02fddd0] get_request_wait+0x4c/0x1b8
[db59db00] [c02fdfcc] __make_request+0x90/0x370
[db59db40] [c02fc03c] generic_make_request+0x36c/0x438
[db59dbc0] [c02fc1b8] submit_bio+0xb0/0x174
[db59dc10] [c00f15fc] submit_bh+0x150/0x190
[db59dc30] [c00f3918] ll_rw_block+0x100/0x144
[db59dc60] [c00f4060] __block_write_begin+0x1f8/0x46c
[db59dcd0] [c00f44cc] block_write_begin+0x74/0xc8
[db59dd00] [c00fa0dc] blkdev_write_begin+0x1c/0x2c
[db59dd10] [c0084edc] generic_perform_write+0x1bc/0x29c
[db59dd80] [c0085030] generic_file_buffered_write+0x74/0xd0
[db59ddb0] [c0086d00] __generic_file_aio_write+0x284/0x554
[db59de30] [c00f920c] blkdev_aio_write+0x48/0xc4
[db59de50] [c00bec9c] do_sync_write+0xb8/0x144
[db59def0] [c00bfad4] vfs_write+0xcc/0x184
[db59df10] [c00bfcc4] sys_write+0x58/0xc8
[db59df40] [c0015084] ret_from_syscall+0x0/0x38
--- Exception: c01 at 0xfed6718
    LR = 0x10002ff8
Instruction dump:
4e800421 7c601b78 7c030378 80010014 38210010 7c0803a6 4e800020 9421fff0
7c0802a6 90010014 8123000c 81290000
 2f800000 419e001c 7c0903a6
---[ end trace 8e039fa70a761634 ]---
usb 1-1: new high speed USB device number 3 using ehci_hcd
usb 1-1: device descriptor read/64, error -71
usb 1-1: device descriptor read/64, error -71
usb 1-1: new high speed USB device number 4 using ehci_hcd
usb 1-1: device descriptor read/64, error -71
usb 1-1: device descriptor read/64, error -71
usb 1-1: new high speed USB device number 5 using ehci_hcd
usb 1-1: device not accepting address 5, error -71
usb 1-1: new high speed USB device number 6 using ehci_hcd
usb 1-1: device not accepting address 6, error -71
hub 1-0:1.0: unable to enumerate USB device on port 1
usb 3-1: new full speed USB device number 2 using ohci_hcd
usb 3-1: device descriptor read/64, error -62
usb 3-1: device descriptor read/64, error -62
usb 3-1: new full speed USB device number 3 using ohci_hcd
usb 3-1: device descriptor read/64, error -62
usb 3-1: device descriptor read/64, error -62
usb 3-1: new full speed USB device number 4 using ohci_hcd
usb 3-1: device not accepting address 4, error -62
usb 3-1: new full speed USB device number 5 using ohci_hcd
usb 3-1: device not accepting address 5, error -62
hub 3-0:1.0: unable to enumerate USB device on port 1



This is what happens with some USB2-sATA adapters ...
The opposite of courage is not cowardice, it is conformity. Even a dead fish can go with the flow
 

Offline DiTBhoTopic starter

  • Super Contributor
  • ***
  • Posts: 4217
  • Country: gb
Re: what do you use to test harddrive, SSD, and NvME units?
« Reply #11 on: February 09, 2021, 01:20:53 pm »
The above messages was triggered by an old 20Gbyte harddrive unit, and it passes all the badblocks tests when directly connected to a pATA controller. But it fails when connected to a usb2-sATA adapter.

usb-sATA seems to add more troubles, or it's only a USB2-bulk/storage problem? Not yet clear ...


Anyway, this afternoon I will test my new and modern SSD-2015 and NVMe-2019 units with the new SH-A1 testing program I wrote yesterday :D
The opposite of courage is not cowardice, it is conformity. Even a dead fish can go with the flow
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf