Author Topic: Things you hope you don't hear...  (Read 14948 times)

0 Members and 1 Guest are viewing this topic.

Online Brumby

  • Supporter
  • ****
  • Posts: 9407
  • Country: au
Re: Things you hope you don't hear...
« Reply #25 on: April 06, 2017, 03:59:55 am »
So much for "hot-swap".
 

Offline Mr.B

  • Supporter
  • ****
  • Posts: 1042
  • Country: nz
Re: Things you hope you don't hear...
« Reply #26 on: April 06, 2017, 04:28:35 am »
So much for "hot-swap".

“Hot-Stop” capability.
Very common in devices configured by morons.
I had some gear at my work once that was configured in a similar way.
We changed infrastructure support partners shortly after discovery.
Time is the overseer of all things.
 
The following users thanked this post: Kilrah

Offline BradC

  • Super Contributor
  • ***
  • Posts: 1649
  • Country: au
Re: Things you hope you don't hear...
« Reply #27 on: April 06, 2017, 04:47:36 am »
Turns out every server in the business had been built the same way.

If its any consolation when I was considerably younger and not quite as experienced I did a similar thing. RAID0 for the OS & RAID1 for the data. Luckily for me it was the first test system in development and 3 days in one of the drives suffered an early life failure. Taught me a valuable lesson with no harm done.

The thing I heard at the end of the phone that I never really wanted to hear was the scream of a fully functioning server room descend into silence when the tech hard selected a redundant UPS selector to the wrong UPS and then opened the output breaker. Still, we all make mistakes.
 

Offline noidea

  • Supporter
  • ****
  • Posts: 202
  • Country: au
Re: Things you hope you don't hear...
« Reply #28 on: April 06, 2017, 05:53:32 am »
So, I do a lot of phone support for field engineers, as somewhat of a grey-beard type (without the facial hair)... the most knee knocking thing you don't want to hear in a critical phase of instruction is..

Field guy: "Uh-oh ... " 
to which the conversation quickly escalates to..
Me: "Uh-oh? .. What the (insert expletive) do you mean "Uh-oh?"

Nothing like that conversation to send your imagination off on a riotous romp.
So I'm not a greybeard, more a long grey haired hippy type that spends a big part of his day doing a similar thing. Actually I think it can be more accurately described as "Playing drive a human ROV by playing twenty questions" than technical support but I digress. The products I support are mains connected both single and three phase (240/415VAC) with working DC voltages in the 300-500VDC range with decent sized capacitors in them.

I actually think the worst is when the conversation goes along the lines of this:

Me: "Can you please tell me what the voltage is between testpoint X and Y?"
Field guy: "It's.........." then utter silence and the call drops out
Me: "Oh F$%^" whilst I frantically try to ring back and find out what happened.

So far fingers crossed it's always been sorry the phone went flat or something along those lines, but it does make you think about liability with the way our current society is heading.
 
The following users thanked this post: yuzuha

Offline Halcyon

  • Super Contributor
  • ***
  • Posts: 3731
  • Country: au
Re: Things you hope you don't hear...
« Reply #29 on: April 06, 2017, 11:35:42 am »
My current lab ISP (at $400/month) is trying to scare me out of moving to the new NBN connection, because, you know, the lack of service level agreement boogieman

Run! Run as fast as you can! Unless of course you're like Geoff who happened to win "Node lotto". Although you should be getting fibre to pretty much your door even on NBN where you are? In that case, you should be good.

Worst case, a few hundred bucks of Ubiquiti gear and Geoff will host your connection and beam it down the mountain for you. ;-)

You're right about lack of SLA however. A mate of mine in Cranebrook has 100Mbps (downstream) FTTH, unlimited plan with a large RSP (I won't mention the name because I'm not 100% sure which one it is). But they cut him off temporarily due to "abuse" of his account when he decided to queue up 1200 files to download (no, it wasn't a dodgy torrent or anything like that). Apparently they didn't expect FTTH customers to actually make use of their "unlimited" accounts... funny that.

That said, a few choice words to them over the phone soon got his account reactivated.
« Last Edit: April 06, 2017, 11:41:12 am by Halcyon »
 

Online gnif

  • Administrator
  • *****
  • Posts: 1119
  • Country: au
Re: Things you hope you don't hear...
« Reply #30 on: April 06, 2017, 01:28:29 pm »
My current lab ISP (at $400/month) is trying to scare me out of moving to the new NBN connection, because, you know, the lack of service level agreement boogieman

Run! Run as fast as you can! Unless of course you're like Geoff who happened to win "Node lotto". Although you should be getting fibre to pretty much your door even on NBN where you are? In that case, you should be good.

Worst case, a few hundred bucks of Ubiquiti gear and Geoff will host your connection and beam it down the mountain for you. ;-)

You're right about lack of SLA however. A mate of mine in Cranebrook has 100Mbps (downstream) FTTH, unlimited plan with a large RSP (I won't mention the name because I'm not 100% sure which one it is). But they cut him off temporarily due to "abuse" of his account when he decided to queue up 1200 files to download (no, it wasn't a dodgy torrent or anything like that). Apparently they didn't expect FTTH customers to actually make use of their "unlimited" accounts... funny that.

That said, a few choice words to them over the phone soon got his account reactivated.

For the record, Geoff is me :P.
HostFission - Full Server Monitoring and Management Solutions.
https://hostfission.com/
https://twitter.com/HostFission

I volunteer my time to manage this server, if you would like to support this work I have a patreon here:
https://www.patreon.com/gnif
 
The following users thanked this post: SeanB

Offline rrinker

  • Super Contributor
  • ***
  • Posts: 1917
  • Country: us
Re: Things you hope you don't hear...
« Reply #31 on: April 06, 2017, 01:55:52 pm »
 Bunch of years back I was working on some large database recovery at a client. They had several racks of Windows servers, plus a big HP mini. There was also a tech from the UPS company working in the data center. There was some maintenance required, so he went to put the UPS in bypass mode. Only he turned the wrong switch, instantly depowered the entire data center. Oops.

 Then there's the other client, who at the time wasn't a client. They had an HP storage array with a failing disk (amber light). The previous support company sent out a tech to replace the drive. The tech CLAIMS he pull the correct drive, but the best thing we can come up with was he pulled a different one that wasn't failed, and the array being configured as RAID 5 was not tolerant of TWO bad disks. They lost their entire Exchange database as well as about 6 other VMs from their ESXi farm. The VMs they got back fairly quickly, but the Exchange data took a month. I happened to be there when the president of the support company was on a call wth the IT directory. He told her, and I quote, that the whole reason for the outage was "a glitch". Needless to say, cleaning up the aftermath of their "glitch" was the last thing that company ever did for this client, and now we are their vendor of choice.



 

Online Kilrah

  • Supporter
  • ****
  • Posts: 1764
  • Country: ch
Re: Things you hope you don't hear...
« Reply #32 on: April 06, 2017, 02:21:50 pm »
While I was the tech with the uh-oh, I can legitimately roll out the 'I was only following orders' defence.
[...]

That is VERY worthy of https://www.reddit.com/r/talesfromtechsupport/ !
 

Online CJay

  • Super Contributor
  • ***
  • Posts: 3421
  • Country: gb
  • M0UAW
Re: Things you hope you don't hear...
« Reply #33 on: April 06, 2017, 05:04:33 pm »
The tech CLAIMS he pull the correct drive, but the best thing we can come up with was he pulled a different one that wasn't failed, and the array being configured as RAID 5 was not tolerant of TWO bad disks. They lost their entire Exchange database as well as about 6 other VMs from their ESXi farm.

To be fair, even if he did pull the wrong drive that's not a common failure with a 'failing' drive left in the array on a HP, they generally survive and often powering down the connected server(s) then replacing the drive will often get you back to the start even if you've not got the ACU boot CD.

I've seen lots of RAID arrays go down because someone decided reseating a predictive fail drive was a good idea, you'd usually get away with it on a HP but it was absolute poison to some of  the Dell machines of the era and utterly pointless as the machines were under support and usually had four hour parts response.

One of the worst fails I saw was on a partitionable SCSI shelf which had been misconnected to two Dell servers after a weekend data centre move and some genius had decided to 'accept new configuration'.

I managed to get it back by powering it all down, correcting the cabling errors, powering back up, deleting the configured arrays and then recreating the original configuration, it was then simple to get to the data but the ESEUTIL Exchange check took almost a day and a half with ~3000 users champing at the bit to get their email back.

While that ESEUTIL scan was working I'd configured two replacement servers, a new shelf and was restoring data from 24 hours before the 'crash' so there was a fallback position.

The single worst fail I saw was caused by a contractor who reinitialised a ~200 disk EVA5000 from the controller front panel because he 'thought the controller had crashed' (it was a single FC disk which had deadlocked the loop and hung the array, seen it happen 2 maybe 3 times in three years).

That one took six days to snapshot/replicate over a high speed fibre link from their head office.

It was always amusing to 'Identify' the whole array from the EVA management appliance and leave it ID'd, would scare the proverbial out of a lot of people to see ~150 Amber LEDs flashing...

M0UAW
 

Online Kilrah

  • Supporter
  • ****
  • Posts: 1764
  • Country: ch
Re: Things you hope you don't hear...
« Reply #34 on: April 06, 2017, 10:03:20 pm »
To be fair, even if he did pull the wrong drive that's not a common failure with a 'failing' drive left in the array on a HP

Except you can be sure the clueless tech noticed he fucked up and removed the wrong drive, plugged it back in to "fix his mistake" then promptly plugged the "real" bad one out... jsut after a rebuild attempt started upon reconnection of the first one. You now have 2 bad drives and you're SOL.

Any basic human who doesn't really understand the system (which he seems to be) will think the best thing to do is try to cover up/fix his mess ASAP instead of asking for help, and will do that when the correct answer would be to let the thing spend a day rebuilding before touching anything else (and check/prepare your backups so you're ready to restore while it happens, jsut in case...) but that would mean owning up to a mistake when you think there's still a way to get out of it...
« Last Edit: April 06, 2017, 10:16:37 pm by Kilrah »
 
The following users thanked this post: TheDane

Online CJay

  • Super Contributor
  • ***
  • Posts: 3421
  • Country: gb
  • M0UAW
Re: Things you hope you don't hear...
« Reply #35 on: April 07, 2017, 05:35:44 am »
To be fair, even if he did pull the wrong drive that's not a common failure with a 'failing' drive left in the array on a HP

Except you can be sure the clueless tech noticed he fucked up and removed the wrong drive, plugged it back in to "fix his mistake" then promptly plugged the "real" bad one out... jsut after a rebuild attempt started upon reconnection of the first one. You now have 2 bad drives and you're SOL.
Quote

Ah, yes, I didn't factor in the arse covering
M0UAW
 

Offline Housedad

  • Frequent Contributor
  • **
  • Posts: 512
  • Country: us
Re: Things you hope you don't hear...
« Reply #36 on: April 07, 2017, 06:34:12 am »
"Put him out!  Put him out!  God damn it!!  NOW!!"

What I remember from waking up laying on my stomach gagging on the airway while trying to scream during surgery to fix my spine that was shattered in a elevator accident.  38 years ago
At least I'm still older than my test equipment
 

Offline EEVblog

  • Administrator
  • *****
  • Posts: 30130
  • Country: au
    • EEVblog
Re: Things you hope you don't hear...
« Reply #37 on: April 07, 2017, 09:15:25 am »
 :palm:

 
The following users thanked this post: TheDane

Online Kilrah

  • Supporter
  • ****
  • Posts: 1764
  • Country: ch
Re: Things you hope you don't hear...
« Reply #38 on: April 07, 2017, 10:16:54 am »
Damn, was so much hoping he'd be putting the sleeves on, tried the remote again - with no change at all, and a loooong painful scream as a result  >:D

Alas no, jsut a stupid shill  |O
 

Offline HwAoRrDk

  • Frequent Contributor
  • **
  • Posts: 633
  • Country: gb
Re: Things you hope you don't hear...
« Reply #39 on: April 07, 2017, 12:38:27 pm »
I've seen lots of RAID arrays go down because someone decided reseating a predictive fail drive was a good idea, you'd usually get away with it on a HP but it was absolute poison to some of  the Dell machines of the era and utterly pointless as the machines were under support and usually had four hour parts response.

Ugh, that brings back a bad memory of RAID controllers on early '00s Dell servers that were a terrible pain in the arse.

I remember on one PowerEdge 4000-something (I forget which exactly) we'd configured a RAID 5 array using some old 10GB drives as temporary scratch space. But, after a few months one of the drives failed, and it wasn't worth it to get a matching replacement drive (the choice was either pay a king's ransom for a new one from Dell or buy a used one). So we decided to just get rid of that array. Except on this particular RAID controller's BIOS, there was literally no facility to delete an array when it was in degraded state. Dell's tech support confirmed it - such an option didn't exist! :wtf:

A further kick in the balls was that the RAID controller didn't remember its setting for the alarm buzzer, so from that point on, every single time the server was restarted, the alarm would sound (which was bloody loud) and you'd have to go into the RAID controller BIOS and turn the alarm off. We had to run that server for 3 years like that, all because nobody had the common sense foresight to realise someone might want to delete an array when it was in a degraded state. :palm:
 

Offline macboy

  • Super Contributor
  • ***
  • Posts: 1981
  • Country: ca
Re: Things you hope you don't hear...
« Reply #40 on: April 07, 2017, 02:57:28 pm »

The tech CLAIMS he pull the correct drive, but the best thing we can come up with was he pulled a different one that wasn't failed, and the array being configured as RAID 5 was not tolerant of TWO bad disks. They lost their entire Exchange database as well as about 6 other VMs from their ESXi farm.

...

One of the worst fails I saw was on a partitionable SCSI shelf which had been misconnected to two Dell servers after a weekend data centre move and some genius had decided to 'accept new configuration'.

I managed to get it back by powering it all down, correcting the cabling errors, powering back up, deleting the configured arrays and then recreating the original configuration, it was then simple to get to the data but the ESEUTIL Exchange check took almost a day and a half with ~3000 users champing at the bit to get their email back.

..

This is why I love ZFS. You can survive a soft double failure like disconnecting the wrong disk. The array (pool) will go down of course, but just plug that disk back in, and the pool will re-silver within minutes, and you will be back in business. You can also reseat a disk, and it won't cause issues because the pool will re-silver that disk back into the pool (while remaining online), fixing new or changed data within seconds or minutes instead of requiring a full offline re-build of the array. You can move disks around at will, and the system will identify their new locations/connections, and adjust accordingly. All data is checksummed so if a data error (e.g. unreadable sector) occurs, even a silent one, the system figures out which combination of primary and redundant data to re-combine into the original file, then attempts to re-write the bad data (which should trigger the disk to reallocate the bad sector). In this way, the simple act of reading data can improve the health of it.

It's really easy to play/experiment with these failure scenarios by creating a pool from several vdevs that are files instead of disks. Then you can create a filesystem or two on the pool, put some test data on there, and start screwing around. My favorite: write random data to a vdev to create silent data corruption and marvel at how the pool remains online, silently heals the data and does it while never delivering a single incorrect byte. Or disconnect a vdev, write some data to the pool, reconnect the vdev, and marvel at how quickly redundancy is restored (instantly for all old data, quickly for new data). I just can't stomach the idea of old school RAID any longer.  This is so far ahead in every way.
 
The following users thanked this post: evb149

Online CJay

  • Super Contributor
  • ***
  • Posts: 3421
  • Country: gb
  • M0UAW
Re: Things you hope you don't hear...
« Reply #41 on: April 07, 2017, 05:22:45 pm »

Ugh, that brings back a bad memory of RAID controllers on early '00s Dell servers that were a terrible pain in the arse.

I remember on one PowerEdge 4000-something (I forget which exactly) we'd configured a RAID 5 array using some old 10GB drives as temporary scratch space. But, after a few months one of the drives failed, and it wasn't worth it to get a matching replacement drive (the choice was either pay a king's ransom for a new one from Dell or buy a used one). So we decided to just get rid of that array. Except on this particular RAID controller's BIOS, there was literally no facility to delete an array when it was in degraded state. Dell's tech support confirmed it - such an option didn't exist! :wtf:

A further kick in the balls was that the RAID controller didn't remember its setting for the alarm buzzer, so from that point on, every single time the server was restarted, the alarm would sound (which was bloody loud) and you'd have to go into the RAID controller BIOS and turn the alarm off. We had to run that server for 3 years like that, all because nobody had the common sense foresight to realise someone might want to delete an array when it was in a degraded state. :palm:

Hell yes, I remember those too, horrible things, some arcane command line interface to the RAID controller that seemed to be so poorly documented as to be undocumented and they had the amazing 'facility' where it was a crapshoot dependant on which BIOS version was on them as to what 'scrubbing' meant, on one it was checking and cleaning up, on the other it was wiping the array.

The nasty continued a couple of generations up, onto PERC4 I think (maybe PERC5?) and it was again often a crapshoot as to what happened when a disk was replaced, it was possible to get them into a state where it complained bitterly about a disk failing but absolutely and resolutely refused to allow you to replace the damn thing and have it rebuild, the only way to resolve it was to completely trash the system, rebuild the system and restore the data.

More than one occasion where Dell support got confused and had given customers instructions for disk replacement which trashed their arrays, on one occasion I sat with the IT director of an NHS trust as Dell Support talked the IT tech through that process and took down their live webserver.

I developed a deep and abiding love for HP RAID after a short time working on Dell equipment, nowadays I believe the Dell gear is better (and to be fair, it wasn't too bad by the PE2900 and PERC5 SAS controllers).

M0UAW
 

Offline james_s

  • Super Contributor
  • ***
  • Posts: 9851
  • Country: us
Re: Things you hope you don't hear...
« Reply #42 on: April 07, 2017, 06:48:40 pm »
"Put him out!  Put him out!  God damn it!!  NOW!!"

What I remember from waking up laying on my stomach gagging on the airway while trying to scream during surgery to fix my spine that was shattered in a elevator accident.  38 years ago

Gah!
*cringe*
 

Offline Howardlong

  • Super Contributor
  • ***
  • Posts: 4787
  • Country: gb
Re: Things you hope you don't hear...
« Reply #43 on: April 07, 2017, 07:57:39 pm »
Two quite recent ones at a Wintel hosting provider who is tasked with managing everything up to and including the OS, i.e., SAN, Virtualisation etc.

During a scheduled DR test, we couldn't access the high availability database server at the DR hosting site, but we could access everything else. As the production and hosting sites are on a stretched VLAN we have no visibility of physically where the hoats are. After three hours, I suggested that the symptoms, as far as I could see from verious primary and secondary indications, strongly implied that both nodes were in fact located at the now offline site rather than being geographically disparate as specified in the design. Two minutes later, we had an embarrassing call from the hosting provider.

Only this week, on Monday, we had an outage of about 20 clusters after a scheduled change on some Xsigo fabric by our esteemed hosting provider on a shared SAN that "shouldn't affect us". Most of the clusters failed over and recovered correctly in a matter of seconds, but three didn't because... all those cluster nodes resided on the same physical hosts on the same failed fabric.

Then there was the one many years ago where an erstwhile colleague plugged both redundant PSUs of a production box into the same PDU strip, and a few months down the line the innevitable happened. He learned.
« Last Edit: April 07, 2017, 07:59:59 pm by Howardlong »
 

Offline Cerebus

  • Super Contributor
  • ***
  • Posts: 3657
  • Country: gb
Re: Things you hope you don't hear...
« Reply #44 on: April 07, 2017, 09:00:18 pm »
This is why I love ZFS. You can survive a soft double failure like ...

...  This is so far ahead in every way.

Yup, ZFS works very well. The one thing wrong with it is the arcane, random, scattered mess that is the command line interface. ZFS works so well that by the time you have a fault that actually requires any human intervention you've forgotten the peculiarities of the command line interface and have to re-learn it all over again.

I had two servers at home (retired kit from the office) and had one be essentially an identical copy of the the other. At regular intervals each day the live one would take some ZFS snapshots, remotely power up the standby machine, copy the snapshot changes across, bring the filesystems on the standby system up to date and the standby machine would power itself off. This worked faultlessly for ages, including failover tests. Then I needed to make some changes and it took me almost as long as it took when I first set it up just to re-work out how to do all the ZFS stuff. Great filesystem, lousy human interface.
Anybody got a syringe I can use to squeeze the magic smoke back into this?
 

Offline Red Squirrel

  • Super Contributor
  • ***
  • Posts: 2320
  • Country: ca
Re: Things you hope you don't hear...
« Reply #45 on: April 08, 2017, 12:20:23 am »
I use mdadm raid and I've gotten to know the commands pretty well and also made myself a cheat sheet, but I might look at ZFS at some point too.   One thing nice about Linux/command line is that it tends to be easier to automate stuff if you want to, or to write a front end to it.  For some reason though it seems no one in the Linux world seems to want to write front ends for anything or the existing ones are garbage.  Though Linux is so fragmented when it comes to GUI tools, I don't really blame people.   Web based is the way to go IMO though.
 

Offline timb

  • Super Contributor
  • ***
  • Posts: 2528
  • Country: us
  • Pretentiously Posting Polysyllabic Prose
    • timb.us
Re: Things you hope you don't hear...
« Reply #46 on: April 08, 2017, 12:31:40 am »
Her reply was "What do you mean software?........ What lights are on at the moment?". I think I even said to her, "You know, software, that's running on the modem?... The configuration you're trying to get me to reset?... The thing where you stick the pen in the hole and such?".

 :palm:

Just ship me my damn modem!
First line support often have little domain knowledge, their main ability/task is to run down a flow-chart of standard questions & responses.

Thing is - it works for most punters who have even less knowledge about how stuff works.

About 8 or 9 years ago I had an issue with my cable internet, so I called up and, while on hold, went ahead and completely unplugged the modem, as I knew they'd have me do that. So I tell the girl my problem and she proceeds to run a "remote diagnostic" on my modem. I let her. After about 30 seconds she tells me that my modem appears to be connected to the network, so she wants me to reboot my computer. I tell her, "That's very interesting, because my modem is currently unplugged, so I'm curious how you ran a diagnostic on it?"

A bit flustered, she explains that power is sent to certain parts of the modem over the cable line, so even if I've got the power cord unplugged she can still talk to it. I replied, "That's even more interesting, because I've had the modem *completely* disconnected during this entire call..." She goes silent and then asks me to hold.

A minute later I get a level 2 technician who dispatches a worker bee. They end up finding a problem with the signal booster on the pole at the end of my street. I still have no idea what the purpose of lying to the customer about a remote modem diagnostic (which is possible, they can see SNR and signal dB from their terminals) is supposed to accomplish.

Oh, a few months back I was staying at my parents place for a week. One of the DirecTV remotes was having an issue with the power button. I took the remote apart, cleaned it and put it back together, but the issue persisted (I was hoping the carbon contact on that button was just dirty, but it was apparently worn out). So I tell my dad to call them up and they'll send a new remote. He calls and I leave for the grocery store.

45 minutes later I get back and he's *still* on the phone with him. The tech is trying to get him to test *every* freaking button on the remote, in a procedure that takes about 25 seconds per button. Seriously. I'm not making this up or exaggerating. If I hadn't been there I wouldn't believe it myself.

So, I motion to him and he gives me the phone, I politely interrupt the tech and explain that I'm the son, the remote is working except for the power button, blah blah blah and the guy tries to get me to test the buttons! I explain it again, but the guy keeps trying to complete his flow chart...

Finally, losing my patience a bit, I explain that I'm an EE, I've taken the remote apart, the contact on that button is worn, we just need a new remote. So the tech pushes back a bit, asking how we know it's the remote and not the box, the test he's trying to get us to do will tell him (it won't). I tell him that remotes from other boxes in the house work fine on this box, yet this remote won't work on other boxes so it's clearly the remote! Finally, after 10 minutes of wearing him down with logic and a stern voice, he relents and orders a new remote for us. Finally! But I'm not out of the woods just yet...

Then the up-sell starts! He says, "I'm waiting for a confirmation code on the order for the new remote, but hey, while we're waiting let me tell you about this exciting new offer!" Bullshit, he should have that order number right away, it doesn't take 5 minutes. He's basically holding me hostage. So, I have to let him get through his spiel, decline twice and explain why before he'll give me the damn code. Ugh.

It took an hour of my time and their time to get a new remote control that in and off itself costs maybe $1 to make. *Bangs head on desk.*
 
Any sufficiently advanced technology is indistinguishable from magic; e.g., Cheez Whiz, Hot Dogs and RF.
 

Online blueskull

  • Supporter
  • ****
  • Posts: 12479
  • Country: cn
  • Power Electronics Guy
Re: Things you hope you don't hear...
« Reply #47 on: April 08, 2017, 12:37:46 am »
"This phone call is to tell you your credit card..."
I never heard the complete sentence as I will just hang up at that moment.
I don't even have a credit card, how can anyone tell me my credit card has an issue?
« Last Edit: April 08, 2017, 02:22:43 am by blueskull »
 
The following users thanked this post: SeanB, TiN

Offline evb149

  • Super Contributor
  • ***
  • Posts: 1666
  • Country: us
Re: Things you hope you don't hear...
« Reply #48 on: April 08, 2017, 02:20:39 am »
mdadm based can be handy for a quick setup, maybe a non critical mirroring setup or something.
ZFS is so much better though with several "gotchas" that have popped up about risk factors.
IIRC there's some risk when you turn on deduplication that you'll "overflow" the physical storage and have some issues around that.  Maybe something like having lots of stored files with partially deduplicated contents that fit in your pool but then modifying some of them to now be different and suddenly a file of the same size that just fit on your disc is now not able to be written because it'd make the pool full etc.  Maybe a problem with snapshots etc. too, not sure.
IIRC there are some memory tuning possible problems as well where it uses in some ways / cases a proportional amount of memory to the dataset sizes (not surprising) and anyway you can get into cases where the recommended amount of ram is higher than most boxes actually have (>= 8-16GBy) given the sizes of discs that are easily available.
They never finished the security layer for it unfortunately.
And IIRC there were said to be some vulnerabilities to corruption during the host's processing of data in host RAM, e.g. exacerbated greatly by lack of ECC or whatever and then corruption happening despite the hashed on disc checksumming.  Too bad there don't seem to be "here's a buffer of data, and also here's the checksum for it" APIs all the way from the high level file I/O APIs all the way down to the disc so that it gets checked for corruption as soon as the write is queued and thereafter optionally as a confirming step of the writing and of course with every read operation.

Seems like the best way to set up ZFS for casual use is still a mirror rather than something too fancy with a raidz since less can go wrong no matter how unstable your host / ram or discs are.  You are able to set up a pool including multiple mirrored sets so that's a good way to get simple extra fail safe redundancy and a large pool.
Quote
Virtual devices are specified one at a time on the command line, separated by whitespace. The keywords "mirror"   and "raidz" are   used to distinguish where a group ends and another begins. For example, the following creates two root vdevs, each a   mirror of two disks:
# zpool create mypool mirror da0   da1 mirror da2 da3

I think there might be some potential gotchas about device replacement along the lines of what happens to create problems with other RAID systems.  If you have multiple "identical" discs in a mirror or RAIDZ and one fails and you go to get a replacement "of the same model" or a different model "of the same size" I'm not sure that will always work since due to tiny actual capacity differences between two units (e.g. bad sector table size or very slightly different formatted capacity) the resilvering could fail to find enough space on the replacement device.  Seems like that might be an easy way to get any existing mirror drive to become bad as well just due to a small error that does not exist on the paired mate.  I wonder if it only cares about the "used storage" on the drives so if you leave at least say 5% free then you're OK.


The problem with GUI front ends is that almost nobody wants to run a windowing system on their server, so as you suggested a web based GUI would be particularly effective since you could do that over ssh / https or whatever and then manage the server's disc stuff from another machine.  Maybe freenas has some nice zfs admin tools these days, IIRC they may have been working on such way back when they started supporting ZFS.



I use mdadm raid and I've gotten to know the commands pretty well and also made myself a cheat sheet, but I might look at ZFS at some point too.   One thing nice about Linux/command line is that it tends to be easier to automate stuff if you want to, or to write a front end to it.  For some reason though it seems no one in the Linux world seems to want to write front ends for anything or the existing ones are garbage.  Though Linux is so fragmented when it comes to GUI tools, I don't really blame people.   Web based is the way to go IMO though.
 

Online Brumby

  • Supporter
  • ****
  • Posts: 9407
  • Country: au
Re: Things you hope you don't hear...
« Reply #49 on: April 08, 2017, 03:09:48 am »
Her reply was "What do you mean software?........ What lights are on at the moment?". I think I even said to her, "You know, software, that's running on the modem?... The configuration you're trying to get me to reset?... The thing where you stick the pen in the hole and such?".

 :palm:

Just ship me my damn modem!
First line support often have little domain knowledge, their main ability/task is to run down a flow-chart of standard questions & responses.

Thing is - it works for most punters who have even less knowledge about how stuff works.

About 8 or 9 years ago I had an issue with my cable internet, so I called up and, while on hold, went ahead and completely unplugged the modem, as I knew they'd have me do that. So I tell the girl my problem and she proceeds to run a "remote diagnostic" on my modem. I let her. After about 30 seconds she tells me that my modem appears to be connected to the network, so she wants me to reboot my computer. I tell her, "That's very interesting, because my modem is currently unplugged, so I'm curious how you ran a diagnostic on it?"

A bit flustered, she explains that power is sent to certain parts of the modem over the cable line, so even if I've got the power cord unplugged she can still talk to it. I replied, "That's even more interesting, because I've had the modem *completely* disconnected during this entire call..." She goes silent and then asks me to hold.

A minute later I get a level 2 technician who dispatches a worker bee. They end up finding a problem with the signal booster on the pole at the end of my street. I still have no idea what the purpose of lying to the customer about a remote modem diagnostic (which is possible, they can see SNR and signal dB from their terminals) is supposed to accomplish.

Oh, a few months back I was staying at my parents place for a week. One of the DirecTV remotes was having an issue with the power button. I took the remote apart, cleaned it and put it back together, but the issue persisted (I was hoping the carbon contact on that button was just dirty, but it was apparently worn out). So I tell my dad to call them up and they'll send a new remote. He calls and I leave for the grocery store.

45 minutes later I get back and he's *still* on the phone with him. The tech is trying to get him to test *every* freaking button on the remote, in a procedure that takes about 25 seconds per button. Seriously. I'm not making this up or exaggerating. If I hadn't been there I wouldn't believe it myself.

So, I motion to him and he gives me the phone, I politely interrupt the tech and explain that I'm the son, the remote is working except for the power button, blah blah blah and the guy tries to get me to test the buttons! I explain it again, but the guy keeps trying to complete his flow chart...

Finally, losing my patience a bit, I explain that I'm an EE, I've taken the remote apart, the contact on that button is worn, we just need a new remote. So the tech pushes back a bit, asking how we know it's the remote and not the box, the test he's trying to get us to do will tell him (it won't). I tell him that remotes from other boxes in the house work fine on this box, yet this remote won't work on other boxes so it's clearly the remote! Finally, after 10 minutes of wearing him down with logic and a stern voice, he relents and orders a new remote for us. Finally! But I'm not out of the woods just yet...

Then the up-sell starts! He says, "I'm waiting for a confirmation code on the order for the new remote, but hey, while we're waiting let me tell you about this exciting new offer!" Bullshit, he should have that order number right away, it doesn't take 5 minutes. He's basically holding me hostage. So, I have to let him get through his spiel, decline twice and explain why before he'll give me the damn code. Ugh.

It took an hour of my time and their time to get a new remote control that in and off itself costs maybe $1 to make. *Bangs head on desk.*

I've had the same myself.  It's very frustrating when the bull they roll out just does not fir the symptoms.

A slightly different experience of mine was a couple of years ago When I was trying to find out about broadband availability for an address that a friend was considering moving to.  I had done some basic research and had a little bit of an idea of what was in the area before I called and asked.  I was told "No" - but then I asked an impossible question: "Why?"

The response was underwhelming.

Over a period of 5 days, it took me four calls, several transfers and a total of nearly 3 hours to finally find someone that was prepared to talk to me - and I'm sure that only happened because I was asking technically answerable questions.  In the end I got an answer.  It was still "No", but the reasoning made sense.  (It wasn't so much a technical impossibility, but one of administrative inertia.)

My friend did eventually move, but to a different property - one that already had an Optus coax drop to the house.  That was an easy one to answer.
« Last Edit: April 08, 2017, 03:11:24 am by Brumby »
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf