Author Topic: Proof that software as service/cloud based, will never work for long term ...  (Read 152247 times)

0 Members and 1 Guest are viewing this topic.

Online tautech

  • Super Contributor
  • ***
  • Posts: 29410
  • Country: nz
  • Taupaki Technologies Ltd. Siglent Distributor NZ.
    • Taupaki Technologies Ltd.
https://www.theregister.com/2024/07/19/microsoft_365_azure_outage_central_us/
LOL, it's worldwide !

But smart dudes previously active members here are fixing it bit by bit.  :-X

Well yesterday was fun. Thankfully we are 50/50 Microsoft/Apple so our business operations weren't impacted too much. Each time could still do their job, albeit at reduced capacity. This is why I always say not to put your eggs in one basket. Those guys who rely on Windows endpoints and servers and Hyper-V will still be picking up the pieces this weekend.
And well into next week once they hit their desks next week only to find their systems still down.
Avid Rabid Hobbyist.
Some stuff seen @ Siglent HQ cannot be shared.
 

Online Halcyon

  • Global Moderator
  • *****
  • Posts: 5926
  • Country: au
https://www.theregister.com/2024/07/19/microsoft_365_azure_outage_central_us/
LOL, it's worldwide !

But smart dudes previously active members here are fixing it bit by bit.  :-X

Well yesterday was fun. Thankfully we are 50/50 Microsoft/Apple so our business operations weren't impacted too much. Each time could still do their job, albeit at reduced capacity. This is why I always say not to put your eggs in one basket. Those guys who rely on Windows endpoints and servers and Hyper-V will still be picking up the pieces this weekend.
And well into next week once they hit their desks next week only to find their systems still down.

The fix is pretty easy. It just depends on how well they've planned and designed their systems. It'll be a learning curve for many, that's for sure.
 

Online tautech

  • Super Contributor
  • ***
  • Posts: 29410
  • Country: nz
  • Taupaki Technologies Ltd. Siglent Distributor NZ.
    • Taupaki Technologies Ltd.
Avid Rabid Hobbyist.
Some stuff seen @ Siglent HQ cannot be shared.
 
The following users thanked this post: madires

Online SiliconWizard

  • Super Contributor
  • ***
  • Posts: 15323
  • Country: fr
 

Offline mikeb1279

  • Newbie
  • Posts: 6
  • Country: au
Quote
As someone pointed out above, anti-viruses has been screwing OS's since the dawn of time.

Which is actually preferable to being hit by ransomware or worse. At least problems with AV (and they are actually few) aren't malicious.

Quote
On-prem IT can manage this by patching non-critical boxes first to test etc

With other stuff, sure. But AV is often pushing out critical patches for 0-day exploits, and if you hang around a week for IT to try it on some spare kit you might be too late to apply it. While your IT bods are having a good play the bad guys are deconstructing it to find the hole it patches, and then hope they get to your setup before your IT people finally give the OK and think about rolling it out.

It's a matter of risk, and effectively you're outsourcing the testing and stuff to a third party who should know their onions - your local IT bods generally don't have a clue because they don't have the mindset of do-badders. Just think of how many security holes there are all over the place (requiring AV to stop them being exploited) - the developers don't have the mindset to see them, and IT support are not really any different (and if they were, you wouldn't want to be employing them).

Well I mean if your IT people are incompetent then sure. Otherwise they would apply the AV patch as an expedited change, starting with the non-critical boxes no? This should be their bread and butter kind of stuff in a moderate to large organization.

That's the theory, but every experience I have had with SaaS has resulted in inferior solutions with more downtime and buggy features. What you are saying is an advantage I am saying is a liability. Maybe we have just had different experiences though?
 

Offline PA0PBZ

  • Super Contributor
  • ***
  • Posts: 5202
  • Country: nl
Well I mean if your IT people are incompetent then sure. Otherwise they would apply the AV patch as an expedited change, starting with the non-critical boxes no? This should be their bread and butter kind of stuff in a moderate to large organization.

Crowdstrike is fully in control over the patches/updates, there's nothing local IT can do about it, there's no setting like in Windows where you can delay the update.
Keyboard error: Press F1 to continue.
 
The following users thanked this post: tautech

Offline Marco

  • Super Contributor
  • ***
  • Posts: 6947
  • Country: nl
Since Microsoft is getting blamed any way, they should just give themselves the power to fix it in the future. Have a PXE server store optional patches and let the bootmanager apply them. A bios update could still brick below that level, but at least drivers would be fixable.
 

Offline madires

  • Super Contributor
  • ***
  • Posts: 8143
  • Country: de
  • A qualified hobbyist ;)
The cause for the CrowdStrike disaster is a missing pointer check:
- Crowdstrike causes the largest IT outage in history, massive questions about testing regime (https://techau.com.au/crowdstrike-causes-the-largest-it-outage-in-history-massive-questions-about-testing-regime/)
 

Offline BravoVTopic starter

  • Super Contributor
  • ***
  • Posts: 7549
  • Country: 00
  • +++ ATH1
The cause for the CrowdStrike disaster is a missing pointer check:
- Crowdstrike causes the largest IT outage in history, massive questions about testing regime (https://techau.com.au/crowdstrike-causes-the-largest-it-outage-in-history-massive-questions-about-testing-regime/)

What testing regime ? Clearly the new update deployed was never tested, not even once.  :-DD

C'mon, a blue sreen problem is easily spotted, either manual or automated testing.
 
The following users thanked this post: 2N3055

Offline m k

  • Super Contributor
  • ***
  • Posts: 2444
  • Country: fi
Any idea how that old boot over network is doing?

This case would be a bit easier if that kind of a "mainframe" was available.
At least with all these "terminals" that are around.
Advance-Aneng-Appa-AVO-Beckman-Danbridge-Data Tech-Fluke-General Radio-H. W. Sullivan-Heathkit-HP-Kaise-Kyoritsu-Leeds & Northrup-Mastech-OR-X-REO-Simpson-Sinclair-Tektronix-Tokyo Rikosha-Topward-Triplett-Tritron-YFE
(plus lesser brands from the work shop of the world)
 

Offline iMo

  • Super Contributor
  • ***
  • Posts: 5153
  • Country: bt
The simplest improvement of the crowdstrike's deployment process of their updates delivered to their lovely customers would be to deploy their updates into their own company first. The day after they may go with the update out to the masses (if they will still be able to go, sure).. :)
Readers discretion is advised..
 

Offline Marco

  • Super Contributor
  • ***
  • Posts: 6947
  • Country: nl
Any idea how that old boot over network is doing?
Doesn't help without ipmi for remote reboot.
 

Online TimFox

  • Super Contributor
  • ***
  • Posts: 8413
  • Country: us
  • Retired, now restoring antique test equipment
The simplest improvement of the crowdstrike's deployment process of their updates delivered to their lovely customers would be to deploy their updates into their own company first. The day after they may go with the update out to the masses (if they will still be able to go, sure).. :)

I didn't get hit by this disaster, but with other smaller dumpster-fire software upgrades I always wondered if anyone had deployed the updates first to a reasonable-sized system before foisting them onto the customer base.
 

Offline Karel

  • Super Contributor
  • ***
  • Posts: 2265
  • Country: 00
Google continues to "ruin" Fitbit for users by discontinuing web app interface

Seventy-nine pages of mostly negative customer feedback with no acknowledgments to voiced concern

https://www.techspot.com/news/103874-google-continues-ruin-fitbit-users-discontinuing-web-app.html

When Google was a lowly upstart, its motto was "Don't be evil." It even listed the phrase prominently in its corporate code of conduct.
After the Alphabet restructuring in 2015, it was changed to the tamer-sounding "Do the right thing."
It's telling that by 2018, Google no longer had a motto and had removed both phrases from the company CoC.
It makes sense, considering the company no longer lives by either creed.
 

Online coppercone2

  • Super Contributor
  • ***
  • Posts: 10605
  • Country: us
  • $
their not being ethical to the share holders with those mottos
 

Offline coppice

  • Super Contributor
  • ***
  • Posts: 9416
  • Country: gb
Google continues to "ruin" Fitbit for users by discontinuing web app interface

Seventy-nine pages of mostly negative customer feedback with no acknowledgments to voiced concern

https://www.techspot.com/news/103874-google-continues-ruin-fitbit-users-discontinuing-web-app.html

When Google was a lowly upstart, its motto was "Don't be evil." It even listed the phrase prominently in its corporate code of conduct.
After the Alphabet restructuring in 2015, it was changed to the tamer-sounding "Do the right thing."
It's telling that by 2018, Google no longer had a motto and had removed both phrases from the company CoC.
It makes sense, considering the company no longer lives by either creed.
The next step will be more honesty with the motto "do the far right thing".
 

Online coppercone2

  • Super Contributor
  • ***
  • Posts: 10605
  • Country: us
  • $
The simplest improvement of the crowdstrike's deployment process of their updates delivered to their lovely customers would be to deploy their updates into their own company first. The day after they may go with the update out to the masses (if they will still be able to go, sure).. :)

I didn't get hit by this disaster, but with other smaller dumpster-fire software upgrades I always wondered if anyone had deployed the updates first to a reasonable-sized system before foisting them onto the customer base.

buddy maybe you did not get the memo but someone is getting promotions for being a 'go getter' that can 'assert risk' to maintain business operations. because they hunched and figured that testing costs money and we don't need it! I 'asked' (in the tony soprano sense) the guy if he was sure (10 seconds after he submitted the final revision of the code) it would release OK!

Don't you love those 'cost sensitive' deadlines that supposedly do something more then make your boss feel relaxed because hes not sure if he feels like asking for a extension based on new information?
« Last Edit: July 20, 2024, 06:14:23 pm by coppercone2 »
 

Offline m k

  • Super Contributor
  • ***
  • Posts: 2444
  • Country: fi
Any idea how that old boot over network is doing?
Doesn't help without ipmi for remote reboot.

It shouldn't be a reboot, nor management.

PXE and what was before it didn't need any special ports.

But special watchdog there must be, no matter what.
After that the style is pretty much irrelevant and management can be automated.
Maybe some motherboards already have something like that.
Advance-Aneng-Appa-AVO-Beckman-Danbridge-Data Tech-Fluke-General Radio-H. W. Sullivan-Heathkit-HP-Kaise-Kyoritsu-Leeds & Northrup-Mastech-OR-X-REO-Simpson-Sinclair-Tektronix-Tokyo Rikosha-Topward-Triplett-Tritron-YFE
(plus lesser brands from the work shop of the world)
 

Online tautech

  • Super Contributor
  • ***
  • Posts: 29410
  • Country: nz
  • Taupaki Technologies Ltd. Siglent Distributor NZ.
    • Taupaki Technologies Ltd.
Avid Rabid Hobbyist.
Some stuff seen @ Siglent HQ cannot be shared.
 

Online SiliconWizard

  • Super Contributor
  • ***
  • Posts: 15323
  • Country: fr
Since Microsoft is getting blamed any way, they should just give themselves the power to fix it in the future. Have a PXE server store optional patches and let the bootmanager apply them. A bios update could still brick below that level, but at least drivers would be fixable.

Well, both should get the blame.

The fact that some security software could even get the OS on its knees is a sign of a severe OS design flaw overall. But obviously not something they can easily fix unless they redesigned it almost entirely.

After that, a large part of the blame should be on the customers' shoulders - decent sysadmins should never allow third-party companies to remotely deploy such a low-level update on a large scale without testing it first locally. That's insane. They just get what they deserve here, and CrowdStrike should be thanked for making it obvious to everyone.
 

Offline fzabkar

  • Super Contributor
  • ***
  • Posts: 2613
  • Country: au
What testing regime ? Clearly the new update deployed was never tested, not even once.  :-DD

C'mon, a blue sreen problem is easily spotted, either manual or automated testing.

Some years ago I uncovered a bug in Seagate's firmware update for their ST3000DM001 HDD. No-one was able to apply the update. I identified the bug, posted a workaround in Seagate's own forum, and made certain Seagate personnel aware of the problem. Four years later the bug was still there. Worse still, the forum had been totally deleted on April's Fools Day.

It turned out that the update was never tested prior to being released. I can confidently say this because the payload files that were bundled with the update package had the wrong filenames. The updater tool expected different file names, so it errored out when those files were not present. The solution was to rename the payloads with their correct names.

ISTM that the testing process must be long and tedious. Presumably some employee decided that a particular change was too insignificant to affect the integrity of the update, so it was decided not to retest the package. I wonder if that is what happened at Crowdstrike.
« Last Edit: July 21, 2024, 02:30:43 pm by fzabkar »
 

Offline rsjsouza

  • Super Contributor
  • ***
  • Posts: 6059
  • Country: us
  • Eternally curious
    • Vbe - vídeo blog eletrônico
:)


That was awesome! A cautionary tale indeed...

That brought me back to the days of those "Doublespace" DOS disk utilities that compacted your data but a single glitch on the disk would flush your data into oblivion... A mistake my dad and I did just once.
Vbe - vídeo blog eletrônico http://videos.vbeletronico.com

Oh, the "whys" of the datasheets... The information is there not to be an axiomatic truth, but instead each speck of data must be slowly inhaled while carefully performing a deep search inside oneself to find the true metaphysical sense...
 

Offline Marco

  • Super Contributor
  • ***
  • Posts: 6947
  • Country: nl
The fact that some security software could even get the OS on its knees is a sign of a severe OS design flaw overall.
A driver which operates during boot (antivirus, storage, network, whatever) can always do that.

Automatically reverting is not an option because that might be used as part of a downgrade attack. There is no way to design the OS to protect against the initial hang except for mobile phone OS level control. When there are no third party drivers they can't interfere with the boot process. Then they'd need to be designing the hardware too, not just the OS.

What they can do is add a mechanism for remote updates which kicks in before the drivers are running through PXE.
« Last Edit: July 21, 2024, 09:53:53 am by Marco »
 

Online themadhippy

  • Super Contributor
  • ***
  • Posts: 2977
  • Country: gb
Dont blame microsoft for  international  IT failure day ,don't blame crowdstrike, its all the fault of that pesky EU.

https://www.euronews.com/next/2024/07/22/microsoft-says-eu-to-blame-for-the-worlds-worst-it-outage
 

Offline coromonadalix

  • Super Contributor
  • ***
  • Posts: 6599
  • Country: ca
blame "stupid" people(s) who rely on these cloud based softwares  loll    would be a lot   ... loll
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf