Author Topic: How the 60-Year-Old IRS Computer System Failed on Tax Day  (Read 3540 times)

0 Members and 1 Guest are viewing this topic.

Offline Homer J SimpsonTopic starter

  • Super Contributor
  • ***
  • Posts: 1240
  • Country: us
How the 60-Year-Old IRS Computer System Failed on Tax Day
« on: April 19, 2019, 12:45:32 am »


 

Offline wilfred

  • Super Contributor
  • ***
  • Posts: 1337
  • Country: au
Re: How the 60-Year-Old IRS Computer System Failed on Tax Day
« Reply #1 on: April 19, 2019, 01:50:38 am »
It paints a very misleading picture. The failure was effectively a firmware bug in an SSD. Albeit a really huge one. But an SSD all the same and it has no link of any kind back to the old code of the1960's. Code that continued to work and continues to work on a family of  software backward compatible mainframes. Quite an amazing achievment meeting customer demands to preserve vast investments in software.

Good video for those interested in pictures of old mainframes.

Spoiler alert: One of the IRS's IT contractors deliberately chose to NOT install a fix for a firmware bug known for some time. Shocked I was.
 
The following users thanked this post: thm_w, Tomorokoshi

Offline blacksheeplogic

  • Frequent Contributor
  • **
  • Posts: 532
  • Country: nz
Re: How the 60-Year-Old IRS Computer System Failed on Tax Day
« Reply #2 on: April 19, 2019, 03:43:28 am »
Spoiler alert: One of the IRS's IT contractors deliberately chose to NOT install a fix for a firmware bug known for some time. Shocked I was.

I've had to work with clients that essentially cannot apply updates once a system moves to prod. One system I remember had a kernel memory leak and I was asked to estimate how long before it crashed so they could schedule a maint window to reboot it. That was the only option available until a new environment was certified.
 

Offline james_s

  • Super Contributor
  • ***
  • Posts: 21611
  • Country: us
Re: How the 60-Year-Old IRS Computer System Failed on Tax Day
« Reply #3 on: April 19, 2019, 04:51:47 am »
It sounds silly but in some cases it makes sense to stick with the devil you know. Any time you fix a bug it very often creates new bugs.
 

Offline soldar

  • Super Contributor
  • ***
  • Posts: 3582
  • Country: es
Re: How the 60-Year-Old IRS Computer System Failed on Tax Day
« Reply #4 on: April 19, 2019, 07:13:25 am »
It sounds silly but in some cases it makes sense to stick with the devil you know. Any time you fix a bug it very often creates new bugs.


It does not sound silly at all. If you have a minor bug and you can live with it and work around it you don't need to risk causing greater damage with repairs.
All my posts are made with 100% recycled electrons and bare traces of grey matter.
 

Offline tooki

  • Super Contributor
  • ***
  • Posts: 12960
  • Country: ch
Re: How the 60-Year-Old IRS Computer System Failed on Tax Day
« Reply #5 on: April 19, 2019, 11:55:59 am »
And people don't seem to understand that code doesn't go bad. It doesn't ferment. Good code written 60 years ago will still run reliably today. There's a reason the financial industry (among others) updates their existing systems and doesn't replace them: replacing is exceedingly risky. The old code has correctly modeled every situation they've encountered so far, and is a history of every bug they've fixed. Starting from scratch, you risk modeling situations differently, introducing new bugs, removing fixes to old ones, etc. And for what? Bragging rights that your code is young?
 

Offline james_s

  • Super Contributor
  • ***
  • Posts: 21611
  • Country: us
Re: How the 60-Year-Old IRS Computer System Failed on Tax Day
« Reply #6 on: April 19, 2019, 06:40:12 pm »
It sounds silly but in some cases it makes sense to stick with the devil you know. Any time you fix a bug it very often creates new bugs.


It does not sound silly at all. If you have a minor bug and you can live with it and work around it you don't need to risk causing greater damage with repairs.

I can see how a situation like this would sound silly to a layman though, to put a bandaid over a serious memory leak by just forcing a reboot rather than fixing the leak. To understand why the bandaid may be preferable to actually fixing the bug requires a deeper understanding of how software works and the potential consequences of fixing an issue like that. A deeper understanding than the average non-software person has.

 

Offline rstofer

  • Super Contributor
  • ***
  • Posts: 9955
  • Country: us
Re: How the 60-Year-Old IRS Computer System Failed on Tax Day
« Reply #7 on: April 19, 2019, 09:29:06 pm »
So why do they have a picture of an analog computer at 9:11?
Mostly fluff...

Banks are still using COBOL and there aren't enough new-grads that know anything about it.
https://thenextweb.com/finance/2017/04/10/ancient-programming-language-cobol-can-make-you-bank-literally/
 

Offline Gyro

  • Super Contributor
  • ***
  • Posts: 10107
  • Country: gb
Re: How the 60-Year-Old IRS Computer System Failed on Tax Day
« Reply #8 on: April 19, 2019, 09:33:36 pm »
Quote
Banks are still using COBOL and there aren't enough new-grads that know anything about it.

Damn, I'm marketable again!  ::)

COBOL is about the most simple language to learn but it's very verbose. As I remember, the biggest challenge was being able to spell 'Environment' correctly for the Environment Division - not so funny when your run got a syntax error and you had to go back and re-punch the card! It's nicely 'structured' though (in the old sense of the word). It required you to declare all permanent and temporary variables and even print output formats before you actually got around to calculating anything.

I was taught it back in the 70's by a delightful old lady spinster (who must have been in computing from the very dawn) - she described it as a being designed by Grace Hopper "to be easy enough to use by th*ck American sailors". I guess political correctness has moved on but I'll never forget that phrase coming out of her mouth!  ;D

Easy money to be made folks!
« Last Edit: April 20, 2019, 09:58:08 am by Gyro »
Best Regards, Chris
 
The following users thanked this post: SeanB

Offline guenthert

  • Frequent Contributor
  • **
  • Posts: 767
  • Country: de
Re: How the 60-Year-Old IRS Computer System Failed on Tax Day
« Reply #9 on: April 20, 2019, 08:29:03 pm »
And people don't seem to understand that code doesn't go bad.
Oh yes, it does, it's called bit-rot.  And while code which flawlessly performed a given task on a given system in the past, will continue to do so, that's an exceedingly unrealistic scenario.  The code might not change, but everything around it will: systems it is meant to run on, requirements, assumptions it is based on and community.  Besides that it is actually really, really difficult to create flawless code.  As a general rule one must assume what isn't tested, doesn't work.

Kudos to IBM who had the foresight (if one can call it that way -- at the time they implemented it, they went already through several computer generations) to design their mainframes with long time backwards compatibility in mind. 

And I'm sure the original programmers anticipated changes in tax code and assured that the source code would read and interpret such tables and wouldn't have to be annually changed.  But were they able to foresee other changes in requirements, like information exchange with treasury department and DHS, double-taxation avoidance agreements with foreign countries, on-line tax filing?

Stories of COBOL programmers hired out of retirements to fix y2k bugs are famous and while there might still be some fresh ones, it clearly is these days a fringe language.  Maintaining old code is always costly, such of an unpopular language doubly so.

It's a furiously frustrating misunderstanding that software is at one point 'done'.  Unmaintained code is a liability, not an asset. 
« Last Edit: April 20, 2019, 08:36:46 pm by guenthert »
 
The following users thanked this post: NiHaoMike, cjs

Offline james_s

  • Super Contributor
  • ***
  • Posts: 21611
  • Country: us
Re: How the 60-Year-Old IRS Computer System Failed on Tax Day
« Reply #10 on: April 21, 2019, 12:29:51 am »
Code doesn't go bad, occasionally the storage medium does but that can be mitigated by copying the code onto newer media. When you copy a digital file you get a perfect copy, it doesn't degrade from one generation to the next.
 
The following users thanked this post: tooki

Offline tooki

  • Super Contributor
  • ***
  • Posts: 12960
  • Country: ch
Re: How the 60-Year-Old IRS Computer System Failed on Tax Day
« Reply #11 on: April 21, 2019, 12:46:45 am »
And people don't seem to understand that code doesn't go bad.
Oh yes, it does, it's called bit-rot.  And while code which flawlessly performed a given task on a given system in the past, will continue to do so, that's an exceedingly unrealistic scenario.  The code might not change, but everything around it will: systems it is meant to run on, requirements, assumptions it is based on and community.  Besides that it is actually really, really difficult to create flawless code.  As a general rule one must assume what isn't tested, doesn't work.

Kudos to IBM who had the foresight (if one can call it that way -- at the time they implemented it, they went already through several computer generations) to design their mainframes with long time backwards compatibility in mind. 

And I'm sure the original programmers anticipated changes in tax code and assured that the source code would read and interpret such tables and wouldn't have to be annually changed.  But were they able to foresee other changes in requirements, like information exchange with treasury department and DHS, double-taxation avoidance agreements with foreign countries, on-line tax filing?

Stories of COBOL programmers hired out of retirements to fix y2k bugs are famous and while there might still be some fresh ones, it clearly is these days a fringe language.  Maintaining old code is always costly, such of an unpopular language doubly so.

It's a furiously frustrating misunderstanding that software is at one point 'done'.  Unmaintained code is a liability, not an asset.
Bit rot is media failure. That’s not the issue in question. We’re talking about the code itself (not a specific copy of it).

Whatever the Cobol code for “1+1=2” is, will work as correctly now as in 1960. This is what I mean by “code doesn’t go bad”.

As for code maintenance: Um, yeah, duh. Nobody claimed that the code is frozen. But it does contain the entire history of bug fixes, exceptions to exceptions to exceptions (like obscure rules, which must not change in how they’re run, lest old data suddenly produces new results), etc. Reproducing all of those behaviors in new code is almost impossible to achieve. And that’s why financial institutions (tax authorities, banks, insurance companies, etc) are exceedingly reluctant to replace their core accounting code. They simply use lots of middleware to connect those core systems to modern interfaces (both user interfaces and software interfaces).
 

Offline cjs

  • Contributor
  • Posts: 49
  • Country: jp
  • Software geek
Re: How the 60-Year-Old IRS Computer System Failed on Tax Day
« Reply #12 on: April 21, 2019, 04:01:09 am »
It sounds silly but in some cases it makes sense to stick with the devil you know. Any time you fix a bug it very often creates new bugs.

Very true, to a point. The problem is, there usually comes a point in any long-lived system where (usually triggered by external factors) you have to change something, and at that point it's quite possible you're faced with a massive cascade of changes that you're not at all prepared to handle or test.

Let's say you've got an internal web-based app that uses a certain weird feature of Internet Explorer 4.x and your IT department comes along and finally says, "As of next year we will no longer be able to supply your users with desktop systems that can run IE4." Well, you can easily replace the feature you used with a different feature from a modern web browser, but the developer now needs a newer version of a library, which triggers needing new versions of three other libraries, which in turn requires a compiler upgrade, and so on and so forth. Given that the system hardly ever changes and it's been massively tested in production, it's unlikely you've invested in a comprehensive test and validation suite for your code, so now you have a real problem and you're likely to kick out a release that could take years to settle down and get back to being as stable as it was before.

Dealing with this is part of the core of "agile development:" rather than putting off changes to collect them together and do them as one huge change, instead aggresively integrate changes as soon as possible, and design your system to handle this. The catchphrase for this is, "If it hurts, do it more often."

If you're having to frequently test a system, the most cost-effective way to do that is to automate as much as possible of it. And the most cost-effective way to automate testing is (as all hardware engineers know) to design the system to be easily tested by automated tools. That's usually a stumbling block with legacy systems; adding comprehensive automated tests can be even more costly than a full rewrite.

But once you've got that in place, dribbling in frequent changes works a lot better than trying to do all the changes at once because you find out about problems earlier and you've got a much, much smaller problem to solve when one comes up because the extent of each change is much more limited.

This also allows you to keep the software itself less complex, smaller and easier to maintain and modify because it allows you easily to do things like changing internal APIs and their behaviour to simplify things even when that change touches things throughout the system.

This isn't to say that existing or even new projects can do this easily, or even at all. The whole concept of "Agile" and embracing continuous change really started coming to popular attention less than twenty years ago, and at this point we've only just passed through the stage of most developers and managers saying, "that's crazy and can never work." Actually building systems this way requires both developers and managers to have a fair amount of expertise in this area and there has to be good co-operation between them. Only a relatively small minority of each have this expertise at a level where they could use it in greenfield projects, and fewer yet the expertise and skill to move existing large projects to agile.
 

Offline james_s

  • Super Contributor
  • ***
  • Posts: 21611
  • Country: us
Re: How the 60-Year-Old IRS Computer System Failed on Tax Day
« Reply #13 on: April 21, 2019, 04:13:27 am »
Well there's no perfect solution, it really depends on what you're trying to do. I work in an agile development environment and it works well for what we do, our service integrates with numerous external services that change frequently so we have to adapt and change quickly ourselves to keep up. This does indeed result in new bugs frequently but the nature of what we do means that these bugs are rarely more than a minor inconvenience.

On the other hand, if we were talking about the firmware in a fly by wire computer in an airliner there is no way in hell I'd get on that plane if agile development was used to create it. If you have something mission critical, you never change anything without very careful evaluation and after doing so you go through a full test and certification process to ensure that it is safe. You have to walk a fine line between making enough changes to keep up with user needs and minimizing the frequency of expensive certification passes. There is a LOT of mission critical stuff like this out there where a new bug or unexpected behavior could cost billions of dollars or lives.
 

Online NiHaoMike

  • Super Contributor
  • ***
  • Posts: 9281
  • Country: us
  • "Don't turn it on - Take it apart!"
    • Facebook Page
Re: How the 60-Year-Old IRS Computer System Failed on Tax Day
« Reply #14 on: April 21, 2019, 05:14:44 am »
Bit rot is media failure. That’s not the issue in question. We’re talking about the code itself (not a specific copy of it).

Whatever the Cobol code for “1+1=2” is, will work as correctly now as in 1960. This is what I mean by “code doesn’t go bad”.

As for code maintenance: Um, yeah, duh. Nobody claimed that the code is frozen. But it does contain the entire history of bug fixes, exceptions to exceptions to exceptions (like obscure rules, which must not change in how they’re run, lest old data suddenly produces new results), etc. Reproducing all of those behaviors in new code is almost impossible to achieve. And that’s why financial institutions (tax authorities, banks, insurance companies, etc) are exceedingly reluctant to replace their core accounting code. They simply use lots of middleware to connect those core systems to modern interfaces (both user interfaces and software interfaces).
The problem of code becoming unfit for purpose is also referred to as "bit rot", but the proper term is code rot or software rot.
https://en.wikipedia.org/wiki/Software_rot
Cryptocurrency has taught me to love math and at the same time be baffled by it.

Cryptocurrency lesson 0: Altcoins and Bitcoin are not the same thing.
 

Offline Rerouter

  • Super Contributor
  • ***
  • Posts: 4704
  • Country: au
  • Question Everything... Except This Statement
Re: How the 60-Year-Old IRS Computer System Failed on Tax Day
« Reply #15 on: April 21, 2019, 05:58:37 am »
The part I don't fully understand is, its written in assembly, there is a known input / output vs time matrix for all of those commands, surely at a minimum they would be able to run analysis to recreate that in modern assembly then test the crap out of it for race conditions and orphaned branches?

or is there something I am missing? instead of brute forcing, to use smart analysis to refine this stuff, they certainly have the money to run this kind of analysis,
start at the lowest functional blocks to replace there functionality over there input range, and keep replacing blocks until most of it has been replaced / refined.
 

Offline bd139

  • Super Contributor
  • ***
  • Posts: 23099
  • Country: gb
Re: How the 60-Year-Old IRS Computer System Failed on Tax Day
« Reply #16 on: April 21, 2019, 10:37:43 am »
Sounds like “fear of failure” here. Basically the testing process is so difficult or time consuming to execute or so unreliable they don’t make changes any more. These are always unconditionally human issues. Usually politics, QA protectionism and incompetence.

They need to go full Hipp here and build test automation so change impact is visible. https://www.sqlite.org/testing.html

The original issue of the SSD firmware bug was a fuck up. A big one and they’re playing CYA.

And there’s no fucking way anyone should be building new stuff in COBOL today. The only reason it still exists is more fear.

It’s my day job to go into companies and fix these broken cultures for ref.
 

Offline cjs

  • Contributor
  • Posts: 49
  • Country: jp
  • Software geek
Re: How the 60-Year-Old IRS Computer System Failed on Tax Day
« Reply #17 on: April 21, 2019, 12:44:08 pm »
Sounds like “fear of failure” here. Basically the testing process is so difficult or time consuming to execute or so unreliable they don’t make changes any more. These are always unconditionally human issues. Usually politics, QA protectionism and incompetence.

Well, I'm sure that there are some situations where it's simply a good business decision. But I have yet to see one personally. :-/

And more often than not, in my experience, there isn't a fear of failure in this particular area but a total lack of understanding that anything other than the status quo could even exist or that things could be done better.

Quote
They need to go full Hipp here and build test automation so change impact is visible. https://www.sqlite.org/testing.html

My guess would be that, even if they had management that could imagine this rather than dismissing it as "not practical" or other typical management misunderstandings, they simply don't have the skill to take an old system like that and put it through the expensive, many-year process to get it to a good level of automated testing and ability to accept change. This is one of the things I do for a living as well, and even in finance where we can pay a lot more than government-level wages we have few developers with the ability to do things like this.

Quote
It’s my day job to go into companies and fix these broken cultures for ref.

I have several times been ostensibly hired to fix broken cultures, bring in "agility" and "testing," or similar things, only to find out that they didn't actually want things fixed, they just wanted to feel like they were doing something about a problem they felt bad about.
 

Offline bd139

  • Super Contributor
  • ***
  • Posts: 23099
  • Country: gb
Re: How the 60-Year-Old IRS Computer System Failed on Tax Day
« Reply #18 on: April 21, 2019, 01:06:11 pm »
Your last point is horribly true. Seen that a few times.

The worst thing I see is the risk register approach. Company puts all risks in a list, usually on an non-backed-up sharepoint install, so they know about them. Then they accept these risks and carry on doing stupid things over and over again.
 

Offline madires

  • Super Contributor
  • ***
  • Posts: 8218
  • Country: de
  • A qualified hobbyist ;)
Re: How the 60-Year-Old IRS Computer System Failed on Tax Day
« Reply #19 on: April 21, 2019, 01:20:46 pm »
It's a furiously frustrating misunderstanding that software is at one point 'done'.  Unmaintained code is a liability, not an asset.

Words of wisdom! :-+ And don't forget to add lots of comments to the source code.
 

Offline bd139

  • Super Contributor
  • ***
  • Posts: 23099
  • Country: gb
Re: How the 60-Year-Old IRS Computer System Failed on Tax Day
« Reply #20 on: April 21, 2019, 01:23:30 pm »
And don't forget to add lots of comments to the source code.

Unless you want job security  8)
 

Offline madires

  • Super Contributor
  • ***
  • Posts: 8218
  • Country: de
  • A qualified hobbyist ;)
Re: How the 60-Year-Old IRS Computer System Failed on Tax Day
« Reply #21 on: April 21, 2019, 01:43:05 pm »
Code doesn't go bad, occasionally the storage medium does but that can be mitigated by copying the code onto newer media. When you copy a digital file you get a perfect copy, it doesn't degrade from one generation to the next.

But with a changing outside world code has to be adapted too. OS's change, APIs change and back then input sanitization wasn't checked with fuzzers. You also have to deal with artificial limits of all sorts, e.g. bit depth of variables or the size of arrays. Or the year 2038 problem when the signed 32 bit unix time will roll over. What about old problematic library functions which are replaced by more secure and safe versions?
 

Offline madires

  • Super Contributor
  • ***
  • Posts: 8218
  • Country: de
  • A qualified hobbyist ;)
Re: How the 60-Year-Old IRS Computer System Failed on Tax Day
« Reply #22 on: April 21, 2019, 01:50:02 pm »
And don't forget to add lots of comments to the source code.

Unless you want job security  8)

Another myth. It simply takes longer to get into the code, but it doesn't prevent other softdevs from doing their job.
 

Offline NorthGuy

  • Super Contributor
  • ***
  • Posts: 3275
  • Country: ca
Re: How the 60-Year-Old IRS Computer System Failed on Tax Day
« Reply #23 on: April 21, 2019, 02:37:04 pm »
I'd say, if it is not broken, don't fix it.

Instead of re-writing the old software trying to duplicate what you already have, you rather develop new, better ways of doing things, which will naturally require new software.

For example, there's no need to re-write software which works with SWIFT money transfers. Rather, a better way to do money transfers should emerge to replace SWIFT.

If you keep using the old methods, there's nothing wrong with using the old software. Moreover, re-developing the software which already works fine is complete idiocy and waste of resources.
 
The following users thanked this post: tooki, james_s, bd139

Offline cjs

  • Contributor
  • Posts: 49
  • Country: jp
  • Software geek
Re: How the 60-Year-Old IRS Computer System Failed on Tax Day
« Reply #24 on: April 21, 2019, 05:12:19 pm »
Or the year 2038 problem when the signed 32 bit unix time will roll over.

That one will be easy to deal with. Since taxes from 68 years ago will be of only historical interest (most people filing will not even have been alive that long ago!) they simply ask everyone, starting in 2038, to write "1970" on their tax forms anywhere they need to fill in the year.
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf