Despite the serious problems caused by stack/heap overflows, etc, this bug is much subtler and it's mostly an issue with the specification.
Fortunately it's not that the specification is wrong, the protocol is fine and it's been formally verified. It's just that the specification is incomplete.
Yesterday I came up for an example for a service note we sent to our customers. I'm sharing it here trying to clarify.
Imagine that someone pays you with a check. You have never seen one, so you are told to go to the bank in order to cash it. Banks, of course, are required to pay the check holder as long as the signature is valid and the account has enough funds.
So, we have a check paying protocol.
"Go to the bank, hand your check to the clerk and wait for your money".
Now, imagine that someone comes with a way to make perfect copies of a check. So, let's say two persons go to the bank with perfect
copies of the check. The bank clerks execute the check paying protocol, with the end result that two different persons have just cashed in the same check.
This is a problem, obviously, so someone notices that there is a vulnerability in the way checks are handled.
Let's look at it in more detail. We have a simple prototol for cashing checks. "Hand your check to the clerk and wait for your money".
Along with this protocol, the bank does some amount of internal processing. But that's not necessarily written down in the standards
for handling checks because it can be more or less obvious and, of course, different banks might have different internal procedures. In order to ease interoperability it's often advisable not to specify in more detail than needed.
So how are banks handling the check in our simple, vulnerable example?
Internal procedure (vulnerable)
Clerk receives check
Clerk verifies signature
Clerk checks account
If there are enough funds, withdraws and pays to the check holder.
But we said there is a security problem in that internal procedure. So we fix it adding an extra step. If the check can be cashed
(signature was correct and there are funds) you add an additional verification: was that check number cashed already? If it was
not, you add a note to the customer's account stating that "check number XXXXXXX has been cashed" and you pay the holder. Now
you have a protection against duplicates.
As a second measure against this kind of problem, some body governing the banking industry updates it's guidelines/standards for
check handling, stating that checks are numbered and the check numbers should be verified.
Pretty obvious, it's a very silly example. But the purpose is to show how is it possible to say that the WPA2 protocol doesn't need to
be changed (the protocol that describes how to cash a check in the bank hasn't been altered, the holder does exactly the same) and only
the internal procedures have been changes.
What has happened with WPA2 is roughly the same. Some internal processing was too lenient. And it's not necessarily a case of sheer
sloppiness, but most probably being cautious. When dealing with standards and multiple manufacturers you have to be extremely careful about adding extra checks because you might end up with serious interoperability problems. Unless those checks are not specified in the standard you can't make some assumptions. Moreover, wireless networks are tricky. There are errors and retransmissions.
So, probably the WPA2 standard will be updated. According to the author of the discovery it should be, and I agree of course. The protocol itself won't change. Just the internal processing.
Hope it helps clarify the nature of this problem. Sorry if the example is too silly, but in this case the real problem lies in the boundary between regulated and unregulated behavior in communication standards.