All of this is going too fast.
What is the likelihood that the big players have training algorithms and/or hardware that is much more efficient than what has been revealed publicly? I have a hard time believing that gradient descent-like algorithms (ADAM etc.), which are theoretically inefficient (much slower convergence than second derivative Newton methods) but fit in memory, are a good use of resources to train these humongous models. Do they really just throw obscene amounts of computing power at these kinds of algorithms with no or very little secret sauce? Or did they figure out how to run something with a higher rate of convergence on these big models? Are they using some kind of analog hardware (not even talking quantum here)?
Also I wouldn't be surprised if TLAs already had 10 to 15 years ago technology comparable to what we have today, specifically something at least like OpenWhisper and for images maybe something similar to the first versions of Stable Diffusion. After all we know they had voice keyword spotting systems in the 90s.
How good was the accuracy of classical, publicly known "non-AI" voice recognition algorithms back then? It has to have been be pretty damn good to be usable for mass surveillance without triggering a deluge of false alarms that have to be cleared by tens of thousands of human operators.
In other words, I think the AI we're seeing now has been recently technology-transferred to the public, however some of the secret sauce (for fast training) might still be held as secrets. Heck, maybe OpenAI subcontracts the NSA to train their models on their massive farms and/or non-classical computers. Someone call the WTO, I sense unfair government subsidies!