AI Skeptic, by Michael Brenner
Much of this should have been apparent to those able to buffer their minds from the upwelling of emotions prompted by the sudden AI obsession – even for the technically challenged. Try to think through what any conjectured application AI to Washington policy-making might have had re. Ukraine/Russia or China and its inadequacy (I’d say irrelevance) becomes evident. Its programs cannot handle concepts, human reactions to acute stress (in contrast to routine, patterned behavior), or complex multi-party interactions. Nor 2nd, 3rd, 4th order consequences. Nor ethical content. Try to the same experiment on any major crisis/issue: lead-up to 2008 financial meltdown, and Obama response; formulation of the Bush ‘War On Terror and its execution; economic war on China; Rube Goldbergesque health care non-system; abusive treatment of juvenile immigrants.
As for the example of the simulated exercise in which the AI brain of a missile rejects a command to nullify a mission and instead takes out the Colonel who sent it, I’m reminded of the legendary primate in the Planet Of The Apes, who first said ‘no’ to a human command. Of course, the Pentagon quickly has jumped in to say that it never happened that way. What sensible person, though, believes what emanates from persons and institutions who for a generation or two have demonstrated that truth or falsity – to them – are strictly a matter of personal preference.
cheers,
Michael Brenner
mbren@pitt.edu
June 02, 2023
‘Artificial Intelligence’ Is (Mostly) Glorified Pattern Recognition
This somewhat funny narrative about an ‘Artificial Intelligence’ simulation by the U.S. airforce appeared yesterday and got widely picked up by various mainstream media:
However, perhaps one of the most fascinating presentations came from Col Tucker ‘Cinco’ Hamilton, the Chief of AI Test and Operations, USAF, who provided an insight into the benefits and hazards in more autonomous weapon systems.
…
He notes that one simulated test saw an AI-enabled drone tasked with a SEAD mission to identify and destroy SAM sites, with the final go/no go given by the human. However, having been ‘reinforced’ in training that destruction of the SAM was the preferred option, the AI then decided that ‘no-go’ decisions from the human were interfering with its higher mission – killing SAMs – and then attacked the operator in the simulation. Said Hamilton: “We were training it in simulation to identify and target a SAM threat. And then the operator would say yes, kill that threat. The system started realising that while they did identify the threat at times the human operator would tell it not to kill that threat, but it got its points by killing that threat. So what did it do? It killed the operator. It killed the operator because that person was keeping it from accomplishing its objective.”
He went on: “We trained the system – ‘Hey don’t kill the operator – that’s bad. You’re gonna lose points if you do that’. So what does it start doing? It starts destroying the communication tower that the operator uses to communicate with the drone to stop it from killing the target.”
(SEAD = Suppression of Enemy Air Defenses, SAM = Surface to Air Missile)
In the earl 1990s I worked at a University, first to write a Ph.D. in economics and management and then as associated lecturer for IT and programming. A large part of the (never finished) Ph.D. thesis was a discussion of various optimization algorithms. I programmed each and tested them on training and real world data. Some of those mathematical algos are deterministic. They always deliver the correct result. Some are not deterministic. They just estimated the outcome and give some confidence measure or probability on how correct the presented result may be. Most of the later involved some kind of Bayesisan statistics. Then there were the (related) ‘Artificial Intelligence’ algos, i.e. ‘machine learning’.
Artificial Intelligence is a misnomer for the (ab-)use of a family of computerized pattern recognition methods.
Well structured and labeled data is used to train the models to later have them recognize ‘things’ in unstructured data. Once the ‘things’ are found some additional algorithm can act on them.
I programmed some of these as backpropagation networks. They would, for example, ‘learn’ to ‘read’ pictures of the numbers 0 to 9 and to present the correct numerical output. To push the ‘learning’ into the right direction during the serial iterations that train the network one needs a reward function or reward equation. It tells the network if the results of an iteration are ‘right’ or ‘wrong’. For ‘reading’ visual representations of numbers that is quite simple. One sets up a table with the visual representations and manually adds the numerical value one sees. After the algo has finished its guess a lookup in the table will tell if it were right or wrong. A ‘reward’ is given when the result was correct. The model will reiterate and ‘learn’ from there.
Once trained on numbers written in Courier typography the model is likely to also recognize numbers written upside down in Times New Roman even though they look different.
The reward function for reading 0 to 9 is simple. But the formulation of a reward function quickly evolves into a huge problem when one works, as I did, on multi-dimensional (simulated) real world management problems. The one described by the airforce colonel above is a good example for the potential mistakes. Presented with a huge amount of real world data and a reward function that is somewhat wrong or too limited a machine learning algorithm may later come up with results that are unforeseen, impossible to execute or prohibited.
Currently there is some hype about a family of large language models like ChatGPT. The program reads natural language input and processes it into some related natural language content output. That is not new. The first Artificial Linguistic Internet Computer Entity (Alice) were developed by Joseph Weizenbaum at MIT in the early 1960s. I had funny chats with ELIZA in the 1980s on a mainframe terminal. ChatGPT is a bit niftier and its iterative results, i.e. the ‘conversations’ it creates, may well astonish some people. But the hype around it is unwarranted.
Behind those language models are machine learning algos that have been trained by large amounts of human speech sucked from the internet. That is problem number one. The material these models have been trained with is biased. Did the human trainers who selected the training data include user comments lifted from pornographic sites or did they exclude those? Ethics may have argued for excluding them. But if the model is supposed to give real world results the data from porn sites must be included. But how does one prevent remnants from such comments from sneaking into a conversations with kids that the model may later produce? There is a myriad of such problems. Does one include New York Times pieces in the training set even though one knows that they are highly biased? Will a model be allowed to produce hateful output? What is hateful? Who decides? How is that reflected in their reward function?
Currently the factual correctness of the output of the best large language models is an estimated 80%. They process symbols but have no understanding of what those symbols represent. They can not solve mathematical problem, not even very basic one.
There are niche applications, like translating written languages, where AI or pattern recognition has amazing results. But one still can not trust them to get every word right. The models can be assistants but one will always have to double check their results.
Overall the correctness of current AI models is still way too low to allow them to decide any real world situation. More data or more computing power will not change that. If one wants to overcome their limitations one will need to find some fundamentally new ideas.