One of the latest breakthroughs came in May 2020 with OpenAI’s GPT-3. The behavior of this system is broader than that of its predecessors. It has been trained using Internet text data and has learned to learn; it is a multi-tasking meta-learner. It can learn to do a new task from just a few examples written in natural language. While the debate has now been resolved over whether GPT-3 is AGI – it’s nowhere near – it raised suspicions.
In July 2020, OpenAI released a beta API for developers to play the system, and they didn’t start finding unexpected results that even content providers hadn’t thought about. Given a set of English instructions, GPT-3 was found to be able to write code, poetry, fiction, songs, guitar tabs, LaTeX. As a result, the hype grew wild and the popularity of GPT-3 soared to headlines in major media magazines.
And when hype appears, anti-hype will not lag behind. Experts and not-so experts tried to reduce the jump noise. GPT-3 was described as an all-effective artificial intelligence, but it wasn’t, and it had to be said. Even OpenAI CEO Sam Altman said it was too much: “[GPT-3 is] active […] but it still has serious weaknesses and sometimes makes very stupid mistakes. Artificial intelligence is changing the world, but GPT-3 is only a very early view. “
People started looking for the limitations of GPT-3: Where it failed, what tasks it couldn’t do, what its weaknesses were … and they found many – maybe even too many. It was probably thought of by technical blogger Gwern Branwen. GPT-3 was neither perfect nor AGI, but people found failures where GPT-3 should have succeeded.
To demonstrate scientific accuracy, Gwern compiled a large number of published examples and retested those that appeared to be too difficult for GPT-3. He claimed that prompts (descriptions or examples entered for GPT-3) were often poorly defined. According to him, the call was better understood as a new programming paradigm and had to be taken care of accordingly:
“Sampling may indicate the existence but not the absence of information
GPT-3 may “fail” if the prompt is poorly written, does not contain enough examples, or uses poor sampling settings. I’ve shown this many times when someone shows a “failure” of GPT-3 – the fault was their own. The question is not whether a particular prompt works, but whether a particular prompt works. “
He showed that a good portion of the weaknesses that people perceived with GPT-3 were their failures in understanding how to communicate with the system. Humans could not find the limits of GPT-3 because they simply exceeded their testing method.