New research findings from a Chinese university provide an insight into why generative natural language processing Models like GPT-3 tend to “cheat” when asked a difficult question and produce answers that may be technically correct, but without real understanding why the answer is correct; and why they show little or no ability to explain the logic of their “easy” answer. Researchers are also proposing some new methods to help systems “study harder” during the training phase.

The problem is twofold: First, we design systems that aim to achieve results quickly and with optimal use of resources. While resources, such as GPT-3, may be significantly larger than the average NLP research project can gather, this culture of results-based optimization still permeates the methodology as it has come to dominate academic practice.

Thus, our training architectures reward models that approach quickly and produce obvious answers to questions, even if the NLP model is later unable to justify its answer or show how it came to its conclusions.

An early way to cheat

This is because the model learns “shortcuts” much earlier in education than more complex types of information retrieval. Because increased accuracy is often rewarded quite indiscriminately throughout practice, the model then prioritizes all approaches that allow it to answer the question “flexibly” and without real insight.

Because shortcut learning inevitably represents first Successes during practice, a session will naturally deviate from the more difficult task of getting a useful and more comprehensive epistemological perspective, which may include deeper and more insightful layers of attribution and logic.

Input to artificial intelligence ‘Easy’ answers

Another problem is that while recent research initiatives have studied Artificial intelligence tends to ‘cheat’ in this way and have recognized the phenomenon of ‘shortcuts’, so far no attempt has been made to classify ‘shortcut’ enabling material into active material, which would be a logical first step in dealing with what may prove to be a major architectural error in machine reader (MRC) systems.

New paper, Wangxuan Institute of Information Technology and Peking University Computational Linguistics MOE Key Laboratory in collaboration, testing different language models recently marked data set which includes classifications for ‘easy’ and ‘difficult’ solutions to a possible question.


The data set uses formatting as criteria for more complex and deeper responses, because semantic understanding is necessary to reformulate the information obtained. Instead, snapshot responses can use tags such as dates and other encapsulating keywords to produce a response that is actually accurate, but without context or justification.

The snapshot component of annotations has a question word match (QWM) and a simple match (SpM). In QWM, the model uses entities extracted from the provided text data and jettisons contexts; For SpM, the model identifies overlaps between answers and questions, both of which are given in the training data.

Keyboard shortcuts almost affect ‘viruses’ in a data set

Researchers argue that data sets usually contain a large number of shortcuts that make trained models trust shortcuts.

The two models used in the experiments were BiDAF and Google BERT-basis. The researchers point out that even if trained in data set variations with a higher proportion of “difficult” questions, both models still work better in quick-question questions than more difficult formulated questions, despite a small number of examples in the datasets.

This presents “snapshot data” almost in the context of a virus – that it must have very little presence in the data set in order to be accepted and prioritized in education according to standard NLP standards and practices.

Pointing out a scam

One method the study uses to show how the fragility of a one-touch answer is to replace an ‘easy’ entity word with an abnormal word. If the snapshot method has been used, the logic of the deceptive response cannot be given; but where the answer was given from a deeper context and a semantic evaluation of the broader supporting text, it is possible for the system to decode the error and reconstruct the correct answer.

By replacing

Replacing “America” ​​(location) with “Beyoncén” (person) reveals whether the model has background logic for the response.

Shortcuts due to financial necessity

The authors comment on some of the architectural reasons why shortcuts are so important in NLP training workflows ‘MRC models can learn shortcut tricks, such as QWM, with less computational resources than comprehension challenges, such as design recognition’.

This can thus be an unintended result of the usual optimization and resource-saving philosophy in approaches to understanding machine numbers and the pressure to get results with limited resources within a tight timeframe.

The researchers also point out:

‘[Since] the shortcut can answer most training questions correctly, the remaining limitless unresolved questions may not motivate models to explore sophisticated solutions that require challenging skills. ‘

If the results of the paper are later confirmed, it seems that the vast and ever-growing field of computing may have to consider hidden babies of data as a problem that needs to be addressed in the long run, or otherwise revise NLP architectures to prioritize more challenging routines for data collection.


Please enter your comment!
Please enter your name here