From Computational to Cognitive: ChatGPT and Natural Language Models

Dec 6

Wolff Gilligan

Illustrations by Maddie Turner

Have you ever wondered if ChatGPT can really think and speak like you do? The science behind it seems to raise more questions than answers. In order to start answering those questions, we must explore how current research is attempting to enable computers to comprehend and generate human language [1, 2, 3]. With the advent of OpenAI's ChatGPT, it may seem like natural-sounding computer-generated language is finally within reach. ChatGPT, while not flawless, demonstrates an impressive use of language, engaging users with responses that are often remarkably human-like. Despite these advances, there is still a long road ahead to enabling language processing models that fully grasp the intricacies of human language. One key issue has emerged: the division between the engineering-based development of these systems and the academic study of how humans comprehend and use language. Separating the two once-close disciplines has led to a world where computer models like ChatGPT often interpret and respond to language in ways that, while sophisticated, lack the depth of human cognition [2, 4, 5]. Investigating the current limitations of natural language processing and the ways in which we can take a more multidisciplinary approach to the field has the power to close the linguistic gap between humans and computational models.

Nodes, Neural Networks, and Nuance. Oh My!

Language is both an integral tool for communication and a vastly complicated one, which makes breaking it down for computers seem like an impossible challenge. However, this is precisely the endeavor of Natural Language Processing (NLP): the field dedicated to enabling computers to understand human language for use in a myriad of tasks [6]. One common task for NLP involves analyzing sections of text or individual words to determine whether they convey a positive, negative, or neutral sentiment — a process known as sentiment prediction [7]. Given the word ‘melancholic,’ a sentiment prediction model might guess that the sentiment is 0.4 on a scale from ‘very negative’ (0) to ‘very positive’ (1), and it might give the word ‘happy’ a 0.9. The ability to efficiently separate positive from negative across large volumes of online data, such as product reviews or social media posts, makes sentiment prediction models valuable tools [7]. Sentiment prediction models are the ancestors of the modern ChatGPT, but their techniques and frameworks are still used today [8].

Sentiment prediction models and other language processing systems are considered miraculous; however, to understand how they are implemented, we must look under the hood. At the core of NLP models lies a fundamental computational building block: the simple neural network [9, 10, 11]. Put plainly, a neural network is a system of connected processing units called nodes. The connections between nodes are mathematically fine-tuned by an algorithm to achieve a desired output when presented with a given input. For example, a sentiment prediction model would have words as the input, and the predicted sentiment of those words as the output. The nodes are structured into three distinct ‘layers:’ the input layer (which receives the raw data), the hidden layer (where the computations take place), and the output layer (which presents the final results of the computations). The network is fed data (that has been generated by groups of people and digitized) in successive rounds, a process called ‘training’ [7]. In each round, the network extrapolates patterns from the input and then comes up with a guess of what the output is, representing what it believes the real answer to be. The network then checks its work, and, if needed, refines its connections to become more precise, then tries again. Over time, the model ‘learns’ patterns based on the data it receives in order to produce more accurate outputs. The learning process takes place in the hidden layer, which is virtually impossible for humans to directly interpret. A well-constructed neural network is designed to eventually be able to accurately predict the correct output, solely based on the input [9, 10, 11].

Predicting outputs for a specific problem begins with designing a new neural network model. For example, to perform the aforementioned sentiment prediction, we would first need a dataset of words paired with numbers representing their sentiments on a scale of 0 to 1 (suggesting very negative to very positive); for instance, ‘melancholic = 0.4’ or ‘happy = 0.9.’ To train our model, we feed it words from our dataset, which make up the input layer, and the model then guesses the value of the sentiment, the output [11]. Based on the connections made in the hidden layer, the model generates its first guess. The guess might be very wrong, for example, a prediction of 0.8 (fairly positive) for the word ‘melancholic.’ At this point, the model checks its work by comparing its guess to the human-generated answer, and it adjusts the connections in its hidden layer using the algorithm [11]. Through thousands of guesses and recalibrations, the hidden layer’s connections become increasingly accurate at predicting the sentiment of a given word from the dataset [9, 10, 11]. The process of training models — by adapting the hidden layer to achieve greater accuracy over time — is the same for more complex networks, such as those in the GPT family [12]. ChatGPT was trained on a dataset called the Common Crawl, which is an open source compilation of billions of pages across the Internet [12]. Due to its highly complex nature, the ChatGPT model features more intricate hidden layers and algorithms, enabling it to discern more nuanced patterns in language [9, 10, 11]. Additionally, with a more complex model, we can generate more complex outputs, like the ones produced by ChatGPT [12]. At all levels of complexity, pure pattern recognition is the engine of neural networks; the networks are limited by their algorithmic complexity and self-contained processes.

Syntax Sorcery: Divining The Basics of Cognitive Linguistics

When we ‘talk’ with ChatGPT, it is easy to assume we are using a computer model that truly understands human language. Yet this assumption overlooks a crucial question: does ChatGPT process and generate language in the same way humans do? While NLP models are able to produce and read human language, that does not mean that they interpret and formulate words in the same manner as humans [2]. We need to consider the fact that computers are fundamentally different agents from humans, starting with their basic ‘hardware.’ Although both computers and humans produce words, the neurons in the human brain and the computational neural network of computers are fundamentally not the same. Consequently, the ways in which computers and humans process language must also be different. Therefore, in order to grasp the mechanics of how we understand, produce, and interpret language, we need to explore the field of cognitive linguistics [13]. While the development of computer models may seem disjointed from the goal of human language understanding, the two are deeply intertwined [14]. The first algorithms used for NLP were developed back in the 1960s, although they were limited in practical usage due to the lack of computational power at the time [15]. As a result, research in the field of NLP was often restricted to a theoretical discussion of abstract ideas about language that could not yet be actualized [15]. Similar to cognitive linguists, early NLP researchers were exploring language at a conceptual level [2]. As processing power progressed, allowing for the large-scale application of pre-existing algorithms, NLP became less theoretical and more practical, leading to a separation between the fields [14]. While the fields have diverged, their goals remain the same; therefore, it is necessary to discuss cognitive linguistics and NLP in relation to each other.

At the core of cognitive linguistics exists the basic grammatical units of language, such as form-meaning pairings [16]. Form-meaning pairings refer to the connection between how a linguistic element, such as a word, sounds and the idea it represents [16, 17]. For instance, consider the connection between the word ‘bucket’ (form) and the physical container it represents (meaning). Nearly all linguistic theories recognize that language is made up of form-meaning pairings, including the theory of construction grammar [17]. According to the general theory of construction grammar, form-meaning pairings apply to a variety of linguistic elements collectively called constructions, including words, idioms, and phrases [17]. Let’s revisit our form-meaning pairing example from before, using the word ‘bucket.’ Think now about the phrase ‘kick the bucket:’ it does not mean to physically boot a pail, but is instead an idiom meaning ‘to die.’ Yet, notice the individual words ‘kick’ and ‘bucket.’ Despite their original meanings, these words have come together to form an entirely different meaning. Within the framework of construction grammar, the idiom ‘kick the bucket’ would be considered a construction since its form is paired with a meaning that differs from the literal translation of the individual words [16, 17]. Concepts like these are very useful for breaking down language; nonetheless, cognitive linguistics is not just about dissecting language into tiny components. The field helps us to understand the rich tapestry of connections, meanings, and experiences that language encompasses.

It is crucial to acknowledge the differences and potential synergies between traditional NLP models that utilize neural networks and systems inspired by cognitive linguistics. Neural networks used in NLP tasks often approach language as an array of patterns [10, 11]. Recall that sentiment analysis models achieve their prediction of ‘positive,’ ‘negative,’ or ‘neutral’ by recognizing patterns in large datasets and then applying this knowledge to analyze new inputs. Neural networks operate by adjusting their predicted output based on patterns they have previously seen, without any inherent understanding of what the patterns signify in human cognition [11]. As a result, the network output is only based on the form of language rather than its meaning. In contrast, cognitive linguistics is not only about patterns, but also the intricacies of meaning, context, and their relation to the human experience [2]. Traditional NLP models — like sentiment analysis neural networks — might tell you that a sentence is negative because they have seen similar patterns before. Alternatively, a cognitive linguistics-inspired model would be able to discern the negativity based on various other factors, possibly taking into account the underlying constructions, context, and human experiences associated with words [4]. While humans consider words and phrases within the context of their own lives and thoughts, neural networks reduce words and phrases to their form [2, 5, 16]. Going back to our ‘kick the bucket’ idiom, a neural network would solely focus on the form of the phrase and not take note of its morbid connotation [2, 16]. Neural networks may detach forms from the human experience of language, resulting in a loss or misinterpretation of the word’s meaning [2, 5, 14]. While the neural network cannot explain its answer, the predictions of a cognitive linguistics-inspired model are easier for humans to understand [4, 5]. Though a difference in explainability may seem insignificant, it is the first sign of how NLP models fail to capture the true essence of human language.

Parsing Predictability: Bridging Gaps with Cognitive Linguistics

While the objective of both NLP and cognitive linguistics requires similar knowledge and research, the relationship between the two is not as strong as one might assume [5, 14]. Despite NLP’s recent advancements with miracle models such as ChatGPT, future strides can only be made to address weaknesses in NLP research by better integrating cognitive linguistics concepts [5, 14]. Some of the issues associated with NLP systems are evident when examining a common language task: reading. Cognitive linguists theorize that when people read or process sentences, the level of effort required depends on how predictable each word or phrase is, based on the context [18]. For instance, the flowery and irregular writing of Hamlet may take more effort to read than the more commonplace vocabulary and sentence structure of Frog and Toad. While we know that predictability plays a role in reading, the way in which humans naturally anticipate what follows in a sentence remains unknown [19]. To mimic this behavior in computers, an algorithm that incorporates grammar and structure rules is utilized in order to predict the next word in a sentence [18]. An example of such an algorithm is the left corner parser, which is simply a method of computational language processing that uses a flowchart of grammatical rules that is examinable and understandable by humans. There is no hidden layer here! To figure out which method of language processing is more accurate, we can compare the performance of the left corner parser to a highly complex neural network model (GPT-2) and a baseline set of human data collected through observing people reading. While both systems performed worse than humans, the left corner parser achieved equivalent or superior results when compared to GPT-2 in predicting the next word [18]. The findings suggest that large neural language models are not as proficient at grasping more nuanced aspects of human language, which are more pronounced in more complex tasks like predicting the next word in a sentence [19]. GPT-2, a large neural network model, underperformed in comparison to an algorithm designed using insights from cognitive linguistics [18].

Beyond GPT-2, systems that solve complex problems related to the understanding of human language are often based on cognitive linguistics research [4]. Despite advances in complex neural networks, these models often struggle with analyzing sentence structure, grammar, and meaning, leading to difficulties in performing complex tasks like inferencing [13, 20]. Inferences, which are conclusions drawn from context rather than explicit statements, require an understanding of nuanced interactions and sequences in language. Humans make inferences quite naturally, even with minimal context; for example, the statement 'John dried the clothes' implies that the clothes were wet to begin with and have transitioned to a dry state [21]. However, even with immense amounts of data — such as the Common Crawl, which ChatGPT was trained on — NLP models struggle with making these types of inferences [20, 22]. The difficulty of complex language processing for NLP models, along with the desire for improved performance, provides a major opportunity for the integration of cognitive linguistics into NLP.

Integrated systems that are better able to incorporate insights from cognitive linguistics into NLP models have been developed to perform a more nuanced analysis of language [23]. An example of one such system is VerbNet, a cognitive linguistics-inspired tool that provides a comprehensive classification of verbs based on their meaning and grammar [21, 24, 25]. VerbNet is not a neural network, but a top-down system that uses explicitly-stated grammatical rules from human language, just like the left-corner parser from earlier. While a neural network reduces a verb to its form, VerbNet incorporates NLP and cognitive linguistics to consider both the form and meaning of a given verb. VerbNet doesn't just look at the surface structure of sentences, but it also analyzes the different meanings that each verb can have in different contexts and how these meanings interact with each other. What makes VerbNet unique is that it pays particular attention to an aspect of verbs called diathesis alternations. A diathesis alternation is a change in a sentence that switches the focus of the verb from its subject to its object, or vice versa, which often involves rearranging the sentence structure [21, 24, 25]. For instance, a sentence like ‘John broke the window’ can be rephrased as ‘the window broke’ without losing its essential meaning. VerbNet's capability to capture these nuances in verb meaning allows it to better understand and represent the changing relationships of verb subjects (John) and objects (the window) in sentence structure. This method of representing smaller components of sentences and their connected conditions gives a deeper insight into language meaning than neural network models that mainly focus on surface-level text analysis [21, 25]. So, how can this help advance NLP research? By integrating VerbNet's nuanced, cognitive linguistics-driven approach to understanding verbs and their roles in events, NLP models can understand language more accurately [21, 25]. The ability of NLP systems to make inferences and understand subtle changes in meaning, which these systems currently struggle with, can then improve significantly.

Kicking the AI Bucket: The Future of NLP

Research into the field of NLP is incredibly advanced, as exemplified by models such as ChatGPT; however, research has room to grow. Inferencing tasks and other similarly nuanced language problems that current NLP models face are significant hurdles that can be overcome by more extensive integration of cognitive linguistic concepts into computational models [5, 14]. The current lack of integration arises from neural networks’ focus on pure pattern recognition, which is exacerbated by training neural networks on large datasets, neglecting the intricacies present in human language [2, 10, 11]. A crucial step towards developing more accurate and wide-ranging models is incorporating concrete, human-designed rulesets into the training of complex neural networks [4, 26]. Systems like VerbNet can be integrated with current neural network models. If properly integrated, cognitive linguistics-inspired enhancements could greatly improve the accuracy of NLP models, as well as expand our understanding of human language [21, 25]. A composite model would thus be able to perform nuanced tasks, such as inferencing, more precisely and efficiently. By harnessing the power of cognitive linguistics and using a more multidisciplinary and integrative approach, NLP models have the potential to become refined producers of human language [4].

References

Hovy, D., & Prabhumoye, S. (2021). Five sources of bias in natural language processing. Language and Linguistics Compass, 15(8), e12432.doi: 10.1111/lnc3.12432
Bisk, Y., Holtzman, A., Thomason, J., Andreas, J., Bengio, Y., Chai, J., Lapata, M., Lazaridou, A., May , J., Nisnevich, A., Pinto, N., & Turian, J. (2020). Data analytics and management in data intensive domains (B. Webber, T. Cohn, Y. He, & Y. Liu, Eds.; pp. 8718--8735). Association for Computational Linguistics.doi: 10.18653/v1/2020.emnlp-main.703
Ion, R., & Tufiş, D. (2009). Multilingual versus monolingual word sense disambiguation. International Journal of Speech Technology, 12(2-3), 113.doi: 10.1007/s10772-009-9053-5
Krishnaswamy, N., & Pustejovsky, J. (2022). Affordance embeddings for situated language understanding. Frontiers in Artificial Intelligence, 5(2624-8212).doi: 10.3389/frai.2022.774752
Lenci, A., & Padó, S. (2022). Perspectives for natural language processing between AI, linguistics and cognitive science. Frontiers in Artificial Intelligence, 5, 1059998. doi: 10.3389/frai.2022.1059998
Liddy, E. D. (2001). Natural language processing.
Hasan, M. R., Maliha, M., & Arifuzzaman, M. (2019, July). Sentiment analysis with NLP on Twitter data. In 2019 international conference on computer, communication, chemical, materials and electronic engineering (IC4ME2) (pp. 1-4). IEEE. doi: 10.1109/ic4me247184.2019.9036670
Roumeliotis, K. I., & Tselikas, N. D. (2023). ChatGPT and open-AI models: a preliminary review. Future Internet, 15(6), 192.doi: 10.3390/fi15060192
Raaijmakers, S., Sappelli, M., & Kraaij, W. (2017, September). Investigating the interpretability of hidden layers in deep text mining. In Proceedings of the 13th International Conference on Semantic Systems (pp. 177-180).doi:10.1145/3132218.3132240
Otter, D. W., Medina, J. R., & Kalita, J. K. (2020). A survey of the usages of deep learning for natural language processing. IEEE transactions on neural networks and learning systems, 32(2), 604-624.doi:10.1109/tnnls.2020.2979670
Han, S. H., Kim, K. W., Kim, S., & Youn, Y. C. (2018). Artificial neural network: understanding the basic concepts without mathematics. Dementia and neurocognitive disorders, 17(3), 83–89. doi: 10.12779/dnd.2018.17.3.83
Ray, P. P. (2023). ChatGPT: A comprehensive review on background, applications, key challenges, bias, ethics, limitations and future scope. Internet of Things and Cyber-Physical Systems, 3(2667–3452), 121–154. doi: 10.1016/j.iotcps.2023.04.003
Evans, V. (2007). Glossary of cognitive linguistics. Edinburgh University Press.doi:10.1515/9780748629862
Neustein, A. (2012). Think before you talk: the role of cognitive science in natural language processing. Proceeding of NLPCS, 3-11.
Tsujii, J. (2021). Natural language processing and computational linguistics. Computational Linguistics, 47(4), 707-727.doi:10.1162/coli_a_00420
Christiansen, M. H, & Monaghan, P. (2006). Why Form-Meaning Mappings Are Not Entirely Arbitrary in Language. Proceedings of the Annual Meeting of the Cognitive Science Society, 28. (pp. 1838-1843).escholarship.org/uc/item/970998zr
Hoffmann, T. (2020). Construction grammar and creativity: evolution, psychology, and cognitive science. Cognitive Semiotics, 13(1), 20202018.doi:10.1515/cogsem-2020-2018
Oh, B. D., Clark, C., & Schuler, W. (2022). Comparison of structural parsers and neural language models as surprisal estimators. Frontiers in Artificial Intelligence, 5, 777963.doi:10.3389/frai.2022.777963
Schrimpf, M., Blank, I. A., Tuckute, G., Kauf, C., Hosseini, E. A., Kanwisher, N., ... & Fedorenko, E. (2021). The neural architecture of language: integrative modeling converges on predictive processing. Proceedings of the National Academy of Sciences, 118(45), e2105646118.doi:10.1073/pnas.2105646118
Feder, A., Keith, K. A., Manzoor, E., Pryzant, R., Sridhar, D., Wood-Doughty, Z., Eisenstein, J., Grimmer, J., Reichart, R., Roberts, M. E., Stewart, B. M., Veitch, V., & Yang, D. (2022). Causal inference in natural language processing: estimation, prediction, interpretation and beyond. Transactions of the Association for Computational Linguistics, 10, 1138–1158. doi:10.1162/tacl_a_00511
Brown, S. W., Bonn, J., Kazeminejad, G., Zaenen, A., Pustejovsky, J., & Palmer, M. (2022). Semantic representations for NLP using VerbNet and the generative lexicon. Frontiers in artificial intelligence, 5, 821697.doi:10.3389/frai.2022.821697
Bernardy, J. P., & Chatzikyriakidis, S. (2019). What kind of natural language inference are NLP systems learning: Is this enough?. In ICAART (2) (pp. 919-931).doi:10.5220/0007683509190931
Lindes, P., & Laird, J. E. (2016, August). Toward integrating cognitive linguistics and cognitive language processing. In Proceedings of the 14th International Conference on Cognitive Modeling (ICCM).doi.org/10.1609/aimag.v38i4.2745
Sun, L., Korhonen, A., Poibeau, T., & Messiant, C. (2010). Investigating the cross-linguistic potential of VerbNet-style classification. In CoLing 2010 (p. 94). HAL: hal.science/hal-00539036f
Shi, L., & Mihalcea, R. (2005). Putting pieces together: Combining FrameNet, VerbNet and WordNet for robust semantic parsing. In Computational Linguistics and Intelligent Text Processing: 6th International Conference, CICLing 2005, Mexico City, Mexico, February 13-19, 2005. Proceedings 6 (pp. 100-111). Springer Berlin Heidelberg.doi:10.1007/978-3-540-30586-6_9
Gururangan, S., Marasović, A., Swayamdipta, S., Lo, K., Beltagy, I., Downey, D., & Smith, N. A. (2020). Don't stop pre training: adapt language models to domains and tasks. arXiv preprint arXiv:2004.10964.doi:10.18653/v1/2020.acl-main.740

Wolff Gilligan

From Computational to Cognitive: ChatGPT and Natural Language Models

Brain Smog: How Pollution Damages the Brain

Making Waves: The Neural Activity of the Dying Brain