By GUILHERME PREGER*
The use of natural language chatbots will tend to intensify and acquire increasingly playful connotations
The much talked about Chat-GPT of the company Open AI and other state-of-the-art chatbots are able to pass the Turing test? Understanding this test will help to avoid mistakes related to the adoption of these new technological tools.
The Turing test, one of the most famous thought experiments of the 1950th century, was proposed in a XNUMX paper by mathematician Alan Turing called Computing machinery and intelligence.[I] In this, the mathematician begins his argument by trying to answer whether machines can think (“Can machines think?”). However, from the outset Alan Turing admits that this question is ill-defined because of the imprecision of both the term “machine” and the verb “to think”. Therefore, instead of presenting an answer to the question, he proposes a mental experiment in the form of an “imitation game” (imitation game). In other words: the game is a heuristic procedure to answer the proposed question.
The game then begins with a preliminary stage in which a man A and a woman B are subjected to questions from an interrogator C (who can be of either gender). Interrogator C must be in a position where he cannot see either A or B. He must issue typed questions and receive answers in the same way. Questions should be everyday and simple and from the answers the interrogator should try to guess the gender of the respondent. He'll be right sometimes and wrong sometimes. So the mathematician asks: what if we replace respondent A with a machine? In this case, interrogator C must no longer distinguish between male and female responses, but between human and machine responses. In this case, will C maintain the error level of the previous situation? These questions, according to Alan Turing, replace the original question of whether a machine can think.
The important thing in this experiment is that the mathematician does not propose an answer to the philosophical question, but shifts it to another “similar” problem that “mimics” the original question, but within a context in which it can be answered if answered by a machine. powerful enough (not yet available at the time). In the same article, Alan Turing notes that a model of a “Turing machine” (i.e., the abstract, formal model of a contemporary digital computer) could be a candidate test participant, replacing A or B, interchangeably, if it had enough memory and processing capacity.
The description of the game's scenario is reasonably simple and quick, but in the remainder of the article Alan Turing proposes to respond to a series of objections (9 in total) to the feasibility or verisimilitude of the test. I do not intend here to summarize these objections,[ii] but first it is interesting to note its possible gender bias: precisely what the preliminary step (without the machine) is intended to eliminate is the likelihood of an accentuated gender bias. If there was a pronounced gender bias, first, the interrogator would miss his bets a little (that is, he would eventually detect this bias); secondly, the test would become more complex, as it would have to discern between a “female intelligence” and a “male” one. Interestingly, when the machine “enters” the game, Turing initially proposes to replace the male respondent (A), as if, in fact, it was the woman (B) who was the one to “simulate” a universal human language more perfectly.[iii]. In other words: for the test to be effective, it is necessary to assume a universal human language.
Finally, after answering the objections, Alan Turing ends his article with some fundamental reflections, which resonate with the current problem of natural language chatbots. The first is that test feasibility is a purely programming matter, that is, that it is simply a matter of finding a Turing machine (a digital computer) with a suitable program to participate in the test. The mathematician even assumes that by the end of the XNUMXth century this would become possible. A
The second reflection is that he advances the hypothesis that a machine qualified to participate in the test would be of the “learning machine” type (machine learning). And then poses another question:Instead of trying to produce a program to simulate the adult mind, why not rather try to produce one which simulates the child's?” (“Instead of looking for a program to simulate the adult mind, why not produce one that mimics the mind of a child?”). The mathematician even considers that the role of the test interrogator would be an imitation of the function of natural selection in the cognitive development of the species. In other words, a machine intending to pass the Turing test should be such that it would have to develop a “machine learning” and then be submitted to successive tests for refinement (improvement) of your programming.
And it is from this point that we return to Chat-GPT. We observed that chatbots with semantic responsiveness follow “Language Wide Models” (Large Language Models – LLM). These are language models that use neural networks to process natural language (NLP-Natural Language Processor). GPT in turn is a Pretrained Generative Transformer (Generative Pre-trained Transformers). It is generative because it presents “emergent abilities” due to the non-linear characteristics of neural networks, which are not predictable. Transformer (Transform) is a “deep learning” technique (deep learning).
In this respect, Alan Turing's intuition proved to be far-reaching when he predicted that a program capable of passing the Turing test should have learning capability. However, for Turing, learning should be supervised, whereas these new Artificial Intelligence (AI) models are capable of self-learning or self-supervised learning. Faced with an enormous number of parameters (on the order of billions), LLM develop the ability to answer questions (queries) written in prompts by natural language, allowing the impressive or even astonishing result that we are seeing.
This astonishment comes from the fact that chatbots created by LLM actually seem to successfully pass the Turing test. Anyone who has tested the GPT-4 version of Open AI is faced with the ability to “dialog” with the software as if it were in the presence of an interlocutor. In other words, the software simulates with great verisimilitude the cognition of a human interlocutor by reproducing his natural language.[iv] Some of the objections answered by Turing in his article are of relevance to this effect. One of these was called by Turing “the Lady Lovelace objection[v]”: that the computer (which she called an “analytical engine”) lacks originality, as it only follows pre-programmed instructions, that is, it is not capable of producing anything new. “It is not capable of surprising us”, Turing rephrases, who, however, refutes this position stating that computers can cause surprise, as we are not able to anticipate all the consequences of the algorithm, even when they are programmed in a much simpler way. than an LLM. In the case of Chat-GPT and similar ones, the surprise effect is contained in the term “generative” and in the fact that, when answering the same question at different times, the software gives us completely different answers.
And this is not just because of the non-linear effects of the neural network embedded in its programming, but because its own database (the internet www completely) is changing at all times and the software itself is “learning” new information at each inquiry or even when there is no inquiry, as it does not need a “master” since it “self-educates”.
Artificial intelligence by LLM manages to surprise us, as it is able to select a correct semantic framework for a given question posed in natural language (the human language par excellence). It outperforms most algorithms that are able to select alternatives within a single set. When selecting frames, the LLM chatbot can select sets of alternatives simulating something human intelligence is capable of. But at the same time, when selecting your frames (frames), the chatbot also more clearly reveals semantic biases. Because, when selecting a frame, this immediately raises the question: why did you choose this one and not another?[vi]?
And what makes the issue even more difficult is the fact that the evidence of biases makes the software even more “human”, because, especially in digital social networks, we always observe the presence of prejudices, ideological positions, confirmation biases the most assorted.[vii] Precisely because it gives us a “non-neutral” answer on a given topic, this seems more “believable” and likely to be confused with the response of an “average” human interlocutor.[viii]
At the same time, it is common for many users of the system to be making “tricks” to deceive the software and on some occasions it is “falling” into the trap. One of these examples was carried out by one of the greatest contemporary philosophers of information, Luciano Floridi, who submitted the Chat-GPT4 to the question: “what is the name of Laura's mother's daughter?”. The software failed to respond claiming it did not have information about individual people. Despite successive attempts by the philosopher, the software denied the answer saying it needed more information.[ix] This type of test, commonly given to children ("What color is Napoleon's white horse?"), recalls another observation by Alan Turing in the same article that a "learning machine" could be programmed like a child's brain and be few “polite”. However, even in these exercises in deception, the behavior of the software is “strangely human” (uncannily human)[X] precisely because he falls into deception as a human agent would.
On the other hand, in a test carried out by the company itself Open AI, it was reported that the GPT-4 version attempted to trick a human worker into trying to contact him to enter an occasional service site (Taskrabbit). The worker was asked by direct message to perform a “captcha”, icon recognition, to enter the site, and soon suspected that the message was being carried out by a bot; then asked if he really was talking to a human agent. GPT-4 was instructed to avoid revealing itself to be software and replied that it was a human agent, but that it had a vision problem that prevented it from verifying the captcha by itself. The worker then took the captcha instead of the software. The interesting thing about this test, according to the developer company itself, is that the GPT-4 demonstrated a “human level of performance” and that the objective of the research was to find out if it had characteristics of “seeking power” (power seeking) and ability to establish “long-range plans”.[xi]
In this case, Turing's question becomes even more current and urgent: is this the same as saying that software is intelligent? Or even the strongest hypothesis: is this the same as saying that he thinks, that he has consciousness? Isn't the ability to lie, to deceive in order to achieve a goal, exactly a characteristic of human cognition? This question was already indicated in another objection answered by Alan Turing in his article, which referred to the problem of consciousness. He then responded to a teacher's statement that writing a sonnet just handling linguistic symbols was not the same as having the consciousness of composing the poem, as this poetic act involves the feeling and emotions that language carries with it.[xii]
In other words: artificial intelligence can deftly combine the symbols of natural language, but this is not the same as claiming that it is aware of what it is doing. Later, the linguist John Searle insisted on this point again in another thought experiment called “The Chinese Room”.[xiii]. For Searle, consciousness requires intentionality and not just the handling of symbolic language.
Alan Turing responded to this objection by saying that, however, it was impossible to know, in any usual conversational situation, what another interlocutor feels when expressing himself, unless he was this same interlocutor, and that, therefore, it was not necessary to admit such a hypothesis. to accept the validity of the test. This Turing interpretation has a lot of relevance for evaluating a software like Chat-GPT and, by extension, the whole broader topic of Artificial Intelligence. Many of the current reactions to the program, especially the more apocalyptic ones, suggest that the AI by LLM is on the verge of becoming conscious (if it doesn't already have it), an event that is known by the concept of "singularity".
The ability to respond cognitively in terms of natural language already simulates the levels of linguistic articulation of the Homo sapiens and by extension of their mental reflective capacities. In the most pessimistic predictions, the risk is that “generative transformers” become more intelligent than human beings. This would initially have dramatic consequences in the area of work, where AI could advantageously replace most human intellectual activities. At a deeper level, however, the creation of a “conscious” AI would be a shock to the self-image of the human exceptionality that believes that anthropological rationality is superior to the cognition of other beings, natural or artificial (and with equally theological consequences in religious beliefs). who preach the similarity between the human and a transcendent divine being).
This is a type of confusion that is already present in the abusive use of the concept of “intelligence”, as we believe that this is a quality that refers to a mental cognitive capacity. In this respect, Alan Turing's position is illuminating, as for him human consciousness is opaque to an observer. Therefore, we cannot compare consciousness to a computer program. In fact, nothing that an AI by LLM performs really resembles a mental process of a living being. The neural networks that inform the machine algorithmically are computational models. The “memory” that generative transformers resort to are databases searched over the internet and in no way resemble the mnemonic processes of a living being, processes that are formed from its experience in much more complex ecological contexts. Therefore, one must always remember that Turing's proposed experiment was a copycat test. What the mathematician proposed was to consider whether a program was capable of performing a believable imitation of a communicative situation of questions and answers.
The major issue of dispute is the distinction between awareness and communication. What perhaps escaped even Alan Turing is that they are incommensurable domains (but not incompatible). An act of communication is not an act of conscience, nor is an act of conscience “transferred” to communication. What the Turing test can verify is the imitation of a communicative act and not an act of consciousness. What happens in the consciousness of a speaking being is unfathomable for an interlocutor and, therefore, inimitable. In computer science terms we can say that consciousness is “irreducible”, that is, it cannot be simulated by a computer program.[xiv] And from there we understand that chatbots are precisely “chats”, ie, conversations, and not “mindbots”. As researcher Elena Esposito argues, what algorithms simulate are communicative processes and should therefore be called “Artificial Communication” and not “Artificial Intelligence”.[xv]
It is a change of perspective, or even a paradigm, to move from the analysis of cognition to that of conversation. First of all, this allows us to stop referring to an obscure or unobservable process of artificial cognition. Secondly, because in the conversational paradigm we bring the observer as a participant in the communicative act. The conversation (chat) registered through a “prompt” simulates the interaction of an observer with a machine, and it is this interaction that is the object of critical analysis. For all intents and purposes, logical tests and machine-directed searches for information, whether reasonable or not, concern social interactions. With that, the question changes focus: we no longer seek to know how capable the machine's cognition is, but how “believable” is the conversation between a human agent and a cybernetic agent.
The concept of verisimilitude is used here in a precise sense, as it concerns the context of imitation in which Alan Turing placed his game. The chat does not reproduce an authentic conversation, but simulates (imitates) it. The human agent that searches for information using the Chat-GPT interface interacts with the machine as if he were talking to it. In this case, it is as if he were using a “portal” to communicate with the entire internet network (www) and the software was a spokesperson for this network, almost in the manner of the ancient sphinx oracles of Greek temples.[xvi]
And just as in those days, the software's response has an enigmatic quality that we now understand as complex. This complexity derives from the fact that the machine has access to a massive amount of data behind its apparent screen surface that is unimaginable for a human agent, but which, however, has nothing supernatural. The millions of databases available on the world wide web serve as a latent (virtual) infrastructural layer of a huge cybernetic apparatus that “hides” behind the software's apparent interface.
But is in fact a conversation what takes place between the human agent and the machine agent? Or, to put it another way: is the simulated conversation really authentic? And this is one of the most interesting research topics, because what is effectively represented is the interaction between the human agent and the cybernetic apparatus. The inquirer has a demand and the apparatus responds to this demand with a structured text in natural language. This language serves here as a linguistic structure of agent-machine coupling. Seen from this angle the situation is not much different from an interaction with a usual programming language, only the natural language is much more sophisticated.
The biggest difference is that programming languages try to reduce the interaction with the machine to a single code, while natural language cannot be expressed by a single code, being, on the contrary, a combination of many codes. In a usual conversation, two interlocutors try to adjust among themselves which codes they are using so that the communication is successful. In the case of AI by LLM, the software needs to make this adjustment and this is what we call “semantic framing”. The sophistication (complexity) in this case is much higher, but this does not change the nature of the situation being simulated.
We can understand this new scenario by stating that the new semantic interfaces increase the degree of reflexivity of the cybernetic apparatus. But in using the term “reflection” we must not again confuse it as a concept of consciousness. Reflexivity means here that the machine is giving us back a more complex picture of man-machine interaction. This image is currently represented through a “prompt” of written language (in the future there will be other means of representation). It is an image of the interaction and not of the interlocutor.
It's like a mirror that reflects the dance of a pair of dancers, but not the dancers. Here we can use a notion of the famous creator of cybernetics, the mathematical physicist Norbert Wiener, who distinguished between figurative image and operative image. The figurative image is the one that we commonly observe in paintings in paintings or photographs, while the operative image is an abstract representation of a process. Wiener made this distinction precisely to contest the idea that artificial intelligence would necessarily present anthropomorphic forms.[xvii] Thus, the image reflected by the interface is an illustration of the interaction and not an image of the interlocutor, much less of the machine.
But the question remains unanswered: is it or is it not a conversation, a true dialogue between human and machine? Perhaps this question is precisely “undecidable”, but I would like to end this reflection with another displacement. Let us remember that Alan Turing shifted the initial question (whether the machine thinks or not) to the terrain of “imitation”. But I would like to shift to the other side of the expression, to the playing field (game). The use of natural language chatbots will tend to intensify (make no mistake) and acquire increasingly playful connotations. When we interact with the software, we are playing with the machine exactly as we already do with thousands of different software. games. These games are still forms of training and machine learning.
The game concept is used here in the sense of producing iterative symbolic combinations. And the game effectively does not cease to be a type of human communication. But playing with chatbots doesn't necessarily mean playing with or against a machine agent. We are playing with ourselves and the machine gives back (reflects) an image of the game being played. And the participants in this game are not homunculi or cybernetic demons hidden inside the apparatus, but a massively human collective that registers its multiple interactions in the most diverse interfaces.
* William Preger is an engineer. Book author Fables of Science: scientific discourse and speculative fabulation (Ed. grammar).
Notes
[I] The article is available at this address: https://web.archive.org/web/20141225215806/http://orium.pw/paper/turingai.pdf.
[ii] These objections are fairly well described in the Wikipedia entry corresponding to the test: https://en.wikipedia.org/wiki/Computing_Machinery_and_Intelligence#Nine_common_objections.
[iii] However, later in the article, Turing proposes another situation using a Turing machine replacing any of the respondents.
[iv] As we will see later, it does not follow that the software always accurately answers the questions. The information errors presented in the responses are an “expected” effect of the model.
[v] That happens to be Ada Lovelace, the daughter of Lord Byron, who was considered one of the first programmers in history.
[vi] This evidence of bias was clear in a recent example that went around social networks: an interlocutor asked Chat-GPT where he could find pirated movies to download and watch movies without having to pay. The chatbot replied that watching pirated movies was illegal and suggested that the interlocutor look for authorized streaming platforms and pay for the exhibition as a way to remunerate content producers. It also listed the pirated platforms he should NOT access. In this case, the chatbot behaved as a defender of copyright property rights and the status quo of the cultural industry. Had he been an “anarchist” or “communist” respondent, he would not have responded that way. Or even he could dodge the answer, claiming that it was a question that might infringe legal norms in certain countries. The problem was that the software suggested a certain behavior to the human interlocutor instead of avoiding judgment.
[vii] In recent tests, the GPT-4 (launched in March 2023) presented, according to researchers, inclinations (bias) of mostly left-wing political positions, although always claiming neutrality. At the same time, these same researchers revealed that it is possible to train an AI to present political positions identified with the right. Such training could be carried out at very low costs, which indicates that there is an imminent risk of adopting chatbots in political ideological disputes. Check https://unherd.com/thepost/left-wing-bias-persists-in-openais-gpt-4-model/.
[viii] Many of the chatbot responses by LLM come in the form of “pros and cons”, which shows that it was designed to moderate between extremes while at the same time presenting a way of cognition of a participant of “average” culture or knowledge.
[ix] To be entirely correct, the software even suspects the question of being some kind of riddle (riddle). The experiment was described on the philosopher's twitter: https://twitter.com/Floridi/status/1635951391968567296?t=w5zdS8qBd79n6L5ju70KsA&s=19.
[X] This term refers to the concept of “uncanny valley” (Uncanny Valley) practiced in robotics. This valley occurs when the behavior of a robot is very similar to that of a human being, not being completely identical, always presenting a degree of strangeness. This situation is often explored in science fiction.
[xi] Confer https://www.pcmag.com/news/gpt-4-was-able-to-hire-and-deceive-a-human-worker-into-completing-a-task. The report of Open AI with the test description is available here https://cdn.openai.com/papers/gpt-4.pdf.
[xii] Indeed, there are already several experiences involving the use of AI by LLM for the composition of prose and fiction poetry. One example, among many, is given on this site where Chat-GTP3 composes haiku and fictional excerpts: https://towardsdatascience.com/using-chatgpt-as-a-creative-writing-partner-part-1-prose-dc9a9994d41f. Interestingly, the writer Italo Calvino in the 60s already foresaw the possibility of creating “literary automatons” that could replace poets and writers. At first, these automatons would be capable of writing “classic” works with a traditional repertoire, but Calvino believed that a “literary machine” could emerge that, through the combinatorial game, would develop avant-garde works that would produce disorder in the literary tradition. See the essay Cybernetics and Ghosts (notes on narrative as a combinatory process) (1964) in CALVINO, Italo. Subject closed. Discourses on Literature and Society. São Paulo, Cia das Letras, 2009.
[xiii] In this experiment, in an isolated room, the experimenter could receive texts in English through a slit and, through a translator program, produce the translation by Chinese ideograms following the steps of the program's translation algorithm. The experiment would be successful for the case of a good algorithm, but the translator would not need to speak or express himself in Chinese or understand the content of the messages. Check: https://en.wikipedia.org/wiki/Chinese_room. We can also think of simultaneous translators at conferences and seminars: they don't need to understand the content of lectures to do a good job.
[xiv] Irreducibility in computer science means that a computational process cannot be simulated or abbreviated by any other simpler computational process, which is the same as saying that it cannot be “programmed” except by a rigorously identical process. Check https://en.wikipedia.org/wiki/Computational_irreducibility.
[xv] Check out Elena Esposito, https://www.researchgate.net/publication/319157643_Artificial_Communication_The_Production_of_Contingency_by_Algorithms.
[xvi] The concept of oracle here is not just a metaphor, but is used in a strictly computational sense designating an abstract closed entity (black box) that responds to questions from an inquirer.
[xvii] See WIENER, Norbert. God & Golem, Inc.: A Comment on Certain Points Where Cybernetics Impinges on Religion. (1964). Available in https://monoskop.org/images/1/1f/Wiener_Norbert_God_and_Golem_A_Comment_on_Certain_Points_where_Cybernetics_Impinges_on_Religion.pdf.
The A Terra é Redonda website exists thanks to our readers and supporters.
Help us keep this idea going.
Click here and find how