The great language models

Whatsapp
Facebook
Twitter
Instagram
Telegram

By ELEONORA ALBANO

The most common misdirection in the discourse of Big Techs – the current owners of “great language models” and similar technologies – is the half-truth

The omnipotence of hard science from the global north

In an era of increasing datafication of human experience, it is not surprising that critical thinking is on the decline among scientists. But it is, in any case, alarming that this could occur in the editorial office of a traditional, prestigious – and generalist – academic journal. It was, therefore, with a mixture of astonishment and indignation that I read the opening paragraph, transcribed below, of the editorial of the July 2023 volume of the magazine Nature Machine Intelligence. As is known, the editorial group Nature, founded in London in the second half of the XNUMXth century, its mission is to make reliable synopses of advances in various areas of knowledge available to the entire scientific community.

"Frederick Jelinek, a renowned Czech-American researcher in natural language processing and speech recognition, famously said in 1985, “Every time I fire a linguist, the performance of the speech recognizer goes up”, suggesting that there may be no efficient way to include linguistic knowledge in such systems. Does this sentiment also hold true for state-of-the-art large language models (LLMs), which seem to be mostly artefacts of computer science and engineering? Both LLMs and linguistics deal with human languages, but whether or how they can benefit each other is not clear".[I]

It is, at the very least, irresponsible for those who publicize advances in artificial intelligence (AI) to the rest of the scientific community to ignore – or not want to admit – that the great language models (hereinafter, GMLs, as in English LLMs) have drunk heavily on concepts and techniques from linguistics, as well as other sciences that describe and interpret natural language.

There are at least two reasons for such misinformation. The first is political. It’s just that human scientists – linguists, psychologists, sociologists, anthropologists, etc. – responsible for the taxonomies essential to the training databases of “great language models” are, in general, “second category” citizens, recruited in poor countries, as outsourced and precarious workers , for the tasks considered “trivial” of labeling data and, if necessary, making decisions about categorization.

The second is the epistemology generally associated with this view of action and, in particular, human work: any action, of any complexity, can be reduced to a chain of associations. The influence of English empiricism is recognized there, mediated by its already century-old[ii] sequel in American psychology, behaviorism.

We will see below how the conception of action developed in Europe from the XNUMXth century onwards reifies not only activity, but also human thought. An atomistic notion of mechanics, vague enough to fit into both empiricism and rationalism, made it possible to exclude certain groups of humanity, assimilating them to the Cartesian animal-machine. Through it, European colonial powers easily justified the enslavement of the natives of their colonies, as well as the dispossession of Europe's own poor.

Taking the colonial scenario as a backdrop, this essay aims to show that there is a strong link between the political position that reduces certain types of scientific labor to an assembly line and the “scientific” position that views natural human language as an infinite number of interconnected chains.

The fact that these ideas are implicit in the production of language technologies allows their tacit violence, inherent to training in computer science, to militate for the interests of the current facet of coloniality, known as platform or surveillance capitalism.[iii]

Denaturalized natural language

The aforementioned editorial honors the empiricist tradition not only because the magazine is English, but, above all, because the area of ​​natural language processing (hereinafter, NLP) – a subarea of ​​artificial intelligence responsible for the “great language models” – was born and flourished in a strongly empiricist – or, more precisely, behaviorist – environment.

Researchers in the field believe that the human mind is a Turing machine, made up of billions of finite-state automatons.[iv] intertwined. It is not surprising, therefore, that the CEOs of companies in the area see the people who feed the “great language models” as mere machines providing the necessary and sufficient information to enable the “great language models” to pass the Turing test in the future next.

These experts don't even realize – or pretend not to realize – how much intelligence underlies the work of labelers. As investigative journalist Josh Dzieza documented,[v]  in Artificial Intelligence it is practically impossible to adopt an intuitive category straight away, as different examples of the same object tend to be taken as indistinct by the machine.

To train it to mimic our categories, taggers have to generate a detailed subcategorization and organize it into a hierarchy of levels. Like other robots, “large language models” require an infinite number of iterations in order to reach generalizations that any human child arrives at after relatively little exposure to data.

In natural language processing, one of the main manifestations of this type of difficulty is in syntactic and semantic contexts that contain discontinuities.

Note that the task of a “great language model” is always to predict the next word – as cell phone text editors do, albeit roughly. This is an easy task in the case of clichés, whose terms co-occur very frequently, but very difficult in most other cases.

Thus, in lower frequency expressions, statistical estimation is only possible thanks to four components, all essential: a database of many billions of words; a very powerful technology – whose ability to learn associations surpasses that of recurrent neural networks (even deep ones, that is, with multiple layers); exhaustive grammatical and semantic descriptions; and intensive training in which association errors are iteratively corrected. Obviously, the aforementioned descriptions and corrections are all made by humans – outsourced and precarious.

It is estimated that the databases that feed dialogical chatbots such as GPT-chat, from Open AI, the Bard, from Google, and the Bing, from Microsoft, are on the order of 300 billion words. The very powerful device that allows its use in real time is called transform – translated as transformer. It is a statistical model that applies a set of mathematical techniques, called “attention” and “self-attention”, to detect dependencies between the elements of a chain – in the first case, the input or the output; in the second, the ongoing chain itself.

A mass of interconnected robots

Note that the transformer underlying the “great language models” has little to do with the homonymous device that modifies the voltage levels of the electric current. What changes, when choosing one word after another, are the relationships between the terms in the database (hereinafter corpus, for simplicity), as each new occurrence feeds back the entry and reorganizes the existing network of relationships.

It's not difficult to understand how. All relationships are expressed by connection weights between the nodes of the corpus subnetworks. These weights are, in turn, calculated based on the co-occurrence probabilities of possible pairs of words. The transformer is powerful enough to allow all members of the corpus, as well as all its tags (grammatical, semantic, discursive, psychological, sociological, political, ethnographic, etc.), to connect to each other simultaneously, so that the calculation of the next word can consider the most varied aspects of the current utterance and its context.

It should be noted that the volume of labels available to refine this calculation is gigantic. The simplest ones cover aspects such as grammatical classes and textual forms of naming and referencing (for example, proper names, personal pronouns, demonstratives, possessives, etc.).

It is also worth noting that labeling is not limited to words. It also understands the parts of speech (e.g., subject, predicate, adjuncts); phrases and their syntactic classification (e.g., main, subordinate and respective subclasses); and oral or written textual genres (e.g., colloquial, literary, journalistic, legal, scientific, etc.).

Anyone who, based on the above, has imagined that the databases of “large language models” look like gigantic dictionaries whose “entries” are implicit in their network of connections has solved a third of the riddle. However, the content of the other two thirds is equally important: it consists of crucial grammatical and encyclopedic information – contributed, once again, by the taggers.

In fact, all occurrences of the same word are connected to each other; and their different meanings are represented by the similarities and differences, mathematically coded, between the sentences to which they connect. This ends up working approximately like the examples provided in the entries for polysemous words in dictionaries.

Furthermore, the elements of each sentence connect to a grammar. In it, syntactic structures are mapped into tree diagrams[vi], while semantic structures refer to different logical forms (via propositional calculus, among others) and semantic fields (e.g., the names of fruits connect, from bottom to top, to the fields of fruits, vegetables, foods , etc.). The grammar also refers to coreference indexers (e.g., in the sentence “João disse que que está não faz o him”, the pronoun ‘o’ can refer to João himself, to a second person or to a third person).

Finally, texts are classified based on world knowledge (e.g., subject, genre, authorship, tone, style, documentary sources; with labels ranging from the most generic to the most specialized). This information, after being meticulously noted and connected, allows for an infinite number of searches in order to meet complex demands, such as solving school tasks, writing legal opinions, assisting with medical diagnoses, etc.

This gigantic and exhaustively interconnected collection gives “great language models” an enormous capacity to construct “new” sentences by paraphrasing fragments of data contained in the database itself. When the eminent linguist, philosopher and mathematician Noam Chomsky says that chatbots they are mere plagiarists, they are not saying that they just copy and paste literal cuts from the base. It is, in fact, assuming that they produce good pastiches of their own content, through synonyms and paraphrases concatenated in such a way as to make the patches and seams almost imperceptible. Exhaustive connections allow them to easily find the best shortcuts to do so.

However, the concatenation ends up failing and sounding strange when an expert on a certain subject asks the robot a trick question. Let's see how linguist Roni Katzir[vii] did the ChatGTP4 get lost with an ellipsis that anyone proficient in English would readily understand. The request was to choose the best of these two sentences: (1) The person that Mary met yesterday and that John will talk to Ed about tomorrow arrived; (2) The person that Mary met yesterday and that John will talk to Ed about his new idea tomorrow arrived.

The robot did not hesitate to respond that it was the second one, as it was “more informative”. He thus ignored an English grammatical rule that requires coreference between two or more coordinated ellipses – in this case, the object of ‘with’ and the complement of the prepositioned verb ‘talk about’. The addition of “his new idea” causes a drastic break in the meaning of the sentence, already burdened by the long separations between the subject and the main verb, and between the verb ‘talk’ and the adverb ‘tomorrow'. Thus, in (2), it is not possible to establish the coreference between ‘the person’ and the two coordinated occurrences of ‘that’. The human reaction is simply to judge the anomalous sentence, while the robot's is to force an interpretation.

The fact that the author coined such tortuous phrases is probably a device to complicate the search for an analogue in the database. The important thing is that, nevertheless, sentence (1) is grammatical and sentence (2) is not. Whenever the search ends in a maze, the robot gets lost and deviates, resorting to one of its pre-recorded clichés.

Similar clichés also emerge when someone confronts you with a request for a moral stance. By insisting that the Chat GPT took a personal perspective on a moral issue, researcher Jeffrey Watumull, collaborator and co-author with Noam Chomsky[viii], drew from him a contrite declaration of his non-humanity. Let us pay attention to the “modesty” of the machine: “As an AI, I am a machine learning model that has been trained on large amounts of text data, and I don't have personal experiences or feelings. I am not conscious, self-aware, or capable of having personal perspectives. I can provide information and analysis based on the data I have been trained on, but I don't have the ability to forge personal opinions or beliefs. While I can understand and interpret language, I do not have the subjective experience that humans have. "[ix]

The only human characteristic that the robot does not give up, giving voice to the ideas of its creators, is the understanding of natural language. Just as the unconfessed linguistics under language robots hides their taggers, the unconfessed psychology of their creators mechanizes the human language capacity, equating it to a gigantic mass of interconnected finite-state automatons through a huge transformer.

We will see below that the policy of chatbot providers has three supportive ingredients: (i) avoid exhaustive statements; (ii) stimulate public curiosity about possible future developments of “great language models”; (iii) address the controversy over its transhuman potential[X]  as a mere dispute of opinions.

Tailor-made screenings

The most common misdirection in the discourse of Big Techs – the current owners of “great language models” and related technologies – is the half-truth. Such companies are masters at disguising their leniency – not to say connivance – with the untruths made viral in the media about their products. On the other hand, if questioned, they are quick to offer clarifications, accurate at first glance, but often lacking and slippery.

A good example is the way Google recently exempted itself from responsibility for rumors about the human attributes of its first chatbot prototype, “LaMDa"(language model for dialogue applications), in 2021. Instead of properly issuing and publishing a statement of principles, it only became public after an engineer on the team achieved a viral spike with the claim that the model had become sentient.

The engineer suffered an administrative suspension and the company announced its position that sentience is still “a long way” from being achieved by robots, language or otherwise. With this, he shrugged off responsibility for the fallacious news – without, however, denying his plans to increasingly “humanize” the offspring of his Artificial Intelligence division. So much so that the most recent descendant of LaMDa, Bard, already has a cliché tailored to quibble on the subject.

Among other predicates, the Bard is free and, in my opinion, nicer than the Chat GPT, its rival launched shortly before by Open AI, with support from Microsoft. Here is his careful response to a question I asked him, following Watumull's model to the letter: “The concept of chatbot intelligence, consciousness, sentiment, and the ability to make moral judgments is a complex one that is still being debated by experts. There is no single definition of these terms, and they are often used interchangeably. Some experts believe that chatbots can be intelligent, but not conscious or sentient. They argue that chatbots are simply very sophisticated algorithms that are able to process information and respond in a way that seems intelligent. However, they do not have the same level of awareness or self-awareness as humans. "[xi]

As can be seen, both robots seem to have been trained so as not to compromise their creators with the fantasies of humanity that they awaken in their users.

To test this idea again, I submitted the Bard to Katzir's discussion on the two sentences with different ellipses in terms of grammaticality in English. He immediately came out with this answer, analogous to that of the GPT-chat. Let’s see: “Both sentences are grammatically correct, but they have different meanings. The first sentence, “The person that Mary met yesterday and that John will talk to Ed about tomorrow arrived,” is more concise and direct. It tells us that the person Mary met yesterday is the same person that John will talk to Ed about tomorrow. The second sentence, “The person that Mary met yesterday and that John will talk to Ed about his new idea arrived tomorrow,” is more detailed. It tells us that the person Mary met yesterday is the same person that John will talk to Ed about his new idea tomorrow. The first sentence is better if you want to emphasize the fact that the two people are the same. The second sentence is better if you want to emphasize the fact that John will be talking to Ed about his new idea. Ultimately, the best sentence depends on the context in which it is being used. "[xii]

Finally, it presents a comparative table of the two sentences, which cannot be reproduced here, as it is a simple summary of the “explanation” transcribed above.

Let us now try to understand in more detail the irresponsibility of Big Techs' evasive speech about their chatbots. There is a good reason for them to treat the differences between computers and humans with due transparency and scientific rigor: people have known about the tendency of people to humanize language robots for more than half a century. The harmful consequences of this bias – real or potential – are also well known.

The Eliza effect

The first chatbot dates back to 1966 and was called Eliza by its creator, mathematician and computer engineer Joseph Weizenbaum, known as one of the fathers of artificial intelligence.

It is important to note that Joseph Weizenbaum peremptorily refused this paternity. A German Jew whose family emigrated to North America shortly before World War II, he turned to his mathematical aptitude to survive at school and in life. Having first lived in Canada, he later moved to the United States, where he embarked on an academic career, ending up as a visiting professor in the Department of Artificial Intelligence at MIT. As his growing skepticism about AI alienated him from his colleagues, he finally returned to Germany in 1996, encouraged by the German intelligentsia's receptivity to his ideas.

Although he never stopped working in computing, he did not hide his passion for human and social studies. The traumas of fleeing Nazism and the vicissitudes of exile ended up leading him to encounter psychoanalysis at a certain point. Since then, he has declared it effective, beneficial and indispensable to understanding human nature.

In designing Eliza as a robot “therapist,” I was aware that I could not equip her with an understanding of the complexity of the psychoanalytic. He then conceived her as a Rogerian therapist, that is, a follower of the method of Carl Rogers, an American clinical psychologist who advocated non-directive, person-centered psychotherapy. This consisted, roughly speaking, of inserting the patient's statement into phrases such as “you tell me that…”, added to other vague and encouraging clichés, such as: “And how do you intend to deal with this?”. After compiling this basic repertoire, it was not difficult to produce and test the software.

Eliza, which began simply as a tool to research the viability of a conversational robot, soon became a success with the public. Joseph Weizenbaum realized that the experiment was taking a different direction than expected when the participants began to refuse to show him the content of their dialogues with the machine, claiming that they were private matters.

He was sure that Eliza didn't really understand what they were saying to her: she was just successfully simulating that understanding. However, he soon realized that his audience was unlikely to notice. He then conjectured that they were immersed in a psychoanalytic transference – directed, surprisingly, at the machine.

From then on, this humanist and politicized engineer became notable for arguing, ever more vehemently, that there is an incommensurable difference between human language, which produces judgments that can be contradicted, and its digital simulacra, which consist only of calculations that can be referred to sequences of words output from a machine programmed to simulate conversations.

Joseph Weizenbaum argued that machines would never reason like humans because they are only capable of calculating. The emergence and advancement of neural networks after the formulation of his theses does not invalidate his arguments. Qualitative or quantitative (as is the case with the weights of connections between nodes in such networks), the calculations involved in conversational technology do not have access to all types of information that living brains, human or animals, are capable of capturing, collecting and process.

This position is explicit in the titles of his two main books. Both are attempts to demonstrate that the digital simulation of natural language is nothing more than an illusion that leads users to project their humanity onto machines.

The first book is titled Computer power and human reason: from judgment to calculation [xiii]. The second is co-authored by German writer Gunna Wendt and is titled Islands in the cyberstream: seeking havens of reason in a programmed society[xiv], having been written on his return to Germany and only later translated into English. Another suggestive title is “Against the imperialism of instrumental reason”, a chapter in a collection on controversies in the computational universe.

Both books received unfavorable reviews in the US. For example, John McCarthy, who created the term artificial intelligence in 1956, together with Marvin Minsky and colleagues, published a long text in 1976[xv] calling the first book moralistic and incoherent. On the other hand, Dwight Hines, professor of literature and social justice at Point Park University, reviewed the same work in 1980,[xvi] describing it as a difficult but rewarding read.

Until his death in 2008, Joseph Weizenbaum expressed great concern about the direction of what he called “the programmed society”. He would certainly have been depressed if he had been among us in March 2023, when there was a fatal episode caused by an “update” from Eliza. The company responsible was Eleuther AI, which redesigned Eliza based on its language model, GPT-J, in turn based on GPT Chat.

In March 2023, a young Belgian family man, depressed by the threat of environmental collapse, suddenly committed suicide. As his wife reported to the press, he had been “treating” his depression with the current Eliza, and had her support for this decision.

This story must have been enough to suggest that current language models justify Joseph Weizenbaum's fears that a society viscerally tied to computing could lose its way and end up delegating crucial decisions for the future of citizens or even of humanity.

It is worth remembering that chatbots do not even need to be invaded by criminals to constitute a danger: complex dynamic systems such as those implemented by transformers typically present unpredictable emergent phenomena. News can emerge at any time, with equally unpredictable consequences. It is possible that some of them end up subjecting users to stressful and embarrassing situations. And – even worse – it is not impossible for them to suddenly start displaying content that we would consider absurd, unethical and even threatening.

This occurs due to abrupt jumps in the behavior of this type of system, characterized through curves of well-known statistical functions. For example, the 'S' curve has a very low rate of change at the base and top and very high in the middle, and can, among other applications, characterize the transition from one level to another. The parameters of complex systems inherent to transformers often present 'S'-shaped trajectories.

In addition to explaining the easy attachment to chatbots, the Eliza effect helps to rationalize, at least in part, the rampant spread of misinformation on social networks. Users of these virtual spaces easily extend to strangers – often with malicious intentions – the transference ties already created with their own machines. It is serious, therefore, that service providers pretend to ignore the phenomenon and exempt themselves from any responsibility for it.

Let us remember that the study of this type of trend does not only concern computer scientists who have become aware of the risks of the digital world. The sociability of networks also raises a lot of research in the departments of psychology, sociology, anthropology and political science at universities and research centers around the world. Therefore, there is already a vast scientific literature on the relationships between humans and their machines.

Indifferently, Big Techs continue to compete for the virtual assistant market, ignoring repeated warnings from academia and critical journalism. In other words, from their point of view, language technologies only open up opportunities to create new markets and maximize profits.

Bad grammar, even worse pragmatics

The above must have made it clear that chatbot owners do not just aim to improve internet search engines. What they want is to build talking robots that definitively win over users and control their lives in order to shape their needs and consumption habits. Obviously, they are fully aware of the fascination exerted on humanity by robots – from their beginnings, in the Middle Ages, to today's cinematic franchises, including the ingenious automatons of the XNUMXth and XNUMXth centuries.

Oscillating between distrust and attachment, users see these devices as affordable servants, ready to assist them in physically or mentally difficult or tedious tasks. The virtual assistants already available, such as Siri, from Apple, or Alexa, from Amazon, explore the simplest aspects of this demand, which is likely to grow and become more complex in the near future.

Surveillance capitalism thus resorts to natural language to “console” individuals from the loneliness and helplessness that it itself sows – despite the repeated warnings of scholars from many areas, including PLN[xvii]. In any case, the recent advances in his conversational skills have already won over audiences in the global north. They have also been advancing in the global south, especially in the middle classes – which, in fact, contributes to accentuating inequalities.

Able to take dictations and control agendas, conversations, mail, phone calls, smart homes, etc., virtual assistants attract consumers who, through them, fulfill their desire to have a private secretary to whom they can transfer not only tasks , but also affections. As GML technology opens the way to new forms of dialogue based on complex dynamic systems, current “Elizas” tend to make spontaneous leaps, being able to acquire new skills that generate increasingly less predictable – and perhaps even dangerous – utterances.

In any case, it has become impossible to stop the manipulation that floods the internet based on the Eliza effect. As Joseph Weizenbaum understood, this is a global mass phenomenon. This makes it imperative to clarify that this manipulation is based on false premises about human intelligence and natural language. This clarity is essential so that critical thinking can focus on possible strategies for confronting the political doctrine that naturally incorporates such premises.

A single word summarizes what is in common between the conception of language and the conception of intelligence adopted by internet platforms, namely: mechanism, that is, the philosophical doctrine that nature is governed by mechanical causality – which is always linear and deterministic. Now, history shows that mechanism is easily associated with authoritarian political views and has a special affinity with fascism.

The mechanistic conception of natural language, typical of the American version of structuralism, understands grammar as a set of rules for sequencing words. Its counterpart in psychology – behaviorism – is even more simplistic and reactionary: it conceives the human mind as a succession of atomic contents originating in impressions that come from outside.

According to the version of behaviorism formulated by the American psychologist B.F. Skinner, we don't even have a mind; we are moved only by fixed and sequenced behaviors thanks to a form of conditioning that he called operant. It consists of rewarding random behaviors in order to cumulatively shape them. So, for example, if a pigeon moves its wing, leg or beak to suggest dancing, the trainer rewards it with a portion of food. Little by little, these repeated reinforcements lead him to perform all sorts of pirouettes.[xviii]

In the 1950s, the young Noam Chomsky became famous for criticizing behaviorism and denouncing its affinities with fascism. Outraged by the mechanistic theses about natural language exposed by Skinner in the book Verbal Behavior, confronted him with a devastating review.[xx] This consisted of a convincing demonstration that the words of any language are organized in a hierarchical and discontinuous way – thus contradicting the sequential rules of finite state grammars that are the formal correlate of operant conditioning.

This is why the famous linguist and his disciples attack the “great language models”, insisting, with the same ardor as Weizenbaum, that they are nothing more than simulations. To this end, they highlight their difficulties with syntactic-semantic discontinuities, such as that of the verb ‘talk about’ and its object ‘the person’ in the sentence that confused the GPT-chat and Bard.

Syntax and semantics are not, however, the only weaknesses of these models. Another failure with an essential aspect of natural languages ​​concerns pragmatics, which includes linguistic and situational context.

Let us remember, by the way, that deixis is the aspect of natural language most dependent on the situational context. All words that point to the time and/or place of the enunciation and/or statement are only interpretable through a lived or imagined context. Let's take, for example, personal pronouns.

The ambiguity of the phrase “She said she takes her” is certainly inappropriate for written language, but perfectly natural for spoken language, whose context includes gestures, looks, tones and voice qualities. For example, if what is in question is who takes “her”, there is more than one way to designate the woman referred to. She can be co-referent to the first “she”, meaning “herself”, or it can be another. Otherness is generally indicated by a nod of the head or a simple look. The third “she”, obviously, can be a person or a thing.

These references can also be anaphoric, that is, the same roles can be attributed to people or things already mentioned. Furthermore, the verb “levar” can take on another meaning: for example, in a championship, “she takes her” can refer to an athlete (or team with a female name) who is the favorite to win the cup.

Let us now examine the analysis of this sentence by Bard. It is curious that he points out a duplicity of meanings, but illustrates them with examples that do not differentiate them.

“The phrase ‘She said she takes her’ has two possible meanings, depending on the interpretation of the pronouns ‘she’. In the first sense, both pronouns she refer to the same person. In this case, the phrase means that a woman said that she herself will take another woman. For example, a woman might say to a friend: I told you I'll take you to the movies.

In the second sense, the two pronouns ‘she’ refer to different people. In this case, the phrase means that a woman said she will take another woman. For example, a mother might say to her daughter: I said I'll take you to school.

To determine the correct meaning of the phrase, it is necessary to consider the context in which it is used. For example, if the phrase is said in a conversation between two women, it is more likely to have the first meaning. However, if the phrase is said in a conversation between a woman and a child, it is more likely to have the second meaning.”

And so, the chatbot continues presenting sentences with identical coreferences to, supposedly, differentiate the two meanings. The long and detailed explanation always makes the same mistakes. At the end, a comical “fictional” section tries to introduce a time lapse between the two occurrences of ‘she’.

Let’s see: “Furthermore, it is important to remember that the phrase ‘She said she takes her’ can also have a third meaning, in which the two pronouns ‘she’ refer to the same person, but at different moments in time. For example, the phrase ‘She said she will take her to school’ could mean that a woman said she will take a child to school, even if the child is herself as an adult.”

And, to conclude, he resorts, again, to his cliché summary about pragmatics: “In this case, the meaning of the sentence is determined by the context and the speaker’s knowledge of the world”.

The above strongly corroborates the simulation hypothesis, making the claim that the chatbot has intelligence and understanding of natural language implausible. On the other hand, it also shows that the simulation rarely fails to deceive the user: this only occurs when the relationships between words violate the sequential logic of the transformer, which always consists of predicting the next word.

Given this, we cannot help but conclude with a question: what is the persuasive power of this logic, given that it is correct in most cases? To try to answer it, let's examine another application of transformers in which the successes outweigh the mistakes. This is a means of estimating, by successive approximations, the next signal sample of a voice recording. This technique is capable of altering a voice to the point of making it a more or less convincing clone of another.

What if Eliza had a familiar voice?

Despite already being quite advanced in the global north, voice cloning remains under debate, due to its ethical implications. It is a technology that takes advantage of the agility and computational power of transformers to answer the following question: how to reproduce a given voice in an unlimited way, that is, extend it to utterances not recorded by the speaker?

The answer is simpler than it seems. Simply superimpose the acoustic characteristics of the voice in question onto the output of a text-to-speech conversion system. To do this, it is necessary to obtain a good-sized sample of the target voice and repeatedly compare it to the synthetic voice. Phrases identical to existing ones are first synthesized, in order to facilitate the modeling of the acoustic parameters of the target voice. The synthetic voice is then subjected to multiple comparisons and modified by successive approximations, until each sample becomes estimable from the previous one with a negligible error. The resulting function, which converts the waveform of one voice to another, is called the voice model.

The change is made in stages. When the perceived quality of the resulting signal becomes satisfactory, the model is ready to be applied to new utterances. It is then reiterated for each sample until an acceptable error rate is achieved in predicting the next one, and so on. These recurring corrections have the effect of bringing the tones and timbres of the voices involved closer together in order to make their qualities increasingly closer.

As the international press has reported, there are already “clones”, that is, models, of the voices of countless deceased celebrities. One can, for example, apply a singer's voice model to recordings of one of his imitators to maximize the naturalness of the imitation and, thus, allow its extension to new songs, including those that appeared after the death.

Os covers Digital speakers tend to do better than humans because the technology for predicting and modifying the speech signal dilutes the effects of morphological differences between the vocal apparatuses involved.

It should be noted that imitators do not have total control over the quality of their voice, as all vocalization is subordinated to the physical limits of the body that emits it. This is why these artists act in a more or less caricatural way, generally accentuating the most prominent features of the imitated voice.

The approximation method, by minimizing the prediction error between successive samples, automatically corrects, among other parameters, those that convey physical differences between speakers or singers. Applied to similar voices found in field databases, it allows for near-perfect clones.

With this technology, which is available on the internet for testing and/or acquisition, the voices of secretarial and/or therapeutic robots can be chosen, as long as they do not violate copyright. In fact, in the USA there are already firms that “recreate” deceased loved ones in order to allow new interactions for interested parties with their voices and images.[xx] These avatars are created based on videos and texts left by those absent. There are even shows in which dead artists perform with living counterparts in song and dance numbers.[xxx]

It is now worth reflecting on the possible consequences of combining a chatbot with cloned voices and animated images. What first catches your eye is the magnification of the users' transferential relationships with the “humanized” robots.

Another obvious consequence is the difficulty of regulation. For example, it is difficult to prevent dubious or even abusive content in distance learning materials. Anyone can set up a friendly robot to teach a course on any subject with information provided by a chatbot, without any moderation by a professional in the field.

Another obvious example lies in the possible uses in marketing. By facilitating the creation of “adorable” robotic advertisement boys designed to advertise products with engaging voices scripted by chatbots, these tools make it almost impossible to define false advertising. Is manipulating consumer affections an attempt to deceive or not?

Perhaps, before continuing, the reader would like to pause to reflect on the possible uses – good or bad – of these resources in their field of work. You will probably be surprised by the diversity and multiplicity of possibilities that will soon occur to you.

Final considerations

To conclude, let's ask ourselves what kind of risk the accelerated development of these technologies could pose. Having discarded the hypothesis of the superintelligence of robots and their understanding of natural language, we no longer need to fear being surpassed and, eventually, destroyed by these machines – unless we have unduly attributed some arms control to them. If this error had been duly avoided, would we have anything left to fear?

An obvious answer involves reducing jobs. Today, it far exceeds predictions at the beginning of the automation era. Imagine, then, what could happen from now on, when virtual assistants increasingly tend to replace attendants, receptionists and others responsible for communication between companies and the public.

Thus, as is already the case with online banks, businesses and public offices, the absence of someone who can understand customer demands tends to increasingly compromise the quality of services. In this way, complaints will hardly work, as there will be no one who can hear them and put themselves in the complainant's shoes – no matter how nice the robot in charge may seem.

We will then be relentlessly subjected to the tyranny of online forms to obtain whatever we want. And it is in this limitless bureaucratization, managed by machines that understand nothing, that lies the greatest risk to humanity from language technologies: disembodied, denaturalized, dehumanized, compulsory conversation with machines could end up causing an unprecedented shock to collective mental health.

We are not machines, we do not think, act or speak according to the sequential laws of mechanics. But we can learn to emulate its logic in order to more easily obtain the products and services we need.

And, as Charles Chaplin predicted in Modern times, we run the risk of developing the most diverse delusions until we plunge into an obsessive delirium – which is not the same as that which affects agents of fascist regimes: it is the pusillanimous version that infects all who submit to them.

* Eleonora Albano, retired professor from the Institute of Language Studies at Unicamp, is a psychologist, linguist, essayist; she coordinated the first Brazilian project on speech technology.

Notes


[I] Language models and linguistic theories beyond words. Nat. Mach. Intell. 5, 677-78 (2023).

[ii] John B. Watson. Psychology as the Behaviorist Views it. Psychological Review, 20, 158-177.

[iii] David Lyon. Surveillance and the power of platforms Cambridge Journal of Regions, Economy and Society, Volume 16, Issue 2, July 2023, Pages 361–365, https://doi.org/10.1093/cjres/rsad006

[iv] Frederick Jelinek, a renowned Czech-American researcher in natural language procedure and speech recognition, achieved notoriety in 1985 by saying, “every time I fire a linguist, the performance of the speech recognizer increases.” This sentiment is also true for state-of-the-art language models (LLMs), which appear to be primarily artifacts of computer science and engineering. Both LLMs and linguistics deal with human languages, but it is not clear whether or how they can benefit each other (author's translation).

[v] Josh Dzieza. The (sub-)human army that powers AI. Other words, June 29, 2023.

[vi] Here is a clear example:
https://alexandrehefren.wordpress.com/2010/03/14/determinismo-e-gramatica-sintagmatica-gs-parte-1/

[vii] Why large language models are poor theories of human linguistic cognition. A reply to Piandadosi. Biolinguistics, 2023, in press.

[viii] Noam Chomsky: The False Promise of ChatGPT. Invited rehearsal. NY Times, March 8, 2023.

[ix] “As an AI, I am a machine learning model trained on large amounts of text data and have no personal experiences or feelings. I am not conscious, self-aware, or capable of personal perspectives. I can provide information and analysis based on the data I have been trained on, but I do not have the ability to form personal opinions or beliefs. Although I can understand and interpret language, I do not have the subjective experience that humans have.” (automatic Google translation, reviewed by the author).

[X] A “transhuman” is a hypothetical human creature with brain implants that connect it to robots.

[xi] “The concept of chatbot intelligence, consciousness, sentience and ability to make moral judgments is complex and is still being debated by experts. There is no single definition for these terms and they are often used interchangeably.

Some experts believe that chatbots can be intelligent, but not conscious or sentient. They argue that chatbots are simply very sophisticated algorithms capable of processing information and responding in a way that appears intelligent. However, they do not have the same level of consciousness or self-awareness as humans.” (automatic Google translation, reviewed by the author)

[xii] “Both sentences are grammatically correct, but they have different meanings.

The first sentence, “The person Mary met yesterday and John will talk to Ed about tomorrow has arrived,” is more concise and direct. It tells us that the person Mary met yesterday is the same person John will talk to Ed about tomorrow.

The second sentence, “The person Mary met yesterday has arrived and John will talk to Ed about his new idea tomorrow,” is more detailed. It tells us that the person Mary met yesterday is the same person John will talk to Ed about his new idea tomorrow.

The first sentence is better if you want to emphasize the fact that the two people are equal. The second sentence is better if you want to emphasize the fact that John will talk to Ed about his new idea.

Ultimately, the best phrase depends on the context in which it is being used.”

[xiii] Joseph Weizembaum. Computer Power and Human Reason: From Judgment to Calculation. New York: WF Freeman & Co, 1976.

[xiv]Joseph Weizenbaum; Gunna Wendt. Islands in the Cyberstream: Seeking Havens of Reason in a Programmed Society. Translator: Benjamin Fasching-Gray. NY: Litwin Books 2015 [2008].

[xv] John McCarthy An unreasonable book. Available in:
http://jmc.stanford.edu/artificial-intelligence/reviews/weizenbaum.pdf

[xvi] Dwight Hines. Review of Computer Power and Human Reason: From Judgment to Calculation, by Joseph Weizenbaum, The Journal of Mind and Behavior, Spring 1980, Vol. 1, No. 1, pp. 123-126.

[xvii] Last May, news broke in the press that Geoffrey Hinton, the father of AI, had left the company because he regretted his contributions to the field. See it at:

https://www.bbc.com/portuguese/articles/cgr1qr06myzo

[xviii] Here's Skinner training pigeons in his laboratory: https://www.youtube.com/watch?v=TtfQlkGwE2U

[xx] Noam Chomsky. Review of Skinner's Verbal Behavior. Language 1959; 35:26–58.

[xx] https://www.hereafter.ai/

[xxx] https://www.youtube.com/watch?v=Jr8yEgu7sHU&ab_channel=TalentRecap


the earth is round exists thanks to our readers and supporters.
Help us keep this idea going.
CONTRIBUTE

See all articles by

10 MOST READ IN THE LAST 7 DAYS

See all articles by

SEARCH

Search

TOPICS

NEW PUBLICATIONS