What and why to resist



Hijacked by financial capitalism, the knowledge society does not offer safe working conditions even to those included

We are immersed in a model of society that has gradually distanced us from our humanity. Our personal and professional lives depend on multiple digital systems, which we often end up getting carried away with. It turns out that a constructive coexistence with this state of affairs requires some preparation to react to passivity, which tends to be more viable for those who have access to forms of knowledge that legitimize inclusion in that same society.

By the way, it should be noted that the progress of digital science, while favoring the included, deepens exclusion, by supporting the growth of the financial system, causing the retraction of the productive sector and the consequent mass unemployment. It is a worldwide phenomenon, with many and distinct regional specificities.

That is why it is the responsibility of scientists to react to passivity, reflecting on the resistance that their work can offer to the dehumanization of global society. It is also up to us to share this knowledge as widely and transparently as possible.

For language scholars – artists, critics, philosophers, scientists, etc. – this task should not be difficult. We all know that the possession of natural language is what humanizes us. In view of the cognitive and emotional difficulties caused by language deprivation in children alienated from human interaction due to abuse, abandonment or disrespect for difference.

However, the experts themselves are perplexed by the manipulation of discourses fostered by new digital technologies. We tend to see them as a devastating wave, the confrontation of which transcends us. In fact, we can't do more than a day's work like an ant. One way to do so is to try to expose our acts of resistance in terms that are accessible to an informed, but not necessarily academic, public.

Below, therefore, is a brief summary of my fifty years of academic study of language, focusing on the choices that today can be seen as resistance to the advance of anti-humanism – already visible in the 1960s. about dystopian societies such as Admirable new world, by Aldous Huxley, and 1984, by George Orwell, were in circulation.


Speech technology and knowledge society

In that same decade, the computer metaphor had definitely invaded language and mind studies. Authors from the most diverse lines adopted a vocabulary that included terms such as input, output, module, processor, etc.

Now, for a philosophy buff like me, this soon became a matter of reflection. In fact, during my undergraduate and graduate studies, it was fascinating to watch the emergence of technologies that emulate the human production of my object of study – speech sounds. It was both exciting and surprising that the machine metaphor, previously applied only to the body, had begun to spread to different aspects of the mind.

But it wasn't until well after my doctorate that I decided to contribute to the dissemination of this scientific trend in Brazil, building, with colleagues from the Faculty of Electrical and Computing Engineering at Unicamp, the first concatenative text-to-speech conversion system for Brazilian Portuguese, the Aiuruetê – Brazilian parrot.

After overcoming the lack of funding for research in the 1980s, we proposed a joint project to funding agencies in the area of ​​synthesis and speech recognition. The aim was to help reduce the risk of the country becoming marginalized in the knowledge society.

In the 1990s, this society, based on information and communication technologies, expanded rapidly throughout the world. Here, however, it has just begun to stimulate research in written digital communication. It seemed urgent to us to create the necessary bases to also produce research in spoken digital communication.

Initially, this was for us an act of resistance to the telecommunications multinationals, which were already interested in our language, due to its mass of speakers, seen as potential consumers. What were our surprise and indignation when we had to fight against a Brazilian company, which we had joined not by choice, but because of the previous existence of an agreement with Unicamp.

At that time, Unicamp still did not regulate intellectual property. The associated companies always took the lion's share. The result was that our “collaborators” in the market appropriated the first prototype of our system and, after some modifications, sold it to a foreign company, without any consideration for the University.

Aiuruetê, created after the break with the company, was an attempt to occupy the field and pressure Unicamp to safeguard its intellectual property. The project aimed not only to build a synthesis system, but also to form a team of young speech scientists – that is, professionals with training in linguistics and telecommunications engineering. Thanks to funding from Fapesp, the system was up to the state of the art at the time and our effort managed to multiply the trainers in the area, who spread across the country.

Fortunately and unfortunately, we didn't stay in the field for long. The happy side is that we started to do more creative and challenging work. The unfortunate side is that the interest in speech science that we fostered in the country served to feed a market with which we had no affinity.

The reason is that the production of speech synthesis and recognition systems has become increasingly automatic and dependent on machine learning. It was no longer about working with rules, but with statistical patterns that the machine discovered through repetitive training in large databases, segmented and labeled by humans – which, later, lost their usefulness.

Behind these technological advances are new versions of tools already used in an artisanal way by pioneering systems in the field. For example, Aiuruetê used a neural network to learn prosodic patterns from a small manually segmented and labeled database. This automatically assigned prosodic structure to the input text, allowing another module in the system to adjust the pitch, duration, and volume of concatenated speech snippets. At the time, this procedure achieved, if not high naturalness, at least an intelligibility compatible with contemporary systems in the rest of the world.

It is noteworthy that the first speech technology systems were all handmade, that is, they depended on rules and criteria based on linguistic and/or engineering knowledge. In contrast, the ones that now inhabit our cars, computers or cell phones were made much more automatically, with the help of various types of machine learning. They are the result of projects funded by giants such as Apple, Google, Amazon, Microsoft and some of the largest banks in the world.

Machine learning systems detect statistical patterns in huge databases, usually compiled from components provided by specialized third-party companies. In most of them, a highly qualified workforce, responsible for the segmentation, classification and organization of data, has very well-paid jobs, but precarious, because they are temporary. By the way, all of them have a scientific background, ie, they are linguists, psychologists, computer engineers, telecommunications engineers, etc.

This simple example is enough to demonstrate that the knowledge society does not offer safe working conditions even to those included. It is because it was kidnapped, some time ago, by financial capitalism, which attracts part of the profit obtained from its products for investments in speculative markets.

Thus, plunder is everywhere, including in offices and laboratories. At this juncture, scientists are faced with the mission of organizing themselves to protect not only the fruits of their work, but also the welfare state, without which any worker is fated to function within an impersonal and inhuman gear. .


Humanistic uses of dynamical systems

The glimmer of hope for some scientists in my age group is that we know that none of these advances were intended to serve the market, but to solve fundamental basic research problems. This indicates that they are conceptually very powerful and can continue to sow socially constructive progress in other areas of knowledge.

The undue appropriation of work that we carried out in favor of the sovereignty of national scientific and technological development cost my engineer partner an early retirement and, me, many attacks from incomprehensible and/or opportunistic colleagues – which I resisted as best I could.

But this did not mean that he could deny the undertaking. Despite everything, it had brought me a transdisciplinary experience, that is, a transit on the frontier between the humanities and exact/technological sciences. In that effort, I ended up needing to study the fundamentals of certain tools that had played a crucial role in the advent of the knowledge society.

Some of them are directly linked to my later acts of resistance, aimed at internationalizing my laboratory and research group. I have always understood science as a heritage of humanity, whose local appropriation must be sensitive, at the same time, to the state of the art and to global and regional socio-political injunctions. In my opinion, knowing what the world sees as cutting edge is a condition sine qua non to advance, innovate or revolutionize anywhere in the world.

To illustrate this position, an example will suffice. One of the concepts used in digital technology that has direct implications for the study of speech sounds is the dynamic system. Dynamic systems are mathematical objects used to model physical phenomena whose instantaneous description changes over time. Although they originated in physics, they are applicable to many other fields, namely: economics, finance, ecology, social sciences, diagnostic medicine, and so on.

The basic idea is that every dynamic system has a state, that is, an instantaneous description, sufficient to predict its future states without resorting to previous states. Thus, for example, an oscillator is a dynamic system, as it describes a movement in which any state, once described, allows predicting the following ones. Furthermore, the temporal evolution of these states can be understood as a continuous sequence or trajectory through a space constituted by the possible states of the system, called state space.

These two properties make it possible to model dynamic systems using a well-known mathematical tool: differential equations. These systems therefore have a high predictive power – forwards and backwards in the timeline. To use them as a diagnosis, just invert the temporal direction.

It should be noted that, for a humanist, working with the notion of a dynamic system does not necessarily mean modeling it with differential equations. It is perfectly possible to leave this task to a transdisciplinary partner – mathematician, engineer, computer scientist, etc.

In the human sciences, what is essential is to be interested in the temporal evolution of the object of study and to be able to use the term “dynamic system” not as a vague metaphor, but with a precise heuristic sense. For that, it is necessary to know how to see, in the imaginary trajectory of the object, properties of some kind of known dynamic system. It is also desirable to know how to collect at least some quantitative data, in order to interpret them in the light of the concept and feed the modeling, when it is possible and opportune.

In the study of speech sounds, there is an important object that behaves like an oscillator. It is precisely the articulatory gestures that produce them. It was the familiarity with this idea that allowed me to adhere to a line of thought that was taking off in the 1990s and is now recognized as one of the leading international arms in the field. This is called gestural or articulatory phonology.

Gradually, this theoretical position attracted to my laboratory the sympathy and respect of some foreign colleagues. It also attracted the enthusiasm of a generation of new talents, with the help of which I built a conception of language acquisition that presupposes the integration of motor, cognitive and social skills. This position rejects the traditional view that cognition commands action in favor of another, more daring one, which assumes that cognition is constructed from shared, explicit or implicit action.

This paves the way for certain fine analyzes of speech sounds, which in turn make it possible to debunk certain myths. For example, it is possible to unravel kinship relationships between standard and stigmatized pronunciations. Thus, in a sound change, one can trace the trajectory from a conservative gesture to an innovative gesture, or vice versa. This shatters the myth of “wrong” pronunciation.

Likewise, in the so-called speech disorders, it is possible to discover kinship relationships between typical and atypical pronunciations. Thus, one can observe the speaker's attempts to approach the target pronunciation, sometimes even insistently. Even when the differences between these attempts are inaudible, physical and conceptual tools make their trajectory observable. This shatters the myth of the crippling deficit.

In conclusion, I must say that today it is a joy to have managed to form scholars of first and second language acquisition capable of detecting and interpreting small differences in the movement of articulatory organs. Many people are no longer branded as “abnormal” thanks to this approach, which committed professionals trained in my laboratory are taking from academia to classrooms and offices.

Another happiness is to have stimulated the natural tendency of these talents to offer resistance to the forms of conservatism of the surrounding environment. For this, it was essential to make use of some broader theoretical tools, which I will briefly expose below.


Return to the philosophies of action

My passion for philosophy is also a passion for freedom of thought. Many philosophers influenced the gestures of resistance to anti-humanism that were part of my scientific trajectory. It should be explained that this began during the military dictatorship and had a long passage through the country of the leaders of the coup that instituted it.

Among the philosophers who inspired me, Ludwig Wittgenstein is undoubtedly the most useful to my work as a teacher, as he allows me to base my choices both on psychology and phonetics, two disciplines that are part of my field of work on a daily basis.

Being a Vygotskian as a psychologist and a Stetsonian as a phoneticist are choices that are consistent, as they converge on the assumption that the roots of cognition reside in action. The Russian psychologist Lev Vygotsky spoke of knowledge as shared and internalized action. The American psychologist and phoneticist Raymond Stetson spoke of audible movements as constitutive of oral language. It is this kind of thinking that underlies the version of gestural phonology that has been practiced in my laboratory for over two decades.

But perhaps Vygotsky and Stetson would not sound so convincing today if they had not had a philosopher – Wittgenstein – as a contemporary willing to demolish some of the most solid myths of the theory of knowledge, namely: defining traits of classes; fixed rules; private language.

The three expressions above are self-explanatory in terms of the power of thought that rejects them in supporting acts of resistance. So this narrative can stop here.


The question that never goes away: how to resist the current obscurantism?

Finally, I must confess that not even 50 years of academic experience have prepared me to face current forms of obscurantism. During the military dictatorship, this threat surrounded academia and claimed many victims. But it has never been as globalized and organized as it is now.

The incessant attacks on Universities and development agencies by the current government call for collective reflection and action. In a context in which many scientists have already become entrepreneurs or outsourced workers at the service of the market, the last stronghold of resistance to anti-humanism seems to reside in trade unions, student organizations and scientific associations.

* Eleonora Albano is a professor of phonetics and phonology at the Institute of Language Studies at Unicamp. She is the author, among other books, of The audible gesture: phonology as pragmatics (Cortez).


See this link for all articles