By I WANT TO HAVE A HAIR & OTAVIANO HELENE*
A simple standardized exam like Enade is not suitable for evaluating an innovative and diverse higher education system, which is why no country in the world uses a similar methodology.
The issue of USP's participation in the National Student Performance Exam (Enade) is once again on the agenda, as has happened several times over the past twenty years, starting with undergraduate courses. Systematic monitoring and evaluation of the University's courses are welcome and necessary actions. USP should certainly take responsibility and create mechanisms for this task. However, USP's eventual participation in Enade should be treated with great caution, for three main reasons.
First, we will argue that a simple standardized exam like Enade is not suitable for evaluating an innovative and diverse higher education system, which is why no country in the world uses a similar methodology. A second reason is that, by participating in Enade, USP would unnecessarily subject its undergraduate programs to homogenizing regulations designed for other types of institutions, with the risk of narrowing their curricula. Finally, for USP courses, Enade would generate highly unstable numbers, inappropriate for supporting good educational policies.
It is necessary to know what the objectives of an evaluation are
It is widely recognized in the educational assessment community that the validity of an examination requires evidence regarding the use and interpretation of its results. The value and validity of Enade should always be judged in relation to how USP will use and interpret its results. In fact, any evaluation process must be preceded by a work that defines the purpose of what is being evaluated.
A university, for example, has multiple functions, such as training professionals, fostering social and cultural development, contributing to the advancement of production processes, producing and appropriating scientific knowledge, among others. In addition to these multiple purposes, it is necessary to know the objective of each one of them and how far they want to go. A simplistic answer would be to say that a university has all of these purposes and the ultimate goal is heaven. However, such an answer is irresponsible, because if more effort is directed towards one purpose, less effort will be left for the others.
Therefore, the question that should precede any discussion about evaluation is to know the objective of what is being evaluated. There are several examples of cases in which this question occurs before the evaluation. Perhaps the discussions regarding the large universities in the state of California, USA, can help to understand the process. A well-known and striking example was the planning of the higher education system of that US state, In the 1960's.
Currently, this system has three components: a colleges, with nearly two million students, providing professional and cultural training and serving as a stepping stone for students who wish to continue higher education at a university; California State University, with more than 400 students, whose main purpose is to train professionals, without excluding, of course, scientific and cultural production; and the University of California, with nearly 300 students, whose main objectives include the production of scientific knowledge, without excluding the training of professionals.
Obviously, in this example, the evaluation criteria for each part of this higher education system cannot be the same. Enade, on the other hand, even in its new version, is a single exam, the same for all courses in a given area, which seeks to characterize, with just one hundred multiple-choice questions, the four years of training of graduates of higher education courses, in just one or two dimensions. Let us ask ourselves: what is the role of USP courses in society? Enade helps USP assess whether its courses will contribute to “train leading professionals and citizens aware of their social role"?
Theodore Porter, in his classic book on the history of the construction of objectivity in public policies, points to the political advantages of creating supposedly objective metrics for managers (“it is the algorithm and the metric that decide”). In societies like Brazil, where there is a widespread feeling that judgment and expertise Human rights are particularly suspicious and subject to corruption, the use of objective tests and exams in evaluation processes such as public exams or bids is seen as indispensable. On the one hand, in education, the culture of external evaluation and comparability through standardized tests has resulted in important advances, such as the School Census, the Saeb and the management indicators derived from them, such as the Ideb, which monitor basic education throughout Brazil. However, it is important to recognize the limits of validity of standardized exams such as the Enade.
The inadequacy of standardized exams to assess higher education
Brazil is the only country in the world that uses an objective test for the dual purpose of evaluating and regulating its higher education system. Almost no other country seeks to monitor the quality of its tertiary education system through a standardized exam given to graduates at the end of their courses. The reasons for this are clearly explained in an OECD report, which at the request of the Brazilian government investigated its system of “quality assurance” for courses. The report, despite being written by an organization known for its commitment to managing educational systems using quantitative evidence, is very critical of the current system and shows that the objectives of Enade, to measure the learning of graduates of higher education courses, are completely unattainable, for three main reasons.
First, unlike an assessment at the end of primary education, it is not clear what to measure. There is not, and should not be, as is the case in primary education, a set of common skills and abilities that everyone should acquire. It is a characteristic of higher education that there is a great diversity of curricula and learning objectives. An Enade that assesses only the most generic skills devalues those courses that are capable of working precisely on those more specialized skills that are expected of a higher education course. But any assessment of more specific skills would necessarily follow a singular view of the objectives of a course, to the detriment of the plurality of views valued in more advanced stages of the educational system.
A second challenge is technical: how to create a relatively short exam that is capable of reliably assessing the content and skills acquired during four years of training? For regulatory purposes, the new Enade (for now, for undergraduate courses) proposes to use Item Response Theory (IRT), creating an exam with 45 multiple-choice questions for the General Training dimension and 60 for a specific dimension for each area. Even assuming that teaching competence is a construct that can be captured on a numerical scale of one or two dimensions – a highly problematic proposition for the reasons explained above – it is even more doubtful that one hundred multiple-choice questions are capable of fairly assessing four years of training.
Third, by establishing a standardized exam, which for practical reasons is necessarily short and focused on a very limited set of skills, there is a huge risk that courses will train their students only in these skills. This narrowing of the curriculum would hamper the possibilities for innovation and the ability of courses to adapt to changes and local circumstances.
All of these reasons mentioned by the OECD experts severely limit the reasonable inferences that can be drawn from standardized tests in higher education. This is why no other countries in the world assess their higher education courses through this type of process. A pilot project promoted by the OECD in 2013, the AHELO initiative, was abandoned as it was deemed unviable. In Europe and the US, rather than relying on a single form of assessment, higher education systems are assessed through accreditation agencies that employ broad and holistic forms of assessment.
The new Enade will increase the number of questions and use TRI to have comparability between years of application and improve accuracy. Like any standardized exam, the new Enade has the potential to provide valuable evidence that can help improve the direction of courses or public policies. But experts in educational assessment warn that a test does not necessarily measure what its title says and that a little equalization magic does not make two tests necessarily equivalent. The literature on evaluation warns especially about the near impossibility of using a single test for both diagnostic and regulatory purposes.
The functioning, validity, reliability and interpretation of the new Enade must be studied before using the numbers generated by it for regulatory purposes and other high-value purposes. It is very worrying that Inep, the body responsible for Enade, between 2014 and 2021 erroneously calculated the most important indicator derived from Enade, so that assigned essentially random numbers to courses. The error was finally corrected in 2024, but the fact that none of those interested in the evaluation process noticed its existence in all these years casts doubt on the real regulatory role of Inep's indicators, as well as its ability to control the quality of its own indicators.
Specific reasons why USP does not use Enade
In addition to the general difficulties highlighted by the OECD and evaluation experts, in the case of USP there are other reasons to be skeptical about participating in Enade.
First, a technical point. In the current Enade, discounting the effect of the Enem on new students, the variance in course grades is small: only around 10% of the variance in student grades. In other words, the variation in the Enade grade among students in a given course is much greater than the variation between course averages, and this will continue to be true for the new Enade. This means that any course indicator derived from students' Enade grades must be calculated with many participating students to have statistical reliability.
It turns out that, according to the Higher Education Census, only five of the 34 undergraduate courses at USP have more than 50 graduates (average of the last five years). For most undergraduate courses at USP (and also for other courses), any average or other indicator derived from Enade will be quite unstable and it would be a mistake to adopt policies based on them.
Second, in principle the exam could provide interesting evidence about USP courses, if the results were used for diagnostic purposes. But one argument that is being circulated is that participation in Enade could exempt courses from the need to renew their recognition by the State Education Council, clearly demonstrating the desire for their results to be used for regulatory purposes.
In this case, there will be almost inevitable pressure for courses to follow the narrow Enade framework instead of the National Course Guidelines, which purposely leave courses free to design their programs according to local realities. Likewise, adherence to Enade will weaken USP's own efforts to implement a broader program for evaluating its courses that is more appropriate to its own context.
Conclusion
It is unlikely that USP will benefit from participating in Enade; on the contrary, USP's submission to the federal regulatory system, as it is implemented today, implies great risks for the quality of our courses. There will be a risk of narrowing the curricula because of secondary effects (teaching to the test) to be evaluated by an exam that is too simple, inadequate for the diversity of USP courses.
Furthermore, by adopting a shallow form of regulation, based on unstable and poorly valid indicators, USP loses the opportunity to design its own, rich and multifaceted assessments of its courses. And for the majority of USP courses, with fewer than 50 graduates per year, the numbers generated by Enade will lead to indicators that are almost merely random, too unstable to support educational policies.
USP should strive to implement its own evaluation system for its courses and use its expertise to pressure higher education quality control bodies to use more valid methods for regulatory purposes. To be useful, an Enade-type exam would only work in a diagnostic role, for analysis units larger than courses and in a sample format, to avoid the side effects of the assessment that we have highlighted.
*I have no hair is a professor at the Physics Institute of USP.
*Otaviano Helene is a senior professor at the USP Physics Institute.
Originally published on Journal of USP.
the earth is round there is thanks to our readers and supporters.
Help us keep this idea going.
CONTRIBUTE