Settling for Scores

In 1979, the psychologist Donald Campbell proposed an axiom. “The more any quantitative social indicator is used for social decision-making,” he wrote, “the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor.” He also wrote: “Achievement tests may well be valuable indicators of general school achievement under conditions of normal teaching aimed at general competence. But when test scores become the goal of the teaching process, they both lose their value as indicators of educational status and distort the educational process in undesirable ways.”

Put simply, when the measure becomes the goal, and when people are punished or rewarded for meeting or not meeting the goal, the measure is corrupted. As Richard Rothstein has shown in his superb monograph, “Holding Accountability to Account,” tying high stakes to measurable goals affects behavior in negative ways in every field, not just education. Judge heart surgeons by the mortality rate of their patients, and they will turn away risky patients. The classic (and probably apocryphal) illustrations of Campbell’s law come from the Soviet Union. When workers were told that they must produce as many nails as possible, they produced vast quantities of tiny and useless nails. When told they would be evaluated by the weight of the nails, they produced enormous and useless nails. The lesson of Campbell’s law: Do not attach high stakes to evaluations, or both the measure and the outcome will become fraudulent.

For the last 16 years, American education has been trapped, stifled, strangled by standardized testing. Or, to be more precise, by federal and state legislators’ obsession with standardized testing. The pressure to raise test scores has produced predictable corruption: Test scores were inflated by test preparation focused on what was likely to be on the test. Some administrators gamed the system by excluding low-scoring students from the tested population; some teachers and administrators cheated; some schools dropped other subjects so that more time could be devoted to the tested subjects.

In his new book, Daniel Koretz, an eminent testing expert at Harvard University, has skillfully dissected the multiple negative consequences of the education reforms of the 2000s, most of them unintended. His title, The Testing Charade: Pretending to Make Schools Better, sums up his conclusion that the reform movement failed badly because of its devotion to high-stakes testing as the infallible measure of educational quality. Koretz says the results of the testing inflated scores and were not valid. But reformers did not withdraw their support for testing even when the harm it inflicted on children and public schools became evident. Some were ignorant of the evidence of failure; others believed tests provided valuable information, despite the corruption of the data by high stakes. Since federal law required states to label schools with low scores as failing, and since those schools were often turned into charter schools, a whole industry benefited from this system—even though the same measures labeled many charters as failing, too. The Every Student Succeeds Act (ESSA), passed in 2015, still requires that every child be tested every year—a practice unknown in any high-performing nation.

Legislators’ and policymakers’ obsession with testing has been locked into place since January 8, 2002, when President George W. Bush signed into law his signature domestic legislation, the No Child Left Behind Act. Before NCLB, every state had its own tests and its own accountability measures, but none was as harsh, punitive, and unrealistic as NCLB. None required every school to reach 100 percent proficiency or face mass firings or closure or both.

THE TESTING CHARADE: PRETENDING TO MAKE SCHOOLS BETTER by Daniel Koretz
University of Chicago Press, 288 pp., $25.00

When Bush campaigned for the presidency, he portrayed himself as a “compassionate conservative” who knew how to overcome “the soft bigotry of low expectations.” He said he knew how to raise test scores, raise graduation rates, and close the achievement gaps between children of different races and classes. Test every child every year from grades three to eight, publish the scores for every subgroup by race, gender, disability status, and so on, then reward the schools that raised scores, embarrass those that didn’t, and voilà! Problem solved. Test scores and graduation rates go up, achievement gaps close. It was all common sense: Get the incentives and sanctions right, and results were sure to follow.

Bush’s surrogates claimed there had been a “Texas miracle,” that annual testing had dramatically raised scores and graduation rates in the state while he was governor. In fact, there was no miracle. Texas was not a model for the nation in 2000, nor is it now. But Congress bought it, and NCLB required every public school in the nation to test every child every year in reading and mathematics from grades three to eight. Schools were required to reach 100 percent proficiency on standardized tests by the year 2014, only twelve years later, or face dire sanctions, including closure or privatization. Not only did the Bush administration vastly expand the federal government’s role in education, venturing where no other administration of either party had dared to go, it set a goal that was literally impossible for schools, districts, and states to meet. Democrats as well as Republicans supported this massive federal intrusion into pedagogical and local matters. Senator Ted Kennedy and Democratic Congressman George Miller of California, both noted liberals, proudly supported Bush’s folly.

When Obama took office in 2009, the failure of NCLB was already apparent, but the president and his secretary of education, Arne Duncan, were true believers in George W. Bush’s vision, and they doubled down on standardized test scores as the definition of success. Under their Race to the Top program, thousands of educators were fired, and countless public schools were closed or handed over to private charter operators. The devout belief in standardized tests as the measure of learning and the equally devout belief in turning public schools over to charter entrepreneurs came to be known as the “reform movement.” It was, in reality, a giant federal wrecking ball that did immeasurable damage to students, teachers, and public education.

Koretz excoriates the reform movement for its indifference to the harm it caused. He criticizes its iconic leaders—Joel Klein, Michelle Rhee, and Arne Duncan—for their fanatical devotion to standardized tests without regard to how the scores were obtained. Duncan demanded that all teachers must raise the test scores of their students, that they must be rated by their ability to raise scores, and that those who couldn’t consistently raise scores year on year should be fired. No excuses. Teachers of the gifted, whose students were already at the top, would be rated ineffective if they didn’t raise those scores even higher—say from 3.97 out of 4 to 3.98. Teachers of art and physical education and other subjects that were not tested were assigned a rating based on the scores of students they didn’t teach, in subjects they didn’t teach. Duncan applauded when Los Angeles and New York City publicly released the rankings of teachers, even though the rankings were rife with errors. Rigoberto Ruelas, a teacher in a tough Los Angeles neighborhood, committed suicide after his average ranking was made public. No one apologized.

The reformers’ obsession with test scores, Koretz writes, prepares students to take tests, but it does not prepare them to apply what they have learned to real life situations. Typically, students are prepped to take a specific test. Switch to another test in the same subject for which the students have not prepped, and their scores are likely to plummet. Placing so much importance on test scores, Koretz writes, was certain to produce score inflation, not better education. Teachers were told to give “interim assessments” frequently during the year to prepare for the real test, but doing so took time away from nontested subjects like history, the arts, civics, even physical education and recess. Reformers point to higher scores as “proof” that their reforms worked. This is circular reasoning.

Will the Common Core standards fix this mess? Koretz says no. The standards mistakenly assume that one curriculum is right for everyone. The whole reform package was mandated without regard for evidence. Reformers suffer, Koretz writes, from an “arrogant assumption that we know so much that we don’t have to bother evaluating our ideas before imposing them on teachers and students.” This assumption is especially startling when you realize how few of the reformers ever taught or taught for more than two years as members of Teach for America.

Koretz is not anti-testing. He is not even anti–standardized testing. He opposes the misuse of tests and would prefer to see them used as diagnostic tools, disconnected from rewards and punishments. Koretz proposes that standardized tests should be coupled with teacher tests and other measures of student performance, and that teachers get extra help to improve classroom instruction. He recognizes that nearly two decades of test-driven accountability, attached to harsh sanctions, has deeply embedded the power of standardized testing in the psyche of teachers and principals and that it will take years to make policymakers and educators aware of the pernicious effects of high-stakes testing.

But there are, Koretz finds, two reasons to feel hopeful about the future. One is that 2015’s ESSA is somewhat less punitive than NCLB. The other is the opt-out movement; in New York, for example, one-fifth of parents refused to allow their children to take the federally mandated state tests. The students sit for the tests in the spring, sometimes for as long as 18 hours—longer than the bar exams—over two weeks, which is unreasonable, especially for children so young. The test results are not reported until summer or fall, when the students have different teachers. Neither the students nor the teachers are allowed to discuss the test questions to find out what they got wrong and how they can do better. The students are given a numerical ranking: They learn how they compare to others of their age. But because of test secrecy, the tests have no diagnostic value. None.

Worse, they teach young children to look for the right answer instead of looking for the right question. Some test questions may have two right answers or no right answer. The thoughtful child may choose an answer that is plausible but judged “wrong.” As a teaching tool, the tests are deeply flawed because they quash imagination, creativity, and divergent thinking. These are mental habits we should encourage, not punish.

Since test scores are highly correlated with parental income and education, children from affluent homes learn they did well. Children from poor homes learn they did poorly. The British author Michael Young wrote in the introduction to the revised edition of his classic book The Rise of the Meritocracy about the pernicious social effects of standardized tests. The children from elite homes are convinced by their test scores that they deserve their high status; their scores demonstrate their superiority. And children of the poor learn early on that they rank poorly; their test scores confirm their lowly status.

Despite the clear failure of test-based accountability, which Koretz amply documents, policymakers cling stubbornly to this corrosive doctrine. When Betsy DeVos says she will leave decisions about testing to the states, what she means is that the status quo of federally mandated annual testing in grades three to eight will remain undisturbed. Supporters of privatization appreciate the testing regime because every year it produces failing schools, the bottom 5 percent that can be closed and handed over to entrepreneurs and charter chains. Testing taps into Americans’ love of competition, incentives, and scores. It makes perfect sense to rank baseball players and teams by their wins and losses, but it doesn’t transfer to children or schools. Children may be talented in the arts or sports or other areas, and it won’t show on the tests. Education is a developmental process, a deliberate cultivation of knowledge and skills, a recognition of each child’s unique talents, not a race.