There’s a lot of confusion among policy makers, educators, and the general public about states’ current options for their K–12 accountability assessment programs. Why? To explain the background and reasons for the confusion, I recently published a paper called “Proficient, Eligible to Graduate, College-Ready? The mystery of achievement-level assessment results.” Here’s a quick summary of that paper.
In the 1990s, largely because of ESEA requirements, states shifted the focus of their assessment program results from scaled scores and percentile ranks to achievement levels. To do this, they went through a process called standard setting, which resulted in “cut scores”—the minimum test scores students needed to attain to be classified in various levels such as Advanced, Proficient, and Needs Improvement. Achievement-level reporting is seemingly simple to understand—just like passing and not passing any teacher’s test. Yet in many ways, it can be extremely confusing, particularly when comparisons of different assessment programs are involved.
Tests that would obviously be highly correlated can produce very different results in terms of the percentages of students categorized as “Proficient,” simply because of where on their score scales the programs have set their cut scores. As obvious as this may seem, many have jumped to erroneous conclusions when NAEP state results have differed from the states’ own test results, when commercial test results have differed from states’ own test results, and even today when states are comparing different program options to decide on their future programs. “Percent proficient” results are simply not comparable across programs. The discrepancies are not reflections of differences in test content or test quality or even test difficulty necessarily. Furthermore, assessment programs can and do make adjustments over time to where the bars are set.
How does one reasonably evaluate the relative merits of testing programs? One should look at direct evidence of test content and quality, along with costs, utility of results relative to program purposes, buy-in of various “consumers,” and other pertinent qualities of the programs. For more on this topic, see the full paper on our Opinion page.