Dr. Steve Ferrara is a Senior Advisor for Measurement Solutions at Measured Progress. The Measured Progress Press Room team sat down with Steve to discuss the standard-setting process he led during the week of August 21, 2017, for eMPower AssessmentsTM.
[This conversation has been edited for clarity and length.]
Tell us more about standard setting in general. What is it, exactly?
Standard setting is the process by which an organization determines the achievement standards for students on any given assessment. It is the methodology that guides the assembled team of experts in content standards, students, and student learning.
Tell us more about achievement levels and standards.
The levels of achievement identify the cut-scores that correspond to those levels. For eMPower, we’re setting standards to identify the Basic, Proficient, and Advanced achievement levels. Proficient is the level that indicates whether students are on track to be college- and career-ready by the 8th grade. What we mean is that 3rd and 4th graders and so forth who reach Proficient are on the right path to being on track for college- and career-readiness when they finish grade 8. Proficient at grade 8 is based on the PSAT 8/9 College and Career Readiness Benchmark for 9thgraders.
Why is standard setting important for assessment?
It’s important because it provides a systematic process for testing program leaders and educators to define levels of achievement that account for both policy considerations like, “What aspirations do we have for our children?” and instruction and learning considerations like, “What are reasonable and attainable goals for student achievement at each grade, for the next grade, and for the one after that?” It helps everyone be on the same page.
Once the achievement levels are defined, we can make meaningful inferences about what students know and can do in a content area and grade. We’ll know what steps we can take to best support struggling students to help them reach the Proficient standard.
As an assessment company focused on student learning, Measured Progress takes a principled approach to test development and to standard-setting design, to make sure the resulting score reports meet the needs of parents, teachers, and district officials. We focus all decisions on the intended interpretations and uses of the eMPower score reports, based on the Achievement Level Descriptors, or ALDs.
A key to being principled is to write the ALDs at the beginning of the test-development process. It helps guide us as we create a new assessment. ALD’s summarize the knowledge, skills, and abilities expected of students at each achievement level.
What are the elements that go into standard setting?
There are three elements that work together to produce the achievement standards:
- Content standards
- The process for setting scores—for example, the ID Matching method.
We also use benchmarking to guide the standard setting. The teachers who conduct the standard setting tell us whether that eMPower score makes sense from a test-content point of view—and they recommend a different cut-score, if necessary.
I understand you used the Item-Descriptor Matching method last week to do standard setting. Tell us more about the other methods you could have chosen.
Sure. There are numerous ways to do standard setting.
- There is the Body of Work method that was actually invented at Measured Progress by our founder, Stuart Kahl, and Neil Kingston, another prominent researcher.
- Another famous method is the Angoff method, which was developed 35–40 years ago. It’s named for its inventor, the esteemed William Angoff.
- The most widely used method in educational testing is called the Bookmark method, which is used in many state assessments.
- The ID Matching method is something I developed with Ross Green when I was state assessment director in Maryland. I’ve worked with additional colleagues over the years to refine ID Matching.
You can learn more about each of these methods in the textbook, Setting Performance Standards. I’ve used all of these methods previously, and they each have relative advantages and disadvantages.
So why choose ID Matching?
ID matching exploits a skill that humans are naturally very good at, which is matching. Educators are experts at content standards and learning, which makes them naturally very good at matching what knowledge and skills test items require of kids to the knowledge and skills in ALDs, which represent what they learn in school. Some of the other methods require skills that huma
ns are less good at, like probability judgments.
How does ID matching work?
Well, we have multiple groups of tables, each with nine teachers. These teachers have been selected for their content knowledge and their specific grade-level experience. Teachers come from all over the country, to make sure we have multiple regions, ethnicities, and teaching experiences represented.
Every participating teacher receives an Ordered Item Book. All the items in the test are included, but not in the order in which they’re presented in the test. The items are presented in order of increasing difficulty. The first item in the Ordered Item Book is the easiest one on the test, and the last item is the most difficult. The teacher’s task is to determine where the cut-scores go, between items.
This activity requires people to match the response demands of the items with the ALDs. It’s a cognitive/judgmental task that requires expertise. But because the teachers are experts in the content standards and the students at that grade level, they are really good at this task. Teachers works independently first, and then discuss their decisions with others.
What’s it like to participate in an ID matching session?
It’s fascinating to hear the discussions. It can get lively at times, because educators are passionate about clarity and making supportable decisions. It’s important to note here that standard setting is not about reaching consensus and it’s not voting. Educators share their insights about the items, ALDs, and students and then make independent recommendations about where the cut-scores should be.
How long does the matching process take?
We expect the process to last two-and-a-half days per group. Part of the efficiency of this process is that participants don’t have to match all the items, just those in the cut-score ranges. That cuts down on the cognitive workload for the participants.
Why should people care about the process we’re using to set standards for eMPower Assessments?
When we bring the scientific methods together with the teachers’ expertise, we create solid procedures to ensure outcomes you can trust.