Thursday, June 14, 2018
If Classroom Observations Are Biased and Value Added Assessments Are Flawed, Where Do We Go Next to Measure Teacher Quality?
Shanyce L. Campbell and Matthew Ronfeldt have published a new study: Observational Evaluation of Teachers: Measuring More Than We Bargained for? They "provide the strongest evidence to date that teachers’ ratings are significantly related to the sociodemographic characteristics of the students they teach apart from differences in teacher quality." Their abstract offers this longer explanation:
Our secondary analysis of Measures of Effective Teaching data contributes to growing evidence that observation ratings, used as part of comprehensive teacher evaluation systems across the nation, may measure factors outside of a teacher’s performance or control. Specifically, men and teachers in classrooms with high concentrations of Black, Hispanic, male, and low-performing students receive significantly lower observation ratings. By using various methodological approaches and a subsample of teachers randomly assigned to classrooms, we demonstrate that these differences are unlikely due to actual differences in teacher quality. These results suggest that policymakers consider the unintended consequences of using observational ratings to evaluate teachers and consider ways to adjust ratings to ensure they are fair.
This is bad news for those who would replace the flawed statistical assessments of teacher quality with in person evaluations. As I detail here, attempts to measure teacher effectiveness based on how their students perform on standardized exams haven't worked. While these statistical evaluations make intuitive sense, the devil is in the detail. It is very difficult, if not impossible, to isolate the effect that a single teacher has on students in a single subject matter areas. This is not to say that teaching effectiveness doesn't matter. Of course, it does. But pinpointing precise effects and isolating them from what student may have learned from another teacher this year, last year, and other classes is not easy. This is to say nothing of the factors that are not even included in the data, but likely explain a lot--home factors, peer groups, etc.
Now this. We can't even accurately assess teacher quality when we go in the room and what them because racial and socio-economic biases appear to get in the way.
To their credit, Campbell and Ronfeldt seem to have found a way to filter the bias out. Once they took classroom demographics into account and adjusted the scores that teachers received based on the demographics of the students they taught, the evaluations made more sense. But this is not something a state would likely ever do.
Unfortunately, I don't have an answer to the question I posed in the title to this blog post, but I welcome your thoughts or suggestions regarding other new research.
--image by Dscot018 at en.wikibooks