Kill the teaching evaluation

(c) Copyright 2014 AAMFT California Division
I’m doing a great job, apparently.
Kudos to the online magazine Slate for saying out loud what university faculty have known for years: Teaching evaluations are unfair to women (and likely to minorities as well), and reflect a view of students as “consumers” that isn’t good for learning. They might actually undermine the learning process.

It’s time to put an end to them.

It has become quite clear that these evaluations don’t actually measure what they claim to measure. Indeed, students use teaching evaluations to voice pleasure or displeasure about a wide variety of class components, from its content (over which faculty, especially at the adjunct level, may have little control) to its room location to its scheduling, that have little to do with the actual teaching. Teaching evaluations are particularly vulnerable to affiliation effects — a student who likes their professor will evaluate their teaching well regardless of whether the student actually learned anything.

Now, you could rightly argue that likability and teaching performance aren’t fully independent. Students often like their professors precisely because they see them as good teachers. But universities typically make no effort to parse out the difference. So, those professors who students like more wind up more likely to be promoted and given tenure.

A couple of years ago, in an excellent debate on the topic at the New York Times, former Duke University professor Stuart Rojstaczer put the problem succinctly:

Students often conflate good instruction with pleasant ambience and low expectations. As a result they also reward instructors who grade easily, require little work, are glib and chatty, wear nice clothes, and are physically attractive. […] Plus, students will penalize demanding professors or professors who have given them a bad grade, regardless of the quality of instruction that a professor provides. In the end, deans and tenure committees are using bad data to evaluate professor performance, while professors feel pressure to grade easier and reduce workloads to receive higher evaluations.

The misrepresentation inherent in teaching evaluation data is of particular concern in the psychotherapy professions. Faculty members here are expected to be especially personable given our chosen career path (suggesting potentially even stronger affiliation effects) and also to challenge students on their biases and behaviors that can negatively impact therapy (leading students to punish faculty who actually do their jobs). All of this statistical noise makes it impossible to sort out from a teaching evaluation what is arguably the most important thing about teaching: Whether students are actually learning.

At universities where I have taught, we have routinely seen otherwise-excellent faculty members get lower evaluations in our diversity-related coursework. It isn’t because they don’t teach the material well. It’s because they challenge students to understand the impacts of parts of their identity that are socially privileged and parts that are socially oppressed, which can be emotionally quite demanding. Students who truly engage with the process come out better therapists, but may associate their struggles in the course with the faculty member and punish them as a result. I have no doubt that the class could be taught as a simple lecture format, teaching students textbook knowledge about cultures around the world. Students would evaluate the professor in such a class more highly, since the class would be less demanding. And the students would enter the professional world as less culturally-aware, less effective therapists.

Universities of course should be gathering data from students about their perceptions of all parts of the program. Student feedback can and should be routinely used to address the content of their program, and yes, those mundane things like course scheduling that often sneak into teaching evaluations precisely because many universities don’t offer students any other mechanism for systematically providing that kind of feedback. But a university that is truly aware of the limits its data gathering should not use student evaluations of teaching in evaluating faculty members for retention and promotion. It’s hard to argue that teaching evaluations should be given much weight anywhere else, for that matter.

Leadership in evaluating faculty more objectively is coming from online universities. Because online activity is carefully tracked, online universities can use metrics like how quickly faculty respond to student inquiries, how much time they spend logged into their classes, and how students perform on tests to determine with far less bias which teachers are truly engaged with their work in a manner that shows benefits to students. Processes at online universities also can offer fairer comparisons between one faculty member and the next, as students are far more likely to be randomly assigned to different faculty rather than choosing faculty based on class timing (or, as sometimes happens, the sound of the faculty member’s name). While this data too is imperfect — professors arguably can game some of those metrics through immediate-but-terse responses to student questions — it certainly is less subject to bias than the teaching evaluations used at most brick-and-mortar universities. Indeed, it is a clever study from an online program that shows us just how severely female faculty are punished by students in a more traditional “teaching evaluation.”

Kill the teaching evaluation. It does more harm than good.