Can A Computer Grade Essays As Well As A Human? Maybe Even Better, Study Says

By Steve Mullis

Published April 24, 2012 at 3:30 PM MDT

Computers have been grading multiple-choice tests in schools for years. To the relief of English teachers everywhere, essays have been tougher to gauge. But look out, teachers: A new studyfinds that software designed to automatically read and grade essays can do as good a job as humans — maybe even better.

The study, conducted at the University of Akron, ran more than 16,000 essays from both middle school and high school tests through automated systems developed by nine companies. The essays, from six different states, had originally been graded by humans.

In a piece in The New York Times, education columnist Michael Winerip described the outcome:

Computer scoring produced "virtually identical levels of accuracy, with the software in some cases proving to be more reliable," according to a University of Akron news release.

"In terms of consistency, the automated readers might have done a little better even," Winerip tells All Things Considered host Melissa Block.

The automated systems look for a number of things in order to grade, or rate, an essay, Winerip says. Among them are sentence structure, syntax, word usage and subject-verb agreements.

"[It's] a lot of the same things a human editor or reader would look for," he says.

What the automated readers aren't good at, he says, is comprehension and whether a sentence is factually true or not. They also have a hard time with other forms of writing, like poetry. One example is the software e-rater, by Educational Testing Service.

Les Perelman, a director of writing at the Massachusetts Institute of Technology, was allowed to test e-rater. He told Winerip that the system has biases that can be easily gamed.

E-Rater prefers long essays. A 716-word essay [Perelman] wrote that was padded with more than a dozen nonsensical sentences received a top score of 6; a well-argued, well-written essay of 567 words was scored a 5.

"You could say the War of 1812 started in 1925," Winerip says. "There are all kinds of things you could say that have little or nothing to do in reality that could receive a high score."

Efficiency is where the automated readers excel, Winerip says. The e-rater engine can grade 16,000 essays in about 20 seconds, according to ETS. An average teacher might spend an entire weekend grading 150 essays, he says, and that efficiency is what drives more education companies to create automated systems.

"Virtually every education company has a model, and there's lots of money to be made on this stuff," he says.

A greater focus on standardized testing and homogenized education only serves to increase the development of automated readers to keep up with demand, Winerip says.

Winerip says that what worries him is that if automated readers become the standard way of grading essays, then teachers will begin teaching to them, removing a lot of the "juice" of the English language.

"If you're not allowed to use a sentence fragment ... [or] a short paragraph ... then you're going to get a very homogenized form of writing," he says. "The joy of writing is surprise."