The Problem with Computer Grammar Checkers [Updated]

When I moved this blog over to WordPress, I noticed that under the Users > Personal Settings page there is an option to turn on a computer proofreader. The program is from Automattic (the same people that make WordPress) and it’s called After the Deadline. While an automatic proofreader isn’t anything spectacular in itself, the grammar and style mistakes that this proofreader can supposedly prevent you from making are eye-popping:

bias language, cliches, complex phrases, diacritical marks, double negatives, hidden verbs, jargon, passive voice, phrases to avoid, and redundant phrases.

It’s an impressive looking list, but anyone with even mediocre writing skills and experience with computer proofreaders is likely to be wary. How often has Microsoft Word mistakenly underlined some of your text? How many times has your smartphone autocorrected you into incomprehension?

The thing is, when presented with such a list, even a confident writer couldn’t be blamed for being curious. Are you unwittingly making grammar mistakes in your carefully crafted prose? Have you been straying outside the accepted limits of complex and redundant phrases? Are there verbs hiding in your text? And holy shit, what the hell are diacritical marks?

Let’s put those ridiculous questions aside for a moment. Many people have pointed out what’s wrong with automatic spelling and grammar checkers. What I want to do here is show you why there are problems with these programs by using some highly regarded prose.

Let’s fire up the incinerator.

"To the Lighthouse" by Virginia Woolf*

At the first green line, After the Deadline suggests, “Did you mean… ‘its fine tomorrow?’” Things are not off to a good start. The three other green lines warn me (or Ms. Woolf) about the Dreaded Passive Voice™. The blue line suggests that “Complex Expression” be changed to “plans.”But perhaps the worst suggestion is given by clicking on the red line – “Did you mean… ‘sense,’ ‘cents,’ ‘scents?’” Moving on…

"Sense and Sensibility" by Jane Austen

The blue line is another “Complex Expression,” which After the Deadline suggests be changed to “way.” That’s not so bad. The green line, however, is (according to the proofreader) an example of a “Hidden Verb.” What’s a hidden verb, you ask? As the After the Deadline explains, “A hidden verb (aka nominalization) is a verb made into a noun. They often need extra words to make sense. Strong verbs are easier to read and use less words.” But this doesn’t make any sense. Constant had not been nominalized, while had is one of the most common (and easiest to read) verbs in English. I’m told to “revise ‘had a constant’ to bring out the verb,” but I don’t know what that means. Alert readers will begin to see the problem here. So will everyone else.

"Great Expectations" by Charles Dickens

Here’s the Dreaded Passive Voice™ again. Geoffrey Pullum would have a fit with this program (comments are open, Geoff! Let us know how you really feel!). I guess the proofreader wants me to change the sentence to something like, “So I called myself Pip, and people called me Pip?” It was the best of times, it was the blurst of times

"The Jungle" by Upton Sinclair

All I really need to say here is that the second green line says “Hyphen Required” and suggests I change the phrase to “out-of-the-way.” Really? Yes, really.

To be sure, I ran some other styles of writing through After the Deadline, such as Pulitzer Prize winners, and got the same results. You’re welcome to run anything you want through there, but I got $20 bucks saying you’re going to get the same nonsense I did.

Getting back to those ridiculous questions, the answers are all irrelevant. If you have understood this article so far, you already know more about writing than After the Deadline. It will not improve your writing. It will most likely make it worse. Contrary to what is claimed on its homepage, you will not write better and you will spend more time editing.

I can’t believe anyone except the most inexperienced writers would be fooled by After the Deadline’s “corrections.” This isn’t exactly surprising when it comes to grammar checkers because they are at best useless and at worst harmful. But the way in which we rely on technology threatens to undermine our own writing. Insecure writers might be tricked into believing that After the Deadline’s suggestions are legit. And that is the real problem with these programs. Their potential to do more harm than good is a ratio approaching one since it’s almost impossible for them to do good.

Finally, I’d just like to add that when I used After the Deadline on this post, two terms were underlined in the explanation of hidden verbs:

“A hidden verb (aka nominalization) is a verb made into a noun. They often need extra words to make sense. Strong verbs are easier to read and use less words.”

The program says that nominalization isn’t a word and that I should write “fewer words” instead of “less words.” But that is a quote from the program itself! If even the makers of After the Deadline can’t (or won’t) follow their own guidelines, why should you?

And so I have decided to destroy the machine. Feeding this next piece of prose into your grammar checker is equivalent to setting its controls for the heart of the sun.

riverrun, past Eve and Adam’s, from swerve of shore to bend of bay, brings us by a commodius vicus of recirculation back to Howth Castle and Environs.
Finnegan’s Wake by James Joyce

 

[Update – Feb. 28, 2011] It’s always nice when someone with first-hand knowledge weighs in on the discussion. In this case, former After the Deadline developer Raphael Mudge was kind enough to stop by and leave his thoughts, to which I responded below.

[Update – Mar. 16, 2012] I heard from the WordPress staff about why they chose to incorporate After the Deadline into their software. Actually, I was directed to the post on the WordPress.com blog about the incorporation. I’m a bit disappointed in this, however. First, although the WordPress staff tells me that “There are many reasons to explain why we chose this service to help WordPress.com users with their writing, but you can read our announcement post for the full details,” their post is not full of “details.” Second, neither the email I got nor the WordPress blog post addresses any of the problems with automatic grammar or spell checkers. Oh well.

But most importantly, I don’t think the author of the post is serious when he says he “was blown away” by After the Deadline. Did he run his own post through there? What the hell did it look like before he did? And why didn’t he accept all of the suggestions? And judging by the comments on the post, when will a psychologist do a study with an automatic grammar checker with incorrect suggestions just to see how blindly people will obey their master?

By the way, running this update through AtD underlines “incorporate,” “was directed,” and “all of the.” Feel free to guess why if you really have nothing better to do.

 

 

 

*So much for only using said to carry dialogue, amIright, Elmore? Way to go, Virginia, you dope.

Advertisements

Author: Joe McVeigh

I'm a linguist who researches email marketing. I also teach at the University of Jyväskylä in Finland. I write about language and linguistics on my blog, ...And Read All Over, and I write about language and marketing on my other blog, Email and Linguistics.

4 thoughts on “The Problem with Computer Grammar Checkers [Updated]”

  1. Dr. McVeigh,
    Thank you for your comments on the open source After the Deadline proofreading system. Without people like you contributing to the open source community, I don’t know how a field could move forward.

    As you know, Natural Language Processing is not a trivial undertaking. Language is ambiguous and computers fare poorly in these situations. Although Doctor, maybe you’re working on something the world doesn’t know about yet. I admire the hidden inventors, like yourself, who come from nowhere.

    When I developed After the Deadline, I had to balance false positives with actually finding errors. Dr. McVeigh, I’m sure you know these systems perform best in the situations they were made for. I built After the Deadline to help bloggers, not Virginia Woolf, Charles Dickens, or Upton Sinclair.

    A better test would include correct and incorrect text. If you want to give the system a fair shake, you might try running it against a random sample of blog posts, written by writers with varying styles and competencies.

    This is not to say that you will not see false positives or missed errors. You will. Does that mean the system is worthless? Maybe, compared to what you’re working to create. But, until then, it catches some errors and saves people time. That’s not a bad thing.

    The code for After the Deadline lives at:

    http://open.afterthedeadline.com/

    I have documented my methods and tests at:

    http://aclweb.org/anthology-new/W/W10/W10-0404.pdf

    Again, I am humbled to have a linguistic mind as great as yours review my trivial attempt to contribute to the field and help others. Dr. McVeigh, thank you a thousand times for your work and contributions!

    Respectfully,

    — Raphael Mudge

    (former) After the Deadline Developer

  2. Hi Raphael,

    Thanks for stopping by and reading my post. You can call me Joe, since I’m not a real doctor (yet). That’s just a little joke I have going on here. I write a lot of satirical pieces about homeopathy, so Dr. Joe McVeigh seemed like the right way to sign them. It emphasized the notion that I have no idea what I’m talking about when it comes to medicine. I got to do something to clear that up. Unless, of course, you got it and were messing with me. In that case, I like it.

    I do know a little bit about language though. I’m working on a Master’s degree in corpus linguistics and my thesis is expansive enough that I’ll probably be able to work it into a doctorate (here’s hoping).

    I agree with you that Natural Language Processing is not a trivial undertaking. Not at all. My hat’s off to anyone working on that because I don’t have the courage.

    You rightfully point out that After the Deadline (along with other computer grammar checkers) is not and should not be aimed at professional writers. These programs are not going to take the jobs of editors at publishing companies anytime soon. So I’ll admit that setting After the Deadline against classic literature was unfair.

    I think the main point to consider is that I probably overestimate how confident writers (or bloggers) are. A confident writer doesn’t need computer grammar checkers for a variety of reasons, so it’s the uncertain writers that matter. They may have perfect grammar, but be lead astray by a computer grammar checker. That was the point I was trying to make. The effect that computer grammar checkers could have on uncertain writers may be worth more than the time that they save other writers, especially if too many false positives are presented and accepted by the unsure bloggers.

    This is even more important when we think of running After the Deadline against a random sample of blog posts, as you suggest. While that would be more fair than what I did, it wouldn’t necessarily tell us anything. What’s needed is a second step of deciding which editing suggestions will be accepted. If we accept only the correct suggestions, we assume an extremely capable author who is therefore not in need of the program. As the threshold for our accepted suggestions lowers, however, we will begin to see a muddying of the waters – the more poorly written posts will be made better, but the more well written posts will be made worse. The question the becomes where do we draw the line on acceptions to ensure that the program is not doing more harm than good? That will decide the program’s worth, in my opinion.

    Nevertheless, programs like After the Deadline are extremely interesting and I commend you and your fellow programmers. I just don’t think these programs are ready for the public yet, hence my review. Since After the Deadline is optional and since WordPress has over 10 million writers, I wonder if the best thing to do would be to somehow gather information on the program’s usage by bloggers over time. That way you could answer some empirical linguistic questions about its viability.

    Thanks again, Raapael, for your comment and for the PDF on your methods and tests. I read through it quickly but will devote some more time to it soon. I recommend my other readers do the same and feel free to offer your feedback in the comments below.

    Regards,

    Joe McVeigh

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s