Google – …And Read All Over

Google’s Bad Voice

January 16, 2026January 16, 2026 Joe McVeigh2 Comments

Google has a style guide for technical writers and from what I’ve read, it’s pretty good. It’s short and sweet on the details, but that’s fine. The guide is direct and it’s not meant as a general writer’s guide.

But it makes comments on the passive voice, so you can probably guess what’s coming.

First, the good stuff. The guide correctly identifies the passive voice. Progress! The guide doesn’t identify all the ways that the passive voice can appear, just the BE + past participle way, but it gets that way right. Good job, Google.

Then the guide gives advice that writers should use the active voice instead of the passive voice. This is where Google’s guide makes some questionable claims. It says:

I have never heard about that first bullet point. How do they know that people convert sentences in their head? How could they even know? If Google has that kind of technology, they need to give it to linguists. It would answer a lot of questions in our field.

I guess we could say that active voice is the default or canonical way of forming a sentence in English, but this is a categorical decision made to aid grammatical analysis. We don’t know whether people mentally convert passive voice to active voice. Do they do that with other types of clauses? Do they convert questions or imperatives? What about the middle voice – do they convert those clauses too? Probably not.

Passive voice does not necessarily obfuscate ideas, nor does it turn sentences on their head. You can obfuscate sentences with the passive voice, but you can also do that with the active voice. Lazy writers like to scapegoat the passive voice for obfuscation, but smarter people know better. Check it: which one of these sentences would you say is the most obfuscatory?

Maggie Simpson shot Mr. Burns.

Someone shot Mr. Burns.

Mr. Burns was shot.

I would say the middle one is the most unclear, but it’s in the active voice. The third sentence, which is the passive one, doesn’t tell us who shot Mr. Burns, but there’s a reason that people write sentences like that. Because sometimes one element is more important than the other. We’ll get to this more in a bit, but for now, consider:

Lee Harvey Oswald shot President John F. Kennedy.

President John F. Kennedy was shot by Lee Harvey Oswald.

President John F. Kennedy was shot.

Not for nothing, the passive voice sentence is the shortest one here. But more importantly, President Kenndy is more important than the person who shot him!

Let’s keep the third bullet point in mind while we look at the next piece of advice.

The Google Technical Writing Manual says

The Google Technical Writing Manual then takes some digs at academic writing (in a section marked “optional”):

So they start off with a swipe at “certain scientific research reports”. But which ones? I thought we were supposed to be joining “the quest for clarity” smh. Let’s think about this for a minute though. If the passive voice is used more often in scientific publishing, could there be a reason for that? Look again at the example sentences the Google manual gives:

It has been suggested that…
Data was taken…
Statistics were calculated…
Results were evaluated…

They claim that we don’t know who is doing what to whom, but with the exception of the first example, this is clearly not true. When a research report says “Data was taken…,” we know who took the data. It was the researchers! The authors of the research report, they took the data! Why on earth would it be anyone else? And if it was, the authors would say that. “Statistics were calculated…,” “Results were evaluated…” The authors are calculating the statistics and evaluating the results. That’s how research reports work. And the statistics and results are more important than the authors. That’s the objectivity in scientific research that the Google manual is clamoring for. Neither the active voice nor the passive voice versions of these examples is more or less objective:

Active: We took the data… vs. Passive: Data was taken…
Active: We calculated the statistics… vs. Passive: Statistics were calculated…
Active: We evaluated the results… vs. Passive: Results were evaluated…

If the author(s) of this Google manual weren’t so hung up on hating the passive, they would notice that three of their four examples sentence disprove their point. Instead they just look silly.

So let’s edit the advice from the Google manual for clarity and truth:

Do we know who is doing what to whom? No Yes. Does the passive voice somehow make the information more objective? No, but neither does the active voice.

Read a book, Google

Look, here’s what’s really going on. It’s not about being bold, or who is doing what to whom, or any of that. It’s about the way English works. Huddleston and Pullum explain:

In English there is a broad preference for packaging information so that SUBJECTS REPRESENT OLD INFORMATION. […] while [active and passive clauses] normally have the same core meaning, they are NOT FREELY INTERCHANGEABLE. They differ in how the information is presented, and one important factor in the choice between them concerns the status of the two major NPs as representing old or new information. (2005: 242-243)

You can’t just switch every passive clause into an active one. You will sound strange. Because you will be disobeying the rules of English. The quote above is from a book called A Student’s Introduction to English Grammar. This is basic stuff. But it does require that the writers of the Google Technical Writing Guide read a book about grammar before making proclamations about it. And that’s asking too much, I guess.

Google doesn’t know what a subject is

March 11, 2022December 31, 2025 Joe McVeigh3 Comments

Ok, the title of this post is a bit misleading. Google doesn’t “know” anything. It just grabs some text from a website and puts it up top to give people an answer to their question. The problem here is that the answer they give you is wrong. Because the website that Google uses is wrong. But there’s more than that. The answer that Google gives has been called a “massive overgeneralization” by Huddleston and Pullum. And if that’s not bad enough, all of the results in the Google search give you the exact same incorrect answer. What the what?

Continue reading →

Don’t Go Down the Google Books Garden Path

February 9, 2014 Joe McVeigh3 Comments

When Google’s Ngram Viewer was the topic of a post on Science-Based Medice, I knew it was becoming mainstream. No longer happy to only be toyed with by linguists killing time, the Ngram Viewer had entranced people from other walks of life. And I can understand why. Google’s Ngram Viewer is an impressive service that allows you to quickly and easily search for the frequency of words and phrases in millions of books. But I want to warn you about Google’s Ngram Viewer. As a corpus linguist, I think it’s important to explain just what Ngram Viewer is, what it can be used to do, how I feel about it, and the praise it has been receiving since its inception. I’ll start out simple: despite all its power and what it seems to be capable of, looks can be deceiving.

Have we learned nothing?

Jann Bellamy wrote a post at Science-Based Medicine about using Google’s Ngram Viewer (GNV) to research some terms used to describe the very unscientific practice of Complementary and Alternative Medicine (CAM). Although an article of this type is unusual for the SBM site, it does show how intriguing GNV can be. And Ms. Bellamy does a good job by explaining a few of the caveats of GVN:

The database only goes through 2008, so searches have to end there. Also, the searches have to assume that the word or phrase has only one definition, or perhaps one definition that dominates all others. We also have to remember that only books were scanned, not, for example, academic journals or popular magazines. Or blog posts, for that matter.

Ms. Bellamy then goes on the search for some CAM terms. After noting the which terms are more common and when they started to rise in usage, she does a very good job at explaining the reasons that certain terms have a higher frequency than others. At the end, however, she is left with more questions than answers. Although she discovered that alternative medicine appears more frequently than complementary medicine in the Google Books database, and although she did further research (outside of Google Books) to explain why, she is still left right where she started. Just looking at the numbers from GNV, she can’t say what kind of impact CAM has had on our (English-speaking) world or culture. So what was the point of looking as GNV at all (besides the pretty colors)?

In her post, Ms. Bellamy links to an article in the New York Times by Natasha Singer. In what is essentially a exposition of GNV, with quotes from two of its founders, Ms. Singer places a lot more stock in the value and capability of the program. But from a corpus linguist’s perspective, she leaps a bit too far to her conclusions.

Ms. Singer’s article begins with the phrase “Data is the new oil” and then goes on to explain the comparison between these two words offered by GNV. She writes:

I started my data-versus-oil quest with casual one-gram queries about the two words. The tool produced a chart showing that the word “data” appeared more often than “oil” in English-language texts as far back as 1953, and that its frequency followed a steep upward trajectory into the late 1980s. Of course, in the world of actual commerce, oil may have greater value than raw data. But in terms of book mentions, at least, the word-use graph suggests that data isn’t simply the new oil. It’s more like a decades-old front-runner.

But with the Google Books corpus (the set of texts that GNV analyzes), we need to remember what the corpus contains, i.e. what “book mentions” means. This lets us know how representative both the corpus and our analysis is. The Google Books corpus does not contain speech, newspapers, tweets, magazine articles, business letters, or financial reports. Sure, oil is important to our culture, and certainly to global and political history, but do people write books about it? We can not directly extrapolate the findings from Google Books to Culture any more than we can tell people about the world of 16th Century England by studying the plays of Shakespeare. With GNV we can merely study the culture of books (or the culture of publishing). And there are many ways that GNV can mislead you. For example, are the hits in Ms. Singer’s search talking about crude oil, olive oil, or oil paintings? Google Ngrams will not tell you. Just for fun, here’s Ms. Singer’s search redone with some other terms. Feel free to draw your own conclusions.

Click to embiggen — Search for “data, oil, chocolate, love” on GNV. (Just to be clear, searching for oil_NOUN doesn’t change things much; oil as a verb is almost non-existent in the corpus. Take that as you will)

Research casual

The second article I want to talk about comes from Ben Zimmer. While I don’t think Mr. Zimmer needs to be told anything that’s in this post, his article in The Atlantic gets to the heart of my frustration with GNV. It features a more complex search on GNV to find out which nouns modify the word mogul and how they have changed over the last 100 years. In the following passage, he alludes to the reality of GNV without coming right out and saying it.

It’s possible to answer these questions using the publicly available corpora compiled by Mark Davies at Brigham Young University, but the peculiar interface can be off-putting to casual users. With the Ngram Viewer, you just need to enter a search like “*_NOUN mogul” or “ragtag *_NOUN” and select a year range. It turns out that in 20th-century sources, media moguls are joined by movie moguls, real estate moguls, and Hollywood moguls, while the most likely things to be ragtag are armies, groups, and bands.

There are a few points to make about this. First, the interface of the publicly available corpora compiled by Mark Davies could be described as “peculiar”, but that’s only because it’s not the lowest common denominator. And there’s the rub because researchers are capable of so much more using Mark Davies’ corpora. While the interface isn’t immediately intuitive, it certainly isn’t hard to learn. As a bad comparison, think about the differences between Windows, OSX, and a Linux OS. Windows is the lowest common denominator – easiest to use and most intuitive. OSX and Linux, on the other hand, take a bit of getting used to. But how many of us have learned OSX or Linux and willingly gone back to Windows?

The second point is not so much about casual users as it is about casual searches. I think Mr. Zimmer is right to talk about casual users since it’s probable that most of the people who use GNV will be looking for a quick and easy stroll down the cultural garden path. But more to the point, I think he’s right to offer different types of moguls as a search example because that’s about as far as GNV will take you. Can you see which types of moguls people are talking about? No. How about which types of moguls are being used in magazines? Nope. Newspapers? Nuh-uh. You have to turn to one of Mark Davies’ corpora for that. In fact, less casual users are even able to access Google Books (and other corpora) via Mark Davies’ site, and this allows them to conduct more complex searches (For a much more detailed comparison of GNV and some of the corpora offered on Mark Davies’ site, see here). So again the question is what’s the point of looking at GNV at all?

Final thoughts – Almost right but not quite

All this picking on GNV is not without reason. Even though what the people at Google have done is truly impressive, we have seen that the practical use GNV is limited. As the saying in corpus linguistics goes “Compiling is only half the battle”. GNV does not offer users a way to really measure what they are (usually) looking for. As an example, a quote from Ms. Singer’s article will suffice:

The system can also conduct quantitative checks on popular perceptions. Consider our current notion that we live in a time when technology is evolving faster than ever. Mr. Aiden and Mr. Michel [two of GNV’s creators] tested this belief by comparing the dates of invention of 147 technologies with the rates at which those innovations spread through English texts. They found that early 19th-century inventions, for instance, took 65 years to begin making a cultural impact, while turn-of-the-20th-century innovations took only 26 years. Their conclusion: the time it takes for society to learn about an invention has been shrinking by about 2.5 years every decade.

While this may be true, it’s not proven by looking at Google Books. For example, ask yourself these questions: what was the rate of literacy in the early 19th-century? How many books did people read (or have read to them) in the early 19th-century compared to the turn of the 20th century? What was the difference between the rate of dissemination of information in the two time periods? How about the rate of publishing? And what exactly qualifies as technology – farm equipment or fMRI machines? Or does it have to be more closely related to culture and Culturomics – like Facebook?

And most importantly, are books the best way to measure the cultural impact of an idea or technology? The fact is that the system can not really conduct quantitative checks on popular perceptions. But it can make you think it can.

So GNV has a long way to go. I hesitate to say that they will get there because Google does not really have an interest in offering this kind of service to the public (I didn’t see any ads on the GNV page, did you?). While it may be fun to play around with GNV, I would advise against drawing any (serious) conclusions from what it spits out. Below are some other searches I ran. Again, feel free to draw your own conclusions about how these terms and the things they describe relate to human culture.

Notice how the middle initial of some presidents complicates things in the above search. It would be nice to be able to combine the frequencies for “John Fitzgerald Kennedy”, “John F Kennedy”, “John Kennedy”, and “JFK” into one line, and exclude hits like “John S Kennedy” from the results completely, but that’s not possible. You could, however, search GNV for the different ways to refer to President Kennedy and see the differences, for whatever that will tell you.

Noam Chomsky has had a bigger effect on our culture than Audrey Hepburn, Bob Marley, and Brad Pitt? You be the judge!

Smithers

January 19, 2012 Joe McVeighLeave a comment

Last week, Google unveiled Search plus Your World, their latest attempt to make the Internet shittier. Search plus Your World “helps you find personal results that are relevant to you.” What this means is that (when you are signed into your Google account), Google sorts their search results to more accurately reflect what they think you want to know. Which means is that Google is trying to tell you what you want to hear. Cute, but not helping.

Eli Pariser coined the term “Filter Bubble” to describe the ways that some online designs can overreach and become detrimental to user experience. This is most evident in the ways that Google orders its search results and how Facebook decides which of your friends are important to you. I highly recommend viewing Pariser’s TED talk on Filter Bubbles.

Based on your physical location, Internet history, and now your Google+ friends, your search results in Google will be different from anyone else’s, even if they are sitting right next to you. Facebook, for its part, will do things like remove friends from your news feed if you don’t click on the articles and pictures they post often enough.

Both of these things suck. Here’s why: Think about when you ask someone a question. Would you rather they gave you an honest answer, or would you rather they told you what they thought you wanted to hear? It’s the Smithers Predicament. Google could give equal value to their results, but they choose not to. When you’re shopping for shoes, it’s harmless. When you want information about the world, it’s ridiculous. Pariser, in his TED talk, shows screen caps from two friends’ search results on Egypt. The more left-leaning friend got results about the Arab Spring. The more right-leaning friend got results about travel and accommodations. Why? Because Google would rather be Waylon Smithers than Professor Frink.

The Federal Trade Commission has added Google to an antitrust investigation to see whether Google is unfairly promoting its services since results now feature Google+ hits more prominently. I’m not sufficiently versed in antitrust law to speak to that, but I can say that Google’s actions have made me seek out new ways to find information on the Internet. I used to think of Google as the index to an encyclopedia and I’m pretty sure most people feel that way. Now I realize it’s just a Yes Man.

There are ways to get out of the Filter Bubble, but it’s not easy. Pariser offers some tips, while Duck Duck Go is a whole search engine dedicated to not tracking or bubbling you. Their take on the Filter Bubble and its problems is much cooler than this article (check it).

The Internet is one of the single greatest human accomplishments and Google helps millions of people every day. But a bit of skepticism still goes a long way.

[Update – Jan. 26, 2011] Google sees your concern and raises you a No Opt Out. The “Don’t be evil” company is placing even more of its services under a privacy policy that allows them to share information about you. Why would they want to do this? Google’s reason: so they can offer you a better user experience, dear. The real reason: so they can make more money, dummy.

So… don’t be evil, just greedy?

There are a few things to do, besides what I mentioned earlier in this post. You could close your Google account. Also, if you rad ass blog is on Blogger, like mine is, you could move it to another host, like WordPress or DreamHost or any other one that isn’t out there sucking at not being evil.

[Update #2 – Jan. 26, 2011] Just to be sure, I was wrong when I said Google would rather be like Smithers. It’s pretty clear they are aspiring to be Monty Burns. Also, that’s it for Simpsons references. Promise.

Tag: Google

Google’s Bad Voice

Read a book, Google

Like this:

Google doesn’t know what a subject is

Like this:

Don’t Go Down the Google Books Garden Path

Have we learned nothing?

Research casual

Final thoughts – Almost right but not quite

Like this:

Smithers

Like this:

Read a book, Google

Share this

Like this:

Share this

Like this:

Have we learned nothing?

Research casual

Final thoughts – Almost right but not quite

Share this

Like this:

Share this

Like this: