The language of the environment: climate change vs global warming

Peter Friederici, in a recent article in the Bulletin of the Atomic Scientists, reminds us that “the language used to characterize the climate problem is far more important than is generally recognized”. Mr. Friederici’s article links to a CBS piece which states things more bluntly:

If you’re trying to get someone to care about the way the environment is changing, you might want to refer to it as “global warming,” rather than “climate change,” according to a new study

The idea is that global warming sounds more dire than climate change. Global warming is more likely to inspire people to do something drastic or force their government to take major steps, but climate change requires only minor steps to solve. So tree-hugging liberals will want to use global warming to fire up their base, while the term climate change is more amenable to the conservative approach of letting the free market sort things out. This idea has been floating around for just over ten years. It was inspired by the American political pollster Frank Luntz. While consulting the Republican Party in 2002, Luntz wrote a memo to President George W. Bush’s staff which read in part:

It’s time for us to start talking about “climate change” instead of global warming […] “Climate change” is less frightening than “global warming.” […] While global warming has catastrophic connotations attached to it, climate change suggests a more controllable and less emotional challenge.

Similar ideas about the differences between these seemingly synonymous terms have been raised in other news outlets. The two articles above also report the results of the Yale Project on Climate Change Communication, which found that:

the term “global warming” is associated with greater public understanding, emotional engagement, and support for personal and national action than the term “climate change.” […] Our findings strongly suggest that the terms global warming and climate change are used differently and mean different things in the minds of many Americans.

The report also says that:

Americans are four times more likely to say they hear the term global warming in public discourse than climate change.

The crucial element missing from all of these news articles and reports is any actual data about how often these terms are used. So let’s see if we can find that out.

Easier said than done

There are a few things to think about before we get started with the data. First, although Luntz’s recommendations were informed by his discussions with voters, we don’t know if President Bush or the Republican party actually listened to him. Reporting that Republicans were advised to use climate change instead of global warming doesn’t mean that they actually did so. Perhaps the reason for this is that it seems Bush didn’t use either term. He didn’t use them in his debates with Democratic presidential candidate John Kerry and he only used the term global climate change once in both his 2007 and 2008 State of the Union addresses:

And these technologies will help us be better stewards of the environment, and they will help us to confront the serious challenge of global climate change. – George W. Bush, State of the Union 2007

The United States is committed to strengthening our energy security and confronting global climate change. – George W. Bush, State of the Union 2008

So it’s hard to report on something happening when it didn’t happen. Ironically, Kerry used global warming once in his debate in St. Louis and twice in Coral Gables, so maybe he also got Luntz’s memo?

The second thing to think about is that reporting that Americans claim they hear global warming more often that climate change doesn’t mean that they actually do. People are really bad at accurately reporting things like this. For example, before I present the data to you, I want you to ask yourself which term you think is more common on various American news outlets. Based on the information above, do you think Fox News uses global warming more often or climate change? How about NPR and MSNBC? We’ll see whether the numbers back you up in a bit.

Finally, I’m going to take my data from the Corpus of Contemporary American English (COCA), which is a 450 million word database of speech and writing that is “suitable for looking at current, ongoing changes in the language”. I wrote about why it is better to use corpora like COCA instead of the Google N-gram viewer here.

Crunching the numbers

Let’s first see how common each of these terms are. COCA allows us to split up our data into different genres depending on where the texts come from – Spoken, Fiction, Magazine, Academic, and Newspaper – so we can look at only the genres we are interested in. For the purposes of this blog post, I’m going to look at news texts, magazine texts and spoken language data. We could also look at academic genres, but that might be problematic since according to the CBS article “Scientists have largely started using the term climate change because it more accurately describes the myriad changes to the climate […] while global warming refers to a single phenomenon.” So academics are very particular in the terms they use (seriously, we write whole sections of our theses just to define our terms and we love doing it).

Climate change
SECTION ALL SPOKEN MAGAZINE NEWSPAPER
FREQ 3136 806 1510 820
PER MIL 6.77 8.43 15.8 8.94

 

Climate change
SECTION 1990-1994 1995-1999 2000-2004 2005-2009 2010-2012
FREQ 156 174 390 1541 883
PER MIL 1.5 1.68 3.79 15.1 17.01

Here we can see the raw count (FREQ) for climate change in the Spoken, Magazine, and Newspaper sections of COCA, as well as for the term in different time periods. This is basically the number of times that the term appears in each section. We also have the frequency per million words (PER MIL), which is a way of normalizing the various sections because they each have a different amount of total words. Looking at this more accurate stat, we can see that climate change is most common in the Magazine genre and that its usage (in all genres taken together) increases over time.

Global warming
SECTION ALL SPOKEN MAGAZINE NEWSPAPER
FREQ 4031 1063 1801 1147
PER MIL 8.68 11.12 18.85 12.51

 

Global Warming
SECTION 1990-1994 1995-1999 2000-2004 2005-2009 2010-2012
FREQ 519 375 763 1854 520
PER MIL 4.99 3.63 7.41 18.17 10.02

Here we have the same stats for global warming. They show that the term is more common in all of the genres and time periods, except for 2010–2012, when the normalized frequency drops down to 10.02. In the same time period, the frequency for climate change is 17.02. Conservatives are winning!

Not so fast, tiger. We still don’t know who is using these words. Remember that global warming only refers to one of the many changes happening to our planet. Maybe those in the media picked up on this and started using climate change where it was more appropriate. So let’s cut up the genres.

Didn’t you get the memo?

So President Bush didn’t use climate change or global warming. But perhaps this idea that the opposing sides of the debate should use different terms has filtered down to the talking heads on TV. If we remember the idea that people believe they hear global warming more often than climate change in public discourse, we can look at the Spoken section of the corpus to check this claim. Here is where you can check your guesses about which term is more common on various news outlets. Below are the frequencies for climate change in the different sections of the Spoken corpus.

Climate change
Spoken # PER MILLION # TOKENS # WORDS
FOX 19.51 123 6,302,918
NPR 18.45 321 17,399,724
PBS 12.1 80 6,612,202
CNN 5.37 111 20,656,861
NBC 4.41 28 6,348,632
MSNBC 3.68 3 814,156
CBS 3.41 44 12,887,290
ABC 3.29 51 15,514,463
Indep 0.23 1 4,343,343

So climate change occurs about 19 times per million words on Fox News and about 3 times per million words on MSNBC. #TOKENS refers to the actual number of times the term appears in each subsection, while # WORDS refers to how many words make up each subsection.

Here are the same stats for global warming:

Global warming
Spoken # PER MILLION # TOKENS # WORDS
FOX 36.33 229 6,302,918
MSNBC 31.93 26 814,156
NPR 17.82 310 17,399,724
PBS 13.16 87 6,612,202
CNN 8.37 173 20,656,861
ABC 6.96 108 15,514,463
Indep 6.22 27 4,343,343
CBS 4.03 52 12,887,290
NBC 3.15 20 6,348,632

Interestingly enough, Fox news tops both lists. What’s strange, though, is that we should have expected a conservative/Republican news site like Fox to use the climate change much more than global warming, but that is not the case (they really are fair and balanced!). NPR and PBS use the terms with almost equal frequency, while the commie pinkos over at MSNBC use global warming at a much higher rate than climate change (they’re coming for your guns too!).

Everybody chill

But hold on a second. What do these numbers really tell us? First, in terms of the spoken data in COCA, global warming really is more frequent. That doesn’t account for all of the language people hear every day, but it is representative of the public discourse they are likely to hear. Only NBC used climate change more often, and even then only barely.

While we can say that the issue of climate change or global warming seems to feature more prominently on Fox News compared to CBS or ABC, we don’t really have a way of saying how these terms are used on any channel.

For that we have to look at the concordances (the passages from the texts where our search terms appear). There we can see things like Fox News’s Sean Hannity saying:

Al Gore has a financial stake in spreading global warming hysteria…
 
Al Gore’s friends in the liberal media jumped on the global warming bandwagon…
 
And finally tonight, Al Gore’ s global warming manipulation isn’t just affecting food prices…

Could it be possible that Fox News uses global warming in its scare tactics and/or liberal bashing?

We can compare this with Hannity’s use of climate change:

the University of Alaska at Fairbanks used 50,000 stimulus dollars to send 11 students to Copenhagen for the failed climate change conference…
 
Jones findings have been used for years to bolster the U.N.’s findings on climate change….

But this is probably nitpicking and it misses the larger point. The words around global warming and climate change say more about their meaning than anything else. We know how Sean Hannity feels about climate change. He says so right here:

HANNITY: Carol, I love you. You’re a great liberal. You defend your side well. If it is hot, it is global warming. If it is cold, it is global warming. If it rains, it’s global warming. If it hails, it is global warming.
 
CAROLINE HELDMAN: Gingrich and Romney are both saying that climate change is happening, are you behind them on this one?
 
HANNITY: I disagree. I don’t think the science is conclusive. Now, I do believe man has an impact on the environment. I want clean air. I want clean water. I want to leave a good planet for our kids and grandkids. But I’m not going to buy lies that are perpetrated by people […] with a political agenda.

I can’t tell if that last line was tongue in cheek, but Hannity seems to opt for another message that was in Luntz’s memo and stress that the scientific jury is still out on global warming. This has also become a conservative talking point. Obviously, the science is firmly in favor of man-made climate change, but even if we replace climate change with global warming in any of the quotes from Sean Hannity, the meaning will not change. The same goes for any of the news outlets above because the difference between these two terms is not that vast. We can all think of two terms which roughly mean the same thing, but are not interchangable in the same way that climate change and global warming are also not. (To his credit, Frank Luntz realizes the complex nature of language and his advice to President Bush on how to talk about environmental issues was nuanced and erudite.)

The idea here is to make sure not to put the cart in front of the horse. Frank Luntz advised President Bush to start using climate change instead of global warming as one way to swing the environmental issue into the Republicans’ favor. This idea would presumably trickle down to other Republicans in the government and to members of the media sympathetic to Republican views. So the first step would be to look at whether the frequency of global warming rose above that of climate change or not. Judging from the data in COCA, I would say this is not what happened. Global warming was already more common than climate change before Luntz issued his memo to President Bush, and both terms were on the rise. Luntz’s advice could certainly have been a contributing factor to climate change’s gain in usage, but it is certainly not the only one. And global warming is still more common on major American news outlets.

I don’t doubt that the terms have a difference in meaning for many people. No matter how small, there is always some semantic difference between even the closest of synonyms. These differences in meanings are based on many different factors, such as the hearer’s education, social background, nationality, familiarity with the speaker, and the context of the situation. What this boils down to is that it doesn’t matter what we call global warming. Focusing on who uses what term misses the point, even if people have more emotional reactions to one term or the other. Climate change is happening and all that matters is that we do something about it.

In the next post, I’ll do a more in depth quantitative analysis of President Bush’s use of these terms. I’ll also look at the problems with reporting Google Search statistics in research on language, which was a method employed by the Yale Project on Climate Change Communication (the same project that studied people’s feelings about the terms).

Analyzing language – You’re doing it wrong

Dan Zarrella, the “social media scientist” at HubSpot, has an infographic on his website called “How to: Get More Clicks on Twitter”. In it he analyzes 200,000 link-containing tweets to find out which ones had the highest clickthrough rates (CTRs), which is another way of saying which tweets got the most people to click on the link in the tweet. Now, you probably already know that infographics are not the best form of advice, but Mr. Zarrella did a bit of linguistic analysis and I want to point out where he went wrong so that you won’t be misled. It may sound like I’m picking on Mr. Zarrella, but I’m really not. He’s not a linguist, so any mistakes he made are simply due to the fact that he doesn’t know how to analyze language. And nor should he be expected to – he’s not a linguist.

But there’s the rub. Since analyzing the language of your tweets, your marketing, your copy, and your emails, is extremely important to know what language works better for you, it is extremely important that you do the analysis right. To use a bad analogy, I could tell you that teams wearing the color red have won six out of the last ten World Series, but that’s probably not information you want if you’re placing your bets in Vegas. You’d probably rather know who the players are, wouldn’t you?

Here’s a section of Mr. Zarrella’s infographic called “Use action words: more verbs, fewer nouns”:

Copyright Dan Zarrella
Copyright Dan Zarrella

That’s it? Just adverbs, verbs, nouns, and adjectives? That’s only four parts of speech. Your average linguistic analysis is going to be able to differentiate between at least 60 parts of speech. But there’s another reason why this analysis really tells us nothing. The word less is an adjective, adverb, noun, and preposition; run is a verb, noun, and adjective; and check, a word which Mr. Zarrella found to be correlated with higher CTRs, is a verb and a noun.

I don’t really know what to draw from his oversimplified picture. He says, “I found that tweets that contained more adverbs and verbs had higher CTRs than noun and adjective heavy tweets”. The image seems to show that tweets that “contained more adverbs” had 4% higher CTRs than noun heavy tweets and 5-6% higher CTRs than adjective heavy tweets. Tweets that “contained more verbs” seem to have slightly lower CTRs in comparison. But what does this mean? How did the tweets contain more adverbs? More adverbs than what? More than tweets which contained no adverbs? This doesn’t make any sense.

The thing is that it’s impossible to write a tweet that has more adverbs and verbs than adjectives and nouns. I mean that. Go ahead and try to write a complete sentence that has more verbs in it than nouns. You can’t do it because that’s not how language works. You just can’t have more verbs than nouns in a sentence (with the exception of some one- and two-word-phrases). In any type of writing – academic articles, fiction novels, whatever – about 37% of the words are going to be nouns (Hudson 1994). Some percentage (about 5-10%) of the words you say and write are going to be adjectives and adverbs. Think about it. If you try to remove adjectives from your language, you will sound like a Martian. You will also not be able to tell people how many more clickthroughs you’re getting from Twitter or the color of all the money you’re making.

I know it’s easy to think of Twitter as one entity, but we all know it’s not. Twitter is made up of all kinds of people, who tweet about all kinds of things. While anyone is able to follow anyone else, people of similar backgrounds and/or professions tend to group together. Take a look at the people you follow and the people who follow you. How many of them do you know on personally and how many are in a similar business as you? These people probably make up the majority of your Twitter world. So what we need to know from Mr. Zarrella is which Twitter accounts he analyzed. Who are these people? Are they on Twitter for professional or personal reasons? What were they tweeting about and where did the links in their tweets go – to news stories or to dancing cat videos? And who are their followers (the people who clicked on the links)? This is essential information to put the analysis of language in context.

Finally, What Mr. Zarrella’s analysis should be telling us is which kinds of verbs and adverbs equal higher CTRs. As I mentioned in a previous post, marketers would presumably favor some verbs over others. They want to say that their product “produces results” and not that it “produced results”. What we need is a type of analysis can tell shit (noun and verb) from Shinola (just a noun). And this is what I can do – it’s what I invented Econolinguistics for. Marketers need to be able to empirically study the language that they are using, whether it be in their blog posts, their tweets, or their copy. That’s what Econolinguistics can do. With my analysis, you can forget about meaningless phrases like “use action words”. Econolinguistics will allow you to rely on a comprehensive linguistic analysis of your copy to know what works with your audience. If this sounds interesting, get in touch and let’s do some real language analysis (joseph.mcveigh (at) gmail.com).

 

Other posts on marketing and linguistics

How Linguistics can Improve your Marketing by Joe McVeigh

Adjectives just can’t get a break by Joe McVeigh

Adjectives just can’t get a break

Everyone loves verbs, or so you would be led to believe by writing guides. Zack Rutherford, a professional freelance copywriter, posted an article on .eduGuru about how to write better marketing copy. In it he says:

Verbs work better than adjectives. A product can be quick, easy, and powerful. But it’s a bit more impressive if the product speeds through tasks, relieves stress, and produces results. Adjectives describe, while verbs do. People want a product or service that does. So make sure you provide them with one. [Emphasis his – JM]

If you’re a copy writer or marketer, chances are that you’ve heard this piece of advice. It sort of makes sense, right? Well as a linguist who studies marketing (and a former copy writer who was given this advice), I want to explain to you why it is misleading at best and flat out wrong at worst. These days it is very easy to check whether verbs actually work better than adjectives in copy. You simply take many pieces of copy (texts) and use computer programs to tag each word for the part of speech it is. Then you can see whether the better, i.e. more successful, pieces of copy use more verbs than adjectives. This type of analysis is what I’m writing my PhD on (marketers and copy writers, you should get in touch).

Don’t heed your own advice

So being the corpus linguist that I am, I decided to check whether Mr. Rutherford follows his own advice. His article has the following frequencies of usage for nouns, verbs, adjectives, and adverbs:

Nouns Verbs Adjectives Adverbs Word count
Total 275 208 135 90 1195
% of all words 23.01% 17.41% 11.30% 7.53%

Hooray! He uses more verbs than adjectives. The only thing is that those frequencies don’t tell the whole story. They would if all verbs are equal, but those of us who study language know that some verbs are more equal than others. Look at Mr. Rutherford’s advice again. He singles out the verbs speeds through, relieves, and produces as being better than the adjectives quick, easy, and powerful. Disregarding the fact that the first verb in there is a phrasal verb, what his examples have in common is that the verbs are all -s forms of lexical verbs (gives, takes, etc.) and the adjectives are all general adjectives (according to CLAWS, the part-of-speech tagger I used). This is important because a good copy writer would obviously want to say that their product produces results and not that it produced results. Or as Mr. Rutherford says “People want a product or service that does” and not presumably one that did. So what do the numbers look like if we compare his use of -s form lexical verbs to general adjectives?

-s form of lexical verbs General adjectives
Total 24 135
% of all words 2.01% 11.30%

Uh oh. Things aren’t looking so good. Those frequencies exclude all forms of the verbs BE, HAVE, and DO, as well as modals and past tense verbs. So maybe this is being a bit unfair. What would happen if we included the base forms of lexical verbs (relieve, produce), the -ing participles (relieving, producing) and verbs in the infinitive (to relieve, it will produce)? The idea is that there would be positive ways for marketers to write their copy using these forms of the verbs. Here are the frequencies:

Verbs (base, -ing part.,
Infin., and -s forms)
General adjectives
Total 127 135
% of all words 10.63% 11.30%

Again, things don’t look so good. The verbs are still less frequent than the general adjectives. So is there something to writing good copy other than just “use verbs instead of adjectives”? I thought you’d never ask.

Some good advice on copy writing

I wrote this post because the empirical research of marketing copy is exactly what I study. I call it Econolinguistics. Using this type of analysis, I have found that using more verbs or more adjectives does not relate to selling more products. Take a look at these numbers.

Copy text Performance Verbs – Adjectives
1 42.04 3.94%
2 11.82 0.63%
3 11.81 6.22%
4 10.75 -0.40%
5 2.39 3.21%
6 2.23 -0.78%
7 2.23 4.01%
8 1.88 1.14%
9 5.46%

These are the frequencies of verbs and adjectives in marketing texts ordered by how well they performed. The ninth text is the worst and the rest are ranked based on how much better they performed than this ninth text. The third column shows the difference between the verb frequency and adjective frequency for each text (verb % minus adjective %). If it looks like a mess, that’s because it is. There is not much to say about using more verbs than adjectives in your copy. You shouldn’t worry about it.

There is, however, something to say about the combination of nouns, verbs, adjectives, adverbs, prepositions, pronouns, etc., etc. in your copy. The ways that these kinds of words come together (and the frequencies at which they are used) will spell success or failure for your copy. Trust me. It’s what Econolinguistics was invented for. If you want to know more, I suggest you get in touch with me, especially if you’d like to check your copy before you send it out (email: joseph.mcveigh(at)gmail.com).

In order to really drive the point home, think about this: if you couldn’t use adjectives to describe your product, how would you tell people what color it is? Or how big it is? Or how long it lasts? You need adjectives. Don’t give up on them. They really do matter. And so do all the other words.

 

Other posts on marketing and linguistics

How Linguistics can Improve your Marketing by Joe McVeigh

Don’t Go Down the Google Books Garden Path

When Google’s Ngram Viewer was the topic of a post on Science-Based Medice, I knew it was becoming mainstream. No longer happy to only be toyed with by linguists killing time, the Ngram Viewer had entranced people from other walks of life. And I can understand why. Google’s Ngram Viewer is an impressive service that allows you to quickly and easily search for the frequency of words and phrases in millions of books. But I want to warn you about Google’s Ngram Viewer. As a corpus linguist, I think it’s important to explain just what Ngram Viewer is, what it can be used to do, how I feel about it, and the praise it has been receiving since its inception. I’ll start out simple: despite all its power and what it seems to be capable of, looks can be deceiving.

Have we learned nothing?

Jann Bellamy wrote a post at Science-Based Medicine about using Google’s Ngram Viewer (GNV) to research some terms used to describe the very unscientific practice of Complementary and Alternative Medicine (CAM). Although an article of this type is unusual for the SBM site, it does show how intriguing GNV can be. And Ms. Bellamy does a good job by explaining a few of the caveats of GVN:

The database only goes through 2008, so searches have to end there. Also, the searches have to assume that the word or phrase has only one definition, or perhaps one definition that dominates all others. We also have to remember that only books were scanned, not, for example, academic journals or popular magazines. Or blog posts, for that matter.

Ms. Bellamy then goes on the search for some CAM terms. After noting the which terms are more common and when they started to rise in usage, she does a very good job at explaining the reasons that certain terms have a higher frequency than others. At the end, however, she is left with more questions than answers. Although she discovered that alternative medicine appears more frequently than complementary medicine in the Google Books database, and although she did further research (outside of Google Books) to explain why, she is still left right where she started. Just looking at the numbers from GNV, she can’t say what kind of impact CAM has had on our (English-speaking) world or culture. So what was the point of looking as GNV at all (besides the pretty colors)?

In her post, Ms. Bellamy links to an article in the New York Times by Natasha Singer. In what is essentially a exposition of GNV, with quotes from two of its founders, Ms. Singer places a lot more stock in the value and capability of the program. But from a corpus linguist’s perspective, she leaps a bit too far to her conclusions.

Ms. Singer’s article begins with the phrase “Data is the new oil” and then goes on to explain the comparison between these two words offered by GNV. She writes:

I started my data-versus-oil quest with casual one-gram queries about the two words. The tool produced a chart showing that the word “data” appeared more often than “oil” in English-language texts as far back as 1953, and that its frequency followed a steep upward trajectory into the late 1980s. Of course, in the world of actual commerce, oil may have greater value than raw data. But in terms of book mentions, at least, the word-use graph suggests that data isn’t simply the new oil. It’s more like a decades-old front-runner.

But with the Google Books corpus (the set of texts that GNV analyzes), we need to remember what the corpus contains, i.e. what “book mentions” means. This lets us know how representative both the corpus and our analysis is. The Google Books corpus does not contain speech, newspapers, tweets, magazine articles, business letters, or financial reports. Sure, oil is important to our culture, and certainly to global and political history, but do people write books about it? We can not directly extrapolate the findings from Google Books to Culture any more than we can tell people about the world of 16th Century England by studying the plays of Shakespeare. With GNV we can merely study the culture of books (or the culture of publishing). And there are many ways that GNV can mislead you. For example, are the hits in Ms. Singer’s search talking about crude oil, olive oil, or oil paintings? Google Ngrams will not tell you. Just for fun, here’s Ms. Singer’s search redone with some other terms. Feel free to draw your own conclusions.

Click to embiggen
Search for “data, oil, chocolate, love” on GNV. (Just to be clear, searching for oil_NOUN doesn’t change things much; oil as a verb is almost non-existent in the corpus. Take that as you will)

Research casual

The second article I want to talk about comes from Ben Zimmer. While I don’t think Mr. Zimmer needs to be told anything that’s in this post, his article in The Atlantic gets to the heart of my frustration with GNV. It features a more complex search on GNV to find out which nouns modify the word mogul and how they have changed over the last 100 years. In the following passage, he alludes to the reality of GNV without coming right out and saying it.

It’s possible to answer these questions using the publicly available corpora compiled by Mark Davies at Brigham Young University, but the peculiar interface can be off-putting to casual users. With the Ngram Viewer, you just need to enter a search like “*_NOUN mogul” or “ragtag *_NOUN” and select a year range. It turns out that in 20th-century sources, media moguls are joined by movie moguls, real estate moguls, and Hollywood moguls, while the most likely things to be ragtag are armies, groups, and bands.

There are a few points to make about this. First, the interface of the publicly available corpora compiled by Mark Davies could be described as “peculiar”, but that’s only because it’s not the lowest common denominator. And there’s the rub because researchers are capable of so much more using Mark Davies’ corpora. While the interface isn’t immediately intuitive, it certainly isn’t hard to learn. As a bad comparison, think about the differences between Windows, OSX, and a Linux OS. Windows is the lowest common denominator – easiest to use and most intuitive. OSX and Linux, on the other hand, take a bit of getting used to. But how many of us have learned OSX or Linux and willingly gone back to Windows?

The second point is not so much about casual users as it is about casual searches. I think Mr. Zimmer is right to talk about casual users since it’s probable that most of the people who use GNV will be looking for a quick and easy stroll down the cultural garden path. But more to the point, I think he’s right to offer different types of moguls as a search example because that’s about as far as GNV will take you. Can you see which types of moguls people are talking about? No. How about which types of moguls are being used in magazines? Nope. Newspapers? Nuh-uh. You have to turn to one of Mark Davies’ corpora for that. In fact, less casual users are even able to access Google Books (and other corpora) via Mark Davies’ site, and this allows them to conduct more complex searches (For a much more detailed comparison of GNV and some of the corpora offered on Mark Davies’ site, see here). So again the question is what’s the point of looking at GNV at all?

Final thoughts – Almost right but not quite

All this picking on GNV is not without reason. Even though what the people at Google have done is truly impressive, we have seen that the practical use GNV is limited. As the saying in corpus linguistics goes “Compiling is only half the battle”. GNV does not offer users a way to really measure what they are (usually) looking for. As an example, a quote from Ms. Singer’s article will suffice:

The system can also conduct quantitative checks on popular perceptions. Consider our current notion that we live in a time when technology is evolving faster than ever. Mr. Aiden and Mr. Michel [two of GNV’s creators] tested this belief by comparing the dates of invention of 147 technologies with the rates at which those innovations spread through English texts. They found that early 19th-century inventions, for instance, took 65 years to begin making a cultural impact, while turn-of-the-20th-century innovations took only 26 years. Their conclusion: the time it takes for society to learn about an invention has been shrinking by about 2.5 years every decade.

While this may be true, it’s not proven by looking at Google Books. For example, ask yourself these questions: what was the rate of literacy in the early 19th-century? How many books did people read (or have read to them) in the early 19th-century compared to the turn of the 20th century? What was the difference between the rate of dissemination of information in the two time periods? How about the rate of publishing? And what exactly qualifies as technology – farm equipment or fMRI machines? Or does it have to be more closely related to culture and Culturomics – like Facebook?

And most importantly, are books the best way to measure the cultural impact of an idea or technology? The fact is that the system can not really conduct quantitative checks on popular perceptions. But it can make you think it can.

So GNV has a long way to go. I hesitate to say that they will get there because Google does not really have an interest in offering this kind of service to the public (I didn’t see any ads on the GNV page, did you?). While it may be fun to play around with GNV, I would advise against drawing any (serious) conclusions from what it spits out. Below are some other searches I ran. Again, feel free to draw your own conclusions about how these terms and the things they describe relate to human culture.

Click to embiggen
Search for “blood, sugar, sex, magik”. Click here to see the results on GNV.
Click to embiggen
Search for “Booby Fischer, Jay Z”. Click here to see the results on GNV.
Click to embiggen
Search for “Bobby Fischer, Jay Z, Eminem, Dr Dre, Run DMC, Noam Chomsky”. Click here to see the results on GNV.
Click to embiggen
Search for “Johnny Carson, Conan O’Brien, Jay Leno, David Letterman, Jimmy Kimmel, Jimmy Fallon, Big Bird, Saturday Night Live” (from the year 1950). Click here and then “Search lots of books” to see the results on GNV.
Click to embiggen
Search for “Superman, Batman, Wonder Woman, Buffy, King Arthur, Robin Hood, Hercules, Sherlock Holmes, Pele”. Click here to see the results on GNV.

Click to embiggen
Search for “Barack Obama, George Bush, Bill Clinton, Ronald Reagan, Richard Nixon, John * Kennedy, Dwight * Eisenhower, Harry * Truman, Franklin * Roosevelt, Abraham Lincoln, Beatles”. Click here to see the results on GNV.

Notice how the middle initial of some presidents complicates things in the above search. It would be nice to be able to combine the frequencies for “John Fitzgerald Kennedy”, “John F Kennedy”, “John Kennedy”, and “JFK” into one line, and exclude hits like “John S Kennedy” from the results completely, but that’s not possible. You could, however, search GNV for the different ways to refer to President Kennedy and see the differences, for whatever that will tell you.
 
 
Click to embiggen
Search for “Brad Pitt, Audrey Hepburn, Noam Chomsky, Bob Marley”. Click here to see the result on GNV.

Noam Chomsky has had a bigger effect on our culture than Audrey Hepburn, Bob Marley, and Brad Pitt? You be the judge!

Unsurprisingly, corpus linguists have already answered your question

This post is a response to a corpus search done on another blog. Over on What You’re Doing Is Rather Desperate, Neil Saunders wanted to research how adverbs are used in academic articles, specifically the sentence adverb, or as he says, adverbs which are used “with a comma to make a point at the start of a sentence”. I’m not trying to pick on Mr. Saunders (because what he did was pretty great for a non-linguist), but I think his post, and the media reports on it, makes a great excuse to write about the really, really awesome corpus linguistics resources available to the public. I’ll go through what Mr. Saunders did, and list what he could have done had he known about corpus linguistics.

Mr. Saunders wanted to know about sentence adverbs in academic texts so he wrote a script to download abstracts from PubMed Central. Right off the bat, he could have gone looking for either (1) articles on sentence adverbs or (2) already available corpora. As I pointed out in a comment on his post (which has mysteriously disappeared, probably due to the URLs I in it), there are corpora with science texts from as far back as the 1375 AD. There are also modern alternatives, such as the Corpus of Contemporary American English (COCA) and the British National Corpus (BNC), both of which (and much, much more) are available through Mark Davies’ awesome site.

I bring this up because there are several benefits of using these corpora instead of compiling your own, especially if you’re not a linguist. The first is time and space. Saunders says that his uncompressed corpus of abstracts is 47 GB (!) and that it took “overnight” (double !) for his script to comb through the abstracts. Using an online corpus drops the space required on your home machine down to 0 GB. And running searches on COCA, which contains 450 million words, takes a matter of seconds.

The second benefit is a pretty major one for linguists. After noting that his search only looks for words ending in -ly, Saunders says:

There will of course be false positives – words ending with “ly,” that are not adverbs. Some of these include: the month of July, the country of Italy, surnames such as Whitely, medical conditions such as renomegaly and typographical errors such as “Findingsinitially“. These examples are uncommon and I just ignore them where they occur.

This is a big deal. First of all, the idea of using “ly” as a way to search for adverbs is profoundly misguided. Saunders seems to realize this, since he notes that not all words that end in -ly are adverbs. But where he really goes wrong, as we’ll soon see, is in disregarding all of the adverbs that do not end in -ly. If Saunders had used a corpus that already had each word tagged for its part of speech (POS), or if he had ran a POS-tagger on his own corpus, he could have had an accurate measurement of the use of adverbs in academic articles. This is because POS-tagging allows researchers to find adverbs, adjectives, nouns, etc., as well as searching for words that end in -ly – or even just adverbs that end in -ly. And remember, it can all be done in a matter of moments (even the POS tagging). You won’t even have time to make a cup of coffee, although consumption of caffeinated beverages is highly recommended when doing linguistics (unless you’re at a conference, in which case you should substitute alcohol for caffeine).

Here is where I break from following Saunders’ method. I want like to show you what’s possible with some of the publicly available corpora online, or how a linguist would conduct an inquiry into the use of adverbs in academia.

Looking for sentence-initial adverbs in academic texts, I went to COCA. I know the COCA interface can seem a bit daunting to the uninitiated, but there are very clear instructions (with examples) of how to do everything. Just remember: if confusion persists for more than four hours, consult your local linguist.

On the COCA page, I searched for adverbs coming after a period, or sentence initial adverbs, in the Medical and Science/Technology texts in the Academic section (Click here to rerun my exact search on COCA. Just hit “Search” on the left when you get there). Here’s what I came up with:

Click to embiggen
Top ten sentence initial adverbs in medical and science academic texts in COCA.

You’ll notice that only one of the adverbs on this list (“finally”) ends in “ly”. That word is also coincidentally the top word on Saunders’ list. Notice also that the list above includes the kind of sentence adverbs that Saunders’ search deliberately does not, or those not ending in -ly, such as “for” and “in”, despite the examples of such given on the Wikipedia page that Saunders linked to in his post. (For those wondering, the POS-tagger treated these as parts of adverbial phrases, hence the “REX21” and “RR21” tags)

Searching for only those sentence initial adverbs that end in -ly, we find a list similar to Saunders’, but with only five of the same words on it. (Saunders’ top ten are: finally, additionally, interestingly, recently, importantly, similarly, surprisingly, specifically, conversely, consequentially)

Click to embiggen
Top ten sentence initial adverbs ending in -ly in medical and science academic texts in COCA.

So what does this tell us? Well, for starters, my shooting-from-the-hip research is insufficient to draw any great conclusions from, even if it is more systematic than Saunders’. Seeing what adverbs are used to start sentences doesn’t really tell us much about, for example, what the journals, authors, or results of the papers are like. This is the mistake that Mr. Saunders makes in his conclusions. After ranking the usage frequencies of surprising by journal, he writes:

The message seems clear: go with a Nature or specialist PLoS journal if your results are surprising.

Unfortunately for Mr. Saunders, a linguist would find the message anything but clear. For starters, the realtive use of surprising in a journal does not tell us that the results in the articles are actually surprising, but rather that the authors wish to present their results as surprising. That is, if the word surprising in the articles is not preceded by Our results are not. This is another problem with Mr. Saunders’ conclusions – not placing his results in context – and it is something that linguists would research, perhaps by scrolling through the concordances using corpus linguistics software, or software designed exactly for the type of research that Mr. Saunders wished to do.

The second thing to notice about my results is that they probably look a whole lot more boring than Saunders’. Such is the nature of researching things that people think matter (like those nasty little adverbs), but professionals know really don’t. So it goes.

Finally, what we really should be looking at is how scientists use adverbs in comparison to other writers. I chose to contrast the frequencies of sentence-initial adverbs in the medical and science/technology articles with the frequencies found in academic articles from the (oft-disparaged) humanities. (Here is the link to that search.)

Click to embiggen
Top ten sentence initial adverbs in humanities academic texts in COCA.

Six of the top ten sentence initial adverbs in the humanities texts are also on the list for the (hard) science texts. What does this tell us? Again, not much. But we can get an idea that either the styles in the two subjects are not that different, or that sentence initial adverbs might be similar across other genres as well (since the words on these lists look rather pedestrian). We won’t know, of course, until we do more research. And if you really want to know, I suggest you do some corpus searches of your own because the end of this blog post is long overdue.

I also think I’ve picked on Mr. Saunders enough. After all, it’s not really his fault if he didn’t do as I have suggested. How was he supposed to know all these corpora are available? He’s a bioinformatician, not a corpus linguist. And yet, sadly, he’s the one who gets written up in the Smithsonian’s blog, even though linguists have been publishing about these matters since at least the late 1980s.

Before I end, though, I want to offer a word of warning. Although I said that anyone who knows where to look can and should do their own corpus linguistic research, and although I tried to keep my searches as simple as possible, I couldn’t have done them without my background in linguistics. Doing linguistic research on Big Data is tempting. But doing linguistic research on a corpora, especially one that you compiled, can be misleading at best and flat out wrong at worst if you don’t know what you’re doing. The problem is that Mr. Saunders isn’t alone. I’ve seen other non-linguists try this type of research. My message here is similar to the one in my previous post, which was directed to marketers: linguistic research is interesting and it can tell you a lot about the subject of your interest, but only if you do it right. So get a linguist to do it or see if a linguist has already done it. If either of these is not possible, then feel free to do your own research, but tread lightly, young padawans.

If you’re wondering whether academia overuses adverbs (hint: it doesn’t) or just how much adverbs get tossed into academic articles, I recommend reading papers written by Douglas Biber and/or Susan Conrad. They have published extensively on the linguistic nature of many different writing genres. Here’s a link to a Google Scholar search to get you started. You can also have a look at the Longman Grammar, which is probably available at your library.

How Linguistics Can Improve Your Marketing

This post is intended to show what linguistics can offer marketing. I’ll be using corpus linguistics tools to analyze a few pieces of advice about how to write better marketing copy. The idea is to empirically test the ideas of what makes for more profitable marketing. But first, a quick note to the marketers. Linguists, please leave the room.

Note to marketers: Corpus linguistics works by annotating texts according to the linguistic features that one wishes to study. One of the most common ways is to tag each word for its part of speech (noun, verb, etc.) and that is what I’ve done here. Corpus linguistics generally works better on longer texts or larger banks of texts, since the results of the analysis become more accurate with more data. In this post I’m going to do a surface analysis of email marketing texts, which are each 250-300 words long, using corpus linguistic methods. If you’re interested in knowing more, please feel free to contact me (joseph.mcveigh@gmail.com). In fact, I really hope you’ll get in touch because I’ve tried again and again to get email marketers to work with me and come up with bupkis. I’m writing this post to show you exactly what I have to offer, which is something you won’t find anywhere else.

Welcome back, linguists. So what I’ve done is gathered ten email marketing texts and ranked them based on how well they performed. That means I divided the number of units sold by the number of emails sent. I then ran each text through a part-of-speech tagger (CLAWS7). Now we’re ready for action.

Let’s start with a few pieces of advice about how to write good marketing copy. I want to see if the successful and unsuccessful marketing texts show whether the advice really translates into better sales.

1. Don’t BE yourself

The first piece of advice goes like this: Don’t use BE verbs in your writing. This means copywriters should avoid is, are, was, were, etc. because it apparently promotes insanity (test results pending) and because “we never can reduce ourselves to single concepts”. If it sounds crazy, that’s because it is. And even the people who promote this advice can’t follow it (three guesses as to what the fourth word in the section on that page introducing this advice is). But let’s see what the marketing texts tell us. Who knows, maybe “to be or not to be” is actually the most memorable literary phrase in English because it’s actually really, really bad writing.

In the chart below, the percentage of BE verbs used in each texts are listed (1 = most successful text). The differences seem pretty staggering, right?

Chart showing the percentage of be verbs in the marketing texts

Well, they would be staggering if I didn’t tell you that each horizontal axis line represents a half of a percentage point, or 0.5%. Now we can see that the differences between the texts, and especially between the best and worst texts, is practically non-existent. So much for being BE-free.

2. You can’t keep a good part of speech down

The second piece of advice is about the misuse of adjectives. According to some marketing/writing experts, copywriters should avoid using adjectives at all because “They are, in fact one of the worst [sic] elements of speech and even make a listener or reader lose trust”. Sounds serious. Except for the fact that linguists have long known that avoiding adjectives is not only bad advice but impossible to do, especially in marketing. How’s that? Well, first, this is another piece of advice which is given by people who can’t seem to follow it. But let’s say you’re trying to sell a t-shirt (or a car or a sofa or whatever). Now try to tell me what color it is without using an adjective. The fact is that different writing styles (sometimes called genres or text types), such as academic writing, fiction, or journalism, use adjectives to a different extent. Some styles use more adjectives, some use less, but all of them use adjectives because (and I can’t stress this enough) adjectives are a natural and necessary part of language. So writers should use neither too many or too few adjectives, depending on the style they are writing in.

But we’re here to run some tests. Let’s take the advice at face value and see if using less (or no) adjectives really means sales will increase.

Chart showing the percentage of adjectives in the marketing texts

Again, the differences in the results look drastic and again looks can be deceiving. In this case, the horizontal axis lines represents two percentage points (2%). The percentage of adjectives used in the three most successful and three least successful marketing texts are nearly identical. In fact, they are within two percentage points of each other. Another one bites the dust.

UPDATE August 22, 2013 – I’d like to mention that the use of modifiers, such as adjectives, is a good way of showing the depth of my research and what it can really offer marketers. While we saw that adjectives in general, or as a class, do not tell us much about which marketing texts will perform better, there are other ways to look into this. For example, there may be certain types of adjectives common to the successful marketing texts, but not found in the unsuccessful ones. Likewise, the placement of an adjective and whether it is preceded by, say, a determiner (the, an, etc.), may also be indicitave of more successful texts. And in a similar fashion, texts which use nouns as modifiers instead of adjectives may be more successful than those that do not. The important thing for marketers reading this to know is that I can research all of these aspects and more. It’s what I do.

3. It’s not all about you, you, you

The final piece of advice concerns the use of the word you, which is apparently one of the most persuasive words in the English language (see #24 on that page). Forget about the details on this one because I don’t feel like getting into why this is shady advice. Let’s just get right to the results.

Chart showing the percentage of the word you in the marketing texts

Does this chart look familiar? This time the horizontal axis lines once again represent a half of a percentage point. And once again, less than two percentage points separate the best and the worst marketing texts. In fact, the largest difference in the use of you between texts is 1.5%. That means that each one of the marketing texts I looked at – the good, the bad, and the in between – uses the word you practically the same as the others. It would behoove you to disregard this piece of advice.

So what?

I’ll admit that I picked some low hanging fruit for this post. But the point was not to shoot down marketing tips. The point was to show email marketers what corpus linguists (like me!) have to offer. Looking for specific words or adjectives is not the only thing that corpus linguistics can do. What if I could analyze your marketing and find a pattern among your more successful texts? Wouldn’t you like to know what it was so you could apply when creating copy in the future? On the other hand, what if there wasn’t any specific pattern among the more successful (or less successful) texts? What if something besides your copy predicted your sales? Wouldn’t you like to know that as well so you could save time poring over your copy in the future?

Really, if you’re an email marketer, I think you should get in touch with me (joseph.mcveigh@gmail.com). I’m about to start my PhD studies, which means that all my knowledge and all that corpus linguistics has to offer could be yours.

How about letting me analyze – and probably finding an innovative way to improve – your marketing? Sound like a good deal? If so, contact me here: joseph.mcveigh@gmail.com.

Book review: Cross-cultural Pragmatics by Anna Wierzbicka

If you study linguistics, you will probably come across Anna Wierzbicka’s Cross-Cultural Pragmatics, perhaps as an undergrad, but definitely if you go into the fields of pragmatics or semantics. It’s a seminal work for reasons I will get into soon. The problem is that most of the data used to draw the conclusions are oversimplifications. This review is written for people who encounter this book in their early, impressionable semesters.

What’s it all about?

With Cross-cultural pragmatics, Wierzbicka was able to change the field of pragmatics for the better. Her basic argument runs like this: the previous “universal” rules of politeness that govern speech acts are wrong. The rules behind speech acts should instead be formulated in terms of cultural-specific conversational strategies. Also, the mechanisms of speech acts are culture-specific, meaning that they reflect the norms and assumptions of a culture. Wierzbicka argues that language-specific norms of interaction should be linked to specific cultural values.

At the time Cross-cultural pragmatics was written, this needed to be said. There was more involved in speech acts than scholars were acknowledging. And the explanations used for speech acts in English were not entirely appropriate to explain speech acts in other languages or even other English-speaking cultures, although they were being used to. So Wierzbicka gets credit for helping to advance the field of linguistics.

So what’s wrong with that?

The problem I have with this book is that Wierzbicka lays out a research method designed to avoid oversimplifications, but then oversimplifies her data to reach conclusions. Wierzbicka’s method in Cross-cultural pragmatics is what can be seen as a step in the development of semantic primes, which aims to explain all of the words in a language using a set of terms or concepts (do, say, want, etc.) that can not be simplified, their meanings being innately understood and their existence being cross-cultural.

For example, Wierzbicka analyzes self-assertion in Japanese and English. She says that Japanese speakers DO NOT say “I want/think/like X”, while English speakers DO. She then translates the Japanese term enryo (restraint) like this:

X thinks: I can’t say “I want/think/like this” or “I don’t want/think/like this”
   Someone could feel bad because of this
X doesn’t say it because of this
X doesn’t do some things because of this

This is all fine and good, but you can probably see how such an analysis has the potential to unravel. Just taking polysemy and context into account means that each and every term must be thoroughly explained using the above system.

But whatever. Let’s just say that it’s possible to do so. Semantic primes are still discussed in academia and I’m not here to debate their usefulness. What I want to talk about is how Wierzbicka oversimplifies the language and cultures that she compares. Although there are many examples to choose from, I’ll only list a few that come in quick succession.

cross-cultural pragmatics - wierzbicka

Those manly Aussies

In describing Australian culture, Wierzbicka says that “Shouting is a specifically Australian concept” (173). And yet she doesn’t explain how it is any different from buying a round or why this concept is “specifically Australian” She then describes the Australian term dob in but does not tell us how it differs from snitch. Finally, she notes that the Australians use the term whinge an awful lot. Whinge is used to bolster Wierzbicka’s claim that Australians value “tough masculinity, gameness, and resilience” and that they refer to British people as whingers .

First of all, how Wierzbicka misses the obviously similarities between whinging and whining is beyond me. She instead compares whinge to complain. Second, British people refer to other British people as “whingers”, so how exactly is whinge “marginal” in “other parts of the English-speaking world”? (180) Finally, wouldn’t using a negative term like whinge show more about the strained relations between the Australians and British than it would about any sort of heightened “masculine” Australian identity? Does stunad prove that Italian-Americans have a particular or peculiar dislike of morons compared to other cultures?

We should have used a corpus

In other parts of Cross-cultural pragmatics, Wierzbicka seems to be cherry-picking the speech acts that she uses to evaluate the norms and values of the cultures she compares. This can be seen from the following passage on the differences between (white) Anglo-American culture and Jewish or black American culture:

The expansion of such expressions [Nice to have met you, Lovely to see you, etc.] fits in logically with the modern Anglo-American constraints on direct confrontation, direct clashes, direct criticisms, direct ‘personal remarks’ – features which are allowed and promoted in other cultures, for example, in Jewish culture or in Black American culture, in the interest of cultural values such as ‘closeness’, ‘sponteneity’, ‘animation’, or ‘emotional intensity’, which are given in these cultures priority over ‘social harmony’.
This is why, for example, one doesn’t say freely in (white) English, ‘You are wrong’, as one does in Hebrew or ‘You’re crazy’, as one does in Black English. Of course some ‘Anglos’ do say fairly freely things like Rubbish! or even Bullshit!. In particular, Bullshit! (as well as You bastard!) is widely used in conversational Australian English. Phrases of this kind, however, derive their social force and their popularity partly from the sense that one is violating a social constraint. In using phrases of this kind, the speaker defies a social constraint, and exploits it for an expressive purpose, indirectly, therefore, he (sometimes, she) acknowledges the existence of this constraint in the society at large. (pp. 118–9)

Do we know whites Anglo-Americans don’t say “You are wrong” or that they say it less than Jewish people? I heard a white person say it today, but that is just anecdotal evidence. Obviously, large representative corpora were not around to consult when Wierzbicka wrote Cross-cultural pragmatics, but it would be nice to see at least some empirical data points. Instead we’re left with just the assertion that black Americans” “You’re crazy” and Anglo-Americans” “Bullshit!” are not equal, which to me is confusing and misguided. Also, aren’t black people violating a social norm by saying “you’re crazy”?

Wierzbicka’s inability to consult a corpus (because there wasn’t one available at the time, granted) is why I am not consulting one right now, but just off the top of my head, I can think of other (common) expressions from both cultures that would say the exact opposite of what Wierzbicka claims. For example, as Pryor (1979) pointed out, whites have been known to say things like “Cut the shit!” How is this different from Black English’s “You’re crazy!”?

This leads me to the final major problem I have with Cross-cultural pragmatics: While classifications of speech acts based on “directness,” etc. were insufficient for the reasons that Wierzbicka points out, her classifications suffer from not being able to group similar constructions together, which is one of the goal in describing a large system such as language. They are too simplistic and specific to each construction. There are always certain constructions that don’t fit the mold that Wierzbicka lays out, which seems to me a similar problem to the one she’s trying to solve. So the problem gets shifted instead of solved.

Still, I think Wierzbicka was justified in changing the ways that researchers talked about speech acts. I also think she was right in shattering the Anglo-American and English language bias which was prevalent at the time. It’s those points that make Cross-cultural pragmatics an important work. The lack of empirical data and the over-generalizations are unfortunate, but so are lots of other things. Welcome to academia, folks.

 

 

 

Up next: Superman: The High-Flying History of America’s Most Enduring Hero by Larry Tye

Book review: Punctuation..? by User Design

This is by far the hippest book on punctuation I’ve ever read. That may sound strange, but I study linguistics, so I’ve read a few good books on punctuation.

Front and back covers of Punctuation..?
Front and back covers of Punctuation..?

Punctuation..? intends to explain the “functions and correct uses of 21 of the most used punctuation marks.” I say “intends” because it’s always a toss up with grammar books. Some people get very picky about what is verboten in written and spoken English. The problem is that when these people get bent out of shape one too many times, they start convincing publishers to bound their rantings and ravings.

But Punctuation..? takes a different approach. The slick, minimalist artwork matches the concise and reasonable explanations of punctuation marks. This book will not tell you that you’re going to die poor and lonely if you don’t use an Oxford comma. Instead it very succinctly explains what a comma is and how it is used.

According to the book’s website, Punctuation..? is for “a wide age range (young to ageing) and intelligence (emerging to expert).” As someone who probably resides on the more expert end of punctuation intelligence, or who at least doesn’t need to be told what an ellipsis is, I still found this book enjoyable for two reasons.

First, the explanations are not only easy to understand, they’re also correct. This is kind of important for educational books. While it was nice that the interpunct (·) and pilcrow (¶) were included, it was even better that the semicolon got some (well deserved) respect and that the exclamation point came with a word of caution.

Pages 34 and 35, which feature some semicolon love.
Pages 34 and 35, which feature some semicolon love.

Second, although Punctuation..? is of more practical benefit to learners of English, it’s probably more of a joy to language enthusiasts because the book is actually funny. If a punctuation book has you laughing, I think that’s a good sign.

I guess the only problem I had with this book was its definition of a noun, which was a little too traditional for my tastes (you know the one). But I think that’s neither here nor there, since if you have another definition for a noun, you’re probably a linguist. And in that case you’ll just be glad to see such a cool book about punctuation aimed at wide audience.

Check out the User design website for more info and links to where you can buy it.

 

 

Up next: A twenty-years-too-late look at a seminal work in pragmatics, Cross-cultural pragmatics: the semantics of human interaction by Anna Wierzbicka.

Meta Book Review: Reviews of Sampson’s The Language Instinct Debate

When I last left you*, we had just talked about how Geoffrey Sampson’s The Language Instinct Debate is a remarkable take-down of Steven Pinker’s The Language Instinct and the nativist argument, or the idea that language is genetic. I came down pretty hard on the nativists, who I termed “Chomskers” (CHOMsky + PinKER + otherS) and rightly so since their theory amounts to a bunch of smoke and mirrors. For this post, I’m going to review the reviews of Sampson’s book. It’ll be like what scholars call a meta-analysis, except nowhere near as lengthy or peer-reviewed. For the absence of those, I promise more swear words. For those just joining us, here are my reviews of Pinker’s The Language Instinct and Sampson’s The Language Instinct Debate, the first two parts of this three-part series of posts. If you’re new to the subject matter (linguistic nativism), they’ll help you understand what this post is all about. If you already know all about Universal Grammar (and have read my totally bitchin’ reviews of the aforementioned books), then let’s get on with the show.

I know you are, but what am I?

Victor M. Longa’s review of The Language Instinct Debate

Longa’s review would be impressive if it wasn’t written in classic Chomskers’ style. He seems to address Sampson’s book in a thoughtful and step-by-step process, but his arguments boil down to nothing but “Sampson’s wrong because language is innate.” I know this sounds bad, but it’s the truth. A good example of Longa’s typical nativist style can be found here:

To sum up, S[ampson] tries, with difficulty, to explain the convergence between different languages by resorting only to the cultural nature of language. (Longa 1999: 338)

The disregard for other explanations is something to expect from the linguistic nativists. “You’re not considering that language is innate!” they protest. But innateness is all they consider. We must remember that linguistic nativism (or UG) is the unfalsifiable hypothesis. Any attempts to engage the theory in a logical way, such as Sampson has done, should be praised because of how much harm the proponents of the Universal Grammar Hypothesis (UGH) have done to the field of linguistics.

The belief that language is innate has become something more than an assumption to the nativists. This can be seen from Longa’s conclusion:

What is more, as I pointed out at the beginning of the paper, from the common-sense point of view, it is perfectly possible to conceive of a capacity such as language having been fixed in our species as a genetic endowment… (Longa 1999: 340)

It’s common-sense, godammit! What’s wrong with you people?! Why can’t everyone just see that something we have no evidence for is real? How many times do we have to say it? Language is innate. Never mind that it’s perfectly possible to conceive of just about anything (it’s called, you know, imagination), or that the arguments for linguistic nativism fall down easier than a elephant on ice skates, just trust us when we say that language is innate. OK?

Longa goes on about the innateness of language:

To deny this possibility a priori, claiming that is sounds almost mad, suggests a biased perspective that has little to offer to the scientific study of language.

Know what else has little to offer the scientific study of language (or the scientific study of anything, for that matter)? Unfalsifiable theories. That’s why linguistic nativism has been denied. Scientific hypotheses are accepted only so long as they stand up to the tests meant to falsify them. But first (and I can’t stress this enough) they have to falsifiable or they’re not scientific theories. Linguistic nativism has been considered for so long only because Chomskers won’t stop writing bullshit books about it and forcing it down students’ throats. My fellow budding scholars who had to write about UGH, I feel for you.

Longa’s review is followed by a reply from Sampson, which offers a simple way to see how unfalsifiable nativism is. Sampson quite rightly points out that the speed-of-acquisition argument made by Chomskers, which says that language is innate because children learn language remarkably fast, is ridiculous because Chomskers have never claimed how long it should take children to learn language in the absence of an innate UGH. They just say it’s innate and that kids learn language, like, really fast bro, and we’re supposed to take these claims as common-sense truth. This is par for the nativist course.

What he said

Stephen J. Cowley’s review of both books

Cowley review of both Pinker’s The Language Instinct and Sampson’s The Language Instinct Debate is a wonderful read and I want to quote the whole damn thing. While Cowley agrees that Sampson successfully refutes linguistic nativism, and that Pinker’s argument is akin to “saying that, because angels exist, miracles happen” (75), he rejects Sampson’s alternative to the origin of language, a topic I have not addressed in these reviews. Fortunately, I don’t have to quote the whole paper because it’s available online. And you should go read it here:
http://www.psy.herts.ac.uk/pub/sjcowley/docs/baby%26bathwater.pdf (PDF).

John H. Whorter’s review in Language

Like, Cowley, McWhorter writes that Sampson successfully refutes Chomsker’s theory, saying that he “makes a powerful case that linguistic nativism […] has been grievously underargued, and risks looking to scientists in a hundred years like the search for phlogiston does to us now” (434). That’s putting it nicely, I think.

McWhorter raises concerns with some of Sampson’s methods, such as his discussion of hypotaxis and complexity, his refutation of Berlin and Kay’s classic color-term study, and WH-movement. McWhorter also worries that since Sampson only covers Chomsky’s writings up to 1980, his take-down of linguistic nativism may not be as strong as could be hoped because of the post-1980 development of the Principle and Parameters theory and minimalism (two theories which are meant to deal with, you guessed it, problems with linguistic nativism. Surprise!). While I agree that it would have been nice to see Sampson discuss these theories (since they have their own typical nativism problems), I don’t believe its absence is as critical as McWhorter claims, who questions Sampson’s decision to stop at 1980 because there’s nothing “solider to be pulled out of the bag.” (Sampson 2005: 165) McWhorter presumes that “certainly we would question a refutation of physics that used that justification to stop before string theory” (436). While I can get where he’s coming from, I think the bad analogy (which is something I’m pretty good at too) is particularly problematic here. Physics is founded on testable and falsifiable theories. Thanks to the contagious nature of nativism, linguistics these days is not.

What I especially like about McWhorter’s review is his acknowledgment that nativism has become something of a religion in linguistics. Commenting on the suspicious lack of response to Samspon’s book by nativists, McWhorter writes:

It may well be that Chomsyans harbor an argumentational firepower that would leave S[ampson] conclusively out-debated just as Chomsky’s detractors were in the 1960s and 1970s. But if such engagement is not even ventured, then claims that linguistic nativism is less a theory than a cult start looking plausible. (McWhorter 2008: 237)

Further Reading

This series of posts is by no means a review of all that has been said about UG or linguistic nativism. For those who wish to learn more, I suggest the following books.

The cultural origins of human cognition by Michael Tomasello

Tomasello’s book is a wonderful explanation of how children learn to speak and how human cognition does not need any innate language faculty. The theory he lays out has been called the Theory of Mind, which is an awful name, but it makes much more sense than anything I have ever read by nativists. Tomasello even has a few words for the nativists:

It is very telling that there are essentially no people who call themselves biologists who also call themselves nativists. When developmental biologists look at the developing embryo, they have no use for the concept of innateness. This is not because they underestimate the influence of genes – the essential role of the genome is assumed as a matter of course – but rather because the categorical judgment that a characteristic is innate simply does not help in understanding the process. (Tomasello 2000: 49)

If Chomskers’ theory left you shaking your head, and Sampson’s didn’t quite measure up, I highly recommend checking out Tomasello. As a bonus, this book is very much aimed at a wide audience, so three years of linguistics courses are not required.

What counts as evidence in linguistics, ed. by Martina Penke and Anette Rosenbach

This book is a collection of essays which address how the opposing fields in linguistics, formalism (or UG proponents) and functionalism, treat evidence in their research. The papers are excellent, not only because the authors are preeminent scholars in their fields, but also because each paper is followed by a response from an author of the opposing field. Even better, the responses are followed by replies from the author(s). It’s definitely on the hard-core linguistics side, so dabblers in this debate beware. As a example of what it contains, however, here is a link to a response to one of the articles by Michael Tomasello: http://www.eva.mpg.de/psycho/pdf/Publications_2004_PDF/what_kind_of_evidence_04.pdf (PDF). Not to toot his own horn, but it really lays bare what scholars are up against when they attempt to engage nativists.

 

 

References

Cowley, Stephen J. 2001. “The baby, the bathwater and the ‘language instinct’ debate”. Language Sciences 23: 69–91. http://www.psy.herts.ac.uk/pub/sjcowley/docs/baby%26bathwater.pdf

Longa, Victor M. 1999. “Review article”. Linguistics 37(2): 325–343. http://dx.doi.org/10.1515/ling.37.2.325 (requires access to Linguistics).

McWhorter, John H. 2008. “The ‘language instinct’ debate (review)”. Language 84(2): 434–437. http://www.jstor.org/stable/40071054 http://dx.doi.org/10.1353/lan.0.0008 (requires access to either JSTOR or Project MUSE).

Penke, Martina and Anette Rosenbach (eds.). 2007. What counts as evidence in linguistics: The case of innateness. Amsterdam & Philadelphia: John Benjamins. http://benjamins.com/#catalog/books/bct.7/main

Sampson, Geoffrey. 1999. “Reply to Longa”. Linguistics 37(2): 345–350. http://dx.doi.org/10.1515/ling.37.2.345 (requires access to Linguistics, but a “submitted” online version can be found on Sampson’s site here: http://www.grsampson.net/ARtl.html)

Tomasello, Michael. 2000. The cultural origins of human cognition. Cambridge: Harvard University Press. On Amazon. On Abe Books. On Barnes&Noble.

 

 

Up next: Punctuation..? by User design.

 

 

* A long, long time ago, I know. But I decided to focus all my powers on writing my Master’s thesis, which meant this blog got the shaft. Now that’s done and we’re back in business, baby. Go back up for the sweet, sweet linguistic goodness.

Book Review: The Language Instinct Debate by Geoffrey Sampson

The following is a book review and the second post in a series. The first post discussed Steven Pinker’s The Language Instinct . This post discusses Geoffrey Sampson’s The Language Instinct Debate, which is a critique of Pinker’s book. The third post will discuss some of the critics and reviews of Sampson’s book.

In a comment on the first post in this series, linguischtick (who has an awesome gravatar, by the way) pointed out that I didn’t mention two key points of the Chomskers (Chomsky + Pinker + their followers. Nom.) theory. As this post is about a book which is a direct “response to Steven Pinker’s The Language Instinct and Noam Chomsky’s nativism,” it would be good to remind ourselves of the claims that nativists make. Below are the claims along with some comments on them.

1. Speed of acquisition

Chomskyian linguists claim that kids learn language remarkably fast, so fast that it must be innate. But fast compared to what? How do we know kids don’t learn language very slowly? Chomskers has no answer. Sampson says this and then very cleverly points out that Chomsky has never supplied an amount of time it should take kids to learn language because “he argues that the data available to a language learner are so poor that accurate language learning would be impossible without innate knowledge – that is, no amount of time would suffice” (37, emphasis his).

2. Age dependence

Chomskers claim that the language instinct theory is supported by how our ability to learn a language diminishes greatly around puberty. Sampson quickly refutes this claim by showing how the evidence on which Chomskers based his claim fails “to distinguish language learning from any other case of learning” and that it is “perfectly compatible with the view that learning as a general process is for biological reasons far more rapid before puberty than later.” (41, emphasis his) So we see that leap of faith again. The evidence doesn’t suggest a language instinct, but that doesn’t stop Chomskers from jumping to that conclusion.

3. Poverty of the Stimulus

This is a major part of the Chomskers argument (and the only one that can be shortened into a perfectly applicable acronym – POS). Put simply, it goes like this: kids are not supplied with enough language info by their community to enable them to learn to speak. This is what Pinker was talking about when he snidely called Motherese – the style adults use when speaking to children – “folklore”. The poverty of the stimulus is a crazy idea, but don’t worry, it’s completely wrong. First, once linguists started researching Motherese, they found that it was much more “proper” than anyone had assumed. Sampson references one study that found “only one utterance out of 1500 spoken to the children was a disfluency.” (43) Chomskers also claim that some linguistic features never occur in spoken language and yet children learn the rules for them anyway. But wait a minute, has Chomskers ever looked for these mysterious linguistic features that never occur? Of course not. That’s not how they roll.

Sampson gives them a taste of their own medicine by writing

‘Hang on a minute,’ I hear the reader say. ‘You seem to be telling us that this man [Chomsky] who is by common consent the world’s leading living intellectual, according to Cambridge University a second Plato, is basing his radical reassessment of human nature largely on the claim that a certain thing never happens; he tells us that it strains his credulity to think that this might happen, but he has never looked, and people who have looked find that it happens a lot.’
Yes, that’s about the size of it. Funny old world, isn’t it! (47)

Another aspect of this piece of shit poverty of the stimulus argument is the so-called lack of negative evidence. This idea claims that kids aren’t given evidence of which types of constructions are not possible in language. It leads one to wonder how children could possibly learn which sentences to exclude as non-language? Sounds pretty interesting, huh? There must be a language instinct then, right? Sampson bursts Chomskers bubble:

The trouble with this argument is that, if it worked, it would not just show that language learning without innate knowledge is impossible: it would show that scientific discovery is impossible. We can argue about whether or not children get negative evidence from their elders’ language; but a scientist certainly gets no negative evidence from the natural world. When a heavy body is released near the surface of the Earth, it never remains stationary or floats upwards, displaying an asterisk or broadcasting a message ‘This is not how Nature works – devise a theory which excludes this possibility!’ (90)

4. Convergence of grammars

This claim wonders how both smart and dumb people grow up speaking essentially the same language.
Except they don’t, so forget it. Other linguists – the kind that like evidence and observable data – have proven that people don’t speak the same.

5. Language universals

This is the idea that there are some structural properties which are found across every language in the world, even though there is no reason why they should be (since they’re not necessary to language). This is where Universal Grammar comes in. Sampson devotes a chapter to this broad argument and in one of the many parts that make this book an excellent read, he very cleverly takes the argument down by pointing out that universals are better evidence of the cultural development of language than they are of the biological innate theory of language. Using a theory developed by Herbert Simon, Sampson shows that, basically, the structural dependencies that Chomskers is so fond of arose out of normal evolutionary development because evolution favors hierarchical structure. Complex evolutionary systems – something Sampson argues language is – are hierarchically structured for a reason, they do not have to be innate.

If this is the crux of the language instinct argument, it’s almost laughable how easily it falls. As Sampson notes, even Chomskers doesn’t think it carries weight.

Steven Pinker himself has suggested that nativist arguments do not amount to much. In a posting on the electronic LINGUIST List (posting 9.1209, 1 September 1998), he wrote: ‘I agree that U[niversal G[rammar] has been poorly defended and documented in the linguistics literature.’ Yet that literature comprises the only grounds we are given for believing in the language universals theory. If the theory is more a matter of faith than evidence and reasoned argument even for its best-known advocate, why should anyone take it seriously? If it were not that students have to deal with this stuff in order to get their degrees, how many takers would there be for it? (166)

Even a blind squirrel finds a nut sometimes

The really sad thing is that Universal Grammar is the crux of the Chomskers argument. Sampson writes that “at heart linguistic nativism is a theory about grammatical structure.” (71) More importantly, it’s a theory that gathers all the “evidence” it thinks support its beliefs and dismisses any that do not. It is Confirmation Bias 101.

But don’t take my word for it. Just before he knocks down the innatist belief that tree structures prove there’s a language instinct, Sampson points out that Chomskers don’t even know how to follow through with their own thoughts. He writes

Ironically, though, having been the first to realize that tree structure in human grammar is a universal feature that is telling us something about how human beings universally function, Chomsky failed to grasp what it is telling us. The universality of tree structuring tells us that languages are systems which human beings develop in the their gradual, guess-and-test style by which, according to Karl Popper, all knowledge is brought into being. Tree structuring is the hallmark of gradual evolution. (141)

Hey-o!

So don’t violate or you’ll get violated

OK, right now the reader might think I’ve been too hard on Chomskers. Let me assuage your concerns. I’m a firm believer in treating people with the respect they deserve. So when I say that Chomskers have their heads stuck firmly up their own asses, it’s because saying “the facts don’t support their claims” is not what they deserve. A group of scientists that hates facts deserves derision. Researchers in every field use observable data to come to conclusions. Their publications are part of an ongoing debate among other researchers, who can support or refute their claims based on more data. Everyone plays by these rules because they are in everyone’s best interest. All infamous academic quarrels aside, Chomskers would prefer not to back up their claims with observable data nor engage in any kind of debate with scientists. The bum on the street shouting that the world is going to end has the advantage of being bat-shit crazy. What’s Chomskers excuse?

I suppose they could say that they are well-established. But in my mind that just points out the reasons for their unscientific actions. What’s going to happen to those grants and faculty positions if people stop believing in Chomskers’ witchcraft? Sampson writes

“Nativist linguistics is now the basis of so many careers and so many university departments that it feels itself entitled to a degree of reverence. Someone who disagrees is expected to pull his punches, to couch his dissent in circumspect and opaquely academic terms – and of course, provided he does that, the nativist community is adept at verbally glossing over the critique in such a way that, for the general reader, not a ripple is left disturbing the public face of nativism. But reverence is out of place in science. The more widespread and influential a false theory has become, the more urgent it is to puncture its pretensions. Taxpayers who maintain the expensive establishment of nativist linguistics do not understand themselves to be paying for shrines of a cult: they suppose that they are supporting research based on objective data and logical argument.” (129)

Chomskers have been selling you snake oil for 60 years, they can’t give it up now. They have to double-down. Now’s the time to really push the limits of decency in academia. Take a look:

“Paul Postal discusses in his Foreword the fact that my critique of linguistic nativism has been left unanswered by advocates of the theory. I am not alone there: various stories go the rounds about refusals by leading figures of the movement to engage with their intellectual opponents in the normal academic fashion, for fear that giving the oxygen of publicity to people who reject nativist theory might encourage the public to read those people and find themselves agreeing. […] This interesting point here is a different one. Nowhere in Words and Rules does Pinker say that he is responding to my objection. My book introduced the particular examples of Blackfoot and pinkfoot into this debate, and they are such unusual words that Pinker’s use of the same examples cannot be coincidence. He is replying to my book; but he does not mention me.” (127-8)

I don’t think I need to point out the shamefulness of such actions.

I read Steven Pinker and all I got was this lowsy blog post

Reading Sampson after reading Pinker is a lesson in frustration, but not because of any problems with Sampson’s book. On the contrary, The Language Instinct Debate is very well written. Sampson not only clearly points out why Chomsky and Pinker’s theories are wrong, but he does so in a seemingly effortless way. Sometimes this is obvious because Chomskers didn’t even look at the evidence, they just made something up and held out their hands. Sometimes this is frustrating because I wasted time reading Pinker’s 450-page sand castle that Sampson crumbled in less than half of that. The Language Instinct Debate may leave you wondering how you ever thought Chomskers was on to something when Sampson makes the counter-evidence seems so blatantly obvious.

In the next and final post of this series, I’ll talk about some of the reviews and critics of Sampson’s book. For now, I’ll leave you with how Chomskers’ refusal to check the evidence or believe anyone who has, along with their outstretched hand and their demand that you believe them, has inspired me to write a book of my own. It’s called Paris is the Capital of Germany, China is in South America, and Other Reasons Why I Hate Maps.

It’s due out at the end of never because ugh.

 

 

References

Sampson, Geoffrey. 2005. The Language Instinct Debate. London & New York: Continuum.