My Corpus Brings All the Boys to the Yard

In two recent papers, one by Kloumann et al. (2012) and the other by Dodds et al. (2015), a group of researchers created a corpus to study the positivity of the English language. I looked at some of the problems with those papers here and here. For this post, however, I want to focus on one of the registers in the authors’ corpus – song lyrics. There is a problem with taking language such as lyrics out of context and then judging them based on the positivity of the words in the songs. But first I need to briefly explain what the authors did.

In the two papers, the authors created a corpus based on books, New York Times articles, tweets and song lyrics. They then created a list of the 10,000 most common word types in their corpus and had voluntary respondents rate how positive or negative they felt the words were. They used this information to claim that human language overall (and English) is emotionally positive.

That’s the idea anyway, but song lyrics exist as part of a multimodal genre. There are lyrics and there is music. These two modalities operate simultaneously to convey a message or feeling. This is important for a couple of reasons. First, the other registers in the corpus do not work like song lyrics. Books and news articles are black text on a white background with few or no pictures. And tweets are not always multimodal – it’s possible to include a short video or picture in a tweet, but it’s not necessary (Side note: I would like to know how many tweets in the corpus included pictures and/or videos, but the authors do not report that information).

So if we were to do a linguistic analysis of an artist or a genre of music, we would create a corpus of the lyrics of that artist or genre. We could then study the topics that are brought up in the lyrics, or even common words and expressions (lexical bundles or n-grams) that are used by the artist(s). We could perhaps even look at how the writing style of the artist(s) changed over time.

But if we wanted to perform an analysis of the positivity of the songs in our corpus, we would need to incorporate the music. The lyrics and music go hand in hand – without the music, you only have poetry. To see what I mean, take a look at the following word list. Do the words in this list look particularly positive or negative to you?

a

ain’t

all

and

as

away

back

bitch

body

breast

but

butterfly

can

can’t

caught

chasing

comin’

days

did

didn’t

do

dog

down

everytime

fairy

fantasy

for

ghost

guess

had

hand

harm

her

his

i

i’m

if

in

it

looked

lovely

jar

makes

mason

life

live

maybe

me

mean

momma’s

more

my

need

nest

never

no

of

on

outside

pet

pin

real

return

robin

scent

she

sighing

slips

smell sorry

that

the

then

think

to

today

told

up

want

wash

went

what

when

with

withered

woke

would

yesterday

you

you’re

your

If we combine these words as Rivers Cuomo did in his song “Butterfly”, they average out to a positive score of 5.23. Here are the lyrics to that song.

Yesterday I went outside
With my momma’s mason jar
Caught a lovely Butterfly
When I woke up today
And looked in on my fairy pet
She had withered all away
No more sighing in her breast

I’m sorry for what I did
I did what my body told me to
I didn’t mean to do you harm
But everytime I pin down what I think I want
it slips away – the ghost slips away

I smell you on my hand for days
I can’t wash away your scent
If I’m a dog then you’re a bitch
I guess you’re as real as me
Maybe I can live with that
Maybe I need fantasy
A life of chasing Butterfly

I’m sorry for what I did
I did what my body told me to
I didn’t mean to do you harm
But everytime I pin down what I think I want
it slips away – the ghost slips away

I told you I would return
When the robin makes his nest
But I ain’t never comin’ back
I’m sorry, I’m sorry, I’m sorry

Does this look like a positive text to you? Does it look moderate, neither positive nor negative? I would say not. It seems negative to me, a sad song based on the opera Madame Butterfly, in which a man leaves his wife because he never really cared for her. When we include the music into our consideration, the non-positivity of this song is clear.

[youtube https://www.youtube.com/watch?v=rCoGkMlfz9I]
Let’s take a look at another list. How does this one look?

above

absence

alive

an

animal

apart

are

away

become

brings

broke

can

closer

complicate

desecrate

down

drink

else

every

everything

existence

faith

feel

flawed

for

forest

from

fuck

get

god

got

hate

have

help

hive

honey

i

i’ve

inside

insides

is

isolation

it

it’s

knees

let

like

make

me

my

myself

no

of

off

only

penetrate

perfect

reason

scraped

sell

sex

smell

somebody

soul

stay

stomach

tear

that

the

thing

through

to

trees

violate

want

whole

within

works

you

your

Based on the ratings in the two papers, this list is slightly more positive, with an average happiness rating of 5.46. When the words were used by Trent Reznor, however, they expressed “a deeply personal meditation on self-hatred” (Huxley 1997: 179). Here are the lyrics for “Closer” by Nine Inch Nails:

You let me violate you
You let me desecrate you
You let me penetrate you
You let me complicate you

Help me
I broke apart my insides
Help me
I’ve got no soul to sell
Help me
The only thing that works for me
Help me get away from myself

I want to fuck you like an animal
I want to feel you from the inside
I want to fuck you like an animal
My whole existence is flawed
You get me closer to god

You can have my isolation
You can have the hate that it brings
You can have my absence of faith
You can have my everything

Help me
Tear down my reason
Help me
It’s your sex I can smell
Help me
You make me perfect
Help me become somebody else

I want to fuck you like an animal
I want to feel you from the inside
I want to fuck you like an animal
My whole existence is flawed
You get me closer to god

Through every forest above the trees
Within my stomach scraped off my knees
I drink the honey inside your hive
You are the reason I stay alive

As Reznor (the songwriter and lyricist) sees it, “Closer” is “supernegative and superhateful” and that the song’s message is “I am a piece of shit and I am declaring that” (Huxley 1997: 179). You can see what he means when you listen to the song (minor NSF warning for the imagery in the video). [1]

[vimeo 3554226 w=500 h=377]

Nine Inch Nails: Closer (Uncensored) (1994) from Nine Inch Nails on Vimeo.

Then again, meaning is relative. Tommy Lee has said that “Closer” is “the all-time fuck song. Those are pure fuck beats – Trent Reznor knew what he was doing. You can fuck to it, you can dance to it and you can break shit to it.” And Tommy Lee should know. He played in the studio for NIИ and he is arguably more famous for fucking than he is for playing drums.

Nevertheless, the problem with the positivity rating of songs keeps popping up. The song “Mad World” was a pop hit for Tears for Fears, then reinterpreted in a more somber tone by Gary Jules and Michael Andrews. But it is rated a positive 5.39. Gotye’s global hit about failed relationships, “Somebody That I Used To Know”, is rated a positive 5.33. The anti-war and protest ballad “Eve of Destruction”, made famous by Barry McGuire, rates just barely on the negative side at 4.93. I guess there should have been more depressing references besides bodies floating, funeral processions, and race riots if the song writer really wanted to drive home the point.

For the song “Milkshake”, Kelis has said that it “means whatever people want it to” and that the milkshake referred to in the song is “the thing that makes women special […] what gives us our confidence and what makes us exciting”. It is rated less positive than “Mad World” at 5.24. That makes me want to doubt the authors’ commitment to Sparkle Motion.

Another upbeat jam that the kids listen to is the Ramones’ “Blitzkrieg Bop”. This is the energetic and exciting anthem of punk rock. It’s rated a negative 4.82. I wonder if we should even look at “Pinhead”.

Then there’s the old American folk classic “Where did you sleep last night”, which Nirvana performed a haunting version of on their album MTV Unplugged in New York. The song (also known as “In the Pines” and “Black Girl”) was first made famous by Lead Belly and it includes such catchy lines as

My girl, my girl, don’t lie to me
Tell me where did you sleep last night
In the pines, in the pines
Where the sun don’t ever shine
I would shiver the whole night through

And

Her husband was a hard working man
Just about a mile from here
His head was found in a driving wheel
But his body never was found

This song is rated a positive 5.24. I don’t know about you but neither the Lead Belly version, nor the Nirvana cover would give me that impression.

Even Pharrell Williams’ hit song “Happy” rates only 5.70. That’s a song so goddamn positive that it’s called “Happy”. But it’s only 0.03 points more positive than Eric Clapton’s “Tears in Heaven”, which is a song about the death of Clapton’s four-year-old son. Harry Chapin’s “Cat’s in the Cradle” was voted the fourth saddest song of all time by readers of Rolling Stone but it’s rated 5.55, while Willie Nelson’s “Always on My Mind” rates 5.63. So they are both sadder than “Happy”, but not by much. How many lyrics must a man research, before his corpus is questioned?

Corpus linguistics is not just gathering a bunch of words and calling it a day. The fact that the same “word” can have several meanings (known as polysemy), is a major feature of language. So before you ask people to rate a word’s positivity, you will want to make sure they at least know which meaning is being referred to. On top of that, words do not work in isolation. Spacing is an arbitrary construct in written language (remember that song lyrics are mostly heard not read). The back used in the Ramones’ lines “Piling in the back seat” and “Pulsating to the back beat” are not about a body part. The Weezer song “Butterfly” uses the word mason, but it’s part of the compound noun mason jar, not a reference to a brick layer. Words are also conditioned by the words around them. A word like eve may normally be considered positive as it brings to mind Christmas Eve and New Year’s Eve, but when used in a phrase like “the eve of destruction” our judgment of it is likely to change. In the corpus under discussion here, eat is rated 7.04, but that doesn’t consider what’s being eaten and so can not account for lines like “Eat your next door neighbor” (from “Eve of Destruction”).

We could go on and on like this. The point is that the authors of both of the papers didn’t do enough work with their data before drawing conclusions. And they didn’t consider that some of the language in their corpus is part of a multimodal genre where there are other things affecting the meaning of the language used (though technically no language use is devoid of context). Whether or not the lyrics of a song are “positive” or “negative”, the style of singing and the music that they are sung to will highly effect a person’s interpretation of the lyrics’ meaning and emotion. That’s just the way that music works.

This doesn’t mean that any of these songs are positive or negative based on their rating, it means that the system used by the authors of the two papers to rate the positivity or negativity of language seems to be flawed. I would have guessed that a rating system which took words out of context would be fundamentally flawed, but viewing the ratings of the songs in this post is a good way to visualize that. The fact that the two papers were published in reputable journals and picked up by reputable publications, such as the Atlantic and the New York Times, only adds insult to injury for the field of linguistics.

You can see a table of the songs I looked at for this post below and an spreadsheet with the ratings of the lyrics is here. I calculated the positivity ratings by averaging the scores for the word tokens in each song, rather than the types.

(By the way, Tupac is rated 4.76. It’s a good thing his attitude was fuck it ‘cause motherfuckers love it.)

Song Positivity score (1–9)
“Happy” by Pharrell Williams 5.70
“Tears in Heaven” by Eric Clapton 5.67
“You Were Always on My Mind” by Willie Nelson 5.63
“Cat’s in the Cradle” by Harry Chapin 5.55
“Closer” by NIN 5.46
“Mad World” by Gary Jules and Michael Andrews 5.39
“Somebody that I Used to Know” by Gotye feat. Kimbra 5.33
“Waitin’ for a Superman” by The Flaming Lips 5.28
“Milkshake” by Kelis 5.24
“Where Did You Sleep Last Night” by Nirvana 5.24
“Butterfly” by Weezer 5.23
“Eve of Destruction” by Barry McGuire 4.93
“Blitzkrieg Bop” by The Ramones 4.82

 

Footnotes

[1] Also, be aware that listening to these songs while watching their music videos has an effect on the way you interpret them. (Click here to go back up.)

References

Isabel M. Kloumann, Christopher M. Danforth, Kameron Decker Harris, Catherine A. Bliss, Peter Sheridan Dodds. 2012. “Positivity of the English Language”. PLoS ONE. http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0029484

Dodds, Peter Sheridan, Eric M. Clark, Suma Desu, Morgan R. Frank, Andrew J. Reagan, Jake Ryland Williams, Lewis Mitchell, Kameron Decker Harris, Isabel M. Kloumann, James P. Bagrow, Karine Megerdoomian, Matthew T. McMahon, Brian F. Tivnan, and Christopher M. Danforth. 2015. “Human language reveals a universal positivity bias”. PNAS 112:8. http://www.pnas.org/content/112/8/2389

Huxley, Martin. 1997. Nine Inch Nails. New York: St. Martin’s Griffin.

Is Your Mother a Geek? Linguistics and the Ramones*

Besides being a great song on a great album (Leave Home), “Suzy is a Headbanger” by the Ramones also has a very interesting line:

Suzy is a headbanger,
Her mother is a geek.

These lyrics puzzled me because they didn’t really make any sense. It wasn’t until I read the term “feed the geek” in Babel No More by Michael Erard (review forthcoming) that I decided to look into it. It turns out, the reason the lyrics didn’t make sense to me was because I was thinking of geek in its contemporary sense, the one Macmillan defines as “n. Someone who is boring, especially because they seem to be interested only in computers.” Even more recently, as we all know, the term has become to mean something like enthusiast or to describe a particular way of practicing some activity (as in geek sex). But since “Suzy” was written around 1976, those are obviously not the intended meanings.

The problem is, I’m having a hard time believing that Joey Ramone meant the other, older sense of the word. I looked at multiple dictionaries, but all of them basically defined this sense as “n. A carnival performer whose show consists of bizarre acts, such as biting the head off a live chicken.” Mmm! Mothers bring your daughters, fathers brings your sons!

So what’s going on? Did he really mean to sing geek? If he didn’t mean that Suzy’s mom is a circus freak and he couldn’t have meant that she’s a computer nerd, what did he mean? The song obviously portrays Suzy in a positive light, so was he doing something like Bob Dylan did in his song “Ballad of a Thin Man” and questioning what we think of as normalcy?

You hand in your ticket
And you go watch the geek
Who immediately walks up to you
When he hears you speak
And says, “How does it feel
To be such a freak ?”
And you say, “Impossible”
As he hands you a bone.

– Bob Dylan, Ballad of the Thin Man

Or did geek have a meaning specific to punks in New York (or punks anywhere) in the 1970s?

There was a 70s Australian punk band in Perth called the Geeks, but knowing what I know about punk rockers, they tend to relish in classifying themselves as the outcasts. It’s a way to welcome someone in and strengthen group identity (The Ramones chant, “Gabba! Gabba! We accept you, we accept you! One of us!” perfectly encapsulates this notion). So were they saying that Suzy’s mom was one of them?

Besides the Perth band, I couldn’t find any connection of geek to the 1970s punk rock scene, so I decided to look at the Corpus of Contemporary American English. If geek was being used by the punks in the 70s, I assumed it was also being used by at least the music journalists as well. I just hoped it was being used in the same way. There were two hits for “geek.[nn1]” in the 1970s, both being the carnival kind of geek. The 1980s is where things start to change since there are five hits – two carnival geeks, two nerd geeks, and one I’m not really sure of (it’s hard to tell from the bit context). After that, the usage really takes off with thirty-three hits in the 1990s and 104 hits in the 2000s.

This still hasn’t answered my question, however. Certainly another word besides geek would fit there just as easily, especially if you’re rhyming it with “ooh ooh wee.” But there’s the catch. I want to conclude that Joey was speaking positively of Suzy’s mother, but that’s not realistic. Joey was most likely using geek to describe a disapproving mother.

Wordnik, which is a great site, lists one definition of geek as “n. An unfashionable or socially undesirable person.” Today there might be wide agreement on which type of person is a geek (because we’re all so cool, you know, man?), but what’s interesting is that in “Suzy is a Headbanger” we have a counter-culture band, who by no means owned the majority stake of Cool, using geek to insult a member of another group. But the reason he’s doing so is that he sees Suzy’s mother as being judgmental, which is not a trait often attributed to geeks. So there’s a disconnect between the connotations of the two meanings of geek, which suggests the term was in flux. Notice also that insulting someone else by calling them a geek is simultaneously an attempt to prove one’s cool, but that’s beside the point.

I think geek is a great case of how quickly words can change their meanings, something linguistics call “semantic shift.” It’s also a simple example of what linguistics mean when they say, “We’re not sure.” Words are tricky things to pin down, especially when they are ones that are used infrequently. Add to that someone using the word in a novel way (or at least with a slightly different meaning) and things get even trickier. Had Joey’s meaning taken off, we might today be using geek to describe older people who disapprove of the younger generation’s activities. Geek then would have a decidedly uncool meaning. Instead, being a geek is an aspiration since it means not only enthusiasm, but knowledge and mastership of a certain area. The success of this meaning of geek, of course, is obviously due to the success of computers and the success of geek as an adjective to be applied to any and all activities. In this way, later in Babel No More, Michael Erard can write, “Indeed, boasting about the languages one has studied or can speak is a display of geek machismo,” and everyone understands the meaning.

As a side note, for those interested in linguistics, semantic shift, or the etymology of contemptuous words, I recommend checking out Slate.com’s new podcast Lexicon Valley. They have two episodes and both are excellent. What’s even better, and even more pertinent to this article, is that in the second episode, entitled “The Other F-Word,” you get to hear linguist Arnold Zwicky reference Pansy Division. What a headbanger.

[Update – July 24, 2012] The Oxford Dictionaries blog has written twice about geek. The most recent post compares the collocates of geek to nerd in their corpus, while the older post explains the transformation in the meaning of geek. Sadly, there is no mention of the Ramones. Maybe it’s time for them to update their corpus?

Here are the posts:
Embrace Your Geekness – July 13, 2012
Are You Calling Me a Geek? Why, *Thank You* – March 4, 2011

 

 

 

*These are a few of my favorite things.

The Atomic Number of “Blackened”

Some albums have opening tracks that both blow you away and tell you how awesome the rest of the record will be. It’s like the opening track is warning you that you’re in for some serious ear banging. Like first track is merely an opening salvo that lands like a kick to the teeth. The Ramones first album and Screeching Weasel’s Boogadaboogadaboogada jump to mind.

But by far, an album that best fits this description is Metallica’s …And Justice For All. “Blackened” is a mind-fuckingly awesome opening track on the band’s pinnacle record. It’s arguable that when Armageddon comes, it will arrive to the sound of “Blackened.”

As YouTube commenter abvflux put it best:
THIS FUCKING SONG NEEDS TO BE ON THE PERIODIC TABLE.

Amen.

There is no other way to convey how nails “Blackened” is. Never!

[youtube http://www.youtube.com/watch?v=DU_ggFovJNo&w=480&h=390]