My Corpus Brings All the Boys to the Yard

In two recent papers, one by Kloumann et al. (2012) and the other by Dodds et al. (2015), a group of researchers created a corpus to study the positivity of the English language. I looked at some of the problems with those papers here and here. For this post, however, I want to focus on one of the registers in the authors’ corpus – song lyrics. There is a problem with taking language such as lyrics out of context and then judging them based on the positivity of the words in the songs. But first I need to briefly explain what the authors did.

In the two papers, the authors created a corpus based on books, New York Times articles, tweets and song lyrics. They then created a list of the 10,000 most common word types in their corpus and had voluntary respondents rate how positive or negative they felt the words were. They used this information to claim that human language overall (and English) is emotionally positive.

That’s the idea anyway, but song lyrics exist as part of a multimodal genre. There are lyrics and there is music. These two modalities operate simultaneously to convey a message or feeling. This is important for a couple of reasons. First, the other registers in the corpus do not work like song lyrics. Books and news articles are black text on a white background with few or no pictures. And tweets are not always multimodal – it’s possible to include a short video or picture in a tweet, but it’s not necessary (Side note: I would like to know how many tweets in the corpus included pictures and/or videos, but the authors do not report that information).

So if we were to do a linguistic analysis of an artist or a genre of music, we would create a corpus of the lyrics of that artist or genre. We could then study the topics that are brought up in the lyrics, or even common words and expressions (lexical bundles or n-grams) that are used by the artist(s). We could perhaps even look at how the writing style of the artist(s) changed over time.

But if we wanted to perform an analysis of the positivity of the songs in our corpus, we would need to incorporate the music. The lyrics and music go hand in hand – without the music, you only have poetry. To see what I mean, take a look at the following word list. Do the words in this list look particularly positive or negative to you?

a

ain’t

all

and

as

away

back

bitch

body

breast

but

butterfly

can

can’t

caught

chasing

comin’

days

did

didn’t

do

dog

down

everytime

fairy

fantasy

for

ghost

guess

had

hand

harm

her

his

i

i’m

if

in

it

looked

lovely

jar

makes

mason

life

live

maybe

me

mean

momma’s

more

my

need

nest

never

no

of

on

outside

pet

pin

real

return

robin

scent

she

sighing

slips

smell sorry

that

the

then

think

to

today

told

up

want

wash

went

what

when

with

withered

woke

would

yesterday

you

you’re

your

If we combine these words as Rivers Cuomo did in his song “Butterfly”, they average out to a positive score of 5.23. Here are the lyrics to that song.

Yesterday I went outside
With my momma’s mason jar
Caught a lovely Butterfly
When I woke up today
And looked in on my fairy pet
She had withered all away
No more sighing in her breast

I’m sorry for what I did
I did what my body told me to
I didn’t mean to do you harm
But everytime I pin down what I think I want
it slips away – the ghost slips away

I smell you on my hand for days
I can’t wash away your scent
If I’m a dog then you’re a bitch
I guess you’re as real as me
Maybe I can live with that
Maybe I need fantasy
A life of chasing Butterfly

I’m sorry for what I did
I did what my body told me to
I didn’t mean to do you harm
But everytime I pin down what I think I want
it slips away – the ghost slips away

I told you I would return
When the robin makes his nest
But I ain’t never comin’ back
I’m sorry, I’m sorry, I’m sorry

Does this look like a positive text to you? Does it look moderate, neither positive nor negative? I would say not. It seems negative to me, a sad song based on the opera Madame Butterfly, in which a man leaves his wife because he never really cared for her. When we include the music into our consideration, the non-positivity of this song is clear.

[youtube https://www.youtube.com/watch?v=rCoGkMlfz9I]
Let’s take a look at another list. How does this one look?

above

absence

alive

an

animal

apart

are

away

become

brings

broke

can

closer

complicate

desecrate

down

drink

else

every

everything

existence

faith

feel

flawed

for

forest

from

fuck

get

god

got

hate

have

help

hive

honey

i

i’ve

inside

insides

is

isolation

it

it’s

knees

let

like

make

me

my

myself

no

of

off

only

penetrate

perfect

reason

scraped

sell

sex

smell

somebody

soul

stay

stomach

tear

that

the

thing

through

to

trees

violate

want

whole

within

works

you

your

Based on the ratings in the two papers, this list is slightly more positive, with an average happiness rating of 5.46. When the words were used by Trent Reznor, however, they expressed “a deeply personal meditation on self-hatred” (Huxley 1997: 179). Here are the lyrics for “Closer” by Nine Inch Nails:

You let me violate you
You let me desecrate you
You let me penetrate you
You let me complicate you

Help me
I broke apart my insides
Help me
I’ve got no soul to sell
Help me
The only thing that works for me
Help me get away from myself

I want to fuck you like an animal
I want to feel you from the inside
I want to fuck you like an animal
My whole existence is flawed
You get me closer to god

You can have my isolation
You can have the hate that it brings
You can have my absence of faith
You can have my everything

Help me
Tear down my reason
Help me
It’s your sex I can smell
Help me
You make me perfect
Help me become somebody else

I want to fuck you like an animal
I want to feel you from the inside
I want to fuck you like an animal
My whole existence is flawed
You get me closer to god

Through every forest above the trees
Within my stomach scraped off my knees
I drink the honey inside your hive
You are the reason I stay alive

As Reznor (the songwriter and lyricist) sees it, “Closer” is “supernegative and superhateful” and that the song’s message is “I am a piece of shit and I am declaring that” (Huxley 1997: 179). You can see what he means when you listen to the song (minor NSF warning for the imagery in the video). [1]

[vimeo 3554226 w=500 h=377]

Nine Inch Nails: Closer (Uncensored) (1994) from Nine Inch Nails on Vimeo.

Then again, meaning is relative. Tommy Lee has said that “Closer” is “the all-time fuck song. Those are pure fuck beats – Trent Reznor knew what he was doing. You can fuck to it, you can dance to it and you can break shit to it.” And Tommy Lee should know. He played in the studio for NIИ and he is arguably more famous for fucking than he is for playing drums.

Nevertheless, the problem with the positivity rating of songs keeps popping up. The song “Mad World” was a pop hit for Tears for Fears, then reinterpreted in a more somber tone by Gary Jules and Michael Andrews. But it is rated a positive 5.39. Gotye’s global hit about failed relationships, “Somebody That I Used To Know”, is rated a positive 5.33. The anti-war and protest ballad “Eve of Destruction”, made famous by Barry McGuire, rates just barely on the negative side at 4.93. I guess there should have been more depressing references besides bodies floating, funeral processions, and race riots if the song writer really wanted to drive home the point.

For the song “Milkshake”, Kelis has said that it “means whatever people want it to” and that the milkshake referred to in the song is “the thing that makes women special […] what gives us our confidence and what makes us exciting”. It is rated less positive than “Mad World” at 5.24. That makes me want to doubt the authors’ commitment to Sparkle Motion.

Another upbeat jam that the kids listen to is the Ramones’ “Blitzkrieg Bop”. This is the energetic and exciting anthem of punk rock. It’s rated a negative 4.82. I wonder if we should even look at “Pinhead”.

Then there’s the old American folk classic “Where did you sleep last night”, which Nirvana performed a haunting version of on their album MTV Unplugged in New York. The song (also known as “In the Pines” and “Black Girl”) was first made famous by Lead Belly and it includes such catchy lines as

My girl, my girl, don’t lie to me
Tell me where did you sleep last night
In the pines, in the pines
Where the sun don’t ever shine
I would shiver the whole night through

And

Her husband was a hard working man
Just about a mile from here
His head was found in a driving wheel
But his body never was found

This song is rated a positive 5.24. I don’t know about you but neither the Lead Belly version, nor the Nirvana cover would give me that impression.

Even Pharrell Williams’ hit song “Happy” rates only 5.70. That’s a song so goddamn positive that it’s called “Happy”. But it’s only 0.03 points more positive than Eric Clapton’s “Tears in Heaven”, which is a song about the death of Clapton’s four-year-old son. Harry Chapin’s “Cat’s in the Cradle” was voted the fourth saddest song of all time by readers of Rolling Stone but it’s rated 5.55, while Willie Nelson’s “Always on My Mind” rates 5.63. So they are both sadder than “Happy”, but not by much. How many lyrics must a man research, before his corpus is questioned?

Corpus linguistics is not just gathering a bunch of words and calling it a day. The fact that the same “word” can have several meanings (known as polysemy), is a major feature of language. So before you ask people to rate a word’s positivity, you will want to make sure they at least know which meaning is being referred to. On top of that, words do not work in isolation. Spacing is an arbitrary construct in written language (remember that song lyrics are mostly heard not read). The back used in the Ramones’ lines “Piling in the back seat” and “Pulsating to the back beat” are not about a body part. The Weezer song “Butterfly” uses the word mason, but it’s part of the compound noun mason jar, not a reference to a brick layer. Words are also conditioned by the words around them. A word like eve may normally be considered positive as it brings to mind Christmas Eve and New Year’s Eve, but when used in a phrase like “the eve of destruction” our judgment of it is likely to change. In the corpus under discussion here, eat is rated 7.04, but that doesn’t consider what’s being eaten and so can not account for lines like “Eat your next door neighbor” (from “Eve of Destruction”).

We could go on and on like this. The point is that the authors of both of the papers didn’t do enough work with their data before drawing conclusions. And they didn’t consider that some of the language in their corpus is part of a multimodal genre where there are other things affecting the meaning of the language used (though technically no language use is devoid of context). Whether or not the lyrics of a song are “positive” or “negative”, the style of singing and the music that they are sung to will highly effect a person’s interpretation of the lyrics’ meaning and emotion. That’s just the way that music works.

This doesn’t mean that any of these songs are positive or negative based on their rating, it means that the system used by the authors of the two papers to rate the positivity or negativity of language seems to be flawed. I would have guessed that a rating system which took words out of context would be fundamentally flawed, but viewing the ratings of the songs in this post is a good way to visualize that. The fact that the two papers were published in reputable journals and picked up by reputable publications, such as the Atlantic and the New York Times, only adds insult to injury for the field of linguistics.

You can see a table of the songs I looked at for this post below and an spreadsheet with the ratings of the lyrics is here. I calculated the positivity ratings by averaging the scores for the word tokens in each song, rather than the types.

(By the way, Tupac is rated 4.76. It’s a good thing his attitude was fuck it ‘cause motherfuckers love it.)

Song Positivity score (1–9)
“Happy” by Pharrell Williams 5.70
“Tears in Heaven” by Eric Clapton 5.67
“You Were Always on My Mind” by Willie Nelson 5.63
“Cat’s in the Cradle” by Harry Chapin 5.55
“Closer” by NIN 5.46
“Mad World” by Gary Jules and Michael Andrews 5.39
“Somebody that I Used to Know” by Gotye feat. Kimbra 5.33
“Waitin’ for a Superman” by The Flaming Lips 5.28
“Milkshake” by Kelis 5.24
“Where Did You Sleep Last Night” by Nirvana 5.24
“Butterfly” by Weezer 5.23
“Eve of Destruction” by Barry McGuire 4.93
“Blitzkrieg Bop” by The Ramones 4.82

 

Footnotes

[1] Also, be aware that listening to these songs while watching their music videos has an effect on the way you interpret them. (Click here to go back up.)

References

Isabel M. Kloumann, Christopher M. Danforth, Kameron Decker Harris, Catherine A. Bliss, Peter Sheridan Dodds. 2012. “Positivity of the English Language”. PLoS ONE. http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0029484

Dodds, Peter Sheridan, Eric M. Clark, Suma Desu, Morgan R. Frank, Andrew J. Reagan, Jake Ryland Williams, Lewis Mitchell, Kameron Decker Harris, Isabel M. Kloumann, James P. Bagrow, Karine Megerdoomian, Matthew T. McMahon, Brian F. Tivnan, and Christopher M. Danforth. 2015. “Human language reveals a universal positivity bias”. PNAS 112:8. http://www.pnas.org/content/112/8/2389

Huxley, Martin. 1997. Nine Inch Nails. New York: St. Martin’s Griffin.