Strange etymologies are afoot at Psychology Today

Last week I was on the twitters talking about “untranslatable” words. The idea was about Dr. Tim Lomas’ work on “untranslatable words,” or his term for how some languages have words that don’t have exact equivalents in other languages (but usually English). Right around the same time I posted my blog post, Lomas wrote an article in Psychology Today. Let’s have a look at it. If you want to see my thoughts on “untranslatable” words, go see my post on it and then come back.

Lomas claims that many concepts are non-English in origin. What this means is that the words used to describe these concepts are from other languages. I think this is opening a whole can of worms, but I’m willing to go with the idea that concepts can be “from another language”. For a bit. Let’s move on.

To prove his point, Lomas analyzes an article on positive psychology by Seligman and Csikszentmihalyi (2000). He looks for the etymology of every word in the text.

According to Lomas, there are:

1333 distinct lexemes

‘Native’ English wordsbelonging either to the Germanic language from which English emerged, or originating as neologisms in English itselfcomprise only 39.4% of the sample (and 38% of the psychological words). Thus, over 60% of the general words (and 62% of psychological words) are loanwords, borrowed from other languages at some point in the development of English.

First, Lomas has a strange definition for “‘native’ English words”. Which “Germanic language” does he mean? Proto-Germanic? One of the other West Germanic languages? Old English? It’s also strange because Lomas’ definition means that these words are not native English words: they, table, blue, and orange. [Britney Spears gif says “huh?!” Oprah gif says “hrmmm?!”]

Lomas also doesn’t say exactly how he counted the words in the C&S article. He says that there are 1,333 “distinct lexemes”. The term lexeme is used in linguistics to talk about all the inflected forms of a word: singular and plural forms for nouns, present and past tense forms for verbs, etc. So runner and runners would be a part of the same lexeme RUNNER, and run, runs, ran, running are a part of RUN. Lexemes are also sometimes called “lemmas” in linguistics.

If Lomas really went through every single word in the article, then he spent a whole lotta time on this. The C&S article is 8,124 words long (not including the References section). He doesn’t say how he did the work, but I used some corpus linguistics methods and got different results. I checked the C&S article against the Someya lemma list in AntConc and found 1,750 lemmas, or 417 more lexemes than Lomas found. This is a large difference and I’m not sure how to explain it. Maybe Lomas didn’t divide his words based on parts of speech? So he counted ran and runner as part of the same lexeme? I don’t know.

Second, let’s look at counting the words in language. Lomas seems to do a straight count. That means one instance of one form of a lexeme is equal to all the other instances. For Lomas, it doesn’t matter how many times a word occurs. In corpus linguistics, however, frequency is a big deal. I’m not going to go through the theoretical points here, but basically if a word is more frequent then it is more important or worthy of being looked at (hehe, fight me, corpus linguists).

So, Lomas claims that only 39% of the lexemes in the article are “native English words”. I took the lexemes in the article and ranked them based on frequency (using AntConc). Then I went through the 100 most frequent lexemes on the list and looked at their etymology. My numbers look much different than Lomas’. I found that 85% of the 100 most frequent lexemes are English in origin. That is, the 100 most frequent lexemes occur a total of 4,440 times in the article (so the lexeme the occurs 442 times, the lexeme of occurs 308 times, the lexeme BE occurs 300 times, and so on) and of these occurrences, 3,767 are English words. This isn’t particularly intriguing – you’ll probably find a similar percentage with any text in English. [See the bottom of this post for my data.]

Looking at this from another angle, we could treat each of the 100 most frequent lexemes as equal – forgetting about how often they occur. Then we find that 70 of them are English, while 30 of them come from another language. This is closer to Lomas’ numbers, but still pretty far off: 70 of the 100 most common lexemes in the article are still English words.

Of course, words in language do not really occur in the way that we’re looking at them. The most common word is the with 442 instances, but the first 442 words of the article are not all the. The word the is sprinkled around the article (you know, where the grammar of English calls for it). I’m not sure how to get to Lomas’ numbers. We could assume that every lexeme outside the 100 most frequent were non-English, but that only gets us down to 46% of the words in the article as being English lexemes. Lomas’ ratio was 40% English to 60% non-English.

Later in the article, Lomas says that 234 words were treated as English in origin in his analysis. But this means that only 17% of the words in his counting are English in origin (234/1,333=0.17). What’s going on here? If 39.4% of the lexemes in the article are English in origin, and there are 1,333 total lexemes in the article (according to Lomas), then there should be 525 English words. Where he gets 234, I don’t know. Let’s move on.

Lomas’ includes two graphs to visualize his findings but they’re pretty weird. The graph below “shows the influx of words according to the language of origin (with the century in which they entered English as stacks within them)”. Look at the third column.

Lomas_PT_graph_1

English words entered English? I don’t get it. Or Germanic words from before the 12th century are not English words? What’s going on here? I guess in Lomas’ counting, Germanic and English lexemes are English lexemes, but then he splits them up in the graph? Are the words me, myself and I not English words? It seems very strange to me to cut things up like this and I would like to see his list of etymologies, or his rationale for doing so.

Agree to disagree?

But there are places that I can agree with Lomas. At the end of the article, he writes:

In these ways does our understanding of life become complexified and enriched. In that respect, one can make the case that English-speaking psychology would do well to more consciously and actively engage with other languages and cultures. Its understanding of the mind has benefited greatly from English incorporating loanwords over the centuries. If one accepts that premise, it follows that psychology would continue to develop from this kind of cross-cultural engagement and borrowing – including, of course, through collaboration with scholars from non-English speaking cultures themselves. One such way in which the field might develop is through inquiring into untranslatable words, since these constitute clear candidates for borrowing (given that they lack an exact equivalent in English). I myself have sought to promote this kind of endeavor, with my ongoing creation of a cross-cultural lexicography of untranslatable words relating to well-being.

I definitely agree with the first part of this. We should engage with speakers of other languages and people from other cultures (although Lomas’ wording seems to present all English speakers as a monolithic culture). I find it hard for anyone to not accept the premise that English (not just “English-speaking psychology”) has benefited greatly from incorporating loanwords. That’s kind of just a fact of language – borrowing words is one of the things that living languages do and so English is still a living language partly for this reason. But I totally agree that people should collaborate with people from different cultures (although again, Lomas’ wording blurs the distinction between language and culture too much for me and again presents English speakers as one culture).

When Lomas goes into the sales pitch in the second to last sentence, I can’t sign on, particularly based on what I’ve seen of his research into “untranslatable” words (in my last post and in this one and in a later one to come).

Lomas’ claims are true – we should reach out to people who speak other languages. But he should perhaps recognize that the reason that English has so many words from Latin and Ancient Greek is because these were once prestigious languages (and to a large extent still are in academia). It wasn’t because the Latin-speaking or Greek-speaking cultures had anything more special than other cultures, but it was believed that by using these languages people would be more civilized. Of course, we know what happened to the Latin-speaking and (Ancient) Greek-speaking cultures. They dead.

But we in English-speaking cultures could just as easily have adapted Finnish words to use in the fields of psychology and linguistics, but Finnish was never considered a prestigious language. Or consider German: once German raised its standing, we got words from German to describe abstract concepts because the texts describing them were written in German and people were supposed to know German to engage in the debate.

There’s more to say about all this and I’ll be back at cha with a later post. I’ll link to it when I write it.

 

Data

Spreadsheet with my analysis. The first sheet is the Someya lemma list analysis. I counted words from Anglo-Norman as not being English. I’m including the 3rd person plural pronouns (they, them, their, themselves) as being English. Illness counts as English. The second sheet uses AntConc’s Word List tool, so it’s not a lexeme/lemma analysis, it treats every “word” as separate (that is, was, am, and is are separate words, not part of the lexeme BE).

Link to download the C&S article as a plain text file (.txt) which was used with AntConc in the analysis. The References section is excluded. And here’s a link to download a POS-tagged version of the article (using CLAWS7).

Advertisements

Steven Pinker’s Dog Whistles

So. Steven Pinker.

Yeah I wrote about him way back in the day when I reviewed his book The Language Instinct and how it was garbage. But if linguistic nativism is your thing, then fine. You do you. Geoffrey Sampson presents a valid argument against Pinker’s claims and Pinker responded… never. Because The Language Instinct is still making that money yo.

But Steven Pinker has branched out now. And things have not gone so well. Scholars from other fields are learning that he’s kinda bad at scholarship.

So here’s a rundown of why you should not follow what Steven Pinker says or writes.

First up we got Pinker’s garbage tweet about words not having power. He links to an article in Quillette (which we’ll get to later). I’m not going to embed the tweet here, but I’ll quote it. Pinker says “The first insight of linguistics, going back to Plato, is that words are conventions, without magical powers. That’s being nullified by PC/SJW attacks on mentioning taboo words, even ironically or in works of art.” Many people pointed out how stupid this is and, indeed, it is very stupid. The first insight of linguistics? Even historians know more about linguistics than this. But what it’s really about is how Steven Pinker really wants to be able to say the n-word. Like really bad. And preferably with impunity, if that’s not too much to ask. Why does everyone have to be so uptight about Steven Pinker saying the n-word? iT’s jUsT a wOrD

Let’s not dwell on it because things get worse (somehow).

Pinker has published a book called Enlightenment Now. In the book, Pinker argues that the world is actually a better place than you think it is because of the Enlightenment. It’s too bad Pinker totally fucks up the scholarship in his book. As this article by Aaron R. Hanlon shows, Pinker doesn’t even know what the Enlightenment was all about.

History scholars staring to feel like linguists.

Do you think we’re done? We’re not done. (I wish we were done. Those three paragraphs alone were draining. On we go! Into the shit!)

Pinker’s Enlightenment Now is bad for other reasons. Here’s Samuel Moyn pointing out one of the problems:

Or take inequality. Sure, some perceive a rampant crisis in most nations, but it is all sort of boring and overblown, by Pinker’s lights. “I need a chapter on the topic,” he writes, apparently willing himself to push through his fatigue with the subject, “because so many people have been swept up in the dystopian rhetoric and see inequality as a sign that modernity has failed to improve the human condition.” In his cursory treatment, Pinker tries to downplay currently exploding levels of national inequality, by pointing out that global inequality is declining: Even if the gap between the richest and the rest in individual countries is widening, on a world scale inequality is falling slightly. Never mind that it is within their individual countries that most people are experiencing and responding to inequality, and wreaking havoc because of it. In any case, Pinker argues, it does not matter morally if some people get extremely wealthy, so long as poverty decreases.

Just as in his somewhat literal understanding of violence, Pinker simply cannot see something so straightforward as class rule, which has been massively reestablished in our time of inequality, with all the baleful effects it has had on politics. In a world in which the outsized gains of the rich allow them to live a separate existence from the rest—stooping only to buy elections with dark money and even induce populists to act in their interest—rage is not only an expected but also an understandable result. The fact that these forms of domination and hierarchy are features of the very modernity he wants to lionize is not a possibility Pinker pauses to contemplate. Each of his arguments on the subject is a way of saying he doesn’t think inequality is that important—even as populists across the world are reaping gains from the obvious conclusion that it is.

“But, Joe,” I hear you saying, “those are just scholars who know more about the Enlightenment than Steven Pinker. So what if he got some stuff wrong? It’s not like he’s a leading thinker in society!” He is a leading thinker in society. He learned how to get things wrong and not care about it in linguistics. Now he’s moved on to other fields and he also sucks at them. And he’s also an asshole about it. Here’s Jennifer Szalai:

Steven Pinker doesn’t just want you to be happy; he wants you to be grateful too. His new book, “Enlightenment Now,” is a spirited and exasperated rebuke to anyone who refuses to concede that the world is becoming a better place. “None of us are as happy as we ought to be, given how amazing our world has become,” he writes. “People seem to bitch, moan, whine, carp and kvetch as much as ever.”

The world has become amazing for Steven Pinker, so why don’t you all just shut your pie holes, huh? You want another article showing that Pinker fucks up his argument? You got it! In fact, here’s two! Go nuts! Because this nonsense of Steven Pinker writing things and people paying for his hot garbage is getting tiring. Linguists knew it first. Sorry, historians. He’s yours now. (Please take him) [Update July 30: Here’s a third article pointing out how wrong and misleading Pinker is in Enlightenment Now. It’s by Phil Torres in Salon.]

And here’s Jason Hickel asking Pinker to debate him. Hahahaha, good luck, bro. Call some linguists if you get Pinker to respond. Because he ain’t ever done that. But he still gets puff pieces in the Chronicle. I see you, Chronicle. Do better. Be more like Mehdi Hasan and don’t fall for Pinker’s bullshit.

[Update June 5] Don in the comments pointed me to a piece in Current Affairs by Nathan J. Robinson which is a thorough take down of Pinker, his writings and his ideology. Well, almost every way – there’s not much in there about how Pinker also sucks at linguistics. If you want something less acerbic than what I’ve written here, then check that article out. But if you want the really despicable stuff Pinker has written, read on and check that piece out later.

But friends, things get much worse. Steven Pinker promotes the website Quillette, which is website all about “free speech”. I’m putting that in scare quotes because it’s 2019 and you know what that means. Quillette likes to publish racists and sexists. They’ll even let these people publish anonymously because why should they have to own up to their bigotry? Steven Pinker has aligned himself with them. Even more so, Pinker said that campus rape is a “moral panic,” an “extraordinary popular delusion” and something akin to a witch hunt. And all because the rate of rapes on college campuses is not as high as it is in “the world’s most savage war zones”. Fuck you, Steven Pinker. Maybe you’ll listen to me because I’m also a straight white man. Steven Pinker has never had to cross the street on campus because he was walking alone and there were men walking toward him and he was worried about being attacked. Steven Pinker has never had to worry about what he’ll do while he’s out for a jog on campus and there’s a man running behind him – is he fast enough to outrun that man? Is he strong enough to overpower him? Are there enough other people around to hear him scream? Steven Pinker has never had to worry about having something slipped into his drink at a campus party. The only reason I know that women have to worry about these things is because they have told me. There are other things they have to worry about – things that neither me nor Steven Pinker have ever been forced to think about. And there are women who have been raped on campus. But Steven Pinker doesn’t care because there aren’t enough rape victims on college campuses as there are in some hypothetical war zone. Ugh. Get fucked, Pinker.

I can’t go on. I don’t want to. Go read this thread. And when someone cites Steven Pinker, tell them to get a real source for their claims. If he would act right, academia would take him seriously. If he would do actual scholarship, he wouldn’t be a problem. But every field he goes into rejects his claims. Why? Because he’s shit.

Update on that F-K paper

Three months ago I posted about a paper in PLoS ONE called “Liberals lecture, conservatives communicate: Analyzing complexity and ideology in 381,609 political speeches”. I noted that there are serious problems with that study. For the tl;dr:

After I posted on here, I also commented on the article with my concerns. The PLoS ONE journal allows commenting on their articles, but I’ll admit that my first comment was neither appropriate nor helpful. It was more of a troll than anything. The editors removed my comment, and to their credit, they emailed me with an explanation why. They also told me what a comment should look like. So I posted a grown-up comment on the article. This started an exchange between me and the authors of the article. Here’s the skinny:

1. The authors confuse written language with spoken language
2. The study uses an ineffectual test for written language on spoken language
3. The paper does not take into account how transcriptions and punctuation affect the data
4. The authors cite almost no linguistic sources in a study about language
5. They use a test developed for English on other languages

The authors tried to respond to my points about why their methodology is wrong, but there are some things that they just couldn’t argue their way out of (such as points 1, 2, 3 and 5 above).

Behind the scenes, I was talking with the editors of the journal. They told me that they were taking my criticisms seriously and looking into the issue themselves. In my comments on the paper, I provided multiple sources to back up my claims. The authors did not do in their replies to me, but that’s because they can’t – there aren’t studies to back up their claims. However, my last email with the editors of the journal was over a month ago. I understand that these things can take time (and the editors told me this much) but a few of the criticisms that I raised are pretty cut and dry. The authors also stopped replying to my comments, the last one of which was posted on April 9, 2019 (can’t say I blame them though).

So I’m not very positive that anything is going to change. But I’ll let you know if it does.

Stop using the Flesch-Kincaid test

Before Language Log beats me to it, I want to hip you to another Bad Linguistics study out there. This one is called “Liberals lecture, conservatives communicate: Analyzing complexity and ideology in 381,609 political speeches” and it’s written by Martijn Schoonvelde, Anna Brosius, Gils Schumacher and Bert Bakker. It was published in PLoS One (doi:10.1371/journal.pone.0208450).

The study analyzes almost 400,000 political speeches from different countries using a method called the Flesch-Kincaid Grade Score. The authors want to find out how complex the language in the speeches is and whether conservative or liberal politicians use more complex language. But hold up: what’s the Flesch-Kincaid score, you ask. Well, it’s a measure of how many syllables and words are in each sentence. The test gives a number that in theory can be correlated to how many years of education someone would need in order to understand the text. This is called the “readability” of the text.

So what’s the problem? Well, rather than spend too much time on it, I’ll listicle-ize the problems with this paper.

Continue reading “Stop using the Flesch-Kincaid test”

When the econs do some lingua, drop it like it’s hot

Last week I did a twitter and it got a big response (for me, that is). It was about a recent paper on language that appeared in an economics journal and it lit a fire under other people as well. The paper is called “Do Linguistic Structures Affect Human Capital? The Case of Pronoun Drop” and it’s by Horst Feldmann. I thought that in addition to dunking on that paper on Twitter, I’d spell out some of the fundamental problems with it. Here goes.

Continue reading “When the econs do some lingua, drop it like it’s hot”

Fluency and linguistics in the news

There was some press recently about a new study which seems to claim that you can’t become fluent in a second language if you start learning it after age 10. In fact, the study* did not talk about fluency at all. As this article in the Conversation UK by Prof. Monika Schmid points out, the media misinterpreted what the study showed. I’m glad Schmid wrote this piece, which not only clears up the media’s confusion with the study, but also explains some other things about fluency in linguistics. I read the study in question and it seemed pretty legit. I have some misgivings about the idea of nativeness in language learning and about how the questionnaire says that India isn’t a “traditional English speaking country”. And also how the quiz said that “Canadians, Irish, and Scottish accept I’m finished my homework instead of with my homework,” when this is also very common in and around Philadelphia**.

games_with_words_done_my_homework

But all in all, it seems to be an interesting linguistics study that got blown out of proportion by the media. File it with the rest.

* The title of the study is “A critical period for second language acquisition: Evidence from 2/3 million English speakers”. Does anyone else find “2/3 million English speakers” ungrammatical?

**It might just be me, but the phrase “Canadians, Irish, and Scottish accept X” also seems ungrammatical. “Canadians accept X” is ok, but “Irish accept X” and “Scottish accept X” are not, at least not in my variety of English. The latter two need articles before them or the word people after them: “The Irish accept X”, “Scottish people accept X”. I don’t know of any variety where “Canadians, Irish, and Scottish accept X” is correct. This is just a bit of irony in a quiz about the grammaticality of different clauses.

Sam Smith’s conservative linguistics

In researching a book on English usage (called Junk English; review coming soon), I came across an article from 2007 by Sam Smith, the journalist, essayist and co-founder of Green Party. Smith’s article is a lesson in how to NOT write about language, as he gets a number of things wrong. One day I’ll write a general post about these kinds of articles, but for now, let’s go through Smith’s post and see where the train goes off the tracks.

The article starts with this:

Sitting in Manhattan across from an editor at one of best regarded publishing houses, I asked, “Does good writing still matter?”

Ugh. Like, gag me with a spoon. This kind of comment is a red flag inside a bell inside a whistle telling me that what is about to come is going to be a bunch of pretentious crap about the good ol’ days when people knew how to use The Language (a time which was probably also when Sam Smith was in his thirties; when he was looking forward to his life, not back on it) Continue reading “Sam Smith’s conservative linguistics”