translation – …And Read All Over

Strange etymologies are afoot at Psychology Today

June 12, 2019 Joe McVeighLeave a comment

Last week I was on the twitters talking about “untranslatable” words. The idea was about Dr. Tim Lomas’ work on “untranslatable words,” or his term for how some languages have words that don’t have exact equivalents in other languages (but usually English). Right around the same time I posted my blog post, Lomas wrote an article in Psychology Today. Let’s have a look at it. If you want to see my thoughts on “untranslatable” words, go see my post on it and then come back.

Lomas claims that many concepts are non-English in origin. What this means is that the words used to describe these concepts are from other languages. I think this is opening a whole can of worms, but I’m willing to go with the idea that concepts can be “from another language”. For a bit. Let’s move on.

To prove his point, Lomas analyzes an article on positive psychology by Seligman and Csikszentmihalyi (2000). He looks for the etymology of every word in the text.

According to Lomas, there are:

1333 distinct lexemes

‘Native’ English words – belonging either to the Germanic language from which English emerged, or originating as neologisms in English itself – comprise only 39.4% of the sample (and 38% of the psychological words). Thus, over 60% of the general words (and 62% of psychological words) are loanwords, borrowed from other languages at some point in the development of English.

First, Lomas has a strange definition for “‘native’ English words”. Which “Germanic language” does he mean? Proto-Germanic? One of the other West Germanic languages? Old English? It’s also strange because Lomas’ definition means that these words are not native English words: they, table, blue, and orange. [Britney Spears gif says “huh?!” Oprah gif says “hrmmm?!”]

Lomas also doesn’t say exactly how he counted the words in the C&S article. He says that there are 1,333 “distinct lexemes”. The term lexeme is used in linguistics to talk about all the inflected forms of a word: singular and plural forms for nouns, present and past tense forms for verbs, etc. So runner and runners would be a part of the same lexeme RUNNER, and run, runs, ran, running are a part of RUN. Lexemes are also sometimes called “lemmas” in linguistics.

If Lomas really went through every single word in the article, then he spent a whole lotta time on this. The C&S article is 8,124 words long (not including the References section). He doesn’t say how he did the work, but I used some corpus linguistics methods and got different results. I checked the C&S article against the Someya lemma list in AntConc and found 1,750 lemmas, or 417 more lexemes than Lomas found. This is a large difference and I’m not sure how to explain it. Maybe Lomas didn’t divide his words based on parts of speech? So he counted ran and runner as part of the same lexeme? I don’t know.

Second, let’s look at counting the words in language. Lomas seems to do a straight count. That means one instance of one form of a lexeme is equal to all the other instances. For Lomas, it doesn’t matter how many times a word occurs. In corpus linguistics, however, frequency is a big deal. I’m not going to go through the theoretical points here, but basically if a word is more frequent then it is more important or worthy of being looked at (hehe, fight me, corpus linguists).

So, Lomas claims that only 39% of the lexemes in the article are “native English words”. I took the lexemes in the article and ranked them based on frequency (using AntConc). Then I went through the 100 most frequent lexemes on the list and looked at their etymology. My numbers look much different than Lomas’. I found that 85% of the 100 most frequent lexemes are English in origin. That is, the 100 most frequent lexemes occur a total of 4,440 times in the article (so the lexeme the occurs 442 times, the lexeme of occurs 308 times, the lexeme BE occurs 300 times, and so on) and of these occurrences, 3,767 are English words. This isn’t particularly intriguing – you’ll probably find a similar percentage with any text in English. [See the bottom of this post for my data.]

Looking at this from another angle, we could treat each of the 100 most frequent lexemes as equal – forgetting about how often they occur. Then we find that 70 of them are English, while 30 of them come from another language. This is closer to Lomas’ numbers, but still pretty far off: 70 of the 100 most common lexemes in the article are still English words.

Of course, words in language do not really occur in the way that we’re looking at them. The most common word is the with 442 instances, but the first 442 words of the article are not all the. The word the is sprinkled around the article (you know, where the grammar of English calls for it). I’m not sure how to get to Lomas’ numbers. We could assume that every lexeme outside the 100 most frequent were non-English, but that only gets us down to 46% of the words in the article as being English lexemes. Lomas’ ratio was 40% English to 60% non-English.

Later in the article, Lomas says that 234 words were treated as English in origin in his analysis. But this means that only 17% of the words in his counting are English in origin (234/1,333=0.17). What’s going on here? If 39.4% of the lexemes in the article are English in origin, and there are 1,333 total lexemes in the article (according to Lomas), then there should be 525 English words. Where he gets 234, I don’t know. Let’s move on.

Lomas’ includes two graphs to visualize his findings but they’re pretty weird. The graph below “shows the influx of words according to the language of origin (with the century in which they entered English as stacks within them)”. Look at the third column.

Lomas_PT_graph_1

English words entered English? I don’t get it. Or Germanic words from before the 12^th century are not English words? What’s going on here? I guess in Lomas’ counting, Germanic and English lexemes are English lexemes, but then he splits them up in the graph? Are the words me, myself and I not English words? It seems very strange to me to cut things up like this and I would like to see his list of etymologies, or his rationale for doing so.

Agree to disagree?

But there are places that I can agree with Lomas. At the end of the article, he writes:

In these ways does our understanding of life become complexified and enriched. In that respect, one can make the case that English-speaking psychology would do well to more consciously and actively engage with other languages and cultures. Its understanding of the mind has benefited greatly from English incorporating loanwords over the centuries. If one accepts that premise, it follows that psychology would continue to develop from this kind of cross-cultural engagement and borrowing – including, of course, through collaboration with scholars from non-English speaking cultures themselves. One such way in which the field might develop is through inquiring into untranslatable words, since these constitute clear candidates for borrowing (given that they lack an exact equivalent in English). I myself have sought to promote this kind of endeavor, with my ongoing creation of a cross-cultural lexicography of untranslatable words relating to well-being.

I definitely agree with the first part of this. We should engage with speakers of other languages and people from other cultures (although Lomas’ wording seems to present all English speakers as a monolithic culture). I find it hard for anyone to not accept the premise that English (not just “English-speaking psychology”) has benefited greatly from incorporating loanwords. That’s kind of just a fact of language – borrowing words is one of the things that living languages do and so English is still a living language partly for this reason. But I totally agree that people should collaborate with people from different cultures (although again, Lomas’ wording blurs the distinction between language and culture too much for me and again presents English speakers as one culture).

When Lomas goes into the sales pitch in the second to last sentence, I can’t sign on, particularly based on what I’ve seen of his research into “untranslatable” words (in my last post and in this one and in a later one to come).

Lomas’ claims are true – we should reach out to people who speak other languages. But he should perhaps recognize that the reason that English has so many words from Latin and Ancient Greek is because these were once prestigious languages (and to a large extent still are in academia). It wasn’t because the Latin-speaking or Greek-speaking cultures had anything more special than other cultures, but it was believed that by using these languages people would be more civilized. Of course, we know what happened to the Latin-speaking and (Ancient) Greek-speaking cultures. They dead.

But we in English-speaking cultures could just as easily have adapted Finnish words to use in the fields of psychology and linguistics, but Finnish was never considered a prestigious language. Or consider German: once German raised its standing, we got words from German to describe abstract concepts because the texts describing them were written in German and people were supposed to know German to engage in the debate.

There’s more to say about all this and I’ll be back at cha with a later post. I’ll link to it when I write it.

Data

Spreadsheet with my analysis. The first sheet is the Someya lemma list analysis. I counted words from Anglo-Norman as not being English. I’m including the 3^rd person plural pronouns (they, them, their, themselves) as being English. Illness counts as English. The second sheet uses AntConc’s Word List tool, so it’s not a lexeme/lemma analysis, it treats every “word” as separate (that is, was, am, and is are separate words, not part of the lexeme BE).

Link to download the C&S article as a plain text file (.txt) which was used with AntConc in the analysis. The References section is excluded. And here’s a link to download a POS-tagged version of the article (using CLAWS7).

Thoughts on “untranslatable” words

June 1, 2019 Joe McVeigh3 Comments

There’s an article in the New Yorker about a glossary of “untranslatable” words. The glossary is put together by Tim Lomas, a psychologist who got interested in the idea of untranslatable words after hearing a talk about the Finnish word sisu. Of course, “untranslatable” doesn’t mean what it looks like it means, as I was quick to point out on Twitter:

https://twitter.com/EvilJoeMcVeigh/status/1133392037892567042

So we can clearly translate these words. There just may not be a 1:1 translation for each of them. But as anyone who has ever done any translating will tell you, that’s so obvious that it barely needs mentioning. But there’s something else behind this idea and I want to open it up a little bit.

In layman’s terms

Alas, the folk linguistic meaning of 'untranslatable' (i.e. 'not lexicalised') is different from the linguistic. It is a fact that we will have to accept…

— Marten vd Meulen (@MartenvdMeulen) May 28, 2019

Marten van der Meulen pointed out on Twitter that Lomas and the New Yorker mean something different with “untranslatable” than a linguist or translator would. What they mean is that there’s no equivalent single word in other languages (usually English) which means the same thing that the “untranslatable” word does. So there’s no way we can “translate” the Finnish word sisu into English because it means many things and it is uniquely tied up into Finnish culture and identity (we’ll get to that in a second). Instead, the meaning of sisu is context dependent – sometimes it means perseverance, sometimes it means grit, sometimes it means “the ability to grin and bear it” – but it is a Finnish version of all these things.

This is why linguists would probably scoff at the idea that we can’t translate sisu. All language is context dependent. The word grit means different things when it’s used in a Clint Eastwood movie than when it’s used in a boardroom. Language is context. Or meaning depends on context.

I actually use the word sisu in my Semantics class as an exercise to understand connotation, denotation and meaning. My students, who are almost all L1 Finnish speakers, give me examples of what sisu means to them. Then we talk about the core meanings of sisu and some peripheral meanings. That is, there is a list of ideas that most people would agree fits the definition of sisu. But that’s the thing – most people would agree, not all. You can do this with any concept in any language (Probably. Don’t quote me on that). Ask a few people what grit means and see how many different answers you get. But we can approach an agreed upon definition of what sisu includes. When we start to put the word in context, then the meaning starts to shift. The classroom exercise is fun because sisu is a popular word in these kinds of discussions and Finns are ready to talk about it. They see it as something very Finnish (more on that in a bit).

I think Marten is right, though. “Untranslatable” does have a different meaning for Lomas and the New Yorker. I would argue that linguists and translators probably wouldn’t use the term untranslatable, but it’s nothing new for the public to have a different definition of a word than specialists. To many people, the word grammar means punctuation and spelling. To language specialists, however, grammar means morphology and syntax; punctuation and spelling are in the realm of orthography. I like Marten’s notion of specialists understanding that “untranslatable” means something different to non-linguists and non-translators, and I think it’s something we should keep in mind. And I agree that the definition of “untranslatable” for Lomas and the New Yorker is “not lexicalized” or “there’s no single word for it”.

Translating morphology

Speaking of morphology, many “untranslatable” words are “words” because of the morphology and spelling norms of the language. For example, another popular “untranslatable” word from Finnish is kalsarikännit. It means “getting drunk at home in your underwear, with no intention of going out”.

kalsaritkannit

The word is a compound noun formed from kalsarit “underwear” and känni “drunk”. The Finnish writing system requires that kalsarikännit is written as one “word” – that is, without a space in between the two words which form the compound noun. This is not a particularly interesting thing about the Finnish language – it just does things like that. English sometimes does that too, such as in the word bedroom, but also sometimes does not, such as in the very similar two-word term living room. We could easily have the term “underwear drunk” or “underwear drinking” or even the word “underweardrunk” or “boxersdrunk” in English. And indeed, as the image of Homer Simpson shows, English speakers have a notion of what underweardrunk is. On the flip side, English doesn’t have a “word” for couch potato like Finnish does (sohvaperuna, literally “sofa+potato”), but that’s due to the writing system, not some cultural notions that Finnish speakers have but English speakers do not. Finnish would not seem to have a word for nothing. Instead the two-word phrases ei mitään and ei mikään are required in certain cases. This is a case where English orthography has merged no+thing into one “word” while Finnish has not.

I wonder how many of the words on Lomas’ list are compound nouns, or words which are one “word” because of the writing systems of the language that they come from. We could sort of say that they were invented because speakers saw a need for a term to describe the concept or action, but that hardly makes them “untranslatable”. Rather, if speakers of another language were doing a similar thing, they could easily coin their own “word” for it. Or they could translate the word, as in the case of Finnish speakers taking couch potato and translating both words to Finnish to get sohvaperuna (these kinds of words are called calques). Or speakers could simply borrow/steal the word for the concept or action, as in the case of schadenfreude, an idea that English speakers immediately understand but don’t have a “word” for.

nelson_muntz_schadenfreude — English has a word for *schadenfreude*. It’s *schadenfreude*.

Identity and what’s on these lists

So which words are good enough for these kinds of lists? That would be a very interesting research topic – and in an alternate universe, Marten and me are working on that question right now. Sticking with Finnish, the language has the word jääkiekko. It refers to the sport played on ice where players use sticks to try to push a small rubber disc into the net or goal of the opposing team. English doesn’t really have a word for it. The closest term is ice hockey. Does this mean that Finnish speakers somehow understand the sport of ice hockey better than English speakers? If so, I think the English speakers in Canada would like to have a word with you. (This idea is very timely since the one-word-having Finns just won the Ice Hockey World Championships. And they beat Canada in the finals. #mörkö). The thing is jääkiekko isn’t sexy enough to make these kinds of lists. French speakers don’t have a “word” for please and instead use the phrase s’il vous plait (In certain cases? Correct me in the comments if I’m wrong!), whereas Finnish speakers don’t have a “word” because for please because they either attach –isi to the verb or use ole hyvä or they use the word which also means “thank you” (kiitos). But you’re unlikely to see please on these lists. And if we want to get really boring, we can talk about how other languages don’t have a “word” for the and a and an. But these aren’t sexy enough either. Only linguists check out language for the articles. (Seriously, though, click that link. It’s a hilarious satire of these lists.)

Instead, what we’re likely to see are words that somehow fit into an identity-shaping role of the speakers. If we’re egalitarian, the words are chosen by the speakers in order to shape and control the collective identity of what it means to be a speaker of a certain language. That is, Finns put sisu on the “untranslatable” word lists because Finns generally see sisu as a positive thing and it helps to create the identity of Finnish speakers – they have perseverance and grit, in the way that British English speakers have a stiff upper lip (but go ask 10 Brits what “stiff upper lip” means and whether it’s positive). Finnish speakers can put kalsarikännit on the list because the idea of laying around drinking beer in your underwear is silly and fun (until it’s not, of course).

Yeah, identity makes sense, words that confirm a kind of perception of Other. I mean, no Dutchmen would care if some Papoean language has a specific word for 'dyke', because that's too normal. You need things that tie in with our pre-existing perception of certain cultures.

— Marten vd Meulen (@MartenvdMeulen) May 28, 2019

These words then help shape the identity of speakers for those who do not know the language that they come from. That is, learning about sisu helps shape English speakers’ perception of Finnish people. This is where we cross over from language to culture. The word sisu doesn’t shape our perception of Finnish speakers but rather Finnish people. I can speak Finnish (kinda sorta), but sisu doesn’t apply to me because I’m not Finnish – my parents weren’t Finnish and I wasn’t born and raised in Finland. If we’re less egalitarian with this idea, then the words that get put on the list have to fit our ideas or stereotypes of the speakers of the language. This perpetuates the myths about Inuit people having 50 words for snow. The language isn’t usually mentioned, it’s just “those natives in northern-ish Canada have a bunch of words for snow because they live in igloos”. That’s what’s behind that myth, so stop using it. Or the idea the Chinese word for “crisis” is composed of “danger” and “opportunity”. This is incorrect, but it fits the stereotype in the West that people in the East are somehow smart and cunning and they can easily use these traits to their advantage, especially in the business and political world, which is where this language myth lives and thrives.

Untranslatable words from other languages?

I want to end on an idea that doesn’t usually get brought up in these discussions. Languages borrow words from other languages all the time. It would seem that English is especially guilty of this, but any time you get people who speak different languages living in close proximity to each other, you’re going to have language transfer. People are going to trade words and sometimes even grammar. But when a word transfers from one language to another, the meaning and connotations don’t always come along. And as it gets used in the “new” language, it can acquire other meanings. Consider what John Waters said in a recent interview with Terry Gross on NPR’s Fresh Air:

GROSS: This fits into something else you write, which is, I realize now how hard it must’ve been for my parents to understand my early eccentricities. So in addition to your terror at seeing hammers, what were some of your eccentricities when you were really young?

WATERS: Well, I was obsessed by car accidents. And I played car accidents. And my mother would take me to junkyards and walk around with me. And I’d be like, oh, there’s been a terrible one over here. Look at this.

GROSS: (Laughter).

WATERS: And I think, what did the junk man think? Well, what is this little ghoul? So that kind of thing.

The word ghoul comes from Arabic. It’s first attested in English in 1786 (according to the OED). But here Waters applies it to himself when he was a young child. It referred to an evil spirit that robbed graves. But later it came to mean a person “who shows morbid interest in things considered shocking or repulsive” (MW). Do Arabic speakers use ghoul this way (and I’m not even bringing up the fact that there are vast differences between local varieties of Arabic)? If not, can we say that the word ghoul is “untranslatable” from English to Arabic?

What I mean to say is that, again, language and meaning are context-dependent. And they are also dependent on time. If English adopted the word sisu from Finnish, it wouldn’t really mean sisu in the same way it does for Finnish speakers who were raised in Finnish culture (the same way that English “sauna” doesn’t really mean Finnish “sauna”). It would mean something slightly different. And in time it could mean something totally different.

So those are just a few of the thoughts I had on this topic. I’ll try to get my hands on Lomas’ books to have a deeper look at what he means by “untranslatable”. And I’ll take a look at his list.

Online etiquette in a different language

December 30, 2018 Joe McVeighLeave a comment

Here’s a very interesting article by Aviya Kushner about translation and internet language conventions. It talks about formality in language and how tricky that can be when moving from writing to spoken interactions or vice versa, as well as how quickly formalities fall away in emails and texting. And it does a great job explaining the politeness required (or expected) in different mediums. Check it out:

“Why Online Etiquette is an International Conundrum”

(h/t to @LangPol_JER)

Tag: translation