Three months ago I posted about a paper in PLoS ONE called “Liberals lecture, conservatives communicate: Analyzing complexity and ideology in 381,609 political speeches”. I noted that there are serious problems with that study. For the tl;dr:
After I posted on here, I also commented on the article with my concerns. The PLoS ONE journal allows commenting on their articles, but I’ll admit that my first comment was neither appropriate nor helpful. It was more of a troll than anything. The editors removed my comment, and to their credit, they emailed me with an explanation why. They also told me what a comment should look like. So I posted a grown-up comment on the article. This started an exchange between me and the authors of the article. Here’s the skinny:
1. The authors confuse written language with spoken language
2. The study uses an ineffectual test for written language on spoken language
3. The paper does not take into account how transcriptions and punctuation affect the data
4. The authors cite almost no linguistic sources in a study about language
5. They use a test developed for English on other languages
The authors tried to respond to my points about why their methodology is wrong, but there are some things that they just couldn’t argue their way out of (such as points 1, 2, 3 and 5 above).
Behind the scenes, I was talking with the editors of the journal. They told me that they were taking my criticisms seriously and looking into the issue themselves. In my comments on the paper, I provided multiple sources to back up my claims. The authors did not do in their replies to me, but that’s because they can’t – there aren’t studies to back up their claims. However, my last email with the editors of the journal was over a month ago. I understand that these things can take time (and the editors told me this much) but a few of the criticisms that I raised are pretty cut and dry. The authors also stopped replying to my comments, the last one of which was posted on April 9, 2019 (can’t say I blame them though).
So I’m not very positive that anything is going to change. But I’ll let you know if it does.
It is now October 2019, so I assume you are not going to ever hear anything. I do machine learning research (specifically on classifying reviews as fake or not) and have used a multitude of readability formulas to extract features out of the text. They are useful, but only up to a point and the limitations are apparent once you get into how they are calculated and really understand how/why they were made. I have not read the article yet, but your points make obvious sense. I am surprised this study was so sloppy – one characteristic of a sentence’s structure is easily calculated (dependency and constituency parsing) and that certainly is a measure of a text’s (or speech) cognitive complexity. I’d have picked something more sophisticated than just one readability measure (and why not multiple ones? That would have revealed inconsistent results probably) It sounds like the researchers did not want to put in the effort to really understand linguistics, so they just picked “something” that hadn’t been done before. OTOH, this study is at least more proof conservatives tend not to be as smart as liberals…. (lots of studies are available on that point)