Log in

No account? Create an account

The Return of Flesch-Kincaid

My recent post on Flesch-Kincaid reading statistics generated a lot of discussion and comment in the so-called blogosphere as well as in my own blog. I promised a follow-up, so here goes.

The first point I want to make has to do with misperceptions out there. A lot of people seem to think that it was I who advocated using the Flesch-Kincaid statistics, or, even worse, advocated using solely the statistics above anything else to determine how "good" a piece of writing is. This is not the case. As I said, I found the recommendation in a book by James V. Smith, Jr., and it's his ideal writing standard that I cited, not something I made up myself. Furthermore, most of what I was trying to do was get the debate started, and to get people to think about their writing.

That seems to have happened.

I'm also approaching this as a scientist, as my background and training is in science. Smith did an experiment with bestselling fiction, and came up with what looks like fairly consistent results. I applied the same tool to a selection from Neil Gaiman's next book, which I expect to be a bestseller, and came up with similar results. But there's still a lot of experimenting to be done before anyone can claim that these results are significant, and that was the point I was trying to make. Smith didn't test nonbestsellers, for example. To do an accurate scientific experiment about the Flesch-Kincaid scale would require analyzing a lot more writing and observing the results.

I also should stress that I don't know much about Flesch-Kincaid other than what I read in Smith's book, and some of it does seem odd to me. For example, the formula they use seems to create a definite inverse relationship between grade level and readability. By their definition, the more readable a piece of writing is, the lower its grade level. And I think most of us would agree that there are some very readable works out there which deal with themes that are only appropriate at higher grade levels.

But I'm getting ahead of myself. Last weekend, the World Science Fiction Convention released the Hugo ballot for 2005, and I was very gratified to find two of my stories nominated in two different categories. I decided to apply Flesch-Kincaid to the nominees in Best Short Story and see what I got. Here are the results:

The Best Christmas Ever by James Patrick Kelly
Characters per word: 4.3
Passive voice: 1%
Readability: 84.8
Grade level: 3.6

Decisions by Michael A. Burstein
Characters per word: 4.3
Passive voice: 0%
Readability: 81.7
Grade level: 4.0

A Princess of Earth by Mike Resnick
Characters per word: 4.1
Passive voice: 0%
Readability: 89.5
Grade level: 2.0

Shed Skin by Robert J. Sawyer
Characters per word: 4.4
Passive voice: 2%
Readability: 76.2
Grade level: 5.3

Travels With My Cats by Mike Resnick
Characters per word: 4.1
Passive voice: 0%
Readability: 88.2
Grade level: 2.2

(By the way, the four non-Burstein stories on the ballot are all excellent, and I highly recommend tracking them down and reading them.)

So what do these results tell us? All these stories fall within or very close to Smith's ideal writing standard. Should we assume that to write a Hugo nominee, one should use the Flesch-Kincaid statistics and make sure your work falls within Smith's recommendations?. Should we also assume that the four of us on the ballot are deliberately writing for an audience of elementary school kids? Or that we're deliberately writing down to our readers?

Well, I'd say that the answers to all of these questions are no. First of all, my analysis completely omits all the other stories out there that didn't make it onto the ballot. I suspect that if you were to run the analysis on any random published set of stories, say an entire issue of Analog, Asimov's, or F&SF, or one month's worth of stories from Sci Fiction, you'd find that most of them fall within the standard. And as for the last two questions, I can't speak for all of the writers, but I strongly doubt that Jim, Mike, or Rob sit down at their desks intent on writing to a child audience. I think the readability score is more significant than the so-called grade level. (I do know that Mike preaches the virtues of readability, but had never heard of Flesch-Kincaid until I pointed these statistics out to him. I haven't yet asked Rob or Jim what they think of all this.)

And now for the final point. As I said, other people have been debating this post, and even running other works through the analysis to see what they get. One of these people is writer Stephen Leigh, who maintains a LiveJournal at http://www.livejournal.com/users/sleigh. Leigh came to find my post through Tobias S. Buckell's blog and commented on it himself at http://www.livejournal.com/users/sleigh/78967.html. Well, he decided to run all of Shakespeare's sonnets through the analysis and invited me to do the same, and the results were as follows:

Shakespeare's sonnets:
Passive voice: 0% (we're okay there)
Characters per word: 4.2 (reasonable)
Readability: 92.8% (Hmmm...)
Grade level: 1.4 (What?!?)


Leigh concludes: "What this says to me is that the Flesch-Kincaid stats checker in MS Word is not only useless, it's utter and complete GARBAGE... I think (but can't prove) that what it does is kick out or ignore any words or sentences that the grammar/spell checker has deemed to be "incorrect." Thus, it's booting all the Elizabethian usage and looking only at the few words left. I'm sorry, but the sonnets are not understandable by someone reading at less than a second grade level."

I don't think I would go so far as Leigh to dismiss the statistics as "utter and complete garbage," but I think he's made a very good point. The statistics should not be anyone's be-all and end-all of what makes good writing.

However, I still feel that they can be a useful tool in analyzing one's work. And I continue to invite people to add to the debate. If you've got a piece of writing out there that you think will disprove Flesch-Kincaid, by all means, run the statistics and report on your results. I'm wondering where all this will finally lead.


I remember that the book I finished at the beginning of last year came in either at a 7th - 8th grade reading level. I keep forgetting to test the one I just finished; I'm especially curious about that one since it's specifically written to be a YA novel.

the Flesch-Kincaid stats checker in MS Word is not only useless, it’s utter and complete GARBAGE

i believe mr. leigh may have hit upon the nub, as it were, of the problem. such results scream “implementation error” at me.

here’s another tool for performing Flesch-Kincaid analysis of text, along with background documentation and source code. if i weren’t at the office now, i’d be inclined to spend more time feeding some texts through this tool to see if the results are more reasonable.


Well, running the Shakespearean sonnets through that tool listed gets these results:

Readability Scores
The text you entered has been checked, and scored as follows:

Flesch-Kincaid Reading Ease: 44
Ideally, web page text should be around the 60 to 80 mark on this scale. The higher the score, the more readable the text.

Flesch-Kincaid Grade Level: 13
Ideally, web page text should be around the 6 to 7 mark on this scale. The lower the score, the more readable the text.

Gunning-Fox Index: 19
Ideally, web page text should be between 11 and 15 on this scale. The lower the score, the more readable the text. (Anything over 22 should be considered the equivalent of post-graduate level text).

That does suggest MS Word has botched the implementation (also, that no one has complained because no one uses it).
It is possible (as Toby Buckell pointed out to me) that the reason the F-K stats do strange things to Shakespeare is that it's poetry, not prose. I think that unlikely, since the sonnets are still in sentence structure except for line breaks. Someone with more time, energy, and inclination than me might try taking out the line breaks in the sonnets and running them again to see if the F-K engine parses it differently...

I think it's far more likely that the software is designed to ignore anything it deems "incorrect" and only evaluate what's left -- and after it tosses all the Elizabethian spellings and structure, there isn't much left. If that suspicion is true, though, then the F-K software will also do some pretty strange stuff to science fiction and fantasy, with all the unusual terminology we tend to use.

Bottom line -- I think that the F-K stats are something we can all safely ignore. I certainly intend to do so. My advice: just write the best damn story you can and let it fall where it may... :-)
By their definition, the more readable a piece of writing is, the lower its grade level. And I think most of us would agree that there are some very readable works out there which deal with themes that are only appropriate at higher grade levels.

As far as I'm aware, the grade level measurement doesn't deal with theme at all. It addresses only form -- how many words in a sentence, how many syllables in a word, how many clauses, how many sentences in a paragraph, etc. The idea is not whether it's appropriate, or even emotionally comprehensible, to someone at that grade level, but merely how well could they take in the literal meaning of what it says.

I don't know all the metrics they use, but I think, if one were so inclined, one could write a story on the themes of death, sex, betrayal, and consititutional law that would score low and a story about fluffy bunnies playing in the forest that would score quite high.

Of course, I also think some things which are quite complex in form are readable in practice, because they go together so smoothly, and some things which are simple are also stilted. Readability is not perfectly reduced to a series of quantitative measurements, but as a first approximation it does well enough.

All other things being equal, it makes sense to me that a transparent storytelling prose which doesn't draw attention to itself is most likely to be bestselling, since it doesn't jar anyone out of their suspension of disbelief, discourage anyone who finds it too difficult, dense, or distracting, or otherwise get in the way of the story.

The more games you play with your language, the more you restrict your audience to those who have the patience and ability to appreciate those games. But you also add more layers of pleasure and nuance for those that do. So yes, go further up the grade level scale and you're likely to lose folks.

At some point you even lose me -- I can't for the life of me make it through the New Sun books, not because they're not good, but because it feels like Wallace Stevens wrote War and Peace. By the time I've parsed the dense lyrical description to figure out what's actually, mundanely happening, I've forgotten what went before.

That said, and with all honor to Gaiman, Bujold, and all the other writers of pellucidly clear and deceptively simple prose among us, who hide their nuance in plain sight -- there's something to be said for attempting the complicated sometimes too. I'd hate to see it become too much accepted as a truism that spare and straightforward prose is better, rather than just broader in its appeal.
a story about fluffy bunnies playing in the forest that would score quite high

I think that one was called WATERSHIP DOWN. :-)
All I can think of is a quote from a Battlestar Galactica episode:

"These days, everyone passes"

In addition, before I read of the Shakespeare results, I'd have said that it wasn't the subject matter that the grade level was reporting. How could a program parse paragraphs for subject matter accurately? Even Search engines can't do that well. I'd have guessed Readability described how long sentences tended to be, while grade level would be how many syllables the average word contained. So sonnets, containing short sentences which in turn contain words with a small number of syllables read high on both meters. No pun intended.

That's my opinion anyway.
Googling for "Flesch-Kincaid validity" turned up this interesting piece".

Name your target...

In think in the main, it depends on what one is writing.

If I were writing my thesis on, say, bubble growth and expansion, I'd certainly want it to score high on the grade level. Indeed, I suspect modern profs might occasionally run thesises through just to check. (Back when I was writing a thesis, IBM Selectrics were state-of-the-art.) But, unless I didn't know the subject matter well as was trying to hide that fact, I'd still aim for clear, direct prose--no passive sentences in science!

If I were writing my memoirs, on the other hand, I'd probably be a lot more passive, and aim for a lower grade level so that even my youngest descendants might grasp the (un)importance of my life.

It would be an interesting experiment to run through some of my old APA entries, versus, say, my recent trip report. I'll see if I have time.

December 2016

Powered by LiveJournal.com