Letter and character frequencies of Faulkner and Hemingway
Alec Jacobson
April 06, 2010
As a freshman at college I took a class called, Randomness and Chaos taught by Mark Nelkin. During the section on power law probability distributions, I remember becoming obsessed with trying to find these in nature. Later when a learned only a little Java programming, I wrote a (albeit horribly inefficient) character frequency counter program that I ran over plain text versions of William Faulkner's The Sound and the Fury and Ernest Hemingway's The Old Man and the Sea. I made some charts with the intention of adding the to the letter frequency wikipedia article, but the wikimilitia users removed them citing that they were "original research". Hardly, I thought. Hardly more than snapping a picture of John Kerry holding a baby is original research.
Anyway I repost them here, so at least I know where to find them and because I think they are an interesting seed to the discussion of recognizing authorship by certain frequencies in their writings (probably not of characters). Also, it is nice to examine these distributions "in nature".
Latin letter frequency in The Old Man and the Sea
Latin letter frequency in The Sound and the Fury
Latin letter frequency in English
from wikipedia.
Character frequency in The Old Man and the Sea
Latin Character frequency in The Sound and the Fury
The character frequencies exhibit much more of a power-law distribution than the letters, mostly because of the space character and the uncommon punctuation marks and digits.