Monday, August 6, 2012
Ever wonder how autocorrect works? When you type "Kofee" how does it decide whether you meant the caffeinated drink or the United Nations secretary-general? Like a fine wine, does autocorrect improve with age as it gets to know the user? How do you make it stop? Check out this interesting, and humorous, essay from Sunday's Op-Ed page and then go to DamnYouAutoCorrect.com to find an archive of best autocorrect gaffes based on visitor submissions.
[Autocorrect is] [a]ctually . . . an assortment of competing algorithms . . . . Autocorrect is not a single entity but a hodgepodge, from different vendors, chief among them Apple, Google and Microsoft. All their algorithms start with the low-hanging fruit. They know what to do when you type “hte.” After that, their goals vary, and so do their capabilities. On most devices and applications, Autocorrect can be switched off, for those who prefer to go naked. It’s not always easy to find the switch.On mobile phones, where our elephant thumbs tramp across tiny keypads, the idea is to free us from backtracking and drudgery. The iPhone’s Autocorrect function loves to insert apostrophes. You can rely on it: type “dont” and get “don’t.” Type “cant” and get “can’t” — but is that what you wanted? Autocorrect is just playing the odds. Even “ill” turns to “I’ll” and “id” to “I’d” (sorry, Dr. Freud).
When Autocorrect can reach out from the local device or computer to the cloud, the algorithms get much, much smarter. I consulted Mark Paskin, a longtime software engineer on Google’s search team. Where a mobile phone can check typing against a modest dictionary of words and corrections, Google uses no dictionary at all.
“A dictionary can be more of a liability than you might expect,” Mr. Paskin says. “Dictionaries have a lot of trouble keeping up with the real world, right?” Instead Google has access to a decent subset of all the words people type — “a constantly evolving list of words and phrases,” he says; “the parlance of our times.”
. . . .
It uses a probabilistic algorithm with roots in work done at AT&T Bell Laboratories in the early 1990s. The probabilities are based on a “noisy channel” model, a fundamental concept of information theory. The model envisions a message source — an idealized user with clear intentions — passing through a noisy channel that introduces typos by omitting letters, reversing letters or inserting letters.
“We’re trying to find the most likely intended word, given the word that we see . . . ."
Continue reading here.