Thursday, July 2, 2009

Cryptologist Cracks a Presidential Code

Robert Patterson

For more than 200 years, buried deep within Thomas Jefferson's correspondence and papers, there lay a mysterious cipher -- a coded message that appears to have remained unsolved. Until now.

The cryptic message was sent to President Jefferson in December 1801 by his friend and frequent correspondent, Robert Patterson, a mathematics professor at the University of Pennsylvania. President Jefferson and Mr. Patterson were both officials at the American Philosophical Society -- a group that promoted scholarly research in the sciences and humanities -- and were enthusiasts of ciphers and other codes, regularly exchanging letters about them.

The 1801 letter from Robert Patterson to Thomas Jefferson

To Mr. Patterson's view, a perfect code had four properties: It should be adaptable to all languages; it should be simple to learn and memorize; it should be easy to write and to read; and most important of all, "it should be absolutely inscrutable to all unacquainted with the particular key or secret for decyphering."

Mr. Patterson then included in the letter an example of a message in his cipher, one that would be so difficult to decode that it would "defy the united ingenuity of the whole human race," he wrote.



The cipher finally met its match in Lawren Smithline, a 36-year-old mathematician. Dr. Smithline has a Ph.D. in mathematics and now works professionally with cryptology, or code-breaking, at the Center for Communications Research in Princeton, N.J., a division of the Institute for Defense Analyses.



The code, Mr. Patterson made clear in his letter, was not a simple substitution cipher. That's when you replace one letter of the alphabet with another. The problem with substitution ciphers is that they can be cracked by using what's termed frequency analysis, or studying the number of times that a particular letter occurs in a message. For instance, the letter "e" is the most common letter in English, so if a code is sufficiently long, whatever letter appears most often is likely a substitute for "e." Because frequency analysis was already well known in the 19th century, cryptographers of the time turned to other techniques. One was called the nomenclator: a catalog of numbers, each standing for a word, syllable, phrase or letter.

But Mr. Patterson had a few more tricks up his sleeve. He wrote the message text vertically, in columns from left to right, using no capital letters or spaces. The writing formed a grid, in this case of about 40 lines of some 60 letters each.

Then, Mr. Patterson broke the grid into sections of up to nine lines, numbering each line in the section from one to nine. In the next step, Mr. Patterson transcribed each numbered line to form a new grid, scrambling the order of the numbered lines within each section. Every section, however, repeated the same jumbled order of lines.

The trick to solving the puzzle, as Mr. Patterson explained in his letter, meant knowing the following: the number of lines in each section, the order in which those lines were transcribed and the number of random letters added to each line.

view interactive

The key to the code consisted of a series of two-digit pairs. The first digit indicated the line number within a section, while the second was the number of letters added to the beginning of that row. For instance, if the key was 58, 71, 33, that meant that Mr. Patterson moved row five to the first line of a section and added eight random letters; then moved row seven to the second line and added one letter, and then moved row three to the third line and added three random letters. Mr. Patterson estimated that the potential combinations to solve the puzzle was "upwards of ninety millions of millions."

Undaunted, Dr. Smithline decided to tackle the cipher by analyzing the probability of digraphs, or pairs of letters. Certain pairs of letters, such as "dx," don't exist in English, while some letters almost always appear next to a certain other letter, such as "u" after "q".

To get a sense of language patterns of the era, Dr. Smithline studied the 80,000 letter-characters contained in Jefferson's State of the Union addresses, and counted the frequency of occurrences of "aa," "ab," "ac," through "zz."

Dr. Smithline then made a series of educated guesses, such as the number of rows per section, which two rows belong next to each other, and the number of random letters inserted into a line.

To help vet his guesses, he turned to a tool not available during the 19th century: a computer algorithm. He used what's called "dynamic programming," which solves large problems by breaking puzzles down into smaller pieces and linking together the solutions.

The overall calculations necessary to solve the puzzle were fewer than 100,000, which Dr. Smithline says would be "tedious in the 19th century, but doable."

After about a week of working on the puzzle, the numerical key to Mr. Patterson's cipher emerged -- 13, 34, 57, 65, 22, 78, 49. Using that digital key, he was able to unfurl the cipher's text:

"In Congress, July Fourth, one thousand seven hundred and seventy six. A declaration by the Representatives of the United States of America in Congress assembled. When in the course of human events..."

That, of course, is the beginning -- with a few liberties taken -- to the Declaration of Independence, written at least in part by Jefferson himself. "Patterson played this little joke on Thomas Jefferson," says Dr. Smithline. "And nobody knew until now."

No comments:

Post a Comment