Book Review: Cracking Codes with Python

Cracking Codes with Python, by Al Sweigart, 2018, No Starch Press.
This was the second of two books I received for Christmas, along with The Puzzle Palace. I’ve mentioned before that I had to upgrade to a Win 10 PC, which no longer supports the VBScript solver tools I’d written, and I wanted to find out if Python would be a better port-over path than Microsoft Power Shell. After spending an hour on the Microsoft support site and not finding any references to Power Shell scripting, I pretty much gave that a big fat 0 on desirability. Python, on the other hand, seems to do what I want very quickly, but it’s got a big learning curve. My hope was that Cracking Codes with Python might be a good introduction to the language.

The cover is a bit misleading on three counts. First, it’s not about codes (words used to hide the meanings of other words), it’s about ciphers (changing the order of letters in a message, or changing letters into something else, to make the message unreadable). Second, while the cover has an illustration of an Enigma machine, Enigma is not mentioned anywhere in the book. Third, you’re not really going to be building and breaking ciphers so much as you are taking a couple example cipher types, playing around with them a bit, and then looking at how Python can be used to implement the examples. If you want a good introduction to cryptography, join the American Cryptogram Association, which the author never mentions, either in the book, or on the NoStarch Cracking Codes references page. But, if you want to learn Python programming, this book is as good of a start as any.

Cracking Codes introduces the Caesar cipher, and the Caesar wheel, the Reverse Cipher (just flipping the message string back to front), presents a Caesar-shift program, and follows this with a simple Caesar bruteforce solver (just printing out the text with all 25 shifts). The Transposition Cipher can be thought of as Incomplete Columnar Transposition with a key of 12345, or a Route Transposition using “on by rows” and “off by columns.” The “encryption key” is simply the choice of whatever column width you want to use. The following chapters go into using random numbers to generate different key widths, encrypting the message and then decrypting it to ensure that the code works right, and then the presentation of an isEnglish() module, which just breaks a message up by words and calculates the percentage of text that appears in a short 45,000 word dictionary. If the message consists of people, country and place names, then isEnglish() will fail to catch that. Regardless, isEnglish() is used to support a Transposition cipher bruteforce tester, which just runs through the various potential key widths, decrypts the text for each width, then asks the user if what passes isEnglish() is really the correct message.

The Affine cipher combines the Caesar shift with a multiplication, instead of addition, shift. It’s almost as easy to break as the Caesar shift is, but is presented in the book simply to justify introducing the greatest common divisor (gcd()) and inverseModulus() functions, which get used later in the public key encryption program.

The Simple Substitution Cipher is what the ACA calls the Aristocrat, which is the same thing as the crypto quips in the newspapers. This cipher changes each letter in the message to a different letter (one-to-one associations), while retaining punctuation and word spacing. The solver uses an interesting approach of combining word patterns with lists of cipher letters and their associated potential plaintext letters based on the words in the English dictionary that match each pattern. This approach works best on long message texts, and with smaller dictionaries (the bigger the dictionaries, the harder it is to narrow down individual substitution matches). The character set includes upper and lower letters, digits and punctuation. Word spacing must be retained, meaning that the solver given here works on Aristocrats, but not Patristocrats (simple substitutions with no punctuation or spaces). A cipher is “solved” when the plaintext to cipher letter associations can’t be narrowed down any further from “all possible letters” to “no other letter is possible for this one cipher letter.” The solver has three levels of “solution”: 90-95% of the plaintext recovered; 10-15% recovered; nothing recovered.

The next section is on Vigenere, with no mention of anything else in the family, including Variant, Beaufort or Gronsfeld. It’s treated as essentially a string of upper and lower case letters, where the string gets shifted by the value of the associated keyword letter. Ct = (Pt + Kt) % 26 for encryption. Ct = (Pt – Kt) % 26 for decryption. The example program is only 1.5 pages long, including comments. The solver is much more interesting, since it uses letter frequency analysis with a “high-low” test to get individual keyletter shift values. And checking for Kasiski letter grouping repetition distances for determining the keyword width. Then, for any given test width, the program tests all possible combinations of the top four potential keyletter shifts for each key letter substring, and then uses isEnglish() to determine if a specific letter shift combination produces something that could be close to the original plaintext.

Example:
Say the key width is two, and the top four most likely Caesar shifts per key letter are ABCD and ESIO. The letter shift combinations would be:

AE, AS, AI AO, BE, BS, BI, BO, CE, CS, CI, CO, DE, DS, DI, DO.

The program would try to decrypt the ciphertext using each of the 16 keys, and the ones that produce a reasonable amount of English words out in the plaintext would be displayed to screen for the user to confirm. Unfortunately, this approach only works because the book’s implementation of the Vigenere symbol set includes punctuation and spacing.

I modified the book’s Vigenere program to use my regular n-gram test for scoring instead of isEnglish(), which lets me bruteforce solve the Vigenere crypts in the Cm. I then ran it on the crypt in a previous Cm issue that I’d already solved with VBScript. The program failed to find the correct solution because the function used for looking for individual letter group shifts is not very accurate. It’s a “high/low” approach, which just measures the existence of the 5 most frequent, and 5 least frequent letters for any given shift, and one of the groups in the crypt I was testing had a very low ranking compared to the scores for other shifts. In other words, the high/low scoring method ranked the correct shift as #10 in the results lists. The default setting for the program is to just use the top 4 results for each letter group.

There’s a short chapter on One-Time Pads, which Al identifies as its own unique cipher type (it’s just Vigenere with a very long key consisting of random symbols that gets used once and only once). Bottom line – if done right, it’s unbreakable. And, generally, everyone using OTPs does it right.

The last three chapters cover testing random numbers to determine if they’re prime, using two very large prime numbers to find the public and private keys for public key encryption, and how to write a “textbook RSA” encryption program. Textbook in that the Rabin-Miller text used for determining if a number is prime isn’t always accurate, and that the algorithm used here can be broken by an expert. A number of the functions written for the earlier cipher-types get reused here for public key, so there is a rationale behind the book’s choice of cipher types to cover. The final program uses 1024-bit keys, and runs relatively fast.

The last chapter talks about how to use the Python IDLE development environment’s debugger.

Since I’m a member of the ACA, I look at the implementations of the ciphers in the book from the point of view of the ACA guidelines. That is, can any of these programs be used as-is for solving the crypts that appear in The Cryptogram (Cm) newsletter without modifications? Short answer, no.

Well, the Caesar shift cipher is fine as-is, but the ACA only uses this to encrypt hints for the easier crypts for the benefit of the lower-level members. The Transposition Cipher is either 25% of a Columnar Transposition cipher, or 5% of a Route Transposition. For Columnar, there needs to be a key number that rearranges the order of the columns as they’re taken off (i.e. – 35214 for a block 5 columns wide). For a Route Tramp, every one of the routes needs to be added (on by rows, columns, diagonals, spirals; vertical or horizontal flips; off by rows, columns, spirals or diagonals).

The ACA doesn’t use Affine, One-Time Pads or Public Key Encryption in the Cm.

For Aristocrats (crypto quips with punctuation and spacing kept intact), the ACA doesn’t use mixed case letters in the plaintext. The book’s approach to a solver (pattern matching on individual “words” and building up potential Ct to Pt letter associations based on a small dictionary) works fine for long plaintexts (400-800) letters. Many ACA Aristos are 140 characters or less, and of the 25 crypts I tested from the most recent Cm, the Cracking Code’s program only 90-95% solved 5 of them. It 5-10% solved another 4, which was enough to help me hand-solve those using my own Free Pascal solver. The remaining 16 crypts were 0% solved. The book’s solver doesn’t work on Patristocrats (Aristos minus word spacing and punctuation), because there’s no way to get word lengths for use with the dictionary file. The primary fixes here would be to either adopt a dictionary attack against the keywords (only good for K1, K2 or K3 key alphabets), plus the isEnglish() dictionary test for Aristos. Or, to scrap isEnglish() and replace it with a count of how many common 2-, 3-, 4- and 5-letter groupings (i.e. – n-grams, etc.) appear in the deciphered message for a given keyword, or to test for the presence of a crib word for Pats.

The remaining cipher program given in the book is for Vigenere. The ACA only uses lowercase letters for the plaintext, and no punctuation or word spacing. The cipher text is arranged in 5-letter groups. This means that, again, the isEnglish() test won’t work. The Kasiski method for finding the key width is fine, though, as is the key letter combination test method. The high-low test for letter shifts needs to be replaced with something more robust, or maybe an Index of Coincidence test. And isEnglish() needs to be replaced by the n-gram counter, or a check for the presence of a crib. Beyond that, though, there’s no support for Variant, Beaufort or Gronsfeld, which use the same algorithm, but different look-up tables (which can easily be emulated via simple relationship equations).

Overall, Cracking Codes with Python is a decent reference for getting started in the Python language, and it does have sufficiently useful descriptions of the Vigenere, Simple Substitution and Public Key Encryption methods. It’s not really great right “out-of-the-box” for writing ACA-compliant solver tools, but that’s an easy enough work-around if you do intend to write your own tools for solving crypts in the Cm. And, it’s a fast read, even at 280 pages.

One side note: The No Starch Cracking Codes page has a link to an Additional Online Resources page, which includes corrections to typos in the book. My copy of the book is a third printing, and all of those corrections were already in my copy. However, I did find a superficial error (it prints an error to the screen, but doesn’t affect the results of the public key encryption program). I sent an email about it to Al, but I never received a reply back. This typo is not included on the resources page.

print(‘The private key is a %s and a %s digit number.’ % (len(str(publicKey[0])), len(str(publicKey[1]))))

“publicKey[0]” and “publicKey[1]” should be “privateKey[0]” and “privateKey[1]”.

Additionally, something that’s mentioned on Al’s own website inventwithpython.com is that you can read his books online for free. So, if you just want to see if Cracking Codes is worth buying in paper, check it out first online.

Finally, would I recommend Cracking Codes? If you want to learn Python, yes. If you want to learn how to write programs for encrypting and decrypting messages, a qualified yes. It’s better to get a dedicated resource on cryptography first, and then use Cracking Codes to see how to get started on the program and modify that for the cipher-type you want to employ. As a textbook for a university cryptography/cryptanalysis mathematics course? Probably not. Check out the online copy on Al’s website first, or write your code in C++ or Pascal.

Published by The Chief

Who wants to know?

One thought on “Book Review: Cracking Codes with Python

Leave a comment

Design a site like this with WordPress.com
Get started