Aristocrats and Patristocrats – Refinements on solving

Simple substitution refers to cipher types where one letter is used to represent another letter, consistently throughout the message. Caesar ciphers are simple subs, where you just shift the entire alphabet a fixed number of positions to the left or right (shift 2: a = c, b = d, c = e, etc.) Deciphering, just flip the letters over (c = a, d = b, e = c, etc.) Aristocrats and Patristocrats are both simple subs, the only differences being that Aristos keep the word separations and punctuation; Pats lose the separators and punctuation, and the letters are collected in groups of 5. This last point needs to be kept in mind for later. (By convention, plaintext is in lowercase, ciphertext is uppercase.)

abcdefghijklmnopqrstuvwxyz - Plain alphabet
CDEFGHIJKLMNOPQRSTUVWXYZAB - Cipher alphabet


this! is a "test." - Formatted plaintext
VJKU! KU C "VGUV." - Aristo ciphertext
VJKUK UCVGU V - Pat ciphertext

The American Cryptogram Association (ACA) generally employs one of four techniques in generating keyed alphabets for Aristos and Pats.
K1 – The plain alphabet is keyed.
K2 – The cipher alphabet is keyed.
K3 – Both alphabets use the same keyword.
K4 – Different keywords are used for both alphabets.

Note that the ACA guidelines say that no letter may map to itself (a != a), so the two alphabets need to be shifted relative to each other. The amount of shift, as well as the type of keyword(s), are at the author’s discretion.

teachrbdfgijklmnopqsuvwxyz - K1
WXYZABCDEFGHIJKLMNOPQRSTUV


xyzabcdefghijklmnopqrstuvw - K2
TEACHRBDFGIJKLMNOPQSUVWXYZ


teachrbdfgijklmnopqsuvwxyz - K3
WXYZTEACHRBDFGIJKLMNOPQSUV


teachrbdfgijklmnopqsuvwxyz - K4
VWXYZSTUDENABCFGHIJKLMOPQR

One significant element in solving Aristos is that you do get to keep the word breaks, and you can tell not only how long the words are, but also where the repeating letters are in those words. The first line of attack is to identify the single letter words (“a” or “I”), which is usually easy based on the positioning of the words in the message; in the ACA crypts, the usage of the word “a” generally outnumbers the word “I” by maybe 10 to 1. After this, look for words with apostrophes (“DBO’U”, “J’KK”, “DPVMEO’U”); the righthand part is most likely to be “t”, but can also be “s”, “d” or “l”. Next, check for pattern words from your dictionary (“UIBU” (that), “QBUUFSO” (pattern), “TUVEFOUT” (students)).

With Pats, you lose the sense of what letters comprise a word, so the above attack is less useful. Fortunately, though, the ACA will generally give you a crib or tip, which is a word that appears in the message plaintext. The Cryptogram newsletter usually has 12 normal Pats, and maybe two harder specials, starting from easiest (1) to hardest (12), and the easier cribs will be pattern words. This makes placing the crib (figuring out which letters are which) more obvious. Aristos already have tons of hints, so the ACA doesn’t give you cribs for those. For “VJKUK UCVGU V”, and the crib “test”:

VJKUK UCVGU V
t..s. s.tes t

As the crypts get tougher, the techniques for solving them have to become more sophisticated. For Aristos, look for the longest pattern word in the message. “QBSFOUIFUJDBMMZ” has a pattern of _1_2_3_23__144_ and is almost unique in the dictionary (“parenthetically”), compared to “UIBU”, 1__1, which can generate an overwhelmingly large number of potential hits (that, test, text, sows, sews…) If there are a large number of hits, or if none of the individual words have patterns, then do a multi-word dictionary search using words with shared letters: TFBSDI + SFBDI = TFBSDISFBDI = _1234531245 (“search” + “reach”). For the really hard crypts with convoluted grammar, all big words (or all short words of the same length (4-5 letters)), and no pattern words, you have have to resort to a 3- or 4- word dictionary search. For me, in VBScript, this means going through the dictionary a ^ b ^ c ^ d times (where a, b, c, and d are the word counts in each file for the words of a specific length). This can take hours for non-pattern words, but, if the words are in your dictionary, you WILL get the solution eventually (my dictionary doesn’t include proper nouns, place names or common foreign words, yet).

Ok, so what about Pats? Well, the biggest weakness here is actually in the keyword the author chose. At least half the time, keys will consist of common single words that are in the dictionary. We don’t know the length of the keys, but we are told which of the keyed alphabets the author used. This makes a big different for crypts using K1, K2 or K3 alphabets.

A bruteforce attack on the keyword just consists of reading your dictionary file (I store it in an array), and then step through it one word at a time. I create the plain and cipher alphabets based on the K-type, loop on the relative shifts (shift one alphabet one letter to the left, 25 times), and attempt to decipher the crypt with the alphabets obtained. If I have a crib, check the output text to see if the crib exists in the string, and if it does, print the string out. Maybe.

Ok, this last “maybe” part can be important. If the crib uses letters that appear in the keyword, the resulting output string may be more or less unique, and there can be just the one string out. Otherwise, you might end up with hundreds of largely garbled sentences out that all just happen to have the crib show up, but are otherwise unreadable.

To weed out the unreadable strings, if the test string contains the crib, I then do a trigram count (common 3-letter groupings, like “the”, “ist”, “ion”, “ing”), and I set the minimum threshold for printing the results out. This threshold depends on the length of the crypt, and other uncontrollable factors, so I end up tweaking it sometimes to even get any results to be displayed. To help with this, my script has a running maximum count that is displayed as part of my progress counter.

Sometimes, the program looks like it’s locked up, so I compare the first two letters of the current word against the contents of a variable. If they don’t match, I print out that word and the array index to show what number word I’m working on, and then update the variable with the first two letters of the new word. I have also appended the maximum trigraph count to the progress output string, which lets me know if my trigram threshold is set too low or high.

I can attack the key for both Aristos and Pats, but the approaches need to be slightly different. First, Pats.

Patristocrats have the 5-letter groupings, which get in the way. So, strip out the spaces between groups, decipher the message with the current keyed alphabet, and check whether the crib exists in the output string. If it doesn’t, shift the alphabets relative to each other up to 25 times, testing again each time, then go to the next dictionary word. If the crib does exist, do a trigram count, output the string if the count is above the threshold, then go directly to the next dictionary word.

1. Strip spaces
2. Load dictionary and store first word.
2a. Build up the keyed alphabets (for K1-K3)
2a1. Decipher the crypt with the current alphabets.
2a2. Check for crib.
2a2a. If no crib, go to step 2a3.
2a2b. Otherwise, count trigrams.
2a2b1. If count below threshold, go to step 3.
2a2b2. Otherwise, print out results and go to step 3
2a3. If # of shifts is 25, go to step 3.
2a4. Otherwise, shift one alphabet one position to left, go to step 2a1.
3. If not at end of array, get next word and go to step 2a.
4. Otherwise, quit.

For Aristos, we can keep the word and punctuation formatting, but there is going to be a speed hit, in that we’re stepping through a longer string and skipping over the spaces and punctuation every time we decipher the text with the next alphabets. The advantage of keeping the punctuation is that if part of the output is garbled (because the key used for the crypt is not in the dictionary, but there’s enough readable output text to let us solve the crypt manually anyway), it’s easier to align the cipher letters to the solved plaintext letters for manual solution. But, solving Aristos through bruteforce this way is going to take longer than for Pats for another reason, too.

If you look at the above steps for Pats, you can see that if the crib exists in the deciphered text, we stop shifting the alphabets and automatically go to the next dictionary word. For Aristos, there are no cribs, so we’ll be doing the full 25 shifts every single time for every single dictionary word. It’s just much faster to do pattern word/non-pattern word/multiple pattern word tests for Aristos and manually solve them using whatever method you like – pencil and paper, or software assisted tool (I use the program I wrote in Free Pascal for this).

Ok, so why can we stop shifting the alphabets for Pats if we find the crib? In general, 25 of the 26 letter arrangements between the plain and cipher alphabets will produce unreadable garbage. Only the correct number of shifts (0 to 25) will give us the crib, even if the rest of the keyword letters are wrong, if the crib doesn’t use the letters in the keyword.

dircabefghjklmnopqstuvwxyz - Close, but not completely correct.
EFGHIJKLMNOPQRSTUVWXYZABCD


XNGW GW I XKWX
thrs rs a test - Crib shows up, and we can tweak G = r to get G =
i.

cribadefghjklmnopqstuvwxyz - Correct word, plus correct shift
EFGHIJKLMNOPQRSTUVWXYZABCD


XNGW GW I XKWX
this is a test

A shift of 4 will give us our crib, while any other number of shifts wouldn’t give us anything.

As a recap, it’s just faster to solve Aristos through single-letter word and apostrophe identification, and dictionary searches on pattern and non-pattern words and combinations of words.

For simpler Pats with K4 alphabets, or keys that aren’t single words in the dictionary, manually place the pattern word cribs in the crypt, and manually solve. Otherwise, for K1-K3 alphabets, and single-word key words that ARE in the dictionary, use the above bruteforce attack against the key.

Right now, this just leaves me with K4 alphabets, and keys (made of two more words, proper names or place names, or common foreign words) not in the dictionary. If I’m lucky, the key is close enough to something in the dictionary (i.e. – “blackridge” comes close to “black”) to give me output that I can degarble manually (“sxtyrn” = “saturn”?). Otherwise, I’m out of luck with K4 Pats, because that can mean going through the dictionary array 220,000 ^ 220,000 times, which could take years for just one crypt. There are several other methods I can use, such as identifying vowels and consonants in the crypt, but I haven’t started digging into those methods yet.

I was prompted to write up this entry as began I tackling the CONs in the current issue of the Cm. It had 25 Aristos, 12 normal Pats, and 2 specials. 7 of the Aristos used K4 alphabets, and almost half of the others used keywords that weren’t in my dictionary. I solved all of those, but all of the work was manual, and I mostly relied on pattern word searches. Only one of the Pats was a K4 (lucky this time!), but 5-6 used keywords not in my dictionary. I could suss out enough of the garbled text to solve some of them manually, including the first special. But, I ultimately failed to solve 3 of the normal Pats and the other special.

I was really hoping for a rare “complete” for both Aristos AND Pats, but not this time. Given that I still haven’t tried tackling any of the Quagmires (I – IV), Bazeries, Phillips, Porta and a few other types that are also in the current Cm, I’m going to set aside the last four Pats right now, and see what else I can break with simple brutefore attacks on their keys, and try writing simple encipher and decipher scripts for those types just for the learning process. I also want to keep beefing up my dictionaries…

Published by The Chief

Who wants to know?

Leave a comment

Design a site like this with WordPress.com
Get started