Python GUI 032 – Transpo Groundwork 3

Still trying to get ahead of things. I got the encryption/decryption plus “show worksheet” functions running for Sequence Transposition. With luck, maybe I’ll get to Nihilist Transposition this week.

In the meantime, one of the issues I encountered in the Sequence program was related to keywords, and I might as well document that now.

First, what is a “key?”

The short answer is: “A key is whatever is used to make a plaintext message harder to read for the unintended recipients of a secret message.”

The long answer is: “It depends.”

Let’s start out with steganography, and the Bacon cipher concept. Francis Bacon came up with binary system in 1605. In this system, any two things (fonts, colors, shapes, etc.) can represent the letters “a” or “b”, or in more modern terms “0” or “1”. The alphabet is traditionally only 24 letters long, with i and j, and u and v doubling up. In this sense, the key could be something like “a” is the color red, and “b” is the color blue. Then, the message could be a grid of red and blue squares or color splotches. As the ACA employs the Baconian cipher, one “letter” (a five-character collection of a’s and b’s) is represented by a 5-letter English word. To make “color” represent “ababa” (which maps to the letter “K”), the key would at a minimum have to include “a=CLR; b=O”.

Moving to Caesar-shift, the “key” is the number of characters the cipher alphabet is shifted relative to the plain alphabet. That is, a key of 4, equivalent to A=E, gives us:

abcdefghijklmnopqrstuvwxyz - plain
EFGHIJKLMNOPQRSTUVWXYZABCD - cipher

Taking the next step, to simple substitution, we have a few choices. We could use meaningless symbols, or random letter assignments, in which case the key would be a letter-by-letter assignment table. Or, we can use a key word or phrase. A keyed alphabet consists of the keyword minus any repeated letters, followed by the rest of the alphabet. Either the plain or/and the cipher alphabets can be keyed, and for the ACA, the two can be shifted relative to each other such that no letter matches to itself (that is, we can’t have something like r=R).

cipherabdfgjklmnoqstuvwxyz - plain (key = "cipher")
ETCDFGIJKMNOQRSUVWXYZALPHB - cipher (key = "alphabet")

Going even further, we get into the Vigenere table family, where the key can be a word, expression, a string of random letters, or a quote from a book. Now we want to keep the repeating letters. I don’t want to produce the entire table here, you can find it in the wikipedia page. Alternatively, just think of each letter in the key string as representing a Caesar-shift of that amount (i.e. – a=K) going letter by letter:

KEYWORDKEYWORDKEYW - Key
thisisatestmessage - plain
------------------
DLGOWJDDIQPAVVCEEA - cipher message

K is the 10th letter after A, T is the 20th letter. K+T = 30 – 26 = 4, which is the letter D.

Changing directions, we could look at book ciphers using a dictionary. Here, we also have a lot of choices. Let’s keep it simple and follow a ppp-c-www pattern, where ppp is the page number, c is the column number (1 or 2), and www is the number of the desired word counting from the top line of the column. Such as:

99-1-45 = page 99, left-hand column, 45th word from top

The “key” could then be thought of as the book to use (name and edition of the dictionary), and possibly an offset (i.e. – add the day of the month to the page number, and month of the year to the word number). For Feb. 7th, 99-1-45 becomes:

106-1-47

Going back to Vigenere, one of the other family members is Gronsfeld, which uses a numeric key of random digits. The theory here is that it’s harder to solve Gronsfeld because the key is not a recognizable word or phrase in any given language, comparing something like 9174293681 to “WASHINGTON.” In fact, Gronsfeld is easier because you’re only using 10 of the available 26 alphabets, and “0” is equivalent to “a=A”, or “no shift.”

917429368191742936 - Key
thisisatestmessage - plain
------------------
CIPWKBDZMTCNLWUJJK - cipher message

Making a sharp left turn, let’s jump to transposition ciphers. Generally, the transposition keys are numeric, in that we’re moving characters around in some kind of repeatable fashion. The classic example is the Columnar Tramp:

2413 - key
----
this
isat
estx

IATTI ESTXH SS

Traditionally, column numbering starts at 1, but we could start at 0 and the results would be the same. To make it easier for the sender and recipient to remember the key, it’s presented as a mnemonic in the form of a word, with the letters numbered in ascending order, and repeated letters incremented from left to right.

ITEM
2413

PEOPLE
614532

However, as mentioned before, the Myszkowski transposition system uses pattern words to generate its keys, and identical letters get identical digits.

PEOPLE
413421

In general, it doesn’t really matter if numbering starts at 0 or 1, as long as you’re consistent, especially if the correspondents are just exchanging keywords. Numbering then becomes just a matter of personal taste. Further, if the key length exceeds 10, using a keyword is more convenient,

0011010000001
1701329852463
CORRESPONDENT

compared to having to remember:

1 7 10 11 3 12 9 8 5 2 4 6 13

This brings me to Sequence Transposition. We have a keyword, and we have a primer (which could also be a word). The primer is 5 digits long, and gets extended to the full length of the message by adding the first two digits, and appending the units value of the sum to the end of the primer (ignoring carry-overs). If the primer is 84137:

To get the 6th digit, we add 8 and 4 to get 12, drop the 1, and append the 2 to the right of the primer:

841372

And continue the process.

84137254097 - sequence
thisisatest - message

Next, we want a 10-letter word or phrase, start numbering at 1, and use 10 mod 10 for the “0” digit.

TRANSPOSED - A=1
0714865932

Looking at the number sequence above the message, write each corresponding message letter under the matching digit in the key:

TRANSPOSED
0714865932
EIIHT.ASSS
.T.T

Write the columns out in the order they appear in the table and group in 5s.

EITIH TTASS S

Here, the decision to start numbering at 0 or 1 has a big impact:

TRANSPOSED
9603754821 - starting with A=0
S.ESIAHTSI
....T.T

SESIT AHTTS I

To create our final message, we prepend the primer, and append the last digit of the sequence as a form of checksum.

84137 EITIH TTASS S 7

Going back and reviewing the “legal letters” argument, we have to allow for digits here in any software solver, or we need to strip off the primer and checksum before “cleaning the message” to remove the spaces prior to starting the auto-solve steps. That is, we want to parse sequence messages as follows:

Primer:. 84137
Check :. 7
Message: EITIHTTASSS

I mentioned above that we could use a mnemonic for the sequence primer as well. Obviously, we’re not going to build the number from the keyword as we did above, because we’d need a really long primer word to get digits between 6 and 10. The easiest way to represent the primer as a string is to use actual digits, like “84137.” Or, we can treat the letters A-J as the digits 0-9 (or 1-10, starting with 0 or 1 is not important here; we just need consistency). If A=1, then:

1234567890
ABCDEFGHIJ

84137 = HDACG

This may not be as easy to remember as FADED (61454), but our options are limited if we restrict ourselves to just spellable 5-letter words. Instead, we can use modulus arithmetic, and the full alphabet (or even the full ASCII set). Just take the units value of anything larger than 10.

12345678901234567890123456
ABCDEFGHIJKLMNOPQRSTUVWXYZ

84137 = RXAWG (still a stretch, but maybe you see my point)

So what’s this all mean? Why go through all this when half of it has been described before, or will be some months from now?

It’s all about standardization, or the lack of it. I don’t know of a way to write one single function in Python that could cleanly handle all of the above use cases. Especially when it comes to “unknowns” (that is, you’re not told in advance what kind of cipher system the message was enciphered in).

Some forms of keys are common across more than one system, such as:

Keyed alphabets: Aristo, Pat, Fractionated Morse, Ragbaby
Numeric: Most transpositions
Key expressions: Vigenere, Variant, Beaufort, Running key

But then again, there are types that are more-or-less unique to a specific cipher type:

Binary: Baconian
Pattern word: Myszkowski
Many 10-letter words:
Grandpre

The question becomes, “where do I put key-generating functions?” Do I try to systematize all of them and shoehorn every single one into one mammoth key_utils.py file? Do I break them up into families, and make separate utilities files, like transposition_most_key_utils.py; simplesub_key_utils.py; etc? Or do I approach this mess on a case-by-case basis, with simple sub key handling going into a generic simplesub_utils.py file, and the Myszkowski key handler staying safely in transpo_myszkowki.py with the rest of the Myszkowski-specific functions?

My decision at the moment is – follow matters of personal preference (mine).

Next up: Don’t know.

Published by The Chief

Who wants to know?

Leave a comment

Design a site like this with WordPress.com
Get started