Python GUI 018 – Transposition Solver, Part 1, Introduction

There are two major types of ciphers – transpositions and substitutions.

Transpositions change the ordering of the characters in the message:

This is a test message

361245
thisis
atestm
essage

IESSS ATAEI TGSME HTS

Substitutions turn the characters into other characters

abcdefghijklmnopqrstuvwxyz
GHIJKLMNOPQRSTUQWXYZABCDEF

This is a test message
ZNOY OY G ZKYZ SKYYGMK

ZNOYO YGZKY ZSKYY GMK

This next project is going to be my most ambitious yet, involving up to possibly 10 different transposition types at once, and multiple languages.

First, a bit more on transpositions as a concept. The basic idea is to create a set of rules that can be followed by the sender and recipient in as a consistent, “fool-proof” way as possible. In this sense, “fool-proof” means that the rules are not going to be too outrageous, and would generally be usable by a moderately-trained soldier on the front lines in combat situations.

One of the easiest rules systems is the one demonstrated above – columnar transposition.

Choose a key number of a particular length (usually referred to as the transposition width, or period).
Write the plaintext message under the number, in rows.
Read off the plaintext in columns, in ascending numeric order, in uppercase.
Group the resulting cipher text in 5s.

Decryption would be the reverse process.

Write the key number.
Divide the message length by the cipher width to get the column lengths.
Write the cipher in column lengths under the key number, in segment order.
Read the plaintext off in rows.

.1. .2. .3. .4. .5. .6.
IES SSA TAE ITG SME HTS

361245
------
thisis
atestm
essage

this is a test message

Encrypting positions

Note that the order of the letters coming out of the encryption phase is completely independent of the original message. If we have a different plaintext message, but the same key:

361245
differ
entpla
intext

FTTFP EDEIE LXRAT INN

We’re still writing in the text in rows and taking it off in the ascending order of the numeric key. What this means is that any transposition system can be modeled by its encrypted positions, and the details for encryption, decryption and bruteforce solving all become the same. That is, for Columnar with a message length of 20 and the key 361245:

.3 .6 .1 .2 .4 .5
00 01 02 03 04 05
06 07 08 09 10 11
12 13 14 15 16 17
18 19

00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 Cipher
-----------------------------------------------------------
02 08 14 03 09 15 00 06 12 18 04 10 16 05 11 17 01 07 13 19 Plain

Once we have the encrypted positions, we can apply encryption by going through the plaintext from left to right and just plugging the letters into a list object based on where the plaintext letter goes into the list according to its original positioning. That is, the first letter will go into list element 06, the second into element 16, the third into 00, etc.

Conversely, if we have the cipher text:

FTTFP EDEIE LXRAT INN

The first cipher text letter is from plaintext position 2, the second is from plain position 8, the third is from plain position 14, etc.

This makes bruteforce solving attacks against the key much easier. Increment the key, try encrypting the message positions with that key value, and apply the decryption step against the cipher text. Test the output for readability. If it’s readable, we’ve got the plaintext out. If not, increment the key and try again.

The next step is to describe each of the transposition systems we’ll be implementing in the GUI solver tool.

But, first, other stuff. Like, languages

One of the interesting things about transposition is that it can be applied to plaintexts in most major languages. Because of this, Xenocrypts (recreational ciphers in other languages) often include the more common transposition types, such as railfence, columnar, AMSCO and Myszkowski, for languages including French, Spanish, Italian, and German.

So, the question becomes, how do we test for readability?

There are tons of resources “out there” (i.e. – books, web sites, wikipedia) that list the top 10 or 20 letter pairs, triplets and 4-grams for the more common languages. For English those would be:

DIGRAMS = 'th er on an re he in ed nd ha at en es of or nt ea ti to it'
TRIGRAMS = 'the and tha ent ion tio for nde has nce edt tis oft sth men are but not you all'
TETRAGRAMS = 'that with have this will your from they know want been good much some time'

Obviously, the more of each length we have, the closer we can come to determining whether the results of a random key number is giving us good English or not. It’s pretty easy to whip up a Python script to go through a large body of sample text to collect the top 600 or 1,000 2-grams, but in our case more is not really better.

If we have too many of any particular n-gram size, it will slow the program down, and allow random text to pass the readability test. That is, if we count the number of 2-grams appearing in some sentence, we’re going to get really high totals if we don’t restrict the number of 2-grams that we are testing for.

I’m finding that for most cases, having between 20 and 100 of each size n-gram is adequate.

The problem comes in with collecting n-grams of languages for Portuguese or Esperanto. Right now, my language dictionaries are not very complete, and the n-gram lists are even more spotty. I need to fix that.

Anyway, the code for counting n-grams in the target text will be something like this:

grams_holder = []

def init_grams(list_name = 'TRITET'):
... ret = ''
... global gram_holder
... if list_name == 'DI':
....... ret = DIGRAPHS
... elif list_name == 'TRI':
....... ret = TRIGRAPHS
... elif list_name == 'TET':
....... ret = TETRAGRAPHS
... elif list_name == 'TRITET':
....... ret = TRIGRAPHS + ' ' + TETRAGRAPHS
... elif list_name == 'ALL':
....... ret = DIGRAPHS + ' ' + TRIGRAPHS + ' ' + TETRAGRAPHS
... else:
....... print('%s is an unrecognized parameter name. Use DI, TRI, TET, TRITET, or ALL.' % list_name)

... gram_holder = ret.upper().split(' ')

def count_grams(message):
... global gram_holder
... ret = 0

... for gram in gram_holder:
....... ret += message.count(gram) * len(gram)

... return ret

Most of the more complex transposition types will also include a crib, or hint, which is one or more words, or word fragments, that appear in the plaintext. As part of the gram count I’ll also check whether the crib appears in the target text.

The main program body might look something like this:

init_grams('ALL')
min_cnt = 100
crib = 'HELLO'

test_str = decrypt(msg, key)

cnt = count_grams(test_str)

if len(crib) > 0 and crib in test_str:
... cnt += 50

if cnt > 100:
... print(test_str)

While I have all of these pieces already in my existing solvers, I don’t have everything incorporating multiple language dictionaries and n-gram lists, and what I do have for other languages needs to be beefed up. But that will happen as I go along.

Making Keys

As mentioned above, transposition types generally use numeric keys. The easiest way to remember a key is to pick a keyword of the right number of letters, and then number the letter positions in ascending alphabetical order. If a letter appears more than once in the word, number the positions starting from left to right. Examples:

keyword
3276451

people
514632

One important exception is for Myszkowski, which is based on pattern words. Here, the idea is to give the same numbers to matching letters. Then, during the encryption phase, group the matching digits together from left to right, and read the cipher text in column blocks.

people
413421
thisis
atestm
messag
e

11 2 3 44
---------
hs i i ts
tm t e as
eg a s ms
.. . . e

HSTME GITAI ESTSA SMSE

Attacks

Encryption and decryption will follow the rules as given for each cipher type. It’s when we try to solve the cipher text through software that our options open up. The first question will be “what’s the transposition width?”

There’s no clear-cut answer to this. We can use a statistical test for getting a rough width indicator for Complete Columnar, but that won’t work for the other systems that try to make the transpositions look more random. Instead, we can start with the ACA guidelines, which state for Complete Columnar that the message length is in the range of the width (or “period”) times 8 to 15 lines deep. That is, if the CON is 100 letters long, the expected width (period) could be between 7 and 13. Note that when the width gets up over 10, an incrementer attack is going to start taking hours, and we really need to switch to hillclimbing.

Incrementing the key:
So, the most obvious attack would be to start a permutating incrementer at the low end width and have it automatically run through all of the intervening widths up to the upper end, and print out potential plaintexts based on a minimum n-gram count.

Hillclimbing:
Then, we have hillclimbing. The drawback here is that it’s going to be difficult to tell whether we missed getting the correct plaintext out because we have the wrong width, the original message has skewed letter frequencies, or if we were just unlucky on the current run and should try again.

When we looked at hillclimbing for cryptarithms, we had the benefit of knowing that the ultimate “best case solution” was to have the differences between the right and left sides of the equal sign equal to 0. We don’t have that guarantee of a solid hit on the correct plaintext with transpositions or substitutions. In many cases, we can only hope that we’ll get “close enough” to work out the full solution manually. Additionally, I’m not sure of a good hillclimbing approach for Myszkowski.

Dictionary:
Because Myszkowski is based on pattern words, it does make sense to run through the dictionary and try applying pattern words as the key to see if anything decrypts the CON properly, and combine that with an n-gram test for “correctness.” This approach will fail if the author used a phrase or expression that’s not in the dictionary. For other cipher types, if the period is 10 or less it just makes more sense to use an incrementing key attack. Above 10, try hillclimbing for specific widths and hope that you get lucky on the first attempt.

Recognized ACA Transposition Systems
Ok, the following systems are all transpositions (sometimes called “tramps”) that have been at least mentioned in the Cryptogram (Cm) newsletter. All but Fixed Distance and SudC (a proposed transposition based on the Sudoku table) routinely appear in the Cipher Exchange department. Grille and Swagman don’t really lend themselves to bruteforce software solving, but I hope to include them in the Transpo Solver project eventually.

AMSCO
Cadenus
Complete Columnar Transposition
Fixed Distance
Grille
Incomplete Columnar Transposition
Myszkowski
Nihilist Transposition
Railfence
Redefence
Route Transposition
Sequence Transposition
SudC
Swagman

AMSCO
This is a kind of Columnar Transposition. Start out by writing the message in rows under a numbered key. Alternate by writing in single letters and pairs. The starting position can be a singlet or a digram. Then, the rows in the first column will alternate between singlets and digrams. Take the text off in columns in ascending order, and group in 5s.

.4 .1 .2 .3
-----------
.t hi .s is
at .t es .t
.m es .s ag
.e

HITES SESSI STAGT ATME

Cadenus
The message is written in rows and taken off in columns based on an alphabetic key (not required; a numeric key also works). One weakness is that the messages have to be a factor of 25 in length, and generally there are only 4, or maybe 5, columns. The column indicator is usually given as AZYXW… etc (V and W double up). The columns are taken off starting at the row that matches the key letter for that column.

“Once upon a time there were three wolves. The father wolf, the mother wolf and the little baby wolf. One day, a big bad pig came X.”

WOLF
4321
----
Once A
upon Z
atim Y
ethe X
rewe V/W
reth U
reew T
olve S
sThe R
fath Q
erwo P
lfth O
emot N
herw M
olfa L
ndth K
elit J
tleb I
abyw H
olfO G
neda F
yabi E
gbad C
pigc D
amex B

We take off column 4 first, starting on the F row (letter “A”). Then column 3, “L” row (letter “F”), etc. For readability, group in 5s.

AIDCX ENMEE HWEEH OHTWA HTBWO
FTIEY FDBAG ECOIH WTEVH TWTOR
FMELD LLBLE ABIMN PTTEE ELTAR
RRROS FELEH ONETA ONYGP AOUAE

Complete and Incomplete Columnar Transposition
“Complete” refers to messages that can fill a rectangular box, “Incomplete” just means that whatever box size you choose, the last row will have at least one open space. For our purposes, there’s no difference between the two types – Complete is just a special case of Incomplete. The distinction comes in when you try to solve them with paper and pencil. Complete is the easier of the two, while Incomplete requires a special approach.

Write the plaintext under the key digits in rows, and take them off in columns in ascending order.

31254
-----
THISI
ISATE
STMES
SAGE

HSTAI AMGTI SSIES STEE

Fixed Distance
Also known as Skip. Pick a skip step size that is relatively prime to the message length (that is, no shared factors). Either pick the letter and then count, or count and then pick the letter.

000000000111111111
123456789012345678
------------------
thisisatestmessage

Step 7, pick then count.

TTSST EASIS GSEHE AIM

Step 7, count then pick is just a rotated variant.

ASISG SEHEA IMTTS STE

Grille
The plaintext message length needs to be a square number (9, 16, 25, 36, etc.) Make a grille (generally a piece of graph paper with holes cut out), where the number of holes is one-fourth of the message length. Place the grille on a clean sheet of paper, and write the plaintext in the holes left to right, top to bottom. Rotate the grille and repeat, and continue for the other two rotations. Remove the grille and write off the cipher text in rows, grouped in 5s. If the length is odd, the middle letter of the text is written into the central hole

Text: a message z
Grille (showing rotations):

X.. | ..X | ... | .X.
..X | ... | X.. | ...
... | .X. | ..X | X..

A.. | ..E | ... | .E.
..M | ... | A.. | ...
... | .S. | ..G | Z..

AEE
MSA
SGZ

AEEM S ASGZ

Myszkowski
Myszkowski is described above. Pick a pattern word (with repeated letters). Number the letters in ascending alphabetic order, where the repeated letters get the same digit. Write the plaintext under the key, then reorder the columns in ascending digit order, and combine those with the same digit. Take the columns off in blocks.

people
------
413421
------
thisis
atestm
messag
e

11 2 3 44
---------
hs i i ts
tm t e as
eg a s ms
.. . . e

HSTME GITAI ESTSA SMSE

Nihilist Transposition
This system uses a square grid with the same numbered key for the row and column headers. Pick a key with no repeated digits, and write it across the top and left of the grid. Write the plaintext in rows (“This is a test msg XX”).

. 4312
. ----
4|THIS
3|ISAT
1|ESTM
2|SGXX

Reorder the rows or columns in ascending order.

. 4312
. ----
1|ESTM
2|SGXX
3|ISAT
4|THIS

Then reorder the columns or rows in ascending order.

. 1234
. ----
1|TMSE
2|XXGS
3|ATSI
4|ISHT

Finally, take the cipher text off either in rows or columns and group in 5s (off by rows).

TMSEX XGSAT SIISH T

There’s an accepted variant, which is actually the method for decrypting the above message. Write the row and column headers in ascending numeric order. Write in the plaintext either in rows or columns.

. 1234
. ----
1|THIS
2|ISAT
3|ESTM
4|SGXX

Reorder the row or column in key order.

. 1234
. ----
4|SGXX
3|ESTM
1|THIS
2|ISAT

Reorder the column or row in key order.

. 4312
. ----
4|XXSG
3|MTES
1|SITH
2|TAIS

Take the cipher text out in rows or columns and group in 5s.

XMSTX TIASE TIGSH S

Railfence
Railfence is so-called because of a casual resemblance to a fence. The rows are referred to as rails. Encryption is based on the number of rails, and an offset (the row you choose to start writing in the message on the zigzag). As with Nihilist Transposition, there are two versions of Railfence, which are inverses of each other. The first is to write the message in on the zigzag and take it off by rows. The other is to write it in rows and take it off on the zigzag.

Say we have “this is a test message,” we want to use 3 rails, and we’ll use an offset of 1.

.. S . T . M . A
T I I A E T E S G
.H . S . S . S . E

STMAT IIAET ESGHS SSE

Alternatively, if we have an offset of 0, write the message in on rows and take it off on the zigzag:

T . H . I . S . I
.S A T E S T M E S
. S . A . G . E

TSSAH TAEIS GTSME EIS

Redefence
Redefence is Railfence, but with a numeric row key:

3 .. S . T . M . A
1 T I I A E T E S G
2 .H . S . S . S . E

TIIAE TESGH SSSES TMA

Route Transposition
Sigh. Route Tramps could be an entire blog entry on their own. They’re particularly important because they’re often used for writing the alphabet into 5×5 or 6×6 grids for substitution types, such as Playfair and Twosquare. The important point is that the message length needs to allow for the formation of a complete rectangle (which is taller than it is wide). The plaintext can be written in an agreed-upon route, and then taken off on a different route. Mirror reflections, and vertical and horizontal flips are allowed. Note though that many of the routes can result in the same cipher text out.

Recognized routes include

Rows:
THIS
ISAT
ESTM
SGXX

Alternating rows:
THIS
TASI
ESTM
XXGS

Columns:
TIES
HSSG
IATX
STMX

Alternating columns:
TTEX
HASX
ISTG
SIMS

Diagonals (up or down):
TISS
HIES
STMX
ATGX

Alt. Diagonals:
TISS
HIET
STMX
ASGX

Spiral (CW or CCW):
THIS
MSGI
TXXS
SETA

Reverse Spiral (CW or CCW):
ASIX
TTSX
EHIG
STMS

Combining them, we can write our plaintext in alternating diagonals, and take it off in a CW spiral.

TISS
HIET
STMX
ASGX

TISST XXGSA SHIEM T

Sequence Transposition
This is one of the more fun systems, but it doesn’t appear in the Cm very often. The general guideline is that the message needs to be 100 to 150 letters long. First, pick a keyword or phrase 10 letters long, and a 5-digit “primer”. Write out the primer and extend it to the length of the message by adding the first two digits and appending the result (without the 10’s digit) at the end, and continue the process:

Primer: 96058

960585653311864294
thisisatestmessage

Write the keyword and under it number the letters in ascending alphabetic order, starting with 1:

CONTROLLER
1650873429

Under that, list the letters that appear under the digits of the primer sequence:

CONTROLLER
1650873429
----------
THSII ESAT
MAS E SE G
S

Pull the text off in columns in ascending numeric order and group in 5s:

TMAES SESSH ASIET GI

Put the primer at the beginning of the message, and the last sequence digit at the end as a checksum.

96058 TMAES SESSH ASIET GI 4

SudC
Sudoku Cryptogram hasn’t been used much in the Cm so far, so there’s little point to giving the full details here. Start with a completed Sudoku puzzle, and write the letters into each 3×3 in ascending digit order. Take the cipher text off in rows, and reduce the Sudoku puzzle to its regular form for the user to solve.

Partial example (top three rows):

135 246 789
246 789 135
789 135 246

“This is a much much longer messag”

TII HUH SAG
HSS LON GRE
AMU CMC EMS

TIIHU HSAGH SSLON GREAM UCMCE MS

Swagman
In effect, Swagman is a little like SudC. Pick a square size (i.e. – 4×4). The plaintext length has to be a factor of the square size (the ACA uses 3 to 6 times the square size). Fill in the square with the numbers 1-n such that the same number isn’t repeated on any of the rows or columns. Repeat the square as necessary.

Plain: “This could be much funnier if I had a joke planned out first.”

1234 1234 1234
2341 2341 2341
3412 3412 3412
4123 4123 4123

Write the text in rows across the squares:

THIS COUL DBEM
UCHF UNNI ERIF
IHAD AJOK EPLA
NNED OUTF IRST

Reorder the columns individually in ascending digit order:

TNAD CNOF DRLT
UCES UJTL EPSM
IHIF AUUI EREF
NNHD OONK IBIA

Take the cipher text off in columns and group in 5s:

TUINN CHNAE IHDSF DCUAO NJUOO TUNFL
IKDEE IRPRB LSEIT MFA

Just as a quick comment – up to this point I have been using Bion’s Swagman and Grille gadgets for “hand-solving” those ciphers. I may at some point write my own solvers for them.

======

Next up, Digression 1 – The Mammoth Book.

Published by The Chief

Who wants to know?

One thought on “Python GUI 018 – Transposition Solver, Part 1, Introduction

Leave a comment

Design a site like this with WordPress.com
Get started