Python GUI 035 – Transpo Groundwork 5 – Width Determination

I’m in the process of starting the first transposition autosolver, for Columnar transposition. I wanted to do a bit of testing first, and realized that I needed to get more groundwork out of the way before trying to write up the implementation of the solver.

So… widths
Several of the transposition types are based on “templates” that are more or less rectangular, such as Columnar, AMSCO and Myszkowski. Attacks against these types start with determining the width of the rectangle (the ACA guidelines state that rectangular forms are taller than they’re wide). For paper and pencil solvers, this may either be intuitive, or bruteforce (i.e. – writing out the cipher using various widths and seeing if anything looks promising). There is a statistical method that can be used for Columnar (both complete and incomplete), but a lot of people dislike math, and will avoid using statistics if possible.

One solving aid is the ACA guidelines for the cipher types. For Complete Columnar, the cipher should be between 7 to 15 times the length of the key. For Incomplete, it’s between 15 to 18 times. (In ACA terminology, 7 to 15 times the “period”, or 15 to 18 lines “deep”.)

That is, if the message is 80 characters long, a “complete” rectangle would allow for 5×16 or 8×10. Following the ACA guidelines, the “best” choice would be a key length of 8. For an “incomplete” rectangle, 80/15 = 5.3; 80/16 = 5.0; 80/17 = 4.7; 80/18 = 4.4.

The first thing to remember is that we can’t have a key with half a letter. Key lengths (or widths, depending on your preferences) are discrete integers. The second thing is that these are only guidelines, and a length of 4, or a depth of 14, aren’t absolutely impossible. On the other hand, a depth of 16 gives us a rectangle 5×16 = 80, which is “complete,” and can be ruled out; and a width of 4 results in another “complete” rectangle (4×20), and can be ruled out. This leaves us with one “questionable” option:

Width of 6, depth of 13(.333) lines

13 is less than the recommended minimum depth of 15 for Incomplete Columnar, but could be acceptable if the plaintext is entertaining enough.

But, I’m here for the autosolver approach, and a bruteforce attack in Python for widths up to 10 or 11 will take less than an hour. I could let the program run, go do dishes, and come back to see if I have a solution yet. Anything over 11 and I’ll use a hillclimber, although I’d rather not have to run the hillclimber individually for each width from 12 on up to 15. I’d prefer to have a good “guestimate” for the width with the most likely chance of being the correct one.

This is where statistics comes in. (Unfortunately, the following statistical method is good only for Columnar ciphers.)

In English, the vowels appear in regular sentences about 40% of the time. Including “Y,” it’s very close to 40%. We can use this knowledge to compare the percentages of letters that are vowels in each row of the rectangle for the various widths, and use the width that gives us the average deviation closest to 0 (or, the largest inverse square deviation, as shown below).

Say we have the following message:

TSNEN RYLNO YNSOU ERTOB URRWU LEIEO OEEON NSOUO
RIIEP TFSHN GIGIO LTBLB URRRE HBEPT FMEIS NRYRL
IILIE OEENT IGIOM F

The length is 96, and we’re not told if this is Complete or Incomplete. If it is Complete, we could have a rectangle 2×48, 4×24, 6×16 or 8×12. The guidelines allow for period times 7 to 15, which gives us a key width of 8 as a good start.

For Incomplete, the guidelines are a depth of 15 to 18. 96/15 = 6.4; 96/16 = 6; 96/17 = 5.6; 96/18 = 5.3.

Since we have to use integer widths, let’s test out 5, 6 and 7.

First though, let’s look at something a bit shorter.

FIRSTTHOUGH -- 3
LETSLOOKATS -- 4
OMETHINGABI -- 5
TSHORTER ----- 2

The first three lines are 11 long, and the fourth is 8. 40% of 11 is 4.4. 40% of 8 is 3.2. Taking the absolute differences of each number from the expected value, we get the deviations.

4.4 - 3 = 1.4
4.4 - 4 = 0.4
4.4 - 5 = 0.6 (absolute)
3.2 - 2 = 1.2
-------------
Total ... 3.6

Divide by the number of rows (4) and we get the average deviation of 3.6/4 = 0.9.

The closer to 0 we are, the more promising the test is.

According to William Friedman, who first described this test in his Military Cryptanalysis texts, the more accurate measure is to divide the expected number of vowels per row by the square of the average deviation. The bigger the result, the better.

4.4 / (0.9^2) = 4.4 / 0.81 = 5.4

Granted, the numbers are bit skewed for such a short message, but we can try with a different width just for the LULZ. A width of 8 gives an expected count of 8 x .4 = 3.2.

FIRSTTHO - 2 = 1.2
UGHLETSL - 2 = 1.2
OOKATSOM - 4 = 0.8
ETHINGAB - 3 = 0.2
ITSHORTE - 3 = 0.2
R -------- 0 = 0.4
Total ........ 4.0

4 / 6 = 0.67
3.2 / (0.67^2) = 7.2

Obviously, we’re working with plaintext, which isn’t scrambled. Even so, the above examples point out two things. First, the results we get will depend on the actual messages we use, showing that variations are a part of the game, and statistics are only going to get us so far. Second, statistics can help us set up an hypothesis that we can then prove or disprove. That is, for Columnar Transpositions, the widths that give us the smallest average deviation, or the largest inverse-squared value, have a better chance of being the correct width for the message under test.

Let’s go back to our test message.

TSNEN RYLNO YNSOU ERTOB URRWU LEIEO OEEON NSOUO
RIIEP TFSHN GIGIO LTBLB URRRE HBEPT FMEIS NRYRL
IILIE OEENT IGIOM F

We’ve already established that our expected width range is from 5 to 7 (plus 8 for Complete). We need to write the cipher column-wise (depth-wise), and we’ll start with 5 columns. Expected = 5 * 0.4 = 2.

TUOLY . 3 . 1.0
SRRBR . 0 . 2.0
NRIUL . 2 . 0.0
EWIRI . 3 . 1.0
NUERI . 3 . 1.0
RLPRL . 0 . 2.0
YETEI . 4 . 2.0
LIFHE . 2 . 0.0
NESBO . 2 . 0.0
OOHEE . 4 . 2.0
YONPE . 3 . 1.0
NEGTN . 1 . 1.0
SEIFT . 2 . 0.0
OOGMI . 3 . 1.0
UNIEG . 3 . 1.0
ENOII . 4 . 2.0
RSLSO . 1 . 1.0
TOTNM . 1 . 1.0
OUBRF . 2 . 0.0 - Depth = 19
---------------
Total ---- 19.0

Expected = 6 * 0.4 = 2.4

TREHEI . 3 . 0.6
STONHI . 2 . 0.4
NONGBL . 1 . 1.4
EBNIEI . 4 . 1.6
NUSGPE . 2 . 0.4
RROITO . 3 . 0.6
YRUOFE . 4 . 1.6
LWOLME . 2 . 0.4
NURTEN . 2 . 0.4
OLIBIT . 3 . 0.6
YEILSI . 4 . 1.6
NIEBNG . 2 . 0.4
SEPURI . 3 . 0.6
OOTRYO . 4 . 1.6
UOFRRM . 2 . 0.4
EESRLF . 2 . 0.4 - Depth 16
----------------
Total ----- 13

Expected = 7 * 0.4 = 2.8

TUEITFI . 4 . 1.2
SEOEBME . 4 . 1.2
NROPLEO . 3 . 0.2
ETETBIE . 4 . 1.2
NOEFUSE . 4 . 1.2
RBOSRNN . 1 . 1.8
YUNHRRT . 2 . 0.8
LRNNRYI . 2 . 0.8
NRSGERG . 1 . 1.8
OWOIHLI . 4 . 1.2
YUUGBIO . 5 . 2.2
NLOIEIM . 4 . 1.2
SEROPLF . 2 . 0.8 - Depth 13
-----------------
Total ------ 15.6

Expected = 8 * 0.4 = 3.2

TSUSHUEE . 4 . 0.8
SOLONRIO . 4 . 0.8
NUEUGRSE . 4 . 0.8
EEIOIRNE . 6 . 2.8
NRERGERN . 2 . 1.2
RTOIIHYT . 4 . 0.8
YOOIOBRI . 6 . 2.8
LBEELELG . 3 . 0.2
NUEPTPII . 4 . 0.8
OROTBTIO . 4 . 0.8
YRNFLFLM . 1 . 2.2
NWNSBMIF . 1 . 2.2 - Depth 12
-----------------
Total ------ 16.2

Average Deviations
------------------
5) 19.0 / 19 = 1
6) 13.0 / 16 = 0.8125
7) 15.6 / 13 = 1.2
8) 16.2 / 12 = 1.35

Inverse Square
--------------
5) 2.0 / (1.0^2) ==== 2.00
6) 2.4 / (0.8125^2) = 3.64
7) 2.8 / (1.2^2) ==== 1.94
8) 3.2 / (1.35^2) === 1.76

Our hypothesis will be based on the width of 6 having the smallest average deviation of 0.8, and the largest inverse square of 3.64. The next step would be to anagram the six 16-deep columns of the message to try to pull the correct plaintext out (which is the task of the autosolver).

I’ll state right now that the key was derived from the keyword ERNEST. You can solve the message yourself if you like.

— Disclaimer —

The one or two of you reading this write-up may be pulling your hair out and screaming “those aren’t incomplete rectangles!” and you’d be right. I’ve simplified the displays for widths 5, 6 and 7 because I didn’t want the extra work of formatting them. But, the numbers for Incomplete are approximations anyway, because I’m simply assuming that the first “column segment” in the message is one of the longer ones. If that’s not true (which is most likely), then the vowel counts will be off in all of the calculations. But, as was seen in the plaintext examples, the real world is messy, and even straight plaintext when calculated correctly can give weird numbers during the statistical testing. (I mean, regardless of whether I used a width of 8 or 11, plaintext is plaintext, yet I got inverse square results of 5.4 and 7.2 for the same “correct” texts.)

This is a case where “good enough for jazz” is good enough. If a width of 6 turned out to be wrong, we’d just have to go down the list and try each of the other widths one at a time until we got to the correct answer. And as I wrote earlier, an autosolver bruteforce-attacking the key will finish off widths up to 10 pretty quickly.

Next up: The Python test code.

Published by The Chief

Who wants to know?

One thought on “Python GUI 035 – Transpo Groundwork 5 – Width Determination

Leave a comment

Design a site like this with WordPress.com
Get started