Python GUI 040 – Transpo – Transposition Utils

I lied. I’m not quite ready to show the Route main file. Instead, I need to cover the general transpo_utils.py file. It’s general in that it includes functions that are used by at least a few of the other transposition cipher types, and not just Route transposition. I’m not done with all of the transposition solvers yet, so I may end up adding a few more functions by the end of this.

Most critically, it contains the TranspoSolverClass, which is the main communications path between the transpo_gui.py caller, and the individual auto-solvers that do all of the work in time slices.

As with the Cryptarithm solver, the TranspoSolverClass object holds the current value of the key during processing, the cipher type, the message, and whether the processing has finished. In addition, we have the minimum and maximum test widths, the solver option selector (brute force, or hillclimber), the hillclimber options, the ngram counter options, and the suggestions for potentially correct output message text.

I should mention here that I ran into an “issue” when testing the railfence solver just now. In the below member list for TranspoSolverClass, I have something called “cnt.” cnt is used to tell the solver what the highest n-gram count has been in previous time slices, and how many solutions to return back to transpo_gui.py (“top offset”). (cnt = [0, 5])

That is, say I’m solving for Cadenus, and it’s taken me 20 seconds to get this far. Since the solver times out after one second, I’ve finished 20 time slices. Every time I test a solution, I check the number of n-grams in the proposed plaintext, and the higher scores get appended to solutions to return to transpo_gui.py. cnt[0] then stores the highest n-gram count so far (say, 100). But, the actual correct plaintext may not be the one with the highest n-gram count (in fact, in the railfence cipher I was testing, the top score was 59, and the correct solution was a 51). I use offset (cnt[1]) to tell the solver to append every solution with an n-gram count >= cnt_max – offset.

Normally, I use cnt = [0, 5]. But, I set cnt in two places – in the member definitions, and in init(). I forgot that, and when I tried changing offset to 10 (cnt = [0, 10]), I only changed the definition and not init(). So now, I’m declaring cnt_offset in the definitions, and using that to set cnt offset elsewhere. It’s still normally 5, but for the problematic railfence cipher, I needed to set it to 10.

Class members:

cnt_offset. - Master value for selecting the top n solutions
cipher_type - String name for current cipher
key ....... - Either initialized, or last key value tested
done ...... - Testing status, finished or still going
attack_type - Bruteforce, hillclimbing, wordlist, etc.
solutions . - Solutions with high ngram counts
msg ....... - The cipher message as a list
hint ...... - Provided crib text, if any
width_min . - Starting width for rectangular types
width_max . - Stopping width for rectangular types
option .... - Used for cipher types with multiple options
hill ...... - Hillclimbing min. limit, loop count, etc.
cnt ....... - n-gram count data (count max, top offset)

Functions in this file:

make_key_from_string() : Convert any key format to a list
check_len() .......... : Return the length of the cipher text
get_factors() ........ : Get factors of message length
defactor() ........... : Take pairs of factors and separate them
ravel() .............. : Encrypt message based on position data
unravel() ............ : Decrypt message based on position data
key_to_str() ......... : Convert key list to a string
suggested_widths() ... : Suggest rectangle widths based on message

........................ length
weight_widths() ...... : Suggest width for Columnar transpositions

####################
# transpo_utils.py #
####################

# Imports.

from crypt_utils import string_cleaner, sort_second
from math import sqrt, floor, ceil
from inc_utils import make_key

# Initialization.
# Constant used for testing a range of factors for
# columnar transpositions.

MIN_FACTOR_WIDTH = 3

# Definition for the main data messaging class.

class TranspoSolverClass():
... cnt_offset = 5
... cipher_type = ''
... key = []
... done = False
... attack_type = 0
... solutions = []
... msg = ''
... hint = ''
... width_min = 3
... width_max = 10
... option = 0
... hill = [11, 5000, 500]
... cnt = [0, cnt_offset] # [ngram max, ngram offset]

# Object initialization, the starting values come from
# transpo_gui, which makes the call just before beginning
# the auto-solve process
.

... def init(self, entries, hillsettings):
....... msg_raw = entries['cipher_area'].get(1.0, 'end').strip()
....... hint_raw = entries['hint'].get().upper()

....... self.con_type = entries['type'].get().upper()
....... self.msg = string_cleaner.clean(msg_raw)
....... self.hint = string_cleaner.clean(hint_raw)
....... self.width_min = entries['min_val'].get()
....... self.width_max = entries['max_val'].get()
....... self.option = entries['option'].get()
....... self.hill = hillsettings
....... self.cnt = [0, self.cnt_offset]
....... self.ret_msg = ''

....... self.done = False
....... self.solutions = []

# Route, Fixed, Myszkowski and Cadenus don't use the
# standard keys with non-repeating digits. For everything
# else, initialize the key as ascending digits for the
# given minimum width of the key.

....... if self.con_type not in
.......... 'ROUTE TRANSPOSITION FIXED MYSZKOWSKI CADENUS':
........... self.key = make_key(self.width_min)

# For Cadenus, if we're doing a wordlist attack, just
# set the key to 0. For a bruteforce attack, set the
# key to a list of 0's ([0, 0, 0]) for the minimum width
# of the key.

....... elif self.con_type == 'CADENUS':
........... if self.attack_type == 0:
............... self.key = 0
........... else:
............... if len(entries['key'].get().strip()) == 0:
................... self.key = [0] * self.width_min
............... else:

# However, if the key field in the GUI is not blank, use
# what the user entered for picking up solving at a
# desired point in the key space.

................... self.key = []
................... for ch in entries['key'].get().upper():
....................... self.key.append('AZYXVUTSRQPONMLKJIHGFEDCB'.\

....................................... find(ch))

# Put the object values into a format for pretty
# printing, used for testing.

... def format_settings(self):
....... ret = []
....... ret.append('Con Type: %s' % self.con_type)
....... ret.append('Key: %s' % self.key)
....... ret.append('Attack type: %s' % self.attack_type)
....... ret.append('Hint: %s' % self.hint)
....... ret.append('Option: %s' % self.option)
....... ret.append('Hill: %s' % self.option)
....... ret.append('Done: %s' % self.done)
....... ret.append('Width Min: %s' % self.width_min)
....... ret.append('Width Max: %s' % self.width_max)
....... ret.append('Cnt settings: %s' % self.cnt)
....... ret.append('Return Msg: %s' % self.ret_msg)
....... ret.append('Message: %s' % self.msg)
....... ret.append('Solutions: %s' % self.solutions)

....... return '\n'.join(ret)

# get() function for returning the variable data to
# the auto-solver for initialization prior to doing
# the solve run for that time slice.

... def get(self):
....... return self.msg, self.key, self.hint, self.attack_type, \

.............. self.option, self.width_min, self.width_max, self.cnt

# Make the transpo_solver_data object global

transpo_solver_data = TranspoSolverClass()

"""
make_key_from_string()
This turned into kind of an all-purpose key reformatting operation. The user can enter a key in the GUI in one of a few different forms:
12345 - Simple digits, no spaces. Allows for 0-9.
1 2 3 - Simple digits, w/ spaces. Allows for 0-infinity.
ABCDE - Simple letters. Allows for a keyword for conversion.

In all cases, take the key and put it in a list of unique integers.
"""

def make_key_from_string(s):

# If the key is blank, return.

... if len(s.strip()) == 0:
....... return ''

# Test if the key contains spaces.

... ret = []
... offset = 0
... temp = s.split(' ')

# If it does, assume that it's numeric. If there's
# no 0, assume that it starts numbering at 1, set
# the offset to 1, and just append the values in the
# key to a list as ints.

... if len(temp) > 1:
....... if '0' not in s: offset = 1
....... for e in temp:
........... ret.append(int(e) - offset)
....... return ret

# We have a key that's not space-delimited. Test if it's alphabetic.

... elif s[0].isalpha():
....... ret = [0] * len(s)
....... cntr = 0

# It is. Remove any characters that aren't in the legal
# list of letters for the current language, and build up
# the key, assigning unique values to repeating letters.

....... for a in string_cleaner.letters:
........... if a in s:
............... for ptr in range(len(s)):
................... if a == s[ptr]:
....................... ret[ptr] = cntr
....................... cntr += 1
....... return ret

# We have a numeric key. Make sure numbering starts at
# 0 (as above) and convert the key characters to ints.

... else:
....... if '0' not in s: offset = 1
....... for e in s:
........... ret.append(int(e) - offset)
....... return ret

"""
check_len()
Some of the cipher types, like Cadenus and Nihilist, have to be particular lengths. For Cadenus, it's a factor of 25, and some of the others need to be square or rectangular. The GUI will have a "Check Length" button which will display the current plaintext length in the Message field at the top of the screen.
"""

def check_len(entries):
... msg = string_cleaner.clean(entries['in_area'].get(1.0, 'end'))

... entries['message'].set('Message length: %s' % (len(msg)))

"""
get_factors()
A couple cipher types, like Complete Columnar, need to be complete rectangles. For auto-solving, when the user is entering the starting and stopping key widths, determine the factors of the message length in advance and return them as matched pairs for display in the Message window. That is, if the message length is 36, return the pairs as [3, 12], [4, 9], [6, 6]. Start at the value in MIN_FACTOR_WIDTH, and only display as "width smaller than height."
"""

def get_factors(lenval):

... results = []
... sr = int(sqrt(lenval)) + 1
... for n in range(MIN_FACTOR_WIDTH, sr):
....... if lenval % n == 0:
........... results.append([n, int(lenval/n)])

... return results

"""
defactor()
As mentioned above, get_factors() returns the factors of the message length as matched pairs. If we need to use the pairs for calculating the best widths for Columnar types, return them as a flat list.
"""

def defactor(factor_pairs):
... ret = []
... for pair in factor_pairs:
....... for e in pair:
........... ret.append(e)

... return ret

"""
ravel()
As described in the transpo introduction, transposition ciphers encrypt the positions of the plaintext message. Therefore, if we number the positions and perform the encryption step, both encryption and decryption become virtually identical. This is a process I call "raveling" and "unraveling."

ravel() - Take the message characters and put them in encrypted position order. Return as an enciphered string.
unravel() - Take the encrypted position order and put the cipher message characters back into their original order. Return as a plaintext string.

"""

def ravel(msg, order):
... ret = [''] * len(msg)

... for i in range(len(msg)):
....... ret[order.index(i)] = msg[i]

... return ''.join(ret)

def unravel(msg, order):
... ret = [''] * len(msg)

... for i in range(len(msg)):
....... ret[i] = msg[order.index(i)]

... return ''.join(ret)

"""
key_to_str()
Take a numeric list and return it as a string.
Unless the caller specifies a starting value of 0, offset all of the digits by 1 as a default.
"""

def key_to_str(key, offset=1):
... ret = ''
... for i in key:
....... ret += str(i+offset)
... return ret

"""
suggested_widths()
The American Cryptogram Association (ACA) has suggested widths for each of their transposition cipher types. I'm using these suggestions along with the message length to tell the user what the more probable minimum and maximum test key widths should be.
"""

def suggested_widths(con_type, msg, msglen):

# amin, amax: ACA suggested values
# skip_calc: Skip the calculation if there can only be 1 value
# retmsg: Return the results as a string.

... amin, amax = -1, -1
... skip_calc = True
... retmsg = ''

... if con_type == 'AMSCO':
....... amin, amax, skip_calc = 8, 12, False

# Cadenus length needs to be a factor of 25.

... elif con_type == 'CADENUS':
....... if msglen % 25 == 0:
........... amin, amax = int(msglen/25), int(msglen/25)
....... else:

........... retmsg = \
...............'Message length (%s) not a factor of 25, between 4-6' \
............... % (msglen)

# For Complete and Incomplete Columnar, use the statistical
# distribution of vowels per row to suggest the best
# starting width to test with.

... elif con_type == 'COMPLETE COLUMNAR':

# Start as normal.
....... amin, amax = floor(msglen / 15), ceil(msglen / 8)

# Get the suggested widths in descending order of probability
....... weighted_order = weight_widths(msg, msglen, amin, amax)

# Get the factors of a complete rectangle
....... factors = defactor(get_factors(msglen))
....... retmsg = ['Weighted factors of %s:' % (msglen)]

# Use only those widths in the ACA suggested range.
....... for e in weighted_order:
........... if e in factors:
............... retmsg.append(str(e))

....... retmsg = ' '.join(retmsg)

... elif con_type == 'INCOMPLETE COLUMNAR':
....... amin, amax = floor(msglen / 18), ceil(msglen / 15)

....... retmsg = []
....... weighted_order = weight_widths(msg, msglen, amin, amax)
....... for e in weighted_order:
........... retmsg.append(str(e))
....... retmsg = 'Weighted order of probability: ' + \

................ (', '.join(retmsg))

... elif con_type == 'MYSZKOWSKI':
....... amin, amax, skip_calc = 12, 15, False

# Nihilist needs to be in a square.

... elif con_type == 'NIHILIST TRANSPOSITION':
....... if sqrt(msglen) != int(sqrt(msglen)):
........... retmsg = 'Message length (%s) not a square' % (msglen)
....... else:
........... amin, amax = int(sqrt(msglen)), int(sqrt(msglen))
........... if amin > 10:
............... retmsg = \

................ 'WARNING: ACA guidelines say a max of 10x10 (is %s)'\
................ % (amin)

# Railfence and Redefence both use a range between 3 and 7 rails.

... elif con_type == 'RAILFENCE':
....... amin, amax = 3, 7

... elif con_type == 'REDEFENCE':
....... amin, amax = 3, 7

# Route needs to be a complete rectangle.

... elif con_type == 'ROUTE TRANSPOSITION':
....... factors = defactor(get_factors(msglen))
....... f = []
....... for e in factors:
........... f.append(str(e))
....... retmsg = 'Weighted factors of %s: %s' % (msglen, ', '.join(f))

# If skip_calc is False, calculate the suggested key
# widths from amax and amin.

... if not skip_calc:
....... amin, amax = floor(msglen / amax), ceil(msglen / amin)

... return amin, amax, retmsg

"""
weight_widths()
Calculate the most probable key width for Columnar transpositions based on the distribution of vowels in each row of the rectangle, given that AEIOUY represents 40% of the message text, on average.
As given by William Friedman in his MILCRYPT textbooks.
"""

def weight_widths(msg, msglen, testmin, testmax):

... devs_list = []

# Test each width in the range.

... for width in range(testmin, testmax + 1):

# Calculate the size of a complete rectangle, plus the residue.

....... lengthnorm = int(msglen/width)
....... leftover = msglen - lengthnorm * width

....... offsets = [lengthnorm] * width
....... for i in range(leftover):
........... offsets[i] += 1

# Build the rectangle and populate it with the message.

....... box = []
....... ptr = 0
....... for i in range(width):
........... box.append(msg[ptr: ptr + offsets[i]])
........... ptr += offsets[i]

# Set up the deviation calculations.

....... devs = []
....... expected = width * 0.4
....... total_dev = 0

# Go through each line and get the deviation of counted
# number of vowels from the expected.

....... for i in range(lengthnorm):
........... vowelcnt = 0
........... s = ''
........... for j in range(width):
............... ch = box[j][i]
............... s += ch
............... if ch in 'AEIOUY':
................... vowelcnt += 1

# Add to the total deviations for the rectangle.

........... total_dev += abs(expected - vowelcnt)
........... devs.append(abs(expected - vowelcnt))

# Calculate the statistics for the current width.

....... sumdevs = sum(devs)
....... avgsumdevs = sumdevs/lengthnorm
....... sqravgdevs = avgsumdevs ** 2
....... sqrdevs = expected/sqravgdevs

# Add to the list of deviation per width.

....... devs_list.append([width, sqrdevs])

# Descending sort on size of squared deviation.

... devs_list.sort(key=sort_second, reverse=True)
... ret = []

# Just return the widths, with best suggestion at the top.

... for elem in devs_list:
....... ret.append(elem[0])

... return ret

Next up: No idea

Published by The Chief

Who wants to know?

Leave a comment

Design a site like this with WordPress.com
Get started