Python GUI 033 – Transpo Groundwork 4

Time to get philosophical again. But I’m not digging for filler right now.

I’ve finished the encryption/decryption and show worksheet functions for the last two cipher types (nihilist and route), and I’m preparing to add the multi-language support functions. This is aiming me directly at a wall I’d thought I knew how to work around.

Step 1 is to add the language selection list to the main menu.
This is done.

Step 2 is to implement a function for returning the legal characters list for each language.
This is done.

Step 3 is to implement a function called clean_string() that takes a message (plain or cipher text) and the legal string, and returns only the characters in the message that are also in the legal string.
This had been done, but now I’m hitting the wall.

Step 4 is to decide how to deal with the wall.

Ok, here’s what I’d had.
cleanstring() and LETTERS were both in a module called crypt_utils.py. To use them, I’d import both at the beginning of every file as needed, such as:

from crypt_utils import cleanstring, LETTERS

I’d call the function as:

msg = 'This is a test message.'
cleaned = cleanstring(msg.upper())

In crypt_utils.py, I had:

LETTERS = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'

def cleanstring(msg, alphabet=LETTERS):
... ret = []
... for ch in msg:
....... if ch in alphabet:
........... ret.append(ch)

... return ''.join(ret)

Printing cleaned, I’d get out:

'THISISATESTMESSAGE'

Although, technically LETTERS didn’t need to be imported unless I wanted to do something with a Polybius square, where there’s no J in the alphabet. Example:

cleaned = cleanstring(msg, LETTERS.replace('J', '')

All very well and good. If I wanted to switch languages, I could do something like:

cleaned = cleanstring(msg.upper(), FRENCH_LETTERS)

Unfortunately, by having a GUI file (transpo_gui.py), I’m introducing a whole lot of levels of interaction that I’ve never had to deal with before. The case in point is transpo_route.py, which has a supporting module called transpo_route_utils.py.

transpo_gui.py uses cleanstring() for the encrypter/decrypter to preprocess the input message prior to calling the appropriate cipher type e/d function. transpo_route.py uses cleanstring() to clean up the keys (keywords or expressions) as a part of ensuring that they are in a valid format (each cipher type has different format requirements). transpo_route_utils.py also uses cleanstring() as a kind of redundancy to make sure that I don’t accidentally let invalid characters slip through to some operation that could throw an error.

As I’m typing this up, I’m realizing that I have these redundancies, and part of the philosophical question is: “Do I keep them, or do I try to simplify the text processing path so much that calls to cleanstring() only take place in transpo_gui.py?” I don’t think I can foresee every possible situation and isolate cleanstring() that completely.

One thought is to turn LETTERS in crypt_utils.py into letters, and do something like the following:

crypt_utils.py

letters = ''

def set_alphabet(l):
... global letters
... letters = l

def cleanstring(msg):
... global letters

... ret = []
... for ch in msg:
....... if ch in letters:
........... ret.append(ch)

... return ''.join(ret)

It’s a minor change, but it means that I don’t have to pass the legal alphabet to a bunch of intermediary functions in order to get it down to the level where the cleanstring() call is made.

This is in contrast to using a class approach:

class CleanString():
... letters = ''

... def clean(self, msg):
....... ret = []
....... for ch in msg:
........... if ch in self.letters:
............... ret.append(ch)

....... return ''.join(ret)

... def __init__(self, l):
....... self.letters = l

On its surface, the class approach leaves me where I was at the beginning, because I somehow have to get the legal character string from transpo_gui.py down into crypt_utils.py. Before worrying about that detail, there’s a different issue – that of speed.

One of the biggest factors in deciding how to implement a specific task is what kind of time hit any given implementation will give me. cleanstring() isn’t going to be called millions of times in a loop, under normal conditions, but still…

In the cryptarithm solver, I had cases where I’d be running through every permutation of a key from 0,1,2… to …,2,1,0. I.e. –

012
021
102
120
201
210

With a 10-digit key, that’s a maximum of 10!, or 3,628,800 loops through whatever process a given task requires. No matter how fast the individual iteration of the process is, delays will add up eventually. Transposition ciphers usually use numeric keys that can get up over 10 digits wide. Is there an approach to something like cleanstring() that is faster or slower than the alternatives?

The test: Increment an integer enough times to get a decent approximation of the execution time for a single operation. Run the test 10 times, and take the average.

1) Class approach:

class Tri():
... def reset(self):
....... self.val = 0

... def inc(self):
....... self.val += 1

... def __init__(self, val):
self.val = 0

tri = Tri(0)
times = []

for loop in range(10):
... start = time()

... for k in range(10000):
....... tri.reset()

....... for j in range(10000):
........... tri.inc()

... times.append(time() - start)

print('Average: %s' % (sum(times)/len(times)))

Average time: 14.564 seconds.

2) Simplified class approach:

class Tri():
... val = 0

tri = Tri()
times = []

for loop in range(10):
... start = time()

... for k in range(10000):
....... tri.val = 0

....... for j in range(10000):
........... tri.val += 1

... times.append(time() - start)

print('Average: %s' % (sum(times)/len(times)))

Average time: 8.492 seconds

3) Dictionary approach with string index:

tri = {}
times = []

for loop in range(10):
... start = time()

... for k in range(10000):
....... tri['i']= 0

....... for j in range(10000):
........... tri['i'] += 1

... times.append(time() - start)

print('Average: %s' % (sum(times)/len(times)))

Average time: 7.775 seconds

4) Dictionary approach with digit variable index:

tri = {}
times = []
i = 0

for loop in range(10):
... start = time()

... for k in range(10000):
....... tri[i]= 0

....... for j in range(10000):
........... tri[i] += 1

... times.append(time() - start)

print('Average: %s' % (sum(times)/len(times)))

Average time: 9.053 seconds

5) Multi-element list with digit variable selector:

tri = [0, 0]
times = []
i = 0

for loop in range(10):
... start = time()

... for k in range(10000):
....... tri[i]= 0

....... for j in range(10000):
........... tri[i] += 1

... times.append(time() - start)

print('Average: %s' % (sum(times)/len(times)))

Average time: 7.838 seconds

6) Multi-element list with hardcoded selector:

tri = [0, 0]
times = []

for loop in range(10):
... start = time()

... for k in range(10000):
....... tri[0]= 0

....... for j in range(10000):
........... tri[0] += 1

... times.append(time() - start)

print('Average: %s' % (sum(times)/len(times)))

Average time: 7.229 seconds

7) Simple numeric variable:

times = []

for loop in range(10):
... start = time()

... for k in range(10000):
....... tri = 0

....... for j in range(10000):
........... tri += 1

... times.append(time() - start)

print('Average: %s' % (sum(times)/len(times)))

Average time: 6.071 seconds

Class 1) 14.564 s
Class 2) .8.492 s
Dict 1) ..7.775 s
Dict 2) ..9.053 s
List 1) ..7.838 s
List 2) ..7.229 s
Simple) ..6.071 s

I already knew from the cryptarithym solvers that intensive processing with data stored in objects was slow. These tests just confirm that. As mentioned above though, cleanstring() isn’t going to be called all that often under normal conditions.

However, there is one potential complication – that’s when I get to substitution cipher types that modify what’s considered “legal.” Such as a 5×5 Playfair that combines J with I; a 6×6 Playfair that keeps the J, and adds 0-9. Or, numeric-only ciphers, where I still need to remove spaces, periods and newline characters.

[Edit: I’d written some verbiage about how I could approach cleanstring(), but I then went ahead the next day and attacked the problem. A few days later, I’d succeeded in debugging everything, and the code seems to work ok. This is what I decided on.]

crypt_utils.py

class StringCleaner():

# Set up some globals

... letters = ''
... nospace = ''
... tospace = ''
... substitute = ''
... digits = '0123456789'

# Define the standard cleaner function

... def clean(self, msg):
....... ret = []
....... for ch in msg.upper():
........... if ch in self.letters:
.............. ret.append(ch)
....... return ''.join(ret)

# Define the cleaner to include digits in the legal string

... def clean_wdigits(self, msg):
....... ret = []
....... valid = self.letters + self.digits

....... for ch in msg.upper():
........... if ch in valid:
............... ret.append(ch)
....... return ''.join(ret)

# Define a function to set the globals for a new language

... def set(self, l, n, t, s):
....... self.letters = l
....... self.nospace = n
....... self.tospace = t
....... self.substitute = s

# Define a function to display the values of the global vars
# Right now, substitute isn't being used

... def show(self):
....... sub = self.substitute
....... if len(sub) == 0:
........... sub = 'Not set'
....... return \

.........'letters=|%s|\nnospace=|%s|\ntospace=|%s|\nsubstitute=|%s|' \
......... % (self.letters, self.nospace, self.tospace, sub)

# Just set up initialization as a dummy operation

def __init__(self):
self.letters = 'not initialized'

# string_cleaner() is a global to be used in place of the old
# cleanstring() function

string_cleaner = StringCleaner()

"""
In transpo_gui.py, I set string_cleaner() to English by default by first calling set_language_params(), which is a new function that I'm not quite finished with (I need to add n-gram handling, next).
"""

In transpo_gui.py __init__():

... default_language = 'English'
... language_list = get_filelist('resources/', 'dictionary_', '.txt')
... set_language_params(default_language)

def set_language_params(language):
... global entries

# get_character_lists() is in cons_parser_update_wordlists.py

... nospace, tospace, legal, substitute = \
....... get_character_lists(language)
... entries['language'] = language # Keep for n-gram functions
... string_cleaner.set(legal.strip(), nospace, tospace, substitute)

... print(string_cleaner.show())

Next up: Maybe examples of the encryption/decryption GUI screens.

Published by The Chief

Who wants to know?

Leave a comment

Design a site like this with WordPress.com
Get started