Python GUI 030 – Transpo Groundwork 2

The harder I try to catch up, the further behind I fall. I’ve currently got code for encrypting, decrypting and displaying worksheet data for all but three planned transposition cipher types (Nihilist, Route and Sequence), and I’m holding off on addressing three others (SudC, Grille and Swagman). I’m pretty sure I could implement Sequence in a couple hours, but there are too many interruptions right now. What I do have are: AMSCO, Cadenus, Columnar, Fixed, Myszkowski, Railfence and Redefence. Once I get the next three implemented, I need to go back and write bruteforce and/or hillclimber solvers for all but Myszkowski (which requires a wordlist attack).

For right now, for filler, I’ll write a little about implementing language support for Xenocrypts (ciphers in other languages).

As mentioned before, language support has two basic parts – vocabulary (wordlists) and letter pattern frequencies (n-gram counts). Underlying these two components are the letters that make up that language’s “alphabet”. For transposition ciphers, it’s not as important to keep the alphabets to a limit of 26 letters, but it is necessary to declare which characters are legal for any given alphabet, in part for removing spaces and periods from formatted cipher text while keeping umlauts, if for nothing else.


(Menu bar)

When Transpo Solver first opens, it defaults to English. Originally, I wanted the pulldown menu to update dynamically, but I discovered that wouldn’t work. I’d collected the filenames of my different language wordlist files, then appended them to the submenu, but when I ran the program and opened the submenu, all of the entries were duplicates of the last item entered. Instead, I just have the one entry, initially for English.

... lang_menu = Menu(menubar, tearoff=0)
... lang_menu.add_command(label='English', command = lambda: select_language(lang_menu))

... menubar.add_cascade(label='Languages', menu=lang_menu, underline=0)

Then, I call the function for reading my resource directory for all filenames starting with “wordlist.”

from cons_parser_update_wordlists import get_filelist

def get_filelist(path, filepre, filetype):
... filelist = []

... for root, dirs, files in os.walk(path):
....... for file in files:
........... file = file.lower()
........... if file.startswith(filepre) and file.endswith(filetype):
............... temp = file.replace('wordlist_', '').replace('.txt', '')
............... filelist.append(temp)
... return filelist

The idea here is to get the names of the wordlists, but only display the part that has the specific language names.

language_list is a global variable. And in _init(), we initialize it as follows:

language_list = get_filelist('c:/resources/', 'wordlist_', '.txt')

Finally, when the user (me) clicks “Language” in the menu bar, I call select_language().

def select_language(menu):
... global language, language_list

... answer= GotoRecordBox(app, wtitle='Pick language',
........... recordset=language_list, width=20)
... if answer is not None:
....... language = language_list[answer.result[0]].capitalize()
....... menu.entryconfig(0, label = language)

GotoRecordBox() is one of the message box types defined in button_utils.py (described in a previous blog entry).

def select_language(menu):

# language will hold the user's selection.
# language=list has the list of wordlist file names from the

# resource folder.

... global language, language_list

# Use GotoRecordBox() to display our list of languages, and return the
# language name (i.e. - "French" or "German") representing the user's
# selection.

... answer= GotoRecordBox(app, wtitle='Pick language',
........... recordset=language_list, width=20)

# If the user selected something, continue.

... if answer is not None:
....... language = language_list[answer.result[0]].capitalize()

# Change the value of the language label to display the new selection.

....... menu.entryconfig(0, label = language)


(Languages list)


(After selecting French)

What follows here is strictly speculative, because I haven’t been able to implement it yet.

Currently, my thought is to add three new items to the entries{} dict object. The first is ngram. The second is wordlist. The third is legal.

cons_parser_update_wordlists.py already has the function get_character_lists(), which returns legal.

def get_character_lists(language):
... lang = language.lower()

... if lang == 'english':
....... nosp = "'’"
....... tosp ='\n#,.!?¿*+"();:--—“”/1234567890$‘=…_'
....... legal = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ '
....... sub = {}

... return nosp, tosp, legal, sub

I could use this to clean the message string as a preparatory step to encryption or decryption, although I’d have to strip out the trailing space first.

For simple encryption and decryption:

entries[‘wordlist’] isn’t a requirement.
entries[‘ngrams’] isn’t a requirement.
entries[‘letters’] = get_character_lists() legal.strip().

Right now, encrypt_decrypt(action) starts with:

def encrypt_decrypt(action):
... global entries

... language = LETTERS

Obviously, this is bad. LETTERS is a global from crypt_utils.py that contains ‘ABCDEFGHIJKLMNOPQRSTUVWXYZ’. I need to change language to something else, most likely legal. I.e.:

def encrypt_decrypt(action):
... global entries

... legal = entries['letters']

For auto-solving:

entries[‘wordlist’] wouldn’t strictly be needed for most of the transposition cipher types, with the exception of Myszkowski, which is based on having pattern words for the keyword. A bruteforce attack on the Myszkowski keyword just consists of trying everything in the wordlist to see which word comes closest to producing readable plaintext for that language. There is the difficulty of an English keyword for a xenocrypt message, which requires entries[‘wordlist’] to be in English, but entries[‘ngram’] to be for the target language.

entries[‘ngrams’] is critical for the auto-solvers, because counts of the ngrams in each attempted decryption will be used for determining how close the result comes to being a valid string in that language. I expect counting will be handled as described previously:

def get_ngram_cnt(msg, ngrams):
... cnt = 0
... for g in ngrams:
....... cnt += cipher.count(g)

return cnt

The function call would look like:

decrypt_text = decrypt(msg, key)

cnt = get_ngram_cnt(decrypt_text, entries['ngrams'])

The only facet missing now is the function for reading the ngrams from the language_ngram.txt. I need to add a new Options menu item for specifying how many 2-, 3- and 4-grams to load, or to only load 2-grams and not the other two. The defaults will probably be the top 50 most frequent for the language for each 2-, 3- and 4-gram. And they’ll be combined into one list object. I imagine the call will look like:

entries['ngrams'] = init_ngrams(language, params)

We’ll see what happens when I get that far.

Next up: Don’t know.

Published by The Chief

Who wants to know?

Leave a comment

Design a site like this with WordPress.com
Get started