Levenshtein squares: symmetrical word puzzles embedded in Connections

Posted 2024-10-23 #gamedev

Earlier this week, I had a terrible idea for a Connections puzzle. I found a lovely tool to create them and sent my monstrosity of a board to my friends for laughs. We've been bouncing silly puzzles back and forth ever since then. A 1-letter puzzle led to a 2-letter puzzle, which then begat a 3-letter puzzle. This made me wonder: what would a 4-letter puzzle that takes full advantage of its inherent symmetry with the game of Connections look like?

My 'Soup with crackers' Connections puzzle, featuring 12 one-letter clues and 4 animal names
Soup with crackers, my first Connections puzzle (definitely not a Levenshtein square)
My 'Daily Double' Connections puzzle, featuring all two-letter clues
Daily Double, a 2-letter Connections puzzle (still not a Levenshtein square)

To be clear, the goal here is not to design a traditional Connections puzzle around 4-letter words. I wanted something a little more abstract - something that leans hard into the maximal four-ness of the format (fourmat?). This design space grants us the opportunity to establish 1:1 mappings between the 4 word groups (and/or the 4 words in each group) and the 4 letters in each word. Swapping letters to create new words seems like a somewhat canonical mechanic to try here; let's give it a shot.

We'll start with a secret word for each group, then derive from it 4 clue words for the group that each have a different letter index substituted:

Example of a Levenshtein group based on the secret word 'core'

To ensure the full puzzle only has one valid solution, let's choose secret words that have a different letter in each position (e.g. ruling out BODY and BRAT as secret words, but not RATE and TEAR). Here's an example:

Four Levenshtein groups with secret words 'word', 'flag', 'daft', and 'punk'

The clue words are shown in order here for illustration, but they'll all be shuffled together for the actual puzzle (and the secret words won't be shown, of course).

A troublesome strategy that emerges from this design is to look for exactly 3 of a letter in a given position across all the clue words. Those 3 words are guaranteed to belong to the same group, leaving you with one more word to find. 75% of the puzzle is trivialized by this realization! Letter distribution analysis turns out to be very powerful for this kind of puzzle - we need to nerf it somehow.

One option here is to look for secret words whose clue words can borrow letters from the other secret words. Here's a simplified example of what we're looking for:

Two Levenshtein groups, each with clue words formed from the other group's secret word

Choosing which secret word to borrow from is trivial when there are only two groups; when there are four, the swaps for each index have to form a permutation of the hidden words' letters at that same index. (Thus, the swaps within a group typically won't spell out a hidden word as illustrated above.) Borrowing letters for substitutions like this perfectly balances the letter distributions, meaning the player can no longer make trivial inferences based on single columns. The resulting structure is also pleasantly symmetrical; I've tentatively (and non-authoritatively) named it a Levenshtein square.

All of that took me hours to figure out, by the way. I had to spend a long time rotating four-dimensional word clusters in my head to convince myself that this was (1) possible and (2) tractable to search for valid puzzles. Initially, it seemed like the combinatorial space was immense, perhaps involving 16! (factorial) permutations to evaluate per possible puzzle. The answer thankfully turned out to be a lot lower - just 9 permutations times 4 positions. Together with some hashmap optimizations and early pruning of dead-ends, it's possible to brute-force the entire search space for a 1,000-word dictionary in just a few seconds on my computer.

Unfortunately, there are no perfectly balanced puzzles contained in my 1,000-word dictionary. I had to resort to an additional, ~5,000-word dictionary full of archaic and obscure words to find boards that met my criteria. But I found them, and now you can experience them for yourself!

My first Levenshtein Square puzzle, featuring all 4-letter clues
Levenshtein Square #1: Mason's Delight
My second Levenshtein Square puzzle
Levenshtein Square #2: Superhero Strike
My third Levenshtein Square puzzle
Levenshtein Square #3: Blunt Translation

These puzzles were optimized to contain as many "common" clue words as possible; only ~30 of the boards my script found had 10 or more common words.¹ Aside from the mandatory archaic words, these puzzles meet my definition of a Levenshtein Square perfectly. I think there's good potential for a daily game based around this concept.

The perfect 4x4 Levenshtein square requires some compromises, though. Ideally, I'd like to limit the board to the smaller dictionary and compromise on something other than the word bank. Deviating from the perfect definition generally seems like a good way to make the problem space more interesting. Maybe the puzzle should allow swaps of two indexes², or multiple words within a group with the same substitution index. Maybe the hidden words can be permitted to share letters at a given index; we can check for uniqueness of a solution procedurally instead. Much remains to be explored!

¹ The most common words I've found in a puzzle is 13, but those puzzles all require some uncommon words that are tough to accept.
² This is known as the Damerau-Levenshtein distance.