Word games have a problem. There are too many words.
Tauggle is a word search game where the player finds words that follow twisting paths through a randomly generated board of letter tiles. This is a crowded category, to say the least, but I started to develop Tauggle because I was dissatisfied with every existing similar game that I tried.
What’s the point?
The main problem that I wanted to solve is the lack of a realistically achievable, satisfying goal. In a word search game like Tauggle, the objective is to find as many words as possible on a given board. The logical end point for a game should be finding all of the words, but in the games I tried this is almost never possible.
One reason it’s impossible to find all the words is because of a peculiar feature of word games, which is that they are dependent on the dictionary. There are around 100,000 words in the Scrabble dictionary, but most people know at most 35,000 words or so, and every person knows a slightly different set of words.
This is strange game design! A seemingly complex game like chess has less than 100 rules. By contrast, Scrabble effectively has around 100,000 rules, since every individual word can be considered a rule like “The word POESY is valid.” While its mechanical rules are simple and don’t lead to as much emergent complexity as the chess rules do, Scrabble derives a large amount of complexity from this huge list of trivial rules.
Large dictionaries can work fairly well for multiplayer games such as Scrabble, as long as the players have roughly similar vocabularies. A word that isn’t known by all players is implicitly disallowed, and players develop social conventions to decide which set of words they will allow. At high levels of play, Scrabble players do literally memorize the entire dictionary, but this isn’t representative of the vast majority of casual players.
By contrast, large dictionaries present a game design problem for single player word games. Very roughly, if a word search game uses a dictionary similar to the Scrabble dictionary, the average player will only even know about a third of the words that exist on the board. The result of this is that the player, with perfect play, will be able to reach around 30% completion, and then have to give up.
This is very unsatisfying, and in my experience playing many word search games, is what happens in practice. It’s not a question of whether I give up or not, but instead a question of when I give up. I will typically consider my effort good enough if I pass, say, 50% completion, and then start a new game.
A game should be winnable
With Tauggle, I wanted to create a word search game that is actually completable. Players should have the satisfying experience of reaching 100% completion and knowing definitively that it was time to start a new game. I wanted this to work for most players, on most boards.
There are a couple of game design challenges with this:
As described above, any word that is in the dictionary that the player doesn’t know, might prevent them from getting 100% on a board.
Even if the player knows all the words, they may not be able to find them all on the board in a reasonable amount of time.
This article discusses Tauggle’s solution to the first problem, but the second is also an interesting challenge that Tauggle tackles, and is discussed in a separate article on Tauggle’s progressive hint system.
Designing the dictionary
Before I started creating a word game, I didn’t realize the extent to which dictionary selection and curation is both difficult and extremely important. After some analysis, I realized Tauggle’s dictionary needed to have a few specific properties.
First, every word in the dictionary should be known by the vast majority of people. In other words, the dictionary should contain no overly obscure words. This will mean that players will, with perfect play, be able to find 100% of the words on each board and complete every game.
Note that this first property is a deliberate tradeoff. Without obscure words in the dictionary, it reduces the chance for the player to feel clever. For instance, there is never a chance for the player to enter a word like POESY and feel good about knowing such an obscure word. I feel that given how critical it is for most players to be able to reach 100% completion of a board, this is the right tradeoff to make. The player can feel clever by finding all the words, rather than by finding particularly obscure words.
Second, every common word should be in the dictionary. In other words, the dictionary should be complete. The player should never have the experience of finding a word that most people know, and then have the game reject it. This property is much less important than the first property, since violating it doesn’t prevent 100% completion, but contributes significantly to how satisfying the game feels.
Put together, the first and second properties give us a dictionary that contains every word the average player knows, and doesn’t contain any words the average player doesn’t know.
Last, the dictionary needs to be free or affordable to license. Many high quality dictionaries, such as the Scrabble dictionary, are expensive or impossible to license for commercial use.
While the completeness property is less important than the obscurity property, it is possible to build a complete, non-obscure dictionary from a complete, obscure dictionary, but it is impossible to build from an incomplete, non-obscure dictionary. It is much easier to remove unwanted words from a dictionary than it is to add missing words.
Therefore, the first step in building an affordable, complete, non-obscure dictionary, is to find an affordable, complete, obscure dictionary.
The starting point for Tauggle’s dictionary
Fortunately, such a dictionary exists. 12Dicts is a collection of free English word lists by Alan Beale. After some investigation, I settled on the union of the 2of12inf and 3of6game word lists as a good starting point for Tauggle’s dictionary. In my experiments, this union seemed to satisfy the completeness property, and included both American and British variants of words (which, as a Canadian, is important to me). Almost never would I check a common word and have it not be in one of these word lists.
But these word lists definitely did contain large numbers of obscure words.
Curating the dictionary
My software engineer brain told me to find some scalable way to evaluate words for how common they are. For instance, for every word in the dictionary, could I check how common they are on the internet, and then remove the 20% most obscure words?
My software engineer brain also knows that such an approach is very fraught. Not only might it fail to remove many obscure words that most people don’t know, but it might also remove many common words, thus breaking the completeness criteria. There are many words that are considered well known, but are rarely used, and also many words that are not well known, but may be used commonly in other contexts.
The problem I faced is that the rate of obscure words in the dictionary has to be extremely low. If the player finds 199 out of 200 words, but is stuck on the last one, then they can’t reach 100% completion and Tauggle has failed at its goal of allowing satisfying completion.
Scaling is overrated
Eventually, I had the thought that perhaps I didn’t need a scalable approach. Dictionaries are finite in size. They are large but not internet large. They are closer to the size of a novel. What’s more, once dictionary curation is done, it doesn’t need to be repeated. Thus was born the idea of simply reading the entire dictionary and manually curating it.
The result of merging the 2of12inf and 3of6game word lists contains about 87,000 words. The idea would be to put all of these words, one per line, in a text file, and put a “#” symbol at the end of each word that I wanted to remove. Then, I’d run a script over this file and create a curated list of words that was the original text file with all the words with “#” at the end removed.
I started by manually reviewing a few hundred words to get an idea of how long this would take. I believe my initial estimate was around 20-40 hours. This is about 1-2 seconds per word. This is quite long, but not insurmountably long for one person. By this time, I had convinced myself that manual curation was the only way to make a satisfying word search game, so I decided to do it.
But the challenges didn’t stop there. I still had to decide on a specific methodology. What was the exact bar for obscurity of a word that would warrant exclusion? Would I exclude rude words?
I went back to my original goal of players being able to complete games. I settled on this standard:
While imagining that I'm the average person: if I 99% completed a board, and this word was the only one I didn't find, would I blame myself for missing it, or the app for trying to make me find a word I didn't know?
With that in mind, I set to work manually curating the dictionary.
The slog and the self doubt
This standard sounds decent in the abstract, but I still encountered a number of problems.
This approach unfortunately excludes a lot of fun words that I know. The standard above sounds good to word enthusiasts until they encounter some of the specific words that I’ve decided to remove. For example, I’ve removed the word IAMB which might offend some word enthusiasts. Ultimately, though, I had a goal in mind, that most people should be able to complete most games, and this word just isn’t in the vocabulary of plenty of people. Completing a game feels much better than it feels to get an obscure-ish word. In addition, there are still plenty of interesting, common words.
Also, what makes me think I can effectively imagine the average person’s vocabulary? People from other countries, ethnic groups, genders, socioeconomic backgrounds, or with different language backgrounds, might have quite different opinions on many borderline words. For example, I was quite conscious of the idea in the last few years that men and women tend to know different words. For example, practically, should I include TAFFETA or not?
Another significant issue for me, separate from the results, was that this was a slog, much worse and more tedious than I’d imagined. Sometimes investigating whether I should include a borderline word would take a minute or two, instead of the second or two I’d estimated for it. I didn’t meticulously track how long the overall process took, but it required many hours over many days over the period of months, and ultimately was probably well over 40 hours.
Finally, this took long enough that by the time I was done, I had significantly refined my technique, and so the end of the dictionary is much better and more efficiently curated than the beginning. I find myself puzzled over some of the choices for inclusion and exclusion of A and B words. I’ll probably need to do another pass of the first half of the dictionary once I’ve forgotten how much work it is.
But… did it work?
Ultimately, you have to be the judge of that. In the end I removed about 15% of the 87,000 words in the original dictionary.
In my testing, and the testing of friends, we have found it’s now possible to get 100% completion on many or all boards, with a reasonable investment of effort. I get occasional complaints about specific words (either to add or remove), but overall, the quality of the dictionary seems good.
If you do play the game, feel free to send me words that you think I should add or remove at firstname.lastname@example.org. Remember that they have to meet or miss the standard that I laid out above. Rarely do I add words, but I regularly remove obscure words that are pointed out to me.