What is the best word to start a game of Wordle

If you ended up here, you are probably hooked on Wordle. You wait patiently for your daily new game, and then when the moment comes, you think about your initial 5-letter word. Some players will use the same word repeatedly, and others will pick one that comes randomly to their minds. Which is the best strategy? Today we will use some statistics to find out the best word to start a game of Wordle.

Letter distribution

Letter frequency analysis is common in cryptanalysis and, for example, data-compression1. It’s interesting to know which letters are more popular to find the best word to start a game of Wordle. Of course, these frequencies will be different from one language to another (in case you want to play French Wordle).

Your computer has probably a dictionary located at /usr/share/dict/words or /usr/dict/words. This file is a newline-delimited list of dictionary words. If you open it, you will see many words. Names start with a capital letter.

We are only interested in 5-letter words, not starting with a capital letter. With some Python code, we can easily filter the words with a regular expression and count the occurrence of letters with the amazing collections.Counter.

import re
from collections import Counter

# Match 5-letter lower-cased words
five_letter_word_pattern_re = re.compile(r"^[a-z]{5}$")

# Initialise our letter counter
letter_frequencies = Counter(string.ascii_lowercase)

# Filter the dictionary
with open("/usr/share/dict/words") as file_words:
    words = [word.strip() for word in file_words if five_letter_word_pattern_re.match(word)]
    for word in words:
        letter_frequencies.update(word)

print(letter_frequencies.most_common(10))

When we run this code, we get the following list of letters sorted by decreasing popularity. Now we know that our word must contain the letter a and e and a few others.

['a', 'e', 'r', 'o', 'i', 's', 't', 'l', 'n', 'u']

Here is the histogram distribution of letters frequency. a is the most popular letter in five-letter words, and q is the least popular. It’s interesting to spot that it follows a decreasing curve.

Letter histogram

Finding words

Now that we have a list of letters sorted by popularity, we can search our dictionary for words that contain them. We want a word with no letter repetition and with high-frequency letters.

We can devise an algorithm that takes the whole list of 5-letter words from our dictionary and process them by elimination. First, we keep all the words that contain a, then all the words that contain e, etc.

matches = list(words)
for best_letter, _ in letter_frequencies.most_common():
    new_matches = [word for word in matches if best_letter in word]
    # If we can't find a word we skip to the next letter
    if new_matches:
        matches = new_matches

print("The best words are:", matches)

There is only one result! The answer to our question is AROSE.

Conclusion

The best word to start a game of Wordle is Arose. It won’t make the game easier, but it will give you the best chance to find some initial clues. It would be interesting to compare game sessions started using “arose” versus a random word. Maybe you would do better with a random word?

Game started with AROSE

Game started with AROSE

Game started with AROSE

Bonus

If you are a fan of the command line, grep, and awk, here is an implementation of our code.

First find the top 6 letters:

$ cat /usr/share/dict/words | grep -E '^[a-z]{5}$' | awk '{for (i=1 ; i<=NF ; i++) array[$i]++ } END { for (char in array) print char,array[char] }' FS="" | sort -n -k 2 -r | head -n 6
a 4467
e 4255
r 3043
o 2801
i 2581
s 2383

Then filter the dictionary:

$ cat /usr/share/dict/words | grep -E '^[a-z]{5}$' | grep a | grep  e | grep r | grep o | grep s
arose

Footnotes

  1. Letter frequency