Disclaimer: The opinions stated here are my own, not necessarily those of my employer.
Does AI make us dumber? Multiple studies now speak to how using AI might impede cognitive growth (see this one), not to mention the numerous hot takes on the internet that pose an even more serious threat to our cognitive growth than AI itself. I’m not too worried about it as someone without a brain someone that sees AI as a tool to unblock my mind and not as a substitute for thinking. But how to put this belief to the test? Instead of my “regular programming” style daily usage like refining comms or sound-boarding ideas or implementing boring workflows, I wanted to do something more stimulating — I wanted to solve a new problem by partnering with AI.
The problem
I’m a huge fan of The NYT Spelling Bee. For those who are unfamiliar: it’s a puzzle where you identify as many words as you can with seven randomly chosen letters. The middle letter (see image below) must be in every word and each word needs to be at least four letters long. You progress across different levels by finding more and more words, from Nice → Genius → Queen Bee, the supposedly secret level and usual goal of regular players.
One day, after getting stuck with an annoying puzzle that had an obscure pangram, a problem hit me — How long can NYT keep doing this for? How many Spelling Bees are left?
The answer is obviously finite, but also much smaller than, say a Sudoku (see here). But how do we get to a number?
First attempt
I gave the exact problem statement to Gemini (2.5 Flash) and asked for a solution. Here’s what it came up with:
To be fair, this is not a bad initial solution and serves as an upper bound to the answer. However, it’s a gross overestimation for a simple reason: Not every 7 letter combination can form real 4-letter words. Here’s an example: [V, W, X, Y, Q, Z, J].
Of course, I could’ve pointed out this mistake to Gemini and gotten a “You’re right! Here’s the new solution…” reply with another incorrect answer. I could’ve switched to a more elaborate prompt or a more advanced reasoning model that would’ve triggered tool use and found a solution from someone else’s analysis. Instead, I thought this would be a good moment to do the thinking on my own and delegate to the model only when I get stuck.
Digging deeper
The observation around infeasible combinations made me realize that the problem can’t be solved with just combinatorics. Solutions like “7-letter Combinations that contain at least one vowel” might help, but they’re hard to exhaustively calculate. I’d rather write some code and check individual combinations against a dictionary. But checking over four million combinations felt impossible with my personal compute and rather wasteful. It was time to use more properties of Spelling Bee puzzles to narrow down the candidate set. Here are 3 such properties:
Every puzzle contains at least one pangram (a word that uses all 7 letters at least once). Sure, NYT might violate this rule in the future. But IMHO as an avid player, a puzzle without a pangram is against the very spirit of the game itself. Using this assumption, we can flip the problem this way: How many unique combinations of 7 letters can form at least 1 pangram?
I have never seen a puzzle with the letter ‘S’ in it. I googled and found that there has only been one puzzle with the letter ‘S’ until now (source). I decided to find solutions with and without ‘S’ letter puzzles.
Every puzzle contains at least 15-20 words. They also never contain more than 70-80 words. I wasn’t sure if these conditions would always hold true, so I ignored them in my solution. We can later make assumptions about the number of puzzles that might get cut.
Okay, enough talk. Let’s vibe code.
First prompt:
“write code to take a large word list as an input file. compute the number of words that contain only 7 unique letters.”
I tried coding the whole thing with one prompt, but it failed a couple of times. Here’s a general vibe coding rule I follow: If you know the solution outline, it’s easier to get the LLM to generate a loose version with mistakes and add in solution details by yourself. Instructing the model to make small, precise changes takes longer time than making them on your own (at least as of when I’m writing this article).
In this case, I used the generated code as a base and then added a couple of lines to count the number of unique 7-letter combinations from words that contain only 7 unique letters. For example: Both “DIABOLIC” and “DIABOLICAL” use the same 7 letters ⇒ They are part of the same puzzle. This can be achieved by counting the number of 7-letter combinations using a standard Python set.
Second prompt:
“suggest open source word lists that are similar to the NYT Spelling Bee dictionary. share links from which I can download them”
The final answer will vary depending on the underlying dictionary. The puzzle’s editor mentioned using a couple of dictionaries, but they weren’t available for download. Instead, I got this suggestion from Gemini and downloaded it for my solution: https://github.com/wordnik/wordlist.
Finishing touches and getting to the answer
A quick condition to ignore puzzles with the letter ‘S’ and remembering to multiply the final answer by 7 (any of the letters can be the required middle letter) were the last touches needed.
Then I ran the script against my downloaded dictionary and here’s what I got:
This means that despite launching its daily version in 2018, The Spelling Bee could still have another 161 years of puzzles. Even if we remove half of these puzzles for having too many or too few words (Property 3), there’s no need to worry about exhaustion in the 21st century.
If we relax the ‘S’ condition, we get 119609 days or 327 years in total (FYI to the NYT team wondering how to double this game’s run).
Here’s the final code that I pushed to github from colab.
Closing thoughts
Apart from knowing for sure that NYT isn’t going to run out of puzzles in my lifetime, this activity reaffirmed my belief in AI’s role in supercharging cognition and turning it into action. As someone who hasn’t written production-grade code in 7 years, I’m likely to have abandoned ship after realizing this wasn’t purely a combinatorial problem. Now, I could see the solution through and even share the code with minimal effort.
I could’ve prompted more elaborately and gotten a reasoning model like Gemini 2.5 pro to try and do all the steps for me, but that would be like using Stockfish to solve a fun Chess puzzle. Computers have been playing better chess than the greatest ever human players (Kasparov/Carlsen) for almost 30 years now. That hasn’t led to a decline in the sport; in fact, more people play and watch chess right now than ever before. Can the same apply to other games and puzzles that AI can solve? I’d like to think so, even at the cost of sounding dumb.
Disclaimer: The opinions stated here are my own, not necessarily those of my employer.