How it works

Inside the engine

This page is for anyone who wants to understand how the search actually works. There is no marketing here. If you are looking for the search, it is at the top of every page.

The Verisign .com zone file

Every .com domain in the world is registered through Verisign, which is the company that operates the .com top-level domain. Verisign publishes a daily file containing every currently registered .com domain. The file is large. The version I work with is 4.27 GB compressed and contains 162,562,292 unique second-level domains. That is the truth about which domains are taken right now.

I download the zone file through ICANN's Centralized Zone Data Service, which is a free programme any domain professional can apply to. The application process takes a few weeks. The file is updated daily.

The Bloom filter

Loading 162 million domain names into memory and querying them naively would require about 15 GB of RAM. That is too expensive to run cheaply. Instead I load the file once into a probabilistic data structure called a Bloom filter, which stores set membership in a much smaller space at the cost of a small false-positive rate.

The Bloom filter for the full .com zone is 183 MB. The false-positive rate is calibrated at 1 percent. False positives mean: occasionally the filter will say a domain is registered when it is in fact available. False negatives are impossible — if the filter says a domain is available, it is genuinely available. The whole point of this asymmetry is that I never want to lie to you about availability.

The filter handles roughly 2 million lookups per second on a laptop. That is more than fast enough to test thousands of candidate domains per search in under 100 milliseconds.

The suggestion engine

When you type a keyword, the engine pairs it against several thousand prefix and suffix morphemes. Those morphemes come from two sources. The first is a 2012 study of the most common prefixes and suffixes in registered .com names. The second is my own mining of the current zone file, which surfaces modern naming patterns that did not exist in 2012 — patterns like ai+, fin+, bio+, +studio, +hub, +pro, and +ing.

The result is roughly 4,000 candidate domain names per keyword search. Each candidate is checked against the Bloom filter. The ones that come back as available are scored and returned.

The brandability score

Every available domain gets a grade — A, B, or C. The grade combines several signals.

Length. Shorter is better. Names under 8 characters score highest. The decay is steep above 11 characters.

Opening phoneme energy. Different letters carry different psychological weight. V is the most energetic opener in English. Z cuts through crowded categories. B signals reliability. The score reflects how much energy the first letter is doing for the brand.

Processing fluency. Names containing familiar morphemes — fragments like -ver, -cel, -son, -ify, -hub — are easier for the brain to process and feel more legitimate. The score reflects how many fluent fragments appear in the candidate.

Vowel ratio. Names with a vowel-to-consonant ratio between roughly 0.3 and 0.5 are easiest to pronounce. Names with too few vowels (Abrdn, kvtch) lose points. Names with too many (Eiou, Aeiou) also lose points.

Compound detection. If both parts of the candidate are real English words, the score boosts heavily. Two real words create a multiplicative association effect (Windsurf, BlackBerry, PowerBook).

Word-boundary safety. Some letter sequences hide unintended readings. The score penalises candidates with dangerous substrings.

The pre-rendered pages

For the 1,000 most-searched English keywords, plus 60 industries, 26 letters, 6 length categories, and 60 morpheme patterns, the engine pre-computes the suggestions and writes them to static pages. That is why /search/cast loads instantly — it is just an HTML file generated overnight against the latest zone file.

If you search a keyword that has not been pre-rendered, the engine runs live and returns results in a few hundred milliseconds. Both paths use the same Bloom filter and the same scoring.

What this site does not do

I do not store your searches. I do not log your IP address against your searches. I do not share search data with registrars. I do not front-run searches — I have no incentive to, since I do not own any registrar relationship that would benefit from it.

I do not check trademark availability. That is a legal exercise you should run before registering anything you plan to use commercially.

I do not check availability of TLDs other than .com. That is a feature in progress.

If you want the data

The morpheme lists, the synonym engine, the industry data, and the article content are all open. The Bloom filter binary is generated on demand because ICANN's terms of service prohibit redistribution of zone files. If you want to build something similar, the scripts that build it are public.