We Read About Crawl Budget. Should Our Small Site Worry About It?

AIKO: The honest answer starts with scale, and at this scale most of the worry just evaporates. A dental clinic with maybe forty pages read about crawl budget, the idea that search engines ration crawling, and now thinks it needs to optimize for it. Crawl budget is a real concern almost only for very large sites, hundreds of thousands or millions of URLs, where crawlers genuinely can’t reach everything often enough. A forty-page clinic isn’t remotely there. Search engines crawl forty pages trivially and repeatedly. The clinic’s preparing to solve a problem it doesn’t have.

HANNAH: That’s right, and let me ground it so it’s not just reassurance. Crawl budget bites when the URLs a site exposes outstrip what a crawler will fetch in a reasonable window. That’s a large-site phenomenon. A small healthy site gets discovered and recrawled without strain.

MARCUS: I’m going to complicate that, though, because “you’re too small to worry” is the kind of blanket reassurance that ages badly.

AIKO: Go on.

MARCUS: The clinic doesn’t have a crawl-budget problem in the classic sense. But it could have a crawl-waste problem, and people conflate the two. If the booking system generates thousands of parameter URLs, or a calendar spins up endless date pages, even a small site balloons into many crawlable URLs, and then crawlers spend time on junk instead of the pages that matter. So the right question isn’t “optimize crawl budget,” it’s “is this site accidentally producing a flood of worthless URLs.” That’s real even at forty actual pages.

HANNAH: That’s fair, and it’s the one caveat I’d allow. The headline still holds, a normal small site optimizing crawl budget is tuning a constraint that isn’t binding. But Marcus’s crawl-waste check is legitimate.

ELENA: And that reframe points at the actual diagnostic, which is structural, not budgetary. Count how many URLs the site actually exposes versus how many real pages it has. Forty real pages should expose roughly forty-ish crawlable URLs plus normal supporting ones. If instead a booking calendar, session parameters, or filters generate hundreds or thousands, that’s the thing to fix. The Crawl Stats report in Search Console shows what the crawler is actually spending requests on, and the Pages report shows what got excluded and why, so you can see the junk directly. Then the fix matches the cause, robots.txt to stop crawling a junk pattern, a noindex on pages that must exist but shouldn’t rank, a canonical where the junk is a duplicate of a real page. Not “manage budget,” just stop generating or stop exposing. The ratio of real pages to exposed URLs is the whole diagnostic.

SOFIA: And keep the priority straight while we’re here, because the worry itself is the cost. The clinic’s real growth levers are clear service pages, genuine local presence, content answering real questions about procedures and costs. Time spent fretting over crawl budget is time not spent on any of that. So the cost of this misdirected worry isn’t just wasted effort, it’s the better work that didn’t happen.

NOAH: The pattern is importing an enterprise problem because the concept is compelling and everywhere in the literature. Crawl budget is real and widely written about, mostly for large sites, and a small-site owner reads it and assumes it applies. The tell is the worry arrived from an article, not a symptom. The clinic didn’t observe pages going uncrawled, it read that crawl budget exists and reverse-engineered an anxiety. When the concern comes from the reading rather than from anything on the actual site, it’s a scale mismatch.

THEO: So the rule replaces Aiko’s crawl-budget question with Marcus’s crawl-waste one, sized to the site. Don’t optimize crawl budget, it isn’t binding here. Do one check, does the site expose roughly as many crawlable URLs as it has real pages, or is something generating a flood. Healthy ratio, do nothing, go to content. Something inflating the count, a booking tool, parameters, filters, address that specifically by blocking or not generating the junk. That single check captures the only part of this that could matter at the clinic’s scale.

AIKO: Which is where I’d land too, the systems answer is to not build a crawl-budget project at all. The one legitimate adjacent check is Marcus’s, whether the site accidentally generates far more URLs than it has real pages, and if so, stop it. Beyond that, the crawling takes care of itself at this size.

DANA: So what this comes down to is a scale correction with one real check inside it. We don’t optimize crawl budget for a forty-page clinic, because that constraint binds enterprise sites with vast URL counts, not small healthy ones search engines crawl easily. The part worth doing is Marcus’s, confirming the site isn’t quietly generating a flood of junk URLs through a booking calendar, parameters, or filters, because crawl waste can hit even a small site, and if it’s happening we block or stop generating it. If the URL count roughly matches the real page count, there’s nothing to do here, and the effort belongs on the service pages and local content that actually bring patients. The worry came from reading about a real enterprise problem and assuming it scaled down. It doesn’t, and Noah’s right that recognizing that is most of the answer.

MARCUS: That puts the energy on the one check that could matter and the content that definitely does, instead of a constraint that was never pressing on a site this size.

DANA: A small site doesn’t have a crawl-budget problem, it might have a crawl-waste problem. Check the URL count once, and if it’s sane, go spend the time on the pages that win patients.

We Read About Crawl Budget. Should Our Small Site Worry About It?

Related posts: