SEOrobots.txtTechnical SEOCrawling

What is robots.txt? Crawl Control for Search Engines

Direct Answer

The robots.txt file is a text file at the root of your website that instructs search engine crawlers which pages or sections to crawl and which to avoid. It controls crawler access but does not prevent indexing directly.

Key Takeaways

  • Access your robots.txt at yourdomain.com/robots.txt
  • Verify critical pages are NOT blocked (test using Google Search Console URL Inspection)
  • Block admin, search result, and faceted navigation URL patterns
  • Add Sitemap: directive pointing to your XML sitemap
  • Test changes in Google Search Console robots.txt Tester before deploying

The robots.txt file is one of the first places Googlebot checks when visiting your site. While it cannot guarantee pages will not be indexed, it effectively controls which areas of your site are crawled — and misconfigurations are among the most catastrophic (and common) technical SEO mistakes.

robots.txt Syntax

User-agent specifies which crawler the rule applies to (* means all crawlers). Disallow specifies paths to block. Allow explicitly permits specific paths within a blocked directory. Sitemap: directive points crawlers to your XML sitemap location.

What NOT to Block

Never block CSS and JavaScript files that affect rendering (Google needs to render pages to understand them), your important service and product pages, your blog and content pages, or your XML sitemap. Blocking critical resources causes Google to misrender and misunderstand your pages.

What to Block

Admin areas (/wp-admin/, /admin/), search result pages (/search?q=), filter/faceted navigation generating thousands of near-duplicate URLs, login and user account pages, and staging/development URLs if they are publicly accessible.

robots.txt vs noindex

Key difference: robots.txt prevents crawling (but not indexing — pages can still appear in results if linked to). Noindex meta tag prevents indexing (but pages can still be crawled). Use robots.txt to save crawl budget, noindex to prevent pages from appearing in search results.

Step-by-Step Action Plan

  1. 1Access your robots.txt at yourdomain.com/robots.txt
  2. 2Verify critical pages are NOT blocked (test using Google Search Console URL Inspection)
  3. 3Block admin, search result, and faceted navigation URL patterns
  4. 4Add Sitemap: directive pointing to your XML sitemap
  5. 5Test changes in Google Search Console robots.txt Tester before deploying
  6. 6Monitor Google Search Console for unexpected crawl drops after changes

Frequently Asked Questions

Frequently Asked Questions

Yes — you can block Googlebot from specific paths or the entire site. However, this does not remove already-indexed pages from Google's index. To remove indexed content, use the noindex meta tag or Google's URL Removal Tool in Search Console.

Related Service

Technical SEO Services

Learn More
Back to Knowledge Hub
Free Consultation Available

Ready to Apply This to Your Business?

Get expert help implementing these strategies. Book a free consultation with our team.

No credit card required. Response within 2 business hours.

Put It Into Practice

Want Us to Implement This For You?

Now that you know what good marketing looks like, let our team execute it for your business. We apply every strategy you've read about — with proven results.

Full implementation by certified marketing experts
AI-powered execution for faster, better results
Real-time reporting so you always know what's working
No long-term contracts — cancel anytime

Get Expert Help

Tell us your goals and we'll take it from here.

No spam. No commitment. We respond within 24 hours.