The robots.txt file is one of the first places Googlebot checks when visiting your site. While it cannot guarantee pages will not be indexed, it effectively controls which areas of your site are crawled — and misconfigurations are among the most catastrophic (and common) technical SEO mistakes.
robots.txt Syntax
User-agent specifies which crawler the rule applies to (* means all crawlers). Disallow specifies paths to block. Allow explicitly permits specific paths within a blocked directory. Sitemap: directive points crawlers to your XML sitemap location.
What NOT to Block
Never block CSS and JavaScript files that affect rendering (Google needs to render pages to understand them), your important service and product pages, your blog and content pages, or your XML sitemap. Blocking critical resources causes Google to misrender and misunderstand your pages.
What to Block
Admin areas (/wp-admin/, /admin/), search result pages (/search?q=), filter/faceted navigation generating thousands of near-duplicate URLs, login and user account pages, and staging/development URLs if they are publicly accessible.
robots.txt vs noindex
Key difference: robots.txt prevents crawling (but not indexing — pages can still appear in results if linked to). Noindex meta tag prevents indexing (but pages can still be crawled). Use robots.txt to save crawl budget, noindex to prevent pages from appearing in search results.