Skip to main content

Robots.txt Generator

Build your robots.txt visually. Add User-agent blocks with Allow/Disallow rules, set Crawl-delay, declare Sitemap URLs. Choose from 5 presets or build from scratch. Live preview with syntax highlighting, validation warnings, and one-click download.

How to Use This Tool

  1. Choose a preset to start quickly. Select Allow All, Block All, Standard Website, WordPress, or Block AI Crawlers. Each preset pre-fills User-agent blocks and rules for common configurations.
  2. Add User-Agent blocks by clicking the "+ Add User-Agent Block" button. Select a crawler from the dropdown (Googlebot, Bingbot, GPTBot, etc.) or enter a custom user-agent name.
  3. Add Allow/Disallow rules within each block. Click "+ Add Rule" and specify the path you want to allow or block. Use paths like /admin/, /private/, or /wp-includes/.
  4. Set Crawl-delay (optional) if you need to throttle crawler request frequency. Enter the number of seconds between requests.
  5. Add Sitemap URLs to help search engines discover your pages. Click "+ Add Sitemap URL" and enter the full URL of your XML sitemap.
  6. Review and export. Check the live preview with syntax highlighting, review any validation warnings, then click Copy or Download to get your robots.txt file.

About the Robots.txt Generator

The robots.txt file is one of the most fundamental files in technical SEO. It sits in the root directory of your website and communicates with search engine crawlers about which parts of your site they should and should not access. Every major search engine, including Google, Bing, Yahoo, Yandex, and Baidu, reads this file before crawling your pages. Despite its simplicity as a plain text file, mistakes in robots.txt can have serious consequences, from accidentally blocking your entire site from indexing to exposing private directories to crawlers.

The Robots Exclusion Protocol was introduced in 1994 and has remained remarkably stable. The core directives are User-agent (which crawler the rules apply to), Disallow (paths the crawler should not access), and Allow (paths explicitly permitted, useful for overriding broader Disallow rules). Additional directives like Crawl-delay help manage server load, while the Sitemap directive points crawlers to your XML sitemap for efficient page discovery. The Host directive, supported by some search engines like Yandex, specifies the preferred domain version.

With the rise of AI crawlers, robots.txt has gained new importance. Companies like OpenAI (GPTBot), Anthropic (ClaudeBot), Google (Google-Extended), ByteDance (Bytespider), and Common Crawl (CCBot) use web crawlers to collect training data for large language models. Website owners who want to prevent their content from being used for AI training can block these specific user agents in their robots.txt file. Each AI crawler requires its own User-agent block with a Disallow: / directive, as robots.txt rules are applied per user-agent.

Common mistakes include using an empty Disallow directive (which actually allows everything rather than blocking it), forgetting to include a Sitemap declaration, blocking CSS and JavaScript files that Google needs to render pages properly, and setting Crawl-delay values too high which can severely slow indexing. It is also important to understand that robots.txt is a directive, not a security measure. Malicious bots can ignore it, so sensitive content should be protected with authentication, not just robots.txt rules.

This tool runs entirely in your browser with no server processing. Your configuration is auto-saved to localStorage so you can return and continue editing. The live preview updates instantly as you make changes, and validation warnings alert you to common configuration issues before you deploy your robots.txt file.

Frequently Asked Questions

What is robots.txt?

Robots.txt is a plain text file placed in the root directory of a website that tells search engine crawlers which pages or sections they are allowed or not allowed to access. It follows the Robots Exclusion Protocol and is the first file crawlers check before indexing a site. Every major search engine, including Google, Bing, and Yahoo, respects robots.txt directives.

Where should I place the robots.txt file?

The robots.txt file must be placed in the root directory of your website, accessible at https://yourdomain.com/robots.txt. It must be at the top-level domain root, not in a subdirectory. If it is not at the root, search engine crawlers will not find or respect it.

What is User-agent in robots.txt?

User-agent is the directive that specifies which search engine crawler the rules apply to. Using User-agent: * applies rules to all crawlers. You can target specific crawlers like Googlebot, Bingbot, GPTBot, or ClaudeBot with their own rule sets. Each User-agent block can have different Allow and Disallow rules.

What is the difference between Allow and Disallow?

Disallow tells crawlers they should not access a specific URL path or directory. Allow explicitly permits access to a path, which is useful for overriding a broader Disallow rule. For example, you can Disallow: /admin/ but then Allow: /admin/public-page.html to let crawlers access just that one page within the blocked directory.

What is Crawl-delay?

Crawl-delay is a directive that tells crawlers to wait a specified number of seconds between requests. It helps prevent server overload from aggressive crawling. Google does not officially support Crawl-delay (use Google Search Console instead), but Bing, Yandex, and other crawlers respect it. A Crawl-delay of 1-5 seconds is common; anything above 10 may significantly slow indexing.

Should I add Sitemap to robots.txt?

Yes, adding a Sitemap directive to robots.txt is recommended. It helps search engines discover your XML sitemap file, which lists all the pages you want indexed. The Sitemap directive is placed outside any User-agent block and uses the full URL: Sitemap: https://yourdomain.com/sitemap.xml. You can list multiple sitemaps.

How do I block AI crawlers with robots.txt?

To block AI crawlers, add separate User-agent blocks for each AI bot with Disallow: /. Common AI crawlers include GPTBot (OpenAI), ClaudeBot (Anthropic), Google-Extended (Google AI training), Bytespider (ByteDance), and CCBot (Common Crawl). Each needs its own User-agent block because robots.txt rules are per user-agent.

Does robots.txt affect SEO rankings?

Robots.txt does not directly affect SEO rankings, but it plays a critical role in crawl management. Blocking important pages can prevent them from being indexed, removing them from search results entirely. Conversely, a well-structured robots.txt helps search engines focus their crawl budget on your most important pages, indirectly improving SEO performance. Always ensure you are not accidentally blocking CSS, JS, or image files that Google needs to render your pages.

Get Expert Technical SEO Configuration

Our Technical SEO team configures your crawl directives, sitemap strategy, and indexation controls to maximize search engine visibility and crawl budget efficiency.

Let's Talk