Robots.txt
Robots.txt is a text file at a website's root that instructs search engine crawlers which pages or sections they are allowed or disallowed from accessing.
Robots.txt provides crawler directives using a simple syntax of User-agent, Allow, and Disallow rules. It is the first file search engine bots check when visiting a domain, making it a critical control point for managing crawl behavior and budget.
Common use cases include blocking admin pages, staging environments, duplicate content paths, and resource-heavy pages that consume crawl budget without providing SEO value. However, robots.txt blocks crawling, not indexing -- if other pages link to a disallowed URL, it may still appear in search results.
GenGrowth audits robots.txt configurations to ensure valuable content is not accidentally blocked and crawl budget is spent efficiently. The platform generates optimized robots.txt rules based on site structure and content priorities.
How GenGrowth Helps
See how GenGrowth helps -->Related Terms
Canonical URL is the preferred version of a web page specified via a link element that tells search engines which URL to index when duplicate or similar content exists.
XML SitemapXML Sitemap is a file that lists all important URLs on a website to help search engines discover and crawl content efficiently.
Let GenGrowth handle this automatically for your product
Learn More