Article

Article

Article

The New robots.txt for the LLM Era

We’re entering a new phase of the web. One where your audience isn’t just people clicking around a browser. It’s language models crawling your content, distilling it, and regurgitating it in chats, summaries, snippets, and who knows what else.

And yet, most websites are still designed for people and Google's crawlers.

That’s why we built an open-source tool to generate llms.txt files: the first step to making your site legible, and useful, in a world where LLMs are the new search engine.

A Brief History of LLMs.txt

The concept of LLMs.txt was introduced by Jeremy Howard, co-founder of Answer.AI, to address a specific technical challenge: AI systems can only process limited context windows, making it difficult for them to understand large documentation sites. Traditional SEO techniques are optimized for search crawlers rather than reasoning engines, and so they can’t solve this limitation. When AI systems try to process HTML pages directly, they get bogged down with navigation elements, JavaScript, CSS, and other non-essential info that reduces the space available for actual content. LLMs.txt solves that by giving the AI the exact information it needs in a format it understands. 

In November 2024, Mintlify added LLMs.txt support to their docs platform. In one move, they made thousands of dev tools’ docs LLM-friendly, like Anthropic and Cursor. Anthropic and others quickly posted on X about their LLMs.txt support. More Mintlify-hosted docs joined in, creating a wave of visibility for the proposed standard. 

The momentum sparked new community sites and tools. @ifox created a directory to index LLM-friendly technical docs. @screenfluent followed shortly with another directory. Mot, who made dotenvx, built and shared an open-source generator tool for dotenvx’s docs site. Eric Ciarla of Firecrawl created a tool that scrapes your website and creates the file for you. 

Why llms.txt matters

We’ve had robots.txt for decades. It tells search engines what they can and can’t crawl. But that’s where the guidance ends. Once a page is crawled, you have very little control over how it's interpreted. Titles, descriptions, context—all of it gets inferred, often badly.

Enter llms.txt. It’s a simple idea: give LLMs a structured, curated list of your best content. Tell them what it’s about. Make it easier for them to answer questions with accurate, up-to-date information pulled directly from your site.

Just like a sitemap helps search engines navigate your site, an llms.txt file helps language models understand what each page is, why it exists, and when to use it.

And we wanted to make it as easy as possible to get started.

Introducing the llms.txt Generator

We built the llms.txt Generator to help marketers, developers, or product teams spin up a clean, structured file in seconds. No manual copy-pasting, no guessing.

Here’s what it does:

1. Input your URLs or sitemap

You can start with one or more URLs, or point the tool at your sitemap.xml file directly. This is ideal if you already have a sitemap configured—it saves a ton of work.

2. Parse and discover your pages

The tool crawls the sitemap and parses every URL. For transparency, it logs each page as it goes, so you can follow along and see what’s being picked up.

3. Generate a structured file

We break the content into logical sections:

  • ## Website section for pages at the root level (like /home/about/contact)

  • Subsections based on your URL structure. For example, everything under /articles/ goes into ## Articles

  • We avoid clutter by only creating sections for the base paths, not every subpage

Each line in the file follows a simple format:

[Page Title](URL)

This gives language models an easy way to understand not just where to find the content, but what it is and why it matters.

4. Clean, readable output

Here’s a sample of what the file might look like:

# Acme Docs

> The official documentation for Acme's suite of developer tools

## Website
- [Home](/home) - The homepage for Acme Docs
- [About](/about) - Learn more about the team and mission
- [Contact](/contact) - Get in touch with support

## Articles
- [Getting Started](/docs/getting-started) - Guide for new users
- [API Reference](/docs/api) - Complete API documentation
- [Tutorials](/docs/tutorials) - Step-by-step guides

## Optional Resources
- [Community Forum](/community) - Get help from other users
- [Change Log](/changelog)

Drop that file at the root of your website and boom. You’re now speaking the same language as the tools parsing your site.

Why we made it open source

To be honest, it wasn’t rocket science to build. It’s not the kind of thing that’ll differentiate Released or move the needle on growth. But if it saves even 100 people a few hours of tedious work, that’s enough to give us the warm and fuzzies.

So we’ve made the generator fully open source. Use it, fork it, improve it. We’ll be maintaining it and making it better over time, but the core idea is this: the more sites that publish llms.txt files, the better the ecosystem becomes.

LLMs are only as good as the data they’re trained on and retrieve from. By making that data clearer, we raise the bar for everyone.

A new layer of visibility

When someone searches your site in Google, they get a few links and a snippet. When they ask ChatGPT or another assistant, they might get a summary with no attribution at all. Or worse, a hallucination.

llms.txt gives you a chance to fix that. It gives LLMs a clearer picture of what your content is, which parts matter most, and how they should describe it.

It won’t solve every problem. But it’s a low-effort, high-impact way to take back a bit of control.

How to get started

  1. Clone the project from GitHub

  2. Run it locally or on your server

  3. Point it at your site or sitemap

  4. Generate and publish the file

You’ll find setup instructions, usage examples, and contribution guidelines all in the repo.

→ Check it out on GitHub ←

We built this because we needed it for ourselves. But we made it public because everyone should be thinking about how LLMs see their site.


Keep your customers and

stakeholders in the loop

Keep your customers and
stakeholders in the loop

Keep your customers and

stakeholders in the loop