All articles
SEO & GEO11 min read

llms.txt: What It Is and How to Set It Up

llms.txt is a proposed plain-text convention — robots.txt-style — that points AI/LLM crawlers to your most important content in clean Markdown. This guide explains what llms.txt is, how it works, how to create one (with a copyable example), whether AI engines actually use it, and how it differs from robots.txt.

By HowWorks Team

Key takeaways

  • llms.txt is a proposed convention — a single plain-text Markdown file at yoursite.com/llms.txt — that gives AI assistants a curated, clean-text map of your most important pages, so a model reading your site at answer time doesn't have to crawl and parse your whole HTML site. It was proposed by Jeremy Howard in September 2024 and is hosted at llmstxt.org.
  • It is a proposal, not an adopted industry standard. As of 2026, no major AI engine — OpenAI, Google, Anthropic, Meta — has publicly committed to reading llms.txt in production, and Google has said plainly that it does not use it. Plenty of sites publish one anyway; that is the publishing side, not proof that engines consume it.
  • It is not robots.txt. robots.txt is access control (it tells crawlers what they may and may not fetch, and major search engines enforce it). llms.txt contains no permissions and blocks nothing — it is a curated guide, closer to a hand-made sitemap written for language models.
  • The format is simple Markdown: an H1 with your site's name (the only required part), a blockquote summary, then optional H2 sections listing links as `[name](url): notes`. An expanded `llms-full.txt` can inline the actual page text for models that want the full content in one file.
  • Because adoption is unproven, treat llms.txt as low-cost, low-risk housekeeping — not a ranking or citation lever. The durable way to be found and cited by AI is still crawlable, well-structured, authoritative content. Publish llms.txt if it's cheap; don't expect it to move AI visibility on its own.

llms.txt is a proposed convention for a single plain-text file — placed at yoursite.com/llms.txt — that hands AI assistants a curated, clean-text map of your most important pages. Think robots.txt's location and simplicity, but a completely different job: instead of telling crawlers what they can't touch, llms.txt points language models toward the content you want them to read, in a format that's easy for a model to ingest at answer time.

It's a genuinely useful idea, and it's spreading. It is also, as of 2026, a proposal — not a standard any major AI engine has committed to following. This guide explains what llms.txt is, how it's meant to work, exactly how to create one (with a block you can copy), how it differs from robots.txt, and the honest answer to the question everyone actually has: does it do anything yet?


What Is llms.txt?

llms.txt is a proposed standard for a Markdown file, located at the root of your site (/llms.txt), that provides information to help large language models use your website. It was proposed by Jeremy Howard — co-founder of Answer.AI and fast.ai — on September 3, 2024, and the spec lives at llmstxt.org.

The problem it targets is specific. When an AI assistant tries to use your website to answer a question, it runs into two walls: first, as the proposal notes, "context windows are too small to handle most websites in their entirety"; and second, "converting HTML pages into LLM-friendly content" — stripping out navigation, ads, scripts, and markup — is "both difficult and imprecise." llms.txt sidesteps both by offering, in the spec's words, "more concise, expert-level information gathered in a single, accessible location."

A few defining points to anchor the term:

  • It lives at a fixed path: the spec is for a file "located in the root path /llms.txt of a website" — the same convention that makes robots.txt easy to find.
  • It's Markdown, not a new syntax: the file is ordinary Markdown, so it's readable by both humans and machines without a parser.
  • It's for inference, not training: the proposal's "expectation is that llms.txt will mainly be useful for inference, i.e. at the time a user is seeking assistance" — when a model is answering a question, not when it's being trained.

So llms.txt is, in plain English, a hand-curated cheat sheet to your best content, written for the AI systems that increasingly answer questions on your behalf.


How Does llms.txt Work?

The intended flow is straightforward:

  1. An AI assistant needs information about your site or product.
  2. Instead of crawling and parsing your full HTML site, it fetches /llms.txt.
  3. It reads your short summary, then follows the curated links to clean, readable (ideally Markdown) versions of your key pages.
  4. It uses that focused content to answer the user — with far less noise than scraping raw HTML.

That's the design. The honest caveat — which we'll expand on below — is that "works" here describes intent, not guaranteed behavior. The file only matters if an engine chooses to fetch it and act on it. llms.txt doesn't push anything to anyone; it just sits at a predictable URL, the way robots.txt and sitemap.xml do, waiting to be read.

There's also an expanded variant worth knowing. The slim llms.txt is an index of links. Many sites also publish an llms-full.txt (and tooling like the spec's own llms_txt2ctx generates llms-ctx.txt / llms-ctx-full.txt) that inlines the actual page text into one big Markdown file — handy for a model that wants the entire documentation set in a single fetch rather than chasing links. Anthropic's developer docs, for example, publish both a slim index and a full export.


llms.txt vs robots.txt: Not the Same Thing

This is the single most common confusion, so let's settle it. The two files share a naming style and a root-level location, which makes people assume they're variations on a theme. They aren't — they do opposite jobs.

robots.txt is about permission. It tells crawlers which URLs they may and may not fetch (User-agent, Allow, Disallow), and major search engines honor it as part of their crawl protocol. It's a gate.

llms.txt is about curation. It contains no directives, grants no permissions, and blocks nothing — it cannot stop a crawler or hide a page. It's a guide. As Search Engine Land put it, "Robots.txt is about exclusion. Sitemap.xml is about discovery. Llms.txt is about curation" — closer to "a curated sitemap.xml" than to an access-control file.

Dimensionrobots.txtllms.txt
JobAccess control — what crawlers may fetchCuration — which content to read, and in what order
ContainsDirectives (Allow / Disallow, User-agent)A summary + curated Markdown links, no directives
Can it block or hide a page?Yes (by convention)No — it grants and denies nothing
FormatPlain-text rulesMarkdown (H1, blockquote, link lists)
EnforcementHonored by major search crawlersNo engine is committed to acting on it
AnalogyA gateA treasure map / hand-made sitemap

The practical takeaway: these aren't either/or. Keep robots.txt doing its access-control job, and add llms.txt as a separate, optional curation layer. One is not a replacement for the other.


How to Create an llms.txt File (Format + Steps)

The format is deliberately minimal. Per the spec, a compliant file contains these sections, as Markdown, in order:

  • An H1 with the name of the project or site. This is "the only required section."
  • A blockquote (">") with a short summary "containing key information necessary for understanding the rest of the file."
  • Optional free-text Markdown (paragraphs, lists — anything except more headings) with extra context.
  • Zero or more H2 sections, each holding a "file list" of links. Each list item is "a required markdown hyperlink [name](url), then optionally a : and notes about the file."
  • An optional ## Optional section: links there "can be skipped if a shorter context is needed" — use it for secondary material.

Here are the numbered steps to ship one:

  1. Create the file. Make a plain-text file named llms.txt and host it so it resolves at https://yoursite.com/llms.txt (root path).
  2. Add the H1 and summary. Start with # Your Site Name, then a > blockquote that says, in one or two sentences, what you do and what a model most needs to know.
  3. List your key pages under H2 sections. Group related links (e.g. ## Docs, ## Products, ## Guides) and write each as - [Page name](https://yoursite.com/page): short, descriptive note. Write descriptive notes — "Payments API: charges and refunds," not "API reference."
  4. Point links at clean content. Where you can, link to readable or Markdown (.md) versions of pages, since the whole point is giving the model clean text instead of cluttered HTML.
  5. (Optional) add ## Optional. Put nice-to-have links here so models can drop them when context is tight.
  6. (Optional) generate llms-full.txt. If you want models to grab everything in one fetch, also publish an expanded file that inlines the page text.
  7. Keep it current. Treat it like sitemap.xml: regenerate it when your important pages change.

You don't have to hand-write it. Several WordPress plugins generate and auto-update llms.txt from your existing content (respecting your noindex/nofollow rules), and many documentation platforms produce one automatically — Mintlify, for instance, generates llms.txt for the docs it hosts. For a small site, hand-writing it is often faster and gives you tighter control over what you surface.

A copyable example

Here's a minimal, spec-faithful llms.txt you can adapt — replace the placeholders with your own pages:

# Acme Analytics

> Acme Analytics is a privacy-first product analytics platform for SaaS teams. It tracks events, funnels, and retention without third-party cookies. This file points to the docs and pages most useful when answering questions about Acme.

Key things to know:

- Acme is self-serve; there is a free tier and usage-based paid plans.
- Acme is GDPR-compliant and does not sell user data.

## Docs

- [Quickstart](https://acme.example.com/docs/quickstart.md): Install the SDK and send your first event in 5 minutes
- [Event tracking API](https://acme.example.com/docs/events.md): Reference for the track(), identify(), and group() methods
- [Funnels & retention](https://acme.example.com/docs/funnels.md): How to build conversion funnels and cohort retention reports

## Product

- [Pricing](https://acme.example.com/pricing.md): Plan tiers, usage limits, and what's included on the free tier
- [Integrations](https://acme.example.com/integrations.md): Supported sources and destinations (warehouses, CDPs, webhooks)

## Optional

- [Changelog](https://acme.example.com/changelog.md): Recent releases and breaking changes

The structure mirrors the spec's own FastHTML example: one H1, a blockquote summary, some free-text context, then H2 link lists with descriptive notes — and an ## Optional section a model can skip.


Does llms.txt Actually Work? An Honest Look at Adoption

Here's the part that separates a useful guide from hype. As of 2026, llms.txt is a proposal that no major AI engine has publicly committed to using in production — and at least one, Google, has explicitly said it does not use it.

The evidence is fairly one-directional:

  • Google has been blunt. Google's John Mueller compared llms.txt to the keywords meta tag — a publisher-controlled tag search engines abandoned long ago because it's too easy to game — and said "AFAIK none of the AI services have said they're using LLMs.TXT." He's also pointed out that, from server logs, "you can tell ... that they don't even check for it." At a Google Search Central event in July 2025, Gary Illyes stated that Google "doesn't support LLMs.txt and isn't planning to."
  • Other major providers haven't committed either. A July 2025 analysis reported that "no major LLM provider currently supports llms.txt. Not OpenAI. Not Anthropic. Not Google," and that AI crawlers were not observed requesting the file during normal site visits.
  • Publishing it ≠ consuming it. This is the nuance most coverage gets wrong. Many well-known companies publish an llms.txt — Anthropic, Vercel, Stripe, Cloudflare, and others all expose one for their docs. But publishing a file is the author's side of the convention. Notably, even Anthropic "publishes its own llms.txt, but doesn't state that its crawlers use" it. A long directory of sites that publish llms.txt tells you the idea is popular with publishers — not that engines read it.
  • A fetch is not an endorsement. You may see an AI crawler occasionally request /llms.txt in your logs. That means a bot fetched a file at a known path; it does not establish that the file changed how that engine sourced, ranked, or cited your content.

None of this means llms.txt is wrong or that it won't be adopted later — conventions sometimes start exactly this way, with publishers ahead of platforms. It means you should calibrate expectations: today, llms.txt is low-cost, low-risk housekeeping, not a proven lever for AI visibility. If publishing one is cheap (a plugin, a docs platform, ten minutes by hand), there's little downside. Just don't expect it, on its own, to get you cited.


So What Actually Gets You Cited by AI?

If llms.txt isn't the switch, what is? The same fundamentals that make content findable and trustworthy in the first place — the work that AI answer engines genuinely rely on, because they retrieve from the same indexed, crawlable web:

  • Be crawlable and allow the right bots. Make sure your pages are accessible and that the AI crawlers you want (for example, GPTBot for ChatGPT) aren't blocked in robots.txt. This is the access-control layer llms.txt deliberately doesn't touch.
  • Write clean, extractable answers. Put a direct answer near the top, use clear headings and lists, and keep facts self-contained with units and sources — content a model can lift and quote confidently.
  • Earn authority. Depth on a topic and references from across the web make you a more likely source for AI answers.

This is the territory of Generative Engine Optimization (GEO) and Answer Engine Optimization (AEO) — optimizing not just to rank a link, but to be the source an AI quotes. If you're weighing how much of your effort belongs in classic search versus AI answers, our GEO vs SEO guide lays out the split. llms.txt fits inside that bigger picture as one small, optional, publisher-side tactic — useful to have, not a substitute for being genuinely the best, most retrievable answer.

If you want to see where you actually stand, our SEO & GEO solution audits both at once — classic ranking signals and AI-citation readiness — so you spend effort on what moves AI visibility rather than on files engines may never read.


Bottom Line

llms.txt is a proposed convention — a simple Markdown file at /llms.txt — that gives AI assistants a curated, clean-text map of your most important pages. It's easy to create: an H1, a summary, and a few link lists, hand-written or generated by a plugin. It's not robots.txt — it curates rather than controls, and blocks nothing.

The honest status, as of 2026: it's a publisher-led idea that major AI engines have not committed to using, with Google saying outright that it doesn't. Publish one if it's cheap — it's harmless housekeeping and may matter more later — but don't mistake it for a citation lever. The durable way to be found and quoted by AI is still crawlable, well-structured, authoritative content.

Audit your site's AI visibility — see how Google, ChatGPT, and AI Overviews currently represent your content, and get specific fixes for both ranking and citation.

FAQ

What is llms.txt?

llms.txt is a proposed convention — introduced by Jeremy Howard in September 2024 and documented at llmstxt.org — for a single Markdown file placed at the root of your site (yoursite.com/llms.txt) that helps large language models use your website at inference time. Its purpose is to give an AI a concise, curated, clean-text map of your most important pages, because, as the proposal puts it, "context windows are too small to handle most websites in their entirety" and converting full HTML pages into LLM-friendly text is "both difficult and imprecise." In short, it's a hand-curated guide to your best content, written for machines that answer questions.

How does llms.txt work?

The idea is that an AI assistant that needs information about your site can fetch /llms.txt, read your short summary, and follow the curated links to clean Markdown versions of your key pages — instead of crawling and parsing your entire HTML site. The file is meant to be used at inference time (when a user is asking for help), not as training data. Crucially, "works" describes the intended design, not guaranteed behavior: the file only does anything if an AI engine chooses to fetch and act on it, and most major engines have not committed to doing so.

How do I create an llms.txt file?

Write a plain-text Markdown file and host it at yoursite.com/llms.txt. The spec requires only an H1 with your site or project name; from there you add a blockquote summary, then optional H2 sections (like ## Docs or ## Products) listing links in the form `- [Page name](https://yoursite.com/page): short note`. Point those links at clean, readable pages — ideally Markdown (.md) versions where you have them. You can write the file by hand for a small site, or generate it automatically: several WordPress plugins and documentation platforms produce and update llms.txt for you. This guide includes a copyable example below.

Does llms.txt actually work — do AI engines use it?

Honestly: it is unproven, and the evidence so far is skeptical. As of 2026, no major AI provider — OpenAI, Google, Anthropic, or Meta — has publicly committed to reading llms.txt in production. Google has said directly that it does not use the file: Google's John Mueller compared it to the long-ignored keywords meta tag and noted that, from server logs, "you can tell ... that they don't even check for it," and Google's Gary Illyes said Google "doesn't support LLMs.txt and isn't planning to." Some sites report AI crawlers occasionally fetching the file, but fetching a file is not the same as letting it influence how an engine sources or cites content. Treat llms.txt as low-cost housekeeping, not a reliable visibility lever.

Is llms.txt the same as robots.txt?

No. They share a naming style and a location at the site root, but they do opposite jobs. robots.txt is access control: it tells crawlers which URLs they may or may not fetch, and major search engines enforce it as part of their crawl protocol. llms.txt contains no directives and grants or denies nothing — it cannot block a crawler or hide a page. It's a curation file: a hand-made guide that points language models toward your most useful content, closer in spirit to a sitemap than to a gatekeeper. You can (and often should) have both, because they solve different problems.