@25xcodes/llmstxt-parser

Parse and validate llms.txt files with RAG utilities.

Installation

bash

npm install @25xcodes/llmstxt-parser

Quick Start

typescript

import { parseLLMSTxt, validateLLMSTxt, fetchLLMSTxt } from '@25xcodes/llmstxt-parser'

// Parse markdown content
const doc = parseLLMSTxt(`# My Project
> A brief description.

## Documentation
- [Getting Started](https://example.com/docs): Quick start guide
`)

console.log(doc.title)   // "My Project"
console.log(doc.summary) // "A brief description."
console.log(doc.links)   // [{ title: "Getting Started", ... }]

// Validate
const result = validateLLMSTxt(doc)
console.log(result.valid) // true
console.log(result.score) // 100

// Fetch from URL or domain
const remoteDdc = await fetchLLMSTxt('example.com')

API Reference

`parseLLMSTxt(markdown: string): LLMSTxtDocument`

Parse llms.txt markdown into a structured document.

Parameters:

markdown - Raw markdown content

Returns: LLMSTxtDocument with title, summary, sections, and links.

`validateLLMSTxt(doc: LLMSTxtDocument): LLMSTxtValidationResult`

Validate a parsed document per the llmstxt.org specification.

Returns: Validation result with:

valid - Whether document passes validation
score - Quality score (0-100)
errors - Spec violations
warnings - Recommendations

`fetchLLMSTxt(urlOrDomain: string, options?): Promise<LLMSTxtDocument>`

Fetch and parse llms.txt from a URL or domain.

Tries well-known paths in order:

/llms.txt
/llms-full.txt
/.well-known/llms.txt

Options:

timeout - Request timeout in ms (default: 10000)
checkFull - Also check for llms-full.txt (default: true)
corsProxy - CORS proxy URL for browser environments

`discoverLLMSTxtFiles(domain: string, options?): Promise<DiscoveredFile[]>`

Discover all llms.txt files available for a domain.

`estimateTokens(doc: LLMSTxtDocument): TokenEstimate`

Estimate token count (~4 chars per token).

`toRAGFormat(doc: LLMSTxtDocument): string`

Convert to plain text format optimized for RAG/embedding.

`extractLinksForIndex(doc: LLMSTxtDocument): RAGLinkEntry[]`

Extract structured link data for vector database indexing.

`parseAndValidate(markdown: string): { document, validation }`

Parse and validate in one call.

Types

typescript

interface LLMSTxtDocument {
  title: string
  summary?: string
  sections: LLMSTxtSection[]
  links: LLMSTxtLink[]
  raw: string
  sourceUrl?: string
  isFull?: boolean
}

interface LLMSTxtLink {
  title: string
  url: string
  description?: string
  section?: string
  optional?: boolean
}

interface LLMSTxtValidationResult {
  valid: boolean
  score: number
  errors: LLMSTxtValidationError[]
  warnings: LLMSTxtValidationWarning[]
}

Browser Usage

For browser environments, use the corsProxy option:

typescript

const doc = await fetchLLMSTxt('example.com', {
  corsProxy: 'https://your-cors-proxy.workers.dev'
})

Features

✅ Zero runtime dependencies
✅ Full TypeScript support
✅ Dual ESM/CJS exports
✅ Works in Node.js and browsers
✅ Quality scoring per llmstxt.org spec
✅ RAG utilities for AI workflows

Source Code

View on GitHub

@25xcodes/llmstxt-parser ​

Installation ​

Quick Start ​

API Reference ​

parseLLMSTxt(markdown: string): LLMSTxtDocument ​

validateLLMSTxt(doc: LLMSTxtDocument): LLMSTxtValidationResult ​

fetchLLMSTxt(urlOrDomain: string, options?): Promise<LLMSTxtDocument> ​

discoverLLMSTxtFiles(domain: string, options?): Promise<DiscoveredFile[]> ​

estimateTokens(doc: LLMSTxtDocument): TokenEstimate ​

toRAGFormat(doc: LLMSTxtDocument): string ​

extractLinksForIndex(doc: LLMSTxtDocument): RAGLinkEntry[] ​

parseAndValidate(markdown: string): { document, validation } ​

Types ​

Browser Usage ​

Features ​

Source Code ​

@25xcodes/llmstxt-parser

Installation

Quick Start

API Reference

`parseLLMSTxt(markdown: string): LLMSTxtDocument`

`validateLLMSTxt(doc: LLMSTxtDocument): LLMSTxtValidationResult`

`fetchLLMSTxt(urlOrDomain: string, options?): Promise<LLMSTxtDocument>`

`discoverLLMSTxtFiles(domain: string, options?): Promise<DiscoveredFile[]>`

`estimateTokens(doc: LLMSTxtDocument): TokenEstimate`

`toRAGFormat(doc: LLMSTxtDocument): string`

`extractLinksForIndex(doc: LLMSTxtDocument): RAGLinkEntry[]`

`parseAndValidate(markdown: string): { document, validation }`

Types

Browser Usage

Features

Source Code