feat(webapp): add llms.txt and markdown page versions for AI agent visibility #5369

nimrodkra · 2026-01-25T10:11:48Z

Summary

Implements the llms.txt specification to improve AI agent visibility and discoverability for daily.dev.

Add /llms.txt with comprehensive platform documentation for AI agents
Add markdown versions of key directory pages (/sources.md, /tags.md, /squads/discover.md)
Configure proper headers and rewrites for all new endpoints

Changes

llms.txt Implementation

Created /packages/webapp/public/llms.txt with platform overview, features, products, and sitemap references
Added robots.txt reference to llms.txt for crawler discovery
Configured Content-Type: text/plain; charset=utf-8 headers per spec

Markdown Page Versions

Created API routes for generating markdown versions of directory pages:
- /api/md/sources - Sources directory
- /api/md/tags - Tags directory
- /api/md/squads - Squads directory
Added rewrites to serve at .md URLs (/sources.md, /tags.md, /squads/discover.md)
CDN-cached responses (1 hour TTL with stale-while-revalidate)

Test plan

Verify /llms.txt is accessible and returns correct Content-Type
Verify /sources.md, /tags.md, /squads/discover.md return markdown content
Verify Cache-Control headers are set correctly
Test with AI tools (Claude, ChatGPT) to ensure discoverability

Preview domain

https://feat-llms-txt.preview.app.daily.dev

Add llms.txt following the llmstxt.org specification to improve discoverability by AI agents and LLMs. The file includes: - Platform overview and key capabilities - Core features (Feed, Squads, Sources, Tags, Posts, Search) - Products (Plus, Jobs, Recruiter, DevCard, Extension) - Developer resources (GitHub, docs, changelog) - Sitemap references (static and dynamic API sitemaps)

Add Content-Type and Cache-Control headers for /llms.txt endpoint to comply with llmstxt.org specification requirements: - Content-Type: text/plain; charset=utf-8 - Cache-Control: public, max-age=86400 (24 hours)

Add AI agent resources section to robots.txt pointing to llms.txt for improved discoverability by AI crawlers and agents.

Add API routes that generate markdown versions of key pages: - /api/md/sources - Sources directory in markdown - /api/md/tags - Tags directory in markdown - /api/md/squads - Public Squads directory in markdown Features: - CDN-cached responses (1 hour TTL with stale-while-revalidate) - Proper Content-Type headers for llms.txt compliance - On-demand generation (no build time impact)

Map .md URLs to markdown API routes: - /sources.md -> /api/md/sources - /tags.md -> /api/md/tags - /squads/discover.md -> /api/md/squads This follows the llms.txt spec for providing markdown versions of pages at the same URL with .md appended.

Add text/plain; charset=utf-8 Content-Type header for all .md routes to ensure proper handling by AI agents per llms.txt specification.

Reference the new .md page versions in llms.txt: - /sources.md - Sources directory in markdown - /tags.md - Tags directory in markdown - /squads/discover.md - Squads directory in markdown This helps AI agents discover the markdown versions of key pages.

vercel · 2026-01-25T10:11:53Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Review	Updated (UTC)
daily-webapp	Ready	Preview	Jan 25, 2026 10:22am

1 Skipped Deployment

Project	Deployment	Review	Updated (UTC)
storybook	Ignored		Jan 25, 2026 10:22am

- Fix type-only import for SourceCategory in squads.ts - Fix prettier formatting in tags.ts (ternary operator line breaks) - Add curly braces to if statements in tags.ts

nimrodkra · 2026-01-25T10:14:33Z

@claude please review this pr

claude · 2026-01-25T10:14:49Z

Claude finished @nimrodkra's task in 1m 23s —— View job

Code Review Complete ✓

Todo List

Read and understand all changed files
Review llms.txt implementation
Review markdown API routes (sources, tags, squads)
Review next.config.ts changes
Review robots.txt changes
Provide comprehensive feedback

Overall Assessment

This is a solid implementation of the llms.txt specification. The code is clean, well-structured, and follows the spec correctly. I found a few minor issues and suggestions for improvement.

Critical Issues

None found! 🎉

Major Issues

1. Incorrect Cache-Control Header in next.config.ts

Location: packages/webapp/next.config.ts:285

{
  source: '/llms.txt',
  headers: [
    { key: 'Content-Type', value: 'text/plain; charset=utf-8' },
    { key: 'Cache-Control', value: 'public, max-age=86400' },
  ],
},

Issue: The llms.txt file has a 24-hour cache (max-age=86400), but it's a static file that won't change frequently. This is fine, but it's inconsistent with the markdown API routes which use s-maxage=3600 (1 hour) with stale-while-revalidate=86400.

Recommendation: Either:

Use consistent caching between static llms.txt and the markdown endpoints, OR
Add stale-while-revalidate to the llms.txt header for better performance: public, max-age=86400, stale-while-revalidate=604800

Fix this →

Minor Issues

1. Missing Error Type in Catch Blocks

Locations:

packages/webapp/pages/api/md/sources.ts:65-67
packages/webapp/pages/api/md/tags.ts:92-94
packages/webapp/pages/api/md/squads.ts:103-105

Issue: All three API routes catch errors without typing them. While console.error works with unknown, it's better practice to type error handling explicitly.

Current:

} catch (error) {
  // eslint-disable-next-line no-console
  console.error('Error generating sources markdown:', error);
  res.status(500).send('Internal server error');
}

Suggested:

} catch (error) {
  console.error('Error generating sources markdown:', error);
  res.status(500).send('Internal server error');
}

Or if you need to access error properties:

} catch (error: unknown) {
  const message = error instanceof Error ? error.message : 'Unknown error';
  console.error('Error generating sources markdown:', message);
  res.status(500).send('Internal server error');
}

Note: The eslint-disable-next-line no-console comments are fine for server-side logging in API routes.

2. Potential XSS Risk in Markdown Generation

Locations:

packages/webapp/pages/api/md/sources.ts:13-16
packages/webapp/pages/api/md/squads.ts:19-25

Issue: Source/squad names and descriptions from the API are directly interpolated into markdown without sanitization. If an attacker could inject malicious content into source/squad metadata in the backend, it could result in XSS when AI agents parse this markdown.

Risk Level: Low (requires backend compromise), but worth noting.

Suggestion: Consider using a markdown escaping utility or sanitization function for user-generated content. Example:

const escapeMarkdown = (text: string): string => {
  return text.replace(/[\\`*_{}[\]()#+\-.!]/g, '\\$&');
};

const formatSource = (source: Source): string => {
  const name = escapeMarkdown(source.name);
  const description = source.description ? `: ${escapeMarkdown(source.description)}` : '';
  return `- [${name}](/sources/${source.handle})${description}`;
};

Fix this →

3. Inconsistent Content-Type Configuration

Location: packages/webapp/next.config.ts:289-293

Issue: You configure headers for /:path*.md to set Content-Type: text/plain, but this pattern will match any .md file in any path. This could affect legitimate markdown files in the future.

Current:

{
  source: '/:path*.md',
  headers: [
    { key: 'Content-Type', value: 'text/plain; charset=utf-8' },
  ],
},

Suggestion: Be more specific to only match your intended routes:

{
  source: '/:path(sources|tags|squads/discover).md',
  headers: [
    { key: 'Content-Type', value: 'text/plain; charset=utf-8' },
  ],
},

Fix this →

Positive Observations

✅ Excellent adherence to llms.txt spec: Content-Type headers are correct, structure follows guidelines
✅ Good caching strategy: Using s-maxage with stale-while-revalidate for CDN efficiency
✅ Clean separation of concerns: API routes are well-structured and focused
✅ Proper HTTP method validation: All routes correctly reject non-GET requests
✅ Good error handling: Consistent error responses across all routes
✅ Comprehensive documentation: The llms.txt file is thorough and well-organized
✅ Smart URL structure: Using rewrites to serve API routes at clean .md URLs
✅ Robots.txt integration: Correctly references llms.txt per spec

Suggestions for Enhancement

Add TypeScript return type annotations to the format functions in the API routes (though they're inferred correctly)
Consider adding a sitemap index for the markdown pages in sitemap.txt
Add JSDoc comments to the API route handlers explaining their purpose
Consider rate limiting on these endpoints if they become heavily used by AI agents

Testing Recommendations

Per your test plan, ensure you verify:

✅ Content-Type headers are correct
✅ Cache-Control headers work as expected
⚠️ Test with special characters in source/squad names and descriptions
✅ Verify markdown formatting renders correctly in AI tools
✅ Test error scenarios (GraphQL failures, network issues)

Summary

This is a well-implemented feature that correctly follows the llms.txt specification. The code is production-ready with only minor suggestions for improvement. The main issues are:

Consider consistent caching strategy or add stale-while-revalidate to llms.txt
Consider escaping user-generated content in markdown to prevent potential XSS
Make the Content-Type header pattern more specific

Great work! 🚀

- Add stale-while-revalidate to llms.txt cache header for consistency - Add explicit error: unknown type to catch blocks in all API routes - Add escapeMarkdown utility to sanitize user-generated content (XSS prevention) - Make Content-Type header pattern more specific to only match intended routes

nimrodkra · 2026-01-25T10:18:49Z

@claude let's do a 2nd round of review

claude · 2026-01-25T10:19:00Z

Claude finished @nimrodkra's task in 1m 6s —— View job

Second Round Review Complete ✅

Todo List

Read all changed files in the PR
Review recent commits to understand what was addressed
Analyze current implementation for any remaining or new issues
Provide comprehensive second-round review feedback

Excellent Progress! 🎉

All major and minor issues from the first review have been properly addressed in commit fea94c5. The implementation is now production-ready.

What Was Fixed ✅

1. Cache-Control Header for llms.txt

Status: ✅ FIXED

Location: packages/webapp/next.config.ts:287

{
  key: 'Cache-Control',
  value: 'public, max-age=86400, stale-while-revalidate=604800',
}

✅ Now includes stale-while-revalidate=604800 for better CDN performance and consistency with markdown routes.

2. Explicit Error Typing

Status: ✅ FIXED

Locations: All three API routes now have proper error typing

} catch (error: unknown) {
  // eslint-disable-next-line no-console
  console.error('Error generating sources markdown:', error);
  res.status(500).send('Internal server error');
}

✅ Explicit error: unknown type annotation added to all catch blocks.

3. XSS Prevention with Markdown Escaping

Status: ✅ FIXED

Locations:

packages/webapp/pages/api/md/sources.ts:17-19, 22-26
packages/webapp/pages/api/md/squads.ts:23-25, 28-35, 85

const escapeMarkdown = (text: string): string => {
  return text.replace(/[\\`*_{}[\]()#+\-.!|]/g, '\\$&');
};

✅ Comprehensive markdown escaping utility added with proper character escaping
✅ Applied to all user-generated content (source names/descriptions, squad names/descriptions, category titles)
✅ Prevents potential XSS attacks through malicious markdown injection

4. Content-Type Header Pattern

Status: ✅ FIXED

Location: packages/webapp/next.config.ts:292

{
  source: '/:path(sources|tags|squads/discover).md',
  headers: [
    { key: 'Content-Type', value: 'text/plain; charset=utf-8' },
  ],
},

✅ Pattern is now specific to only intended routes (sources, tags, squads/discover)
✅ Won't accidentally match other .md files in the future

Additional Observations

Tags Route - No Escaping Applied

Location: packages/webapp/pages/api/md/tags.ts:12-14, 58

The tags route formats tag values but doesn't use the escapeMarkdown utility:

const formatTag = (tag: { value: string }): string => {
  return `- [${tag.value}](/tags/${tag.value})`;
};

Assessment: This is acceptable because:

Tag values are controlled/validated by the backend system
Tags typically contain only alphanumeric characters and hyphens
They're not free-form user input like source/squad descriptions

However, for defense-in-depth, you could apply escaping here too. Current implementation is fine though.

JSDoc Comments

While the escapeMarkdown functions have good inline comments explaining their purpose, the main handler functions could benefit from JSDoc comments. This is a nice-to-have, not a requirement:

/**
 * API route handler that generates a markdown version of the sources directory.
 * Returns a formatted markdown document with trending, popular, recent, and video sources.
 * 
 * @route GET /api/md/sources
 * @returns {200} Markdown formatted sources directory
 * @returns {405} Method not allowed for non-GET requests
 * @returns {500} Internal server error if GraphQL query fails
 */
const handler = async (req: NextApiRequest, res: NextApiResponse): Promise<void> => {
  // ...
};

This is purely a documentation enhancement and not critical.

Final Assessment

Code Quality: Excellent ✅

All review feedback properly addressed
Security concerns resolved with proper escaping
Consistent caching strategy across all endpoints
Clean, maintainable code structure

Spec Compliance: Perfect ✅

Follows llms.txt specification correctly
Proper Content-Type headers
Appropriate cache directives
Good integration with robots.txt

Security: Strong ✅

XSS prevention implemented
Error handling is safe
No sensitive data exposure

Performance: Optimized ✅

CDN-friendly caching with stale-while-revalidate
Efficient GraphQL queries
No unnecessary data fetching

Recommendation

Approved for merge! 🚀

This implementation is production-ready. All critical issues have been resolved, and the remaining suggestions (JSDoc comments, potential tag escaping) are optional enhancements that don't impact functionality or security.

Great work addressing all the feedback quickly and thoroughly!

nimrodkra added 7 commits January 25, 2026 12:03

feat(webapp): configure headers for llms.txt

656bfb3

Add Content-Type and Cache-Control headers for /llms.txt endpoint to comply with llmstxt.org specification requirements: - Content-Type: text/plain; charset=utf-8 - Cache-Control: public, max-age=86400 (24 hours)

feat(webapp): add llms.txt reference to robots.txt

0ee3498

Add AI agent resources section to robots.txt pointing to llms.txt for improved discoverability by AI crawlers and agents.

feat(webapp): add Content-Type headers for .md routes

97338cd

Add text/plain; charset=utf-8 Content-Type header for all .md routes to ensure proper handling by AI agents per llms.txt specification.

nimrodkra requested a review from a team as a code owner January 25, 2026 10:11

nimrodkra requested review from AmarTrebinjac, capJavert, omBratteng and rebelchris January 25, 2026 10:11

vercel bot had a problem deploying to Preview – daily-webapp January 25, 2026 10:12 Failure

fix(webapp): fix lint errors in markdown API routes

48feac8

- Fix type-only import for SourceCategory in squads.ts - Fix prettier formatting in tags.ts (ternary operator line breaks) - Add curly braces to if statements in tags.ts

vercel bot deployed to Preview – daily-webapp January 25, 2026 10:22 View deployment

idoshamun approved these changes Jan 25, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(webapp): add llms.txt and markdown page versions for AI agent visibility #5369

feat(webapp): add llms.txt and markdown page versions for AI agent visibility #5369

nimrodkra commented Jan 25, 2026 •

edited by github-actions bot

Loading

Uh oh!

vercel bot commented Jan 25, 2026 •

edited

Loading

Uh oh!

nimrodkra commented Jan 25, 2026

Uh oh!

claude bot commented Jan 25, 2026 •

edited

Loading

Uh oh!

nimrodkra commented Jan 25, 2026

Uh oh!

claude bot commented Jan 25, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat(webapp): add llms.txt and markdown page versions for AI agent visibility #5369

Are you sure you want to change the base?

feat(webapp): add llms.txt and markdown page versions for AI agent visibility #5369

Conversation

nimrodkra commented Jan 25, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

llms.txt Implementation

Markdown Page Versions

Test plan

Preview domain

Uh oh!

vercel bot commented Jan 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nimrodkra commented Jan 25, 2026

Uh oh!

claude bot commented Jan 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review Complete ✓

Todo List

Overall Assessment

Critical Issues

Major Issues

1. Incorrect Cache-Control Header in next.config.ts

Minor Issues

1. Missing Error Type in Catch Blocks

2. Potential XSS Risk in Markdown Generation

3. Inconsistent Content-Type Configuration

Positive Observations

Suggestions for Enhancement

Testing Recommendations

Summary

Uh oh!

nimrodkra commented Jan 25, 2026

Uh oh!

claude bot commented Jan 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Second Round Review Complete ✅

Todo List

Excellent Progress! 🎉

What Was Fixed ✅

1. Cache-Control Header for llms.txt

2. Explicit Error Typing

3. XSS Prevention with Markdown Escaping

4. Content-Type Header Pattern

Additional Observations

Tags Route - No Escaping Applied

JSDoc Comments

Final Assessment

Code Quality: Excellent ✅

Spec Compliance: Perfect ✅

Security: Strong ✅

Performance: Optimized ✅

Recommendation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

nimrodkra commented Jan 25, 2026 •

edited by github-actions bot

Loading

vercel bot commented Jan 25, 2026 •

edited

Loading

claude bot commented Jan 25, 2026 •

edited

Loading

claude bot commented Jan 25, 2026 •

edited

Loading