Skip to content

Conversation

@nimrodkra
Copy link
Member

@nimrodkra nimrodkra commented Jan 25, 2026

Summary

Implements the llms.txt specification to improve AI agent visibility and discoverability for daily.dev.

  • Add /llms.txt with comprehensive platform documentation for AI agents
  • Add markdown versions of key directory pages (/sources.md, /tags.md, /squads/discover.md)
  • Configure proper headers and rewrites for all new endpoints

Changes

llms.txt Implementation

  • Created /packages/webapp/public/llms.txt with platform overview, features, products, and sitemap references
  • Added robots.txt reference to llms.txt for crawler discovery
  • Configured Content-Type: text/plain; charset=utf-8 headers per spec

Markdown Page Versions

  • Created API routes for generating markdown versions of directory pages:
    • /api/md/sources - Sources directory
    • /api/md/tags - Tags directory
    • /api/md/squads - Squads directory
  • Added rewrites to serve at .md URLs (/sources.md, /tags.md, /squads/discover.md)
  • CDN-cached responses (1 hour TTL with stale-while-revalidate)

Test plan

  • Verify /llms.txt is accessible and returns correct Content-Type
  • Verify /sources.md, /tags.md, /squads/discover.md return markdown content
  • Verify Cache-Control headers are set correctly
  • Test with AI tools (Claude, ChatGPT) to ensure discoverability

Preview domain

https://feat-llms-txt.preview.app.daily.dev

Add llms.txt following the llmstxt.org specification to improve
discoverability by AI agents and LLMs. The file includes:
- Platform overview and key capabilities
- Core features (Feed, Squads, Sources, Tags, Posts, Search)
- Products (Plus, Jobs, Recruiter, DevCard, Extension)
- Developer resources (GitHub, docs, changelog)
- Sitemap references (static and dynamic API sitemaps)
Add Content-Type and Cache-Control headers for /llms.txt endpoint
to comply with llmstxt.org specification requirements:
- Content-Type: text/plain; charset=utf-8
- Cache-Control: public, max-age=86400 (24 hours)
Add AI agent resources section to robots.txt pointing to llms.txt
for improved discoverability by AI crawlers and agents.
Add API routes that generate markdown versions of key pages:
- /api/md/sources - Sources directory in markdown
- /api/md/tags - Tags directory in markdown
- /api/md/squads - Public Squads directory in markdown

Features:
- CDN-cached responses (1 hour TTL with stale-while-revalidate)
- Proper Content-Type headers for llms.txt compliance
- On-demand generation (no build time impact)
Map .md URLs to markdown API routes:
- /sources.md -> /api/md/sources
- /tags.md -> /api/md/tags
- /squads/discover.md -> /api/md/squads

This follows the llms.txt spec for providing markdown versions
of pages at the same URL with .md appended.
Add text/plain; charset=utf-8 Content-Type header for all .md routes
to ensure proper handling by AI agents per llms.txt specification.
Reference the new .md page versions in llms.txt:
- /sources.md - Sources directory in markdown
- /tags.md - Tags directory in markdown
- /squads/discover.md - Squads directory in markdown

This helps AI agents discover the markdown versions of key pages.
@vercel
Copy link

vercel bot commented Jan 25, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Review Updated (UTC)
daily-webapp Ready Ready Preview Jan 25, 2026 10:22am
1 Skipped Deployment
Project Deployment Review Updated (UTC)
storybook Ignored Ignored Jan 25, 2026 10:22am

Request Review

- Fix type-only import for SourceCategory in squads.ts
- Fix prettier formatting in tags.ts (ternary operator line breaks)
- Add curly braces to if statements in tags.ts
@nimrodkra
Copy link
Member Author

@claude please review this pr

@claude
Copy link
Contributor

claude bot commented Jan 25, 2026

Claude finished @nimrodkra's task in 1m 23s —— View job


Code Review Complete ✓

Todo List

  • Read and understand all changed files
  • Review llms.txt implementation
  • Review markdown API routes (sources, tags, squads)
  • Review next.config.ts changes
  • Review robots.txt changes
  • Provide comprehensive feedback

Overall Assessment

This is a solid implementation of the llms.txt specification. The code is clean, well-structured, and follows the spec correctly. I found a few minor issues and suggestions for improvement.

Critical Issues

None found! 🎉

Major Issues

1. Incorrect Cache-Control Header in next.config.ts

Location: packages/webapp/next.config.ts:285

{
  source: '/llms.txt',
  headers: [
    { key: 'Content-Type', value: 'text/plain; charset=utf-8' },
    { key: 'Cache-Control', value: 'public, max-age=86400' },
  ],
},

Issue: The llms.txt file has a 24-hour cache (max-age=86400), but it's a static file that won't change frequently. This is fine, but it's inconsistent with the markdown API routes which use s-maxage=3600 (1 hour) with stale-while-revalidate=86400.

Recommendation: Either:

  • Use consistent caching between static llms.txt and the markdown endpoints, OR
  • Add stale-while-revalidate to the llms.txt header for better performance: public, max-age=86400, stale-while-revalidate=604800

Fix this →

Minor Issues

1. Missing Error Type in Catch Blocks

Locations:

  • packages/webapp/pages/api/md/sources.ts:65-67
  • packages/webapp/pages/api/md/tags.ts:92-94
  • packages/webapp/pages/api/md/squads.ts:103-105

Issue: All three API routes catch errors without typing them. While console.error works with unknown, it's better practice to type error handling explicitly.

Current:

} catch (error) {
  // eslint-disable-next-line no-console
  console.error('Error generating sources markdown:', error);
  res.status(500).send('Internal server error');
}

Suggested:

} catch (error) {
  console.error('Error generating sources markdown:', error);
  res.status(500).send('Internal server error');
}

Or if you need to access error properties:

} catch (error: unknown) {
  const message = error instanceof Error ? error.message : 'Unknown error';
  console.error('Error generating sources markdown:', message);
  res.status(500).send('Internal server error');
}

Note: The eslint-disable-next-line no-console comments are fine for server-side logging in API routes.

2. Potential XSS Risk in Markdown Generation

Locations:

  • packages/webapp/pages/api/md/sources.ts:13-16
  • packages/webapp/pages/api/md/squads.ts:19-25

Issue: Source/squad names and descriptions from the API are directly interpolated into markdown without sanitization. If an attacker could inject malicious content into source/squad metadata in the backend, it could result in XSS when AI agents parse this markdown.

Risk Level: Low (requires backend compromise), but worth noting.

Suggestion: Consider using a markdown escaping utility or sanitization function for user-generated content. Example:

const escapeMarkdown = (text: string): string => {
  return text.replace(/[\\`*_{}[\]()#+\-.!]/g, '\\$&');
};

const formatSource = (source: Source): string => {
  const name = escapeMarkdown(source.name);
  const description = source.description ? `: ${escapeMarkdown(source.description)}` : '';
  return `- [${name}](/sources/${source.handle})${description}`;
};

Fix this →

3. Inconsistent Content-Type Configuration

Location: packages/webapp/next.config.ts:289-293

Issue: You configure headers for /:path*.md to set Content-Type: text/plain, but this pattern will match any .md file in any path. This could affect legitimate markdown files in the future.

Current:

{
  source: '/:path*.md',
  headers: [
    { key: 'Content-Type', value: 'text/plain; charset=utf-8' },
  ],
},

Suggestion: Be more specific to only match your intended routes:

{
  source: '/:path(sources|tags|squads/discover).md',
  headers: [
    { key: 'Content-Type', value: 'text/plain; charset=utf-8' },
  ],
},

Fix this →

Positive Observations

Excellent adherence to llms.txt spec: Content-Type headers are correct, structure follows guidelines
Good caching strategy: Using s-maxage with stale-while-revalidate for CDN efficiency
Clean separation of concerns: API routes are well-structured and focused
Proper HTTP method validation: All routes correctly reject non-GET requests
Good error handling: Consistent error responses across all routes
Comprehensive documentation: The llms.txt file is thorough and well-organized
Smart URL structure: Using rewrites to serve API routes at clean .md URLs
Robots.txt integration: Correctly references llms.txt per spec

Suggestions for Enhancement

  1. Add TypeScript return type annotations to the format functions in the API routes (though they're inferred correctly)
  2. Consider adding a sitemap index for the markdown pages in sitemap.txt
  3. Add JSDoc comments to the API route handlers explaining their purpose
  4. Consider rate limiting on these endpoints if they become heavily used by AI agents

Testing Recommendations

Per your test plan, ensure you verify:

  • ✅ Content-Type headers are correct
  • ✅ Cache-Control headers work as expected
  • ⚠️ Test with special characters in source/squad names and descriptions
  • ✅ Verify markdown formatting renders correctly in AI tools
  • ✅ Test error scenarios (GraphQL failures, network issues)

Summary

This is a well-implemented feature that correctly follows the llms.txt specification. The code is production-ready with only minor suggestions for improvement. The main issues are:

  1. Consider consistent caching strategy or add stale-while-revalidate to llms.txt
  2. Consider escaping user-generated content in markdown to prevent potential XSS
  3. Make the Content-Type header pattern more specific

Great work! 🚀


- Add stale-while-revalidate to llms.txt cache header for consistency
- Add explicit error: unknown type to catch blocks in all API routes
- Add escapeMarkdown utility to sanitize user-generated content (XSS prevention)
- Make Content-Type header pattern more specific to only match intended routes
@nimrodkra
Copy link
Member Author

@claude let's do a 2nd round of review

@claude
Copy link
Contributor

claude bot commented Jan 25, 2026

Claude finished @nimrodkra's task in 1m 6s —— View job


Second Round Review Complete ✅

Todo List

  • Read all changed files in the PR
  • Review recent commits to understand what was addressed
  • Analyze current implementation for any remaining or new issues
  • Provide comprehensive second-round review feedback

Excellent Progress! 🎉

All major and minor issues from the first review have been properly addressed in commit fea94c5. The implementation is now production-ready.

What Was Fixed ✅

1. Cache-Control Header for llms.txt

Status:FIXED

Location: packages/webapp/next.config.ts:287

{
  key: 'Cache-Control',
  value: 'public, max-age=86400, stale-while-revalidate=604800',
}

✅ Now includes stale-while-revalidate=604800 for better CDN performance and consistency with markdown routes.

2. Explicit Error Typing

Status:FIXED

Locations: All three API routes now have proper error typing

} catch (error: unknown) {
  // eslint-disable-next-line no-console
  console.error('Error generating sources markdown:', error);
  res.status(500).send('Internal server error');
}

✅ Explicit error: unknown type annotation added to all catch blocks.

3. XSS Prevention with Markdown Escaping

Status:FIXED

Locations:

  • packages/webapp/pages/api/md/sources.ts:17-19, 22-26
  • packages/webapp/pages/api/md/squads.ts:23-25, 28-35, 85
const escapeMarkdown = (text: string): string => {
  return text.replace(/[\\`*_{}[\]()#+\-.!|]/g, '\\$&');
};

✅ Comprehensive markdown escaping utility added with proper character escaping
✅ Applied to all user-generated content (source names/descriptions, squad names/descriptions, category titles)
✅ Prevents potential XSS attacks through malicious markdown injection

4. Content-Type Header Pattern

Status:FIXED

Location: packages/webapp/next.config.ts:292

{
  source: '/:path(sources|tags|squads/discover).md',
  headers: [
    { key: 'Content-Type', value: 'text/plain; charset=utf-8' },
  ],
},

✅ Pattern is now specific to only intended routes (sources, tags, squads/discover)
✅ Won't accidentally match other .md files in the future

Additional Observations

Tags Route - No Escaping Applied

Location: packages/webapp/pages/api/md/tags.ts:12-14, 58

The tags route formats tag values but doesn't use the escapeMarkdown utility:

const formatTag = (tag: { value: string }): string => {
  return `- [${tag.value}](/tags/${tag.value})`;
};

Assessment: This is acceptable because:

  • Tag values are controlled/validated by the backend system
  • Tags typically contain only alphanumeric characters and hyphens
  • They're not free-form user input like source/squad descriptions

However, for defense-in-depth, you could apply escaping here too. Current implementation is fine though.

JSDoc Comments

While the escapeMarkdown functions have good inline comments explaining their purpose, the main handler functions could benefit from JSDoc comments. This is a nice-to-have, not a requirement:

/**
 * API route handler that generates a markdown version of the sources directory.
 * Returns a formatted markdown document with trending, popular, recent, and video sources.
 * 
 * @route GET /api/md/sources
 * @returns {200} Markdown formatted sources directory
 * @returns {405} Method not allowed for non-GET requests
 * @returns {500} Internal server error if GraphQL query fails
 */
const handler = async (req: NextApiRequest, res: NextApiResponse): Promise<void> => {
  // ...
};

This is purely a documentation enhancement and not critical.

Final Assessment

Code Quality: Excellent ✅

  • All review feedback properly addressed
  • Security concerns resolved with proper escaping
  • Consistent caching strategy across all endpoints
  • Clean, maintainable code structure

Spec Compliance: Perfect ✅

  • Follows llms.txt specification correctly
  • Proper Content-Type headers
  • Appropriate cache directives
  • Good integration with robots.txt

Security: Strong ✅

  • XSS prevention implemented
  • Error handling is safe
  • No sensitive data exposure

Performance: Optimized ✅

  • CDN-friendly caching with stale-while-revalidate
  • Efficient GraphQL queries
  • No unnecessary data fetching

Recommendation

Approved for merge! 🚀

This implementation is production-ready. All critical issues have been resolved, and the remaining suggestions (JSDoc comments, potential tag escaping) are optional enhancements that don't impact functionality or security.

Great work addressing all the feedback quickly and thoroughly!


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants