-
Notifications
You must be signed in to change notification settings - Fork 290
feat(webapp): add llms.txt and markdown page versions for AI agent visibility #5369
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Add llms.txt following the llmstxt.org specification to improve discoverability by AI agents and LLMs. The file includes: - Platform overview and key capabilities - Core features (Feed, Squads, Sources, Tags, Posts, Search) - Products (Plus, Jobs, Recruiter, DevCard, Extension) - Developer resources (GitHub, docs, changelog) - Sitemap references (static and dynamic API sitemaps)
Add Content-Type and Cache-Control headers for /llms.txt endpoint to comply with llmstxt.org specification requirements: - Content-Type: text/plain; charset=utf-8 - Cache-Control: public, max-age=86400 (24 hours)
Add AI agent resources section to robots.txt pointing to llms.txt for improved discoverability by AI crawlers and agents.
Add API routes that generate markdown versions of key pages: - /api/md/sources - Sources directory in markdown - /api/md/tags - Tags directory in markdown - /api/md/squads - Public Squads directory in markdown Features: - CDN-cached responses (1 hour TTL with stale-while-revalidate) - Proper Content-Type headers for llms.txt compliance - On-demand generation (no build time impact)
Map .md URLs to markdown API routes: - /sources.md -> /api/md/sources - /tags.md -> /api/md/tags - /squads/discover.md -> /api/md/squads This follows the llms.txt spec for providing markdown versions of pages at the same URL with .md appended.
Add text/plain; charset=utf-8 Content-Type header for all .md routes to ensure proper handling by AI agents per llms.txt specification.
Reference the new .md page versions in llms.txt: - /sources.md - Sources directory in markdown - /tags.md - Tags directory in markdown - /squads/discover.md - Squads directory in markdown This helps AI agents discover the markdown versions of key pages.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
- Fix type-only import for SourceCategory in squads.ts - Fix prettier formatting in tags.ts (ternary operator line breaks) - Add curly braces to if statements in tags.ts
|
@claude please review this pr |
|
Claude finished @nimrodkra's task in 1m 23s —— View job Code Review Complete ✓Todo List
Overall AssessmentThis is a solid implementation of the llms.txt specification. The code is clean, well-structured, and follows the spec correctly. I found a few minor issues and suggestions for improvement. Critical IssuesNone found! 🎉 Major Issues1. Incorrect Cache-Control Header in next.config.tsLocation: {
source: '/llms.txt',
headers: [
{ key: 'Content-Type', value: 'text/plain; charset=utf-8' },
{ key: 'Cache-Control', value: 'public, max-age=86400' },
],
},Issue: The Recommendation: Either:
Minor Issues1. Missing Error Type in Catch BlocksLocations:
Issue: All three API routes catch errors without typing them. While Current: } catch (error) {
// eslint-disable-next-line no-console
console.error('Error generating sources markdown:', error);
res.status(500).send('Internal server error');
}Suggested: } catch (error) {
console.error('Error generating sources markdown:', error);
res.status(500).send('Internal server error');
}Or if you need to access error properties: } catch (error: unknown) {
const message = error instanceof Error ? error.message : 'Unknown error';
console.error('Error generating sources markdown:', message);
res.status(500).send('Internal server error');
}Note: The 2. Potential XSS Risk in Markdown GenerationLocations:
Issue: Source/squad names and descriptions from the API are directly interpolated into markdown without sanitization. If an attacker could inject malicious content into source/squad metadata in the backend, it could result in XSS when AI agents parse this markdown. Risk Level: Low (requires backend compromise), but worth noting. Suggestion: Consider using a markdown escaping utility or sanitization function for user-generated content. Example: const escapeMarkdown = (text: string): string => {
return text.replace(/[\\`*_{}[\]()#+\-.!]/g, '\\$&');
};
const formatSource = (source: Source): string => {
const name = escapeMarkdown(source.name);
const description = source.description ? `: ${escapeMarkdown(source.description)}` : '';
return `- [${name}](/sources/${source.handle})${description}`;
};3. Inconsistent Content-Type ConfigurationLocation: Issue: You configure headers for Current: {
source: '/:path*.md',
headers: [
{ key: 'Content-Type', value: 'text/plain; charset=utf-8' },
],
},Suggestion: Be more specific to only match your intended routes: {
source: '/:path(sources|tags|squads/discover).md',
headers: [
{ key: 'Content-Type', value: 'text/plain; charset=utf-8' },
],
},Positive Observations✅ Excellent adherence to llms.txt spec: Content-Type headers are correct, structure follows guidelines Suggestions for Enhancement
Testing RecommendationsPer your test plan, ensure you verify:
SummaryThis is a well-implemented feature that correctly follows the llms.txt specification. The code is production-ready with only minor suggestions for improvement. The main issues are:
Great work! 🚀 |
- Add stale-while-revalidate to llms.txt cache header for consistency - Add explicit error: unknown type to catch blocks in all API routes - Add escapeMarkdown utility to sanitize user-generated content (XSS prevention) - Make Content-Type header pattern more specific to only match intended routes
|
@claude let's do a 2nd round of review |
|
Claude finished @nimrodkra's task in 1m 6s —— View job Second Round Review Complete ✅Todo List
Excellent Progress! 🎉All major and minor issues from the first review have been properly addressed in commit What Was Fixed ✅1. Cache-Control Header for llms.txtStatus: ✅ FIXED Location: {
key: 'Cache-Control',
value: 'public, max-age=86400, stale-while-revalidate=604800',
}✅ Now includes 2. Explicit Error TypingStatus: ✅ FIXED Locations: All three API routes now have proper error typing } catch (error: unknown) {
// eslint-disable-next-line no-console
console.error('Error generating sources markdown:', error);
res.status(500).send('Internal server error');
}✅ Explicit 3. XSS Prevention with Markdown EscapingStatus: ✅ FIXED Locations:
const escapeMarkdown = (text: string): string => {
return text.replace(/[\\`*_{}[\]()#+\-.!|]/g, '\\$&');
};✅ Comprehensive markdown escaping utility added with proper character escaping 4. Content-Type Header PatternStatus: ✅ FIXED Location: {
source: '/:path(sources|tags|squads/discover).md',
headers: [
{ key: 'Content-Type', value: 'text/plain; charset=utf-8' },
],
},✅ Pattern is now specific to only intended routes (sources, tags, squads/discover) Additional ObservationsTags Route - No Escaping AppliedLocation: The tags route formats tag values but doesn't use the const formatTag = (tag: { value: string }): string => {
return `- [${tag.value}](/tags/${tag.value})`;
};Assessment: This is acceptable because:
However, for defense-in-depth, you could apply escaping here too. Current implementation is fine though. JSDoc CommentsWhile the /**
* API route handler that generates a markdown version of the sources directory.
* Returns a formatted markdown document with trending, popular, recent, and video sources.
*
* @route GET /api/md/sources
* @returns {200} Markdown formatted sources directory
* @returns {405} Method not allowed for non-GET requests
* @returns {500} Internal server error if GraphQL query fails
*/
const handler = async (req: NextApiRequest, res: NextApiResponse): Promise<void> => {
// ...
};This is purely a documentation enhancement and not critical. Final AssessmentCode Quality: Excellent ✅
Spec Compliance: Perfect ✅
Security: Strong ✅
Performance: Optimized ✅
RecommendationApproved for merge! 🚀 This implementation is production-ready. All critical issues have been resolved, and the remaining suggestions (JSDoc comments, potential tag escaping) are optional enhancements that don't impact functionality or security. Great work addressing all the feedback quickly and thoroughly! |
Summary
Implements the llms.txt specification to improve AI agent visibility and discoverability for daily.dev.
/llms.txtwith comprehensive platform documentation for AI agents/sources.md,/tags.md,/squads/discover.md)Changes
llms.txt Implementation
/packages/webapp/public/llms.txtwith platform overview, features, products, and sitemap referencesrobots.txtreference to llms.txt for crawler discoveryContent-Type: text/plain; charset=utf-8headers per specMarkdown Page Versions
/api/md/sources- Sources directory/api/md/tags- Tags directory/api/md/squads- Squads directory.mdURLs (/sources.md,/tags.md,/squads/discover.md)Test plan
/llms.txtis accessible and returns correct Content-Type/sources.md,/tags.md,/squads/discover.mdreturn markdown contentPreview domain
https://feat-llms-txt.preview.app.daily.dev