Skip to content

Conversation

@ascorbic
Copy link
Contributor

@ascorbic ascorbic commented May 3, 2025

Summary

Adds support for live data to content collections. Defines a new type of content loader that fetches data at runtime rather than build time, allowing users to get the data with a similar API.

import { defineCollection } from "astro:content";

const products = defineCollection({
  type: "live",
  loader: {
    name: "store-loader",
    loadCollection: async ({ filter }) => {
      // ...
      return {
        entries: products.map((product) => ({
          id: product.id,
          data: product,
        })),
      };
    },
    loadEntry: async ({ filter }) => {
      // ...
      return {
        id: filter.id,
        data: product,
      };
    },
  };
});
export const collections = { products };

Links

@ascorbic ascorbic force-pushed the feat/live-loaders branch 3 times, most recently from fcd8968 to 06aeff3 Compare May 3, 2025 07:19
@ascorbic ascorbic force-pushed the feat/live-loaders branch from 06aeff3 to 9502a4f Compare May 3, 2025 07:24
@ascorbic ascorbic marked this pull request as draft May 3, 2025 07:25
@ascorbic ascorbic changed the title Add live content loaders RFC Live content loaders May 3, 2025
@ascorbic ascorbic self-assigned this May 3, 2025
@ematipico
Copy link
Member

I noticed that the RFC doesn't cover error handling. Can we cover that part? Things that could wrong:

  • timeout while fetching a collection
  • invalid data
  • parsing error of data
  • more?

Do live collections provide a mean to handle the error gracefully? If so how? If not, how users can mitigate the error? Looking forward to see this part covered by the RFC

@sarah11918
Copy link
Member

I know everyone's super excited for real-time, updating data! 🎉 Just some thoughts from me that came to mind while reading!

You've mentioned user confusion between "similar but different" APIs in the drawbacks, but I worry about using identically-named helper functions that do (and take? and return?) different things.

getEntry() and getCollection() are familiar to existing content collections users. (But, not familiar to someone who hasn't used existing collections!) But with different implementations under the hood, I wonder whether the similarity is actually an advantage, or just a greater chance for confusion? (Assuming I'm understanding this properly.)

From a docs/support standpoint:

"What arguments does getCollection() take?" "Oh, depends which kind of collection you're querying. It can take a filtering function, or a query object."

"What does getCollection() return?" "Oh, it depends. There might be a cacheHint object included."

At that point, I'd probably prefer to document a getLiveEntry() and getLiveCollection() and then you can do whatever you want with those that make sense for their specific context, and not have to shoe horn them into existing functions. 😄

I could see keeping identical naming for e.g. people being able to keep their existing querying code when swapping out an existing collection for a live content loader collection. But it doesn't seem like this is a smooth (or even envisioned?) path anyway? They aren't even configured in the same file, so treating these as a completely new and different thing seems reasonable?

I'm also assuming that creating new functions probably also means we run less risk of introducing unexpected behaviour in existing projects by changing getCollection() and getEntry() to now be "smart" and know which kind of collection they are querying?

(Feel free to ignore the rest, but I would be remiss if I didn't at least bring it up...)

There's even a world where "Content Collections" remain "at build/request time" and this is an entirely new paradigm (just to open up the world of possibilities)! These functions feel like an example of being superimposed onto our existing collections where maybe they want to be something different. And if you were allowed to operate "outside the collections/loader box", maybe there are more cases where you'd feel the freedom of flexibility? (e.g. trying to follow the established pattern of a config file for something that isn't really a config...) Maybe it's not even a "loader", but a Firehose!

(Would it also be easier to launch/promote/write about? Easier for content creators not to have to update their existing "content collections" material yet again, but instead get to talk about a "new thing" now: real-time, live, updating data? Avoiding the need to further distinguish between "which version of content collections are you using?")

(Again, assuming I'm understanding everything! This just feels like some pretty significant differences that are, as you said, close, but not exactly the same. to some existing things. And from a docs / support standpoint, these are always the most challenging things to handle!)

@ascorbic
Copy link
Contributor Author

@sarah11918 my rationale is that they are returning the same thing. We already do a lot of smart stuff under the hood so that the same function works for content layer collections, compat-mode collections, and legacy data and content collections. These all have very different internal implementations, but for the user they are all queried in the same way. I think the best argument for separating them is that these are "more different" than the others.

@ascorbic
Copy link
Contributor Author

I've updated the RFC with two changes based on @ematipico and @sarah11918's feedback:

  • change to use separate exports: getLiveCollection and getLiveEntry
  • add explicit error handling. It now returns an object with { data, error } instead of directly returning the data or throwing

@stipsan
Copy link

stipsan commented May 27, 2025

  • support for user-defined Zod schemas, executed at runtime, to validate or transform the data returned by the loader

Is it possible for an Astro Integration to define generated schemas and plug it into the astro typegen pipeline?

For Sanity we're interested in providing a loader creator that can hook into our TypeGen pipeline in order to allow a minimal API that is fully typed:

// src/sanity.types.ts (generated by TypeGen, simplified here)
export type Product = {
  _id: string
  _type: 'product'
  slug: string
  title: string
  category: string[]
}


// src/live.config.ts
import {defineLiveCollection, type ExtractFilterFromType} from '@sanity/astro'

const products = defineLiveCollection({type: 'product'});

export const collections = { products };

With the inference we can type it so that TS would throw on a type that doesn't exist in the Sanity Schema:

const products = defineLiveCollection({type: 'produc'});
                                             // ^? 'produc' doesn't exist. Did you mean 'product'?

A valid type lets us infer the return type of defineLiveCollection({type: 'product'}) to be:

// somehow `@sanity/astro` feeds Astro this typegen
import {type InferData, type InferEntryFilter, type InferCollectionFilter} from '@sanity/astro'
import type { LiveLoader } from "astro/loaders";
import type {Product} from './sanity.types'

type ProductData = InferData<Product>
//   ^? {_id:string; _type:'product'; slug: string; title: string; category: string[]}
type ProductEntryFilter = InferEntryFilter<Product>
//   ^? {slug: string; title: string; category: string[]}
type ProductCollectionFilter = InferCollectionFilter<Product>
//   ^? {_id: string | string[]; slug: string | string[]; title: string | string[]; category: string[]}

interface SanityLoaders {
  'product': LiveLoader<ProductData, ProductEntryFilter, ProductCollectionFilter>
}

export function defineLiveCollection<DocumentType extends keyof SanityLoaders>({type: DocumentType}): SanityLoaders[DocumentType]

And all the usages sites would be fully typed:

const { entries: allProducts } = await getLiveCollection("products");
//               ^? {_id:string; _type:'product'; slug: string; title: string; category: string[]}

const { entries: clothes } = await getLiveCollection("products", {
  categories: ["clothes"]
  // ^? `categories` doesn't exist on `ProductCollectionFilter`, did you mean `category`?
});

const { entry: productById } = await getLiveEntry("products", Astro.params.id);
//             ^? {_id:string; _type:'product'; slug: string; title: string; category: string[]}
const { entry: productBySlug } = await getLiveEntry("products", { id: Astro.params.slug });
//                                                              ^? `id` doesn't exist on `ProductEntryFilter`, did you mean `slug`?

@ascorbic
Copy link
Contributor Author

@stipsan that's interesting. Which part of that would be missing now? Have you seen the existing injectTypes helper that lets you generate a d.ts? The Zod schema is optional, so as it stands, the way you've implemented it there with the generic would type it correctly for the user. You can presumably trust the Sanity API to return the correct data?

@stipsan
Copy link

stipsan commented May 28, 2025

@ascorbic Thanks! I’ll check out that helper and give it a spin in a POC 🙌

You can presumably trust the Sanity API to return the correct data?

Yes and no. We don’t yet enforce server-side schema validation on writes to Content Lake; it’s currently handled client-side by Sanity Studio.

However, Sanity Studio isn’t the only entity that can write to Content Lake. Content Lake accepts any valid JSON document. Users frequently automate content imports from external systems (Shopify, Salesforce, Mux, etc.) directly into Content Lake.

The only strict guarantee Content Lake provides is compliance with its query language rules (GROQ). For example, if I query:

*[_type == "post"]{_type, _id, title}

We know upfront, per GROQ rules, this query returns either an array of objects with keys _type, _id, and title, or an empty array. _id and _type are special system properties (_type must be a string, and _id must be globally unique). From the filter, we know _type is 'post'. But title could be any valid JSON. At runtime, the typesafe declaration becomes:

type Json =
  | string
  | number
  | boolean
  | null
  | { [key: string]: Json | undefined }
  | Json[];

type QueryResult = {
  _id: string;
  _type: 'post';
  title: Json;
}[];

The TypeGen can analyze your sanity.config.ts from Sanity Studio to narrow down types:

type QueryResult = {
  _id: string;
  _type: 'post';
  title: string | null;
}[];

However, since this isn’t enforced, parser codegen becomes beneficial.

It seems achievable by using Astro’s createCodegenDir alongside generating Zod parsers when calling injectTypes. I’ll explore implementing Astro’s Content Loader API to verify this.

I imagine it’d be helpful for integration authors to have a dedicated API for schema codegen, similar to injectTypes, generating Zod schemas usable by both build-time and live loaders. This is particularly beneficial when dealing with live content previews during publishing workflows.

By default, our TypeGen creates types reflective of draft document states. Drafts may be incomplete or temporarily invalid—like images awaiting captions. Hence, customization options for userland are valuable. Users might be fine with handling string | null, or they might prefer coercion via parsers:

// src/sanity.types.ts (simplified example reflecting live preview states)
export type Product = {
  _id: string;
  _type: 'product';
  slug: string | null;
  title: string | null;
  category: string[] | null;
};

import { defineLiveCollection } from '@sanity/astro';

const products = defineLiveCollection({
  type: 'product',
  schema: (schema) => schema.extend({
    title: schema.shape.title.unwrap().default('Untitled'),
    category: z.preprocess((val) => (Array.isArray(val) ? val : []), schema.shape.category.unwrap()),
  }),
// ^? the type of Product is now narrowed to `{...Product, title: string | 'Untitled'; category: string[]}`
});

export const collections = { products };

We frequently see this challenge with live preview-capable applications, especially as the studio schema evolves over time.

@boutell
Copy link

boutell commented Jun 4, 2025

I'm coming at this from a slightly different perspective. For ApostropheCMS, we decided our first priority was to make on-page editing possible within Astro.

To do that, we decided to invert the control flow. While Astro is of course the front end, in a combined ApostropheCMS / Astro project there is a [...slug].astro route that does a lot of the lifting. That Astro route makes a call to ApostropheCMS, which responds with the information Astro needs to render that page.

And even after Astro starts rendering a page template, it still often needs to insert user-edited content at a particular point, in a way that maintains editability. So we do that by providing an ApostropheArea astro component, which in turn invokes various widget-specific Astro components to render different types of content widgets.

This allows ApostropheCMS development to proceed as normal, but with Astro as the rendering engine, mapping the concepts of our CMS one to one to various folders of Astro templates that produce 100% of the actual markup.

Since the CMS must know and manage the structure of all of the data, it makes sense to make those representations in one place (e.g. the CMS is the model layer), and then let Astro concern itself exclusively with presentation (Astro is the view layer).

So at least for right now, we probably wouldn't use live content loaders very much.

However, it could make sense for us to support them in the future, particularly if we find ourselves talking to customers who are less interested in ApostropheCMS "driving the bus" as it were, and more interested in static builds with the addition of some dynamic content from the CMS, managed via the CMS back end.

This isn't a criticism of the new API, which looks well-suited to its purpose, as long as error handling is taken into account. I bring all this up just to create awareness of different perspectives and use cases that might not map to it as expected.

@ascorbic ascorbic changed the title Live content loaders Live content collections Jun 11, 2025
@gingerchew
Copy link

gingerchew commented Jun 14, 2025

I just watched the vod of the TBD the other day and saw that the object passed to defineLiveCollection has a property type: 'live'. Is that redundant now that the live collections are defined in a function of their own as opposed to piggybacking on defineCollection? Feels like a footgun I would gladly end up firing multiple times myself 😅

After talking with Fryuni in the discord, I have a better understanding of why the type property is there. Ignore my previous comment :)

@Alynva
Copy link

Alynva commented Jun 19, 2025

I've read the announcement and the docs, but haven't tested it yet. Is this feature exclusive to SSR? I always find it confusing that the docs don't explicitly state which rendering methods a feature is compatible with. I mean, after understanding what the feature is and what its capabilities are, it's not hard to guess. But it would be way better if it was just stated at the start of the docs page, kinda like the "added in" field...

@ascorbic
Copy link
Contributor Author

Having spent some time working on building loaders, I think it would be good to add lastModified to cacheHint, and possibly remove maxAge because that's not something that loaders can normally decide.

@palockocz
Copy link

Hi, based on a thread on Discord:
https://discord.com/channels/830184174198718474/1390746735290355723

Would it be possible to return other/custom data as well? For example:

return {
  entries: posts.map((post) => ({
    id: post.slug.current,
    data: post,
  })),
  totalPage: 10 // This is not transcribed
};

@deslunes
Copy link

Hi ! I'm really interested in content collections and making it live extends it to more use cases. But I'm a bit concerned about on-request fetching, depending on the collection, it could end up costing me more than my budget, or hit service limits.

I don't really know if this should be part of live content collections, or a separate suggestion, but I think that it could be a great addition to be able to fetch every X amount of time, in a CRON way, cache the data until the new fetch happens.

I think it's a great middle ground between on request fetching and build time fetching, it's not as responsive but gives the developer enough control for the kind of data they want to implement. The content collection could be update every minute, hour, or at specific time, specific day.

@tumes
Copy link

tumes commented Oct 3, 2025

I apologize because I'm just now starting to wrap my head around the internals of Astro and this may be a moot question because I am just not seeing something but: Is there a mechanism to expose bindings from adapters into loaders? Or perhaps that particular burden will fall more on the adapters than the base loader implementation? Mostly curious because Cloudflare requires service bindings for calls within the same zone so the most immediate solution to get around that is to separate things with custom domains but I reckon exposing those bindings for loaders would be useful even outside of that restriction since most of the bindings are for data stores anyway.

edit: It’s super late where I am so I can’t test it until tomorrow, but my last thought when I was drifting to sleep was whether I could hack around this by passing a binding in as a filter. Consequently would it make sense to have an argument where an unexposed data source can be passed down the chain from a runtime call for a collection or entry, or would that be too clunky or infeasible?

Edit 2: Yep, super hacky but it does work to pass it in as a filter. Do you think there's merit to adding another argument for injecting a dependency like a binding to loaders?

@JonathonRP
Copy link

I don't see any where of any statements of if this works fine with just static build or if ssr is needed,
So does this work fine with just static build and deployment?

@delucis
Copy link
Member

delucis commented Dec 2, 2025

So does this work fine with just static build and deployment?

It does but there’s no advantages over the current content collections.

@JonathonRP
Copy link

It does but there’s no advantages over the current content collections.

Interesting, so should work fine like if it was just content collection? but to get the benefits of live ssr is needed, correct?

@delucis
Copy link
Member

delucis commented Dec 2, 2025

Interesting, so should work fine like if it was just content collection? but to get the benefits of live ssr is needed, correct?

Correct. There may be subtle cases where there are advantages either way, but basically that’s right, yes.

For example, maybe you have a really big data source, and only need some of its data to build the site — a live collection would avoid loading all that data up front in a traditional collection loader. On the flip side, because live loaders load content at render time, if you are building a static site it might be less efficient to repeatedly request that data across multiple pages with a live loader than to load it once upfront.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.