Skip to content

Commit 3f3129c

Browse files
Pijukatelvdusek
andauthored
feat: Add iterate methods for paginated collections (#771)
Support more convenient iteration through paginated endpoints of collection clients. Closes: #539 Co-authored-by: Vlada Dusek <v.dusek96@gmail.com>
1 parent b590b5e commit 3f3129c

26 files changed

Lines changed: 1996 additions & 157 deletions

docs/02_concepts/08_pagination.mdx

Lines changed: 24 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -12,9 +12,10 @@ import ApiLink from '@site/src/components/ApiLink';
1212

1313
import PaginationAsyncExample from '!!raw-loader!./code/08_pagination_async.py';
1414
import PaginationSyncExample from '!!raw-loader!./code/08_pagination_sync.py';
15-
1615
import IterateItemsAsyncExample from '!!raw-loader!./code/08_iterate_items_async.py';
1716
import IterateItemsSyncExample from '!!raw-loader!./code/08_iterate_items_sync.py';
17+
import IterateCollectionAsyncExample from '!!raw-loader!./code/08_iterate_collection_async.py';
18+
import IterateCollectionSyncExample from '!!raw-loader!./code/08_iterate_collection_sync.py';
1819

1920
Most methods named `list` or `list_something` in the Apify client return a <ApiLink to="class/ListPage">`ListPage`</ApiLink> object. This object provides a consistent interface for working with paginated data and includes the following properties:
2021

@@ -45,21 +46,38 @@ The <ApiLink to="class/ListPage">`ListPage`</ApiLink> interface offers several k
4546

4647
## Generator-based iteration
4748

48-
For most use cases, `iterate_items()` is the recommended way to process all items in a dataset. It handles pagination automatically using a Python generator, fetching items in batches behind the scenes so you don't need to manage offsets or limits yourself.
49+
For collection clients, the `iterate` method returns an iterator that lazily fetches as many pages as needed
50+
to retrieve every item matching the filters. For dataset, key-value store and request queue clients, the
51+
matching helpers are `iterate_items`, `iterate_keys` and `iterate_requests`. They handle pagination
52+
automatically, so you don't need to manage offsets, limits or cursors yourself.
53+
54+
The example below iterates over every Actor owned by the current user using a collection client's `iterate`
55+
method:
4956

5057
<Tabs>
5158
<TabItem value="AsyncExample" label="Async client" default>
5259
<CodeBlock className="language-python">
53-
{IterateItemsAsyncExample}
60+
{IterateCollectionAsyncExample}
5461
</CodeBlock>
5562
</TabItem>
5663
<TabItem value="SyncExample" label="Sync client">
5764
<CodeBlock className="language-python">
58-
{IterateItemsSyncExample}
65+
{IterateCollectionSyncExample}
5966
</CodeBlock>
6067
</TabItem>
6168
</Tabs>
6269

63-
`iterate_items()` accepts the same filtering parameters as `list_items()` (`clean`, `fields`, `omit`, `unwind`, `skip_empty`, `skip_hidden`), so you can combine automatic pagination with data filtering.
70+
The next example uses `iterate_items` on a dataset client to stream items past a given offset:
6471

65-
Similarly, `KeyValueStoreClient` provides an `iterate_keys()` method for iterating over all keys in a key-value store without manual pagination.
72+
<Tabs>
73+
<TabItem value="AsyncExample" label="Async client" default>
74+
<CodeBlock className="language-python">
75+
{IterateItemsAsyncExample}
76+
</CodeBlock>
77+
</TabItem>
78+
<TabItem value="SyncExample" label="Sync client">
79+
<CodeBlock className="language-python">
80+
{IterateItemsSyncExample}
81+
</CodeBlock>
82+
</TabItem>
83+
</Tabs>
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
from apify_client import ApifyClientAsync
2+
3+
TOKEN = 'MY-APIFY-TOKEN'
4+
5+
6+
async def main() -> None:
7+
apify_client = ApifyClientAsync(TOKEN)
8+
9+
# Iterate over all Actors owned by the current user, lazily fetching
10+
# as many pages as needed under the hood.
11+
async for actor in apify_client.actors().iterate(my=True):
12+
print(actor.id)
Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
from apify_client import ApifyClient
2+
3+
TOKEN = 'MY-APIFY-TOKEN'
4+
5+
6+
def main() -> None:
7+
apify_client = ApifyClient(TOKEN)
8+
9+
# Iterate over all Actors owned by the current user, lazily fetching
10+
# as many pages as needed under the hood.
11+
for actor in apify_client.actors().iterate(my=True):
12+
print(actor.id)
13+
14+
15+
if __name__ == '__main__':
16+
main()

docs/02_concepts/code/08_iterate_items_async.py

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,11 @@ async def main() -> None:
77
apify_client = ApifyClientAsync(TOKEN)
88
dataset_client = apify_client.dataset('dataset-id')
99

10-
# Iterate through all items automatically.
11-
async for item in dataset_client.iterate_items():
12-
print(item)
10+
# Define the pagination parameters
11+
limit = 1500 # Number of items in total
12+
offset = 100 # Starting offset
13+
14+
# Iterate through items automatically, lazily sending as many API calls
15+
# as needed and receiving items in chunks.
16+
async for item in dataset_client.iterate_items(limit=limit, offset=offset):
17+
print(item) # Process the item as needed

docs/02_concepts/code/08_iterate_items_sync.py

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -7,9 +7,14 @@ def main() -> None:
77
apify_client = ApifyClient(TOKEN)
88
dataset_client = apify_client.dataset('dataset-id')
99

10-
# Iterate through all items automatically.
11-
for item in dataset_client.iterate_items():
12-
print(item)
10+
# Define the pagination parameters
11+
limit = 1500 # Number of items in total
12+
offset = 100 # Starting offset
13+
14+
# Iterate through items automatically, lazily sending as many API calls
15+
# as needed and receiving items in chunks.
16+
for item in dataset_client.iterate_items(limit=limit, offset=offset):
17+
print(item) # Process the item as needed
1318

1419

1520
if __name__ == '__main__':

docs/02_concepts/code/08_pagination_async.py

Lines changed: 8 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -10,26 +10,15 @@ async def main() -> None:
1010
dataset_client = apify_client.dataset('dataset-id')
1111

1212
# Define the pagination parameters
13-
limit = 1000 # Number of items per page
13+
limit = 1000 # Number items to request from API
1414
offset = 0 # Starting offset
15-
all_items = [] # List to store all fetched items
1615

17-
while True:
18-
# Fetch a page of items
19-
response = await dataset_client.list_items(limit=limit, offset=offset)
20-
items = response.items
21-
total = response.total
16+
# Send single API call to fetch paginated items.
17+
# (number of items per single call can be limited by API)
18+
paginated_items = await dataset_client.list_items(limit=limit, offset=offset)
2219

23-
print(f'Fetched {len(items)} items')
20+
# Inspect pagination metadata returned by API
21+
print(paginated_items.total)
2422

25-
# Add the fetched items to the complete list
26-
all_items.extend(items)
27-
28-
# Exit the loop if there are no more items to fetch
29-
if offset + limit >= total:
30-
break
31-
32-
# Increment the offset for the next page
33-
offset += limit
34-
35-
print(f'Overall fetched {len(all_items)} items')
23+
for item in paginated_items.items:
24+
print(item) # Process the item as needed

docs/02_concepts/code/08_pagination_sync.py

Lines changed: 8 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -10,26 +10,15 @@ def main() -> None:
1010
dataset_client = apify_client.dataset('dataset-id')
1111

1212
# Define the pagination parameters
13-
limit = 1000 # Number of items per page
13+
limit = 1000 # Number items to request from API
1414
offset = 0 # Starting offset
15-
all_items = [] # List to store all fetched items
1615

17-
while True:
18-
# Fetch a page of items
19-
response = dataset_client.list_items(limit=limit, offset=offset)
20-
items = response.items
21-
total = response.total
16+
# Send single API call to fetch paginated items.
17+
# (number of items per single call can be limited by API)
18+
paginated_items = dataset_client.list_items(limit=limit, offset=offset)
2219

23-
print(f'Fetched {len(items)} items')
20+
# Inspect pagination metadata returned by API
21+
print(paginated_items.total)
2422

25-
# Add the fetched items to the complete list
26-
all_items.extend(items)
27-
28-
# Exit the loop if there are no more items to fetch
29-
if offset + limit >= total:
30-
break
31-
32-
# Increment the offset for the next page
33-
offset += limit
34-
35-
print(f'Overall fetched {len(all_items)} items')
23+
for item in paginated_items.items:
24+
print(item) # Process the item as needed

docs/04_upgrading/upgrading_to_v3.mdx

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -320,3 +320,22 @@ from apify_client._literals import WebhookEventType
320320

321321
events: list[WebhookEventType] = ['ACTOR.RUN.SUCCEEDED', 'ACTOR.RUN.FAILED']
322322
```
323+
324+
## Async `iterate_*` methods are plain functions, not async generators
325+
326+
Async iteration helpers — <ApiLink to="class/DatasetClientAsync#iterate_items">`DatasetClientAsync.iterate_items()`</ApiLink> and <ApiLink to="class/KeyValueStoreClientAsync#iterate_keys">`KeyValueStoreClientAsync.iterate_keys()`</ApiLink> — were previously declared as `async def` (async generator functions). They are now plain `def` functions that return an `AsyncIterator` produced by a shared pagination helper.
327+
328+
Consumer-side iteration is unchanged — `async for item in client.iterate_items(...)` works the same in both versions:
329+
330+
```python
331+
# Works in both v2 and v3
332+
async for item in client.dataset('my-dataset').iterate_items():
333+
print(item)
334+
```
335+
336+
The difference matters only if your code inspects the function itself:
337+
338+
- The call is no longer a coroutine function — `inspect.iscoroutinefunction(client.iterate_items)` returns `False`, and `inspect.isasyncgenfunction(client.iterate_items)` also returns `False` (it returns a regular function whose result is an async iterator).
339+
- Type checkers see `def (...) -> AsyncIterator[T]` instead of `async def (...) -> AsyncIterator[T]`. Annotations on variables that hold the call's result may need to change from `AsyncGenerator[T, None]` to `AsyncIterator[T]`.
340+
341+
A new <ApiLink to="class/RequestQueueClientAsync#iterate_requests">`RequestQueueClientAsync.iterate_requests()`</ApiLink> helper is also introduced and follows the same `def ... -> AsyncIterator[T]` shape.

0 commit comments

Comments
 (0)