-
Notifications
You must be signed in to change notification settings - Fork 702
SqlStorageClient silently swallows write errors, causing data loss #1831
Copy link
Copy link
Open
Labels
t-toolingIssues with this label are in the ownership of the tooling team.Issues with this label are in the ownership of the tooling team.
Description
Background
I've been experimenting with using SqlStorageClient to store dataset data within a SQLite database. During my testing, I've seen quite a few database lock errors, like the following, in my logs:
[crawlee.storage_clients._sql._client_mixin] WARN Error occurred during session transaction: (sqlite3.OperationalError) database is locked
[SQL: INSERT INTO key_value_store_records (key_value_store_id, "key", value, content_type, size) VALUES (?, ?, ?, ?, ?) ON CONFLICT (key_value_store_id, "key") DO UPDATE SET value = excluded.value, content_type = excluded.content_type, size = excluded.size]
[parameters: ('u4IehavotWGCQRNEj', 'xxxxxxx', <memory at 0x7f73dd7b68c0>, 'application/json; charset=utf-8', 62)]
(Background on this error at: https://sqlalche.me/e/20/e3q8)
Looking at the crawlee source code, I see that these errors seem to get "swallowed up" and the transaction is rolled back, causing data loss.
@asynccontextmanager
async def get_session(self, *, with_simple_commit: bool = False) -> AsyncIterator[AsyncSession]:
"""Create a new SQLAlchemy session for this storage."""
async with self._storage_client.create_session() as session:
# For operations where a final commit is mandatory and does not require specific processing conditions
if with_simple_commit:
try:
yield session
await session.commit()
except SQLAlchemyError as e:
logger.warning(f'Error occurred during session transaction: {e}')
await session.rollback()
else:
yield session
I do not see any existing mechanism to retry failed transaction nor any obvious way for me to implement transaction retries in my client code.
Expected behavior
A transient database error during a write operation should either:
- Re-raise the exception after logging, so the caller (and ultimately the crawler's retry logic) can handle it, or
- Retry the transaction internally with backoff, which is especially important for SQLite's common
SQLITE_BUSY/database is lockederrors.
Actual behavior
The error is caught, logged as a warning, and silently discarded. The data is lost with no retry.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
t-toolingIssues with this label are in the ownership of the tooling team.Issues with this label are in the ownership of the tooling team.
Type
Fields
Give feedbackNo fields configured for Bug.