Skip to content

Conversation

@carloea2
Copy link
Contributor

@carloea2 carloea2 commented Jan 29, 2026

What changes were proposed in this PR?

This PR adds dataset resumable multipart upload experience end-to-end (backend + frontend) by tightening concurrency rules, adding restart/resume, and supporting an automatic restart when upload configuration changes.

Backend changes:

  • Added a new multipart operation type: type=list
    • Lists active multipart upload file paths (within the physical address expiration window) so clients can discover resumable uploads.
  • Updated type=init to deny concurrent uploads for the same file
    • Uses DB row locking (FOR UPDATE NOWAIT) to fail fast with 409 CONFLICT if another client is currently uploading/initializing parts for the same file.
  • Updated init response to return all missing parts , plus completedPartsCount
    • Enables clients to resume by uploading only missing part numbers, without trials.
    • Added restart = false so frontend can send the param to specify if the upload session starts from 0.
  • Added a delete-then-restart flow when the requested fileSizeBytes / partSizeBytes changes from an existing session
    • If an existing session is found but its config mismatches the incoming request:
      • Delete the DB upload session (and part rows via cascade),
      • Abort the previous lakeFS multipart upload,
      • Start a fresh session with the new parameters.
  • Added/updated tests covering:
    • type=list behavior
    • init concurrency denial (conflict when rows are locked)
    • restart behavior when fileSizeBytes or partSizeBytes changes

Frontend changes:

  • Added a Resume confirmation dialog for multipart uploads
    • When resumable uploads are detected, the dialog lets the user choose which items to:
      • Resume (continue by uploading missing parts, if impossible automatic restart),
      • Restart (ignore a detected resumable upload and start it from 0).

This results in a safer and clearer resumable upload UX:

  • No silent concurrent uploads for the same file.
  • Full visibility into which parts are missing.
  • A deterministic restart path when upload parameters change.
  • Explicit user choice in the UI about what to resume vs restart.

Any related issues, documentation, discussions?

How was this PR tested?

Backend:

  • Added automated tests covering:
    • type=list
    • type=init concurrency denial (409 on concurrent lock)
    • missing parts reporting (returns all missing parts, sorted)
    • delete-then-restart behavior when fileSizeBytes or partSizeBytes changes

Frontend:

  • Manually verified resume dialog behavior:
    • Detected recoverable uploads render in the dialog
    • “Recover” continues uploading only missing parts and triggers the restart path when parameters differ
    • “Skip” leaves the upload untouched and proceeds without recovering

Closes #4183

Was this PR authored or co-authored using generative AI tooling?

ChatGPT co-authored

@github-actions github-actions bot added ddl-change Changes to the TexeraDB DDL fix frontend Changes related to the frontend GUI service labels Jan 29, 2026
@aicam aicam self-requested a review January 29, 2026 21:06
@carloea2 carloea2 changed the title feat(backend): Recoverable Uploads feat(backend): Resumable Uploads Jan 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

common ddl-change Changes to the TexeraDB DDL fix frontend Changes related to the frontend GUI service

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Resumable multipart dataset upload

1 participant