Skip to content

perf: improve performance with part sizing#28

Open
designcode wants to merge 4 commits intomainfrom
perf/throughput
Open

perf: improve performance with part sizing#28
designcode wants to merge 4 commits intomainfrom
perf/throughput

Conversation

@designcode
Copy link
Collaborator

@designcode designcode commented Mar 4, 2026

Note

Medium Risk
Changes multipart upload behavior (part sizing/queueing and stream buffering) and introduces refreshable OAuth credential providers for long-running S3 operations, which could impact large transfers or auth edge cases.

Overview
Improves cp, mv, and objects put performance/reliability for larger transfers by introducing calculateUploadParams (tiered multipart part sizes with a 10,000-part safety cap) and applying it across uploads/copies/moves, plus increasing file stream buffer size via highWaterMark.

Updates OAuth-based storage auth to optionally provide a refreshable credentialProvider (and uses it for long-running commands) while keeping short-lived operations on a static token for better client caching. Dependency bump: @tigrisdata/storage to 2.15.2 (plus related lockfile updates).

Written by Cursor Bugbot for commit 76fca5b. This will update automatically on new commits. Configure here.

5MB for <1GB (max parallelism), 16MB for 1-10GB (fewer parts, less overhead), 32MB for >10GB.
Bumped from Node's default 64KB highWaterMark to 1MB. Reduces syscalls from ~80 per 5MB part to ~5.
@designcode designcode changed the title perf: improve performance with tiered sizing perf: improve performance with part sizing Mar 4, 2026
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Autofix Details

Bugbot Autofix prepared fixes for both issues found in the latest run.

  • ✅ Fixed: Recursive getStorageConfig call drops options parameter
    • The auto-login recursion now forwards the original options argument so withCredentialProvider is preserved after interactive login.
  • ✅ Fixed: Tiered sizing not applied to mv.ts moveObject
    • moveObject now uses calculateUploadParams(fileSize) instead of a hardcoded multipart threshold, aligning it with tiered upload sizing behavior.

Create PR

Or push these changes by commenting:

@cursor push 1e11c1c480
Preview (1e11c1c480)
diff --git a/src/auth/s3-client.ts b/src/auth/s3-client.ts
--- a/src/auth/s3-client.ts
+++ b/src/auth/s3-client.ts
@@ -150,7 +150,7 @@
 
   // No valid auth method found — try auto-login in interactive terminals
   if (await triggerAutoLogin()) {
-    return getStorageConfig();
+    return getStorageConfig(options);
   }
   throw new Error(
     'Not authenticated. Please run "tigris login" or "tigris configure" first.'

diff --git a/src/lib/mv.ts b/src/lib/mv.ts
--- a/src/lib/mv.ts
+++ b/src/lib/mv.ts
@@ -10,6 +10,7 @@
 import { getOption } from '../utils/options.js';
 import { getStorageConfig } from '../auth/s3-client.js';
 import { formatSize } from '../utils/format.js';
+import { calculateUploadParams } from '../utils/upload.js';
 import { get, put, remove, list, head } from '@tigrisdata/storage';
 
 async function confirm(message: string): Promise<boolean> {
@@ -362,12 +363,9 @@
     return { error: getError.message };
   }
 
-  // Use multipart for files larger than 100MB
-  const useMultipart = fileSize !== undefined && fileSize > 100 * 1024 * 1024;
-
   // Put to destination
   const { error: putError } = await put(destKey, data, {
-    multipart: useMultipart,
+    ...calculateUploadParams(fileSize),
     onUploadProgress: showProgress
       ? ({ loaded }) => {
           if (fileSize !== undefined && fileSize > 0) {
This Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

multipart: useMultipart,
partSize: useMultipart ? 16 * 1024 * 1024 : undefined,
queueSize: useMultipart ? 8 : undefined,
...calculateUploadParams(fileSize),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remote-to-remote batch copies skip multipart upload sizing

Medium Severity

In copyObject in cp.ts, the fileSize is only fetched via head() when showProgress is true. For batch remote-to-remote copies (the common path), showProgress defaults to false, so fileSize stays undefined and calculateUploadParams(undefined) always returns { multipart: false }. This means large files in batch copies never use multipart upload. The equivalent function in mv.ts was correctly updated to always perform the head() call regardless of showProgress, but cp.ts was not updated the same way.

Additional Locations (1)

Fix in Cursor Fix in Web

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a CopyObject API on the server-side, are we using that for copying objects within the bucket?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn’t aware of it. Is it what we use for renaming in webconsole? Also, can it be used to move object between buckets?

const MAX_PARTS = 10_000; // S3 hard limit
const DEFAULT_QUEUE_SIZE = 10; // match AWS CLI

// Tiered part sizes to balance parallelism vs per-part overhead

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by per-part overhead? The parallelism controls how many parts get uploaded concurrently.


function tieredPartSize(fileSize: number): number {
if (fileSize <= ONE_GB) return 5 * 1024 * 1024; // 5 MB — max parallelism
if (fileSize <= TEN_GB) return 16 * 1024 * 1024; // 16 MB — fewer parts, less overhead

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is no overhead in play when you use 5MB part size with 10GB files as well. The main thing is to make sure that whatever part you choose allows us to upload the file given the limit of 10K parts. So the calculation you need to do is define a minimum part size (such as 5MB), and then make sure we always have < 10K parts regardless of the file size.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants