Skip to content

feat: chunked image push via OCI compatible push#2760

Merged
markphelps merged 26 commits intomainfrom
mphelps/chunked-image-push
Feb 26, 2026
Merged

feat: chunked image push via OCI compatible push#2760
markphelps merged 26 commits intomainfrom
mphelps/chunked-image-push

Conversation

@markphelps
Copy link
Contributor

@markphelps markphelps commented Feb 23, 2026

Summary

Replace Docker's monolithic ImagePush with chunked layer uploads, gated behind COG_PUSH_OCI=1. The image is exported from Docker via ImageSave, loaded lazily using go-containerregistry's tarball.ImageFromPath (layers are read on-demand from a temp file, not into memory), then each layer is pushed through the existing RegistryClient.WriteLayer chunked upload path — the same infrastructure weight artifacts already use. This bypasses request body limits that block Docker's native push for large layers.

Off by default. Set COG_PUSH_OCI=1 to enable. Falls back to Docker push on any non-fatal error (except context cancellation/timeout and auth errors), so there's zero regression risk.

What changed

Chunked image push

  • ImagePusher tries OCI chunked push first when COG_PUSH_OCI=1, falls back to Docker push
  • Exports image via docker.ImageSave() → tar → lazy v1.Image (via tarball.ImageFromPath) → concurrent layer upload
  • shouldFallbackToDocker() blocks fallback on context cancellation/timeout and authentication errors (401/403) — auth errors won't be fixed by Docker fallback, so the original error is surfaced directly
  • Added ImageSave(ctx, imageRef) to the Docker Command interface
  • ImagePusher.Push() takes *ImageArtifact directly (not a raw string ref), preserving the resolved reference from the Model type

Unified push infrastructure

  • Single PushProgress type replaces separate image/weight progress types
  • writeLayerWithProgress() helper deduplicates progress channel boilerplate
  • GetPushConcurrency() (default 5, COG_PUSH_CONCURRENCY) shared by image layers, weight pushes, and CLI progress
  • BundlePusher.pushWeights() uses errgroup with concurrency limit
  • NewBundlePusher takes docker.Command + registry.Client and creates both sub-pushers (image + weight) internally — unified construction

Progress rendering

  • Replaced the mpb progress bar library with Docker's jsonmessage.DisplayJSONMessagesStream — the same rendering code used by docker push
  • Progress output now matches docker push exactly: layer ID, status, and progress bar on each line
  • Handles terminal resizing correctly: each line is erased and rewritten individually (ESC[2K + per-line cursor up/down), rather than relying on a bulk cursor-up count that desyncs when lines wrap after a resize
  • Terminal width is re-queried via ioctl(TIOCGWINSZ) on every render, so progress bars adapt dynamically
  • ProgressWriter adapter in pkg/docker/progress.go converts PushProgress callbacks into JSON-encoded JSONMessage structs fed to DisplayJSONMessagesStream via an io.Pipe
  • Retry messages are only logged via console.Warnf in non-TTY mode — in TTY mode the progress writer handles display inline, avoiding duplicate output
  • Removed mpb dependency entirely

HTTP/1.1 transport for all registry operations

  • All registry operations (push, inspect, HEAD, etc.) now use HTTP/1.1 instead of allowing HTTP/2 negotiation
  • Problem: Go's default http.Transport negotiates HTTP/2 via TLS ALPN. High-throughput HTTP/2 suffers from head-of-line blocking when multiplexing large blob uploads on a single connection. Additionally, certain CDN/proxy edges can send RST_STREAM INTERNAL_ERROR on large requests — killing uploads before they reach the origin.
  • Fix: RegistryClient stores an http1OnlyTransport() and passes it to all remote.* calls via a remoteOptions() helper. Multiple concurrent HTTP/1.1 connections outperform a single multiplexed HTTP/2 connection for large blob uploads.
  • HTTP/2 stream errors (RST_STREAM) are also recognized by isRetryableError() as a belt-and-suspenders measure, so they trigger the existing retry-from-scratch logic in WriteLayer.

Retry and resource management

  • Retry jitter is now properly randomized (rand.Float64()) to avoid thundering herd when multiple clients retry simultaneously
  • Chunk buffers (up to 96 MB each) are pooled via sync.Pool to reduce memory pressure when pushing multiple layers concurrently

Server-driven chunk sizing (OCI-Chunk-Min/Max-Length)

  • initiateUpload() now parses OCI-Chunk-Min-Length and OCI-Chunk-Max-Length headers from the registry's 202 response
  • The server-advertised maximum always takes precedence over client defaults — chunk size is set to max - 64KB margin to stay safely under the limit
  • The result is clamped to be at least the server-advertised minimum
  • COG_PUSH_DEFAULT_CHUNK_SIZE is only used as a fallback when the registry doesn't advertise limits
  • New uploadSession struct carries location + chunk constraints from initiation through to upload

Registry config

  • Extracted getDefaultChunkSize() and getMultipartThreshold() env var helpers into pkg/registry/config.go
  • Default chunk size: 96 MB (multiple of 8), stays under common CDN/proxy request body limits (used only when registry doesn't advertise OCI-Chunk-Max-Length)
  • Default multipart threshold: 128 MB (higher than chunk size so blobs that fit in a single chunk skip multipart entirely)

Cleanup

  • Deleted pkg/oci/ package — OCI image loading inlined into ImagePusher
  • Deleted tools/uploader/ — unused S3 multipart uploader (363 lines, zero imports)
  • Removed Pusher interface, OCIImagePusher, pushImageWithFallback() — consolidated into single ImagePusher
  • Unexported DefaultFactorydefaultFactory, NewImagePushernewImagePusher

Environment variables

Variable Default Description
COG_PUSH_OCI unset (off) Set to 1 to enable OCI chunked image push
COG_PUSH_CONCURRENCY 5 Max concurrent layer/weight uploads
COG_PUSH_DEFAULT_CHUNK_SIZE 100663296 (96 MiB) Fallback chunk size when registry doesn't advertise OCI-Chunk-Max-Length
COG_PUSH_MULTIPART_THRESHOLD 134217728 (128 MiB) Blobs above this size use chunked upload; below this they upload in a single request

@markphelps markphelps force-pushed the mphelps/chunked-image-push branch 2 times, most recently from 8b01c53 to 88ff363 Compare February 23, 2026 20:45
Replace Docker's monolithic ImagePush with a chunked push path for
container image layers. Images are exported from the Docker daemon to
OCI layout via ImageSave, then pushed through the registry client's
existing chunked upload infrastructure (WriteLayer with 256MB chunks).

This bypasses the ~500MB Cloudflare Workers request body limit that
blocks Docker's native push for large layers.

Key changes:
- Add OCIImagePusher to pkg/registry/ with concurrent layer uploads
- Export images from Docker daemon to OCI layout via ImageSave + tarball
- Integrate into Resolver.Push and BundlePusher with Docker push fallback
- Add ImageSave method to command.Command interface
- Delete unused tools/uploader/ S3 multipart code (363 lines)
…pkg/model

Move OCI layout utilities to pkg/oci/, extract registry transport config
(chunk size, multipart threshold env vars) to pkg/registry/config.go, and
relocate OCIImagePusher to pkg/model/ alongside ImagePusher and WeightPusher.

- pkg/oci/: pure OCI format utilities (Docker tar <-> OCI layout), no registry deps
- pkg/registry/config.go: configurable chunk size and multipart threshold
- pkg/model/oci_image_pusher.go: push orchestration with shared pushImageWithFallback()
- Deduplicate fallback logic between resolver.go and pusher.go
- Add error discrimination: no fallback on auth errors or context cancellation
- Create OCIImagePusher once in NewResolver, not per-push call
…weight pushers

- Unify ImagePushProgress and WeightPushProgress into shared PushProgress type
- Extract writeLayerWithProgress() helper to deduplicate progress channel
  boilerplate between OCIImagePusher and WeightPusher
- Unify push concurrency: both image layer pushes and weight pushes use
  GetPushConcurrency() (default 4, overridable via COG_PUSH_CONCURRENCY)
- Fix BundlePusher.pushWeights() which had no concurrency limit (launched
  all goroutines at once); now uses errgroup.SetLimit
- Implement auth error detection in shouldFallbackToDocker() to match its
  documented behavior (don't fall back on UNAUTHORIZED/DENIED errors)
String-based error detection is fragile. Fall back to Docker push on any
error except context cancellation/timeout.
Replace ~200 lines of custom ANSI escape progress rendering with the mpb
(multi-progress-bar) library, which was already a dependency but unused.

mpb handles TTY detection, cursor management, concurrent bar updates, and
size formatting natively. Retry status is shown via a dynamic decorator.
…etTotal completion

When bars are created with total > 0, mpb sets triggerComplete=true
internally. This causes SetTotal(n, true) to early-return without
triggering completion, so bars never finish and p.Wait() deadlocks.

Creating bars with total=0 leaves triggerComplete=false, allowing
explicit completion via SetTotal(current, true) after push finishes.
The real total is still set dynamically via ProgressFn callbacks.
Merge OCIImagePusher (OCI chunked push) and the old ImagePusher (Docker
push) into a single ImagePusher type that tries OCI first and falls back
to Docker push on non-fatal errors.

- ImagePusher.Push() handles OCI→Docker fallback internally
- Delete OCIImagePusher type and oci_image_pusher.go
- BundlePusher takes *ImagePusher directly instead of separate oci/docker pushers
- Resolver stores single imagePusher field instead of ociPusher
- Remove dead Pusher interface
- Consolidate tests into image_pusher_test.go
ImagePusher now calls p.docker.ImageSave() directly instead of going
through the oci.ImageSaveFunc indirection. The OCI layout export logic
is inlined into ImagePusher.ociPush(). The pkg/oci package is deleted
entirely since it had no other consumers.
OCI push is now opt-in rather than always-on when a registry client
is present. Requires COG_PUSH_OCI=1 to activate.
Signed-off-by: Mark Phelps <mphelps@cloudflare.com>
- Add dynamic mpb progress bars for per-layer upload progress during OCI push
- Wire ImageProgressFn through PushOptions → Resolver → ImagePusher
- Force HTTP/1.1 for registry chunked uploads to avoid HTTP/2 RST_STREAM errors
- Add HTTP/2 stream errors to isRetryableError for retry resilience
- Reduce default chunk size from 256MB to 95MB to stay under CDN body limits
@markphelps markphelps force-pushed the mphelps/chunked-image-push branch from 9930c18 to cd606d8 Compare February 24, 2026 17:47
@markphelps markphelps marked this pull request as ready for review February 24, 2026 19:01
@markphelps markphelps requested a review from a team as a code owner February 24, 2026 19:01
@markphelps markphelps marked this pull request as draft February 25, 2026 16:54
Parse OCI-Chunk-Min-Length and OCI-Chunk-Max-Length headers from the
registry's upload initiation response (POST /v2/.../blobs/uploads/).
The server-advertised maximum always takes precedence over client
defaults, and the result is clamped to be at least the server minimum.

Rename COG_PUSH_CHUNK_SIZE to COG_PUSH_DEFAULT_CHUNK_SIZE to clarify
that it is only a fallback for registries that don't advertise limits.
Replace the mpb progress bar library with Docker's jsonmessage rendering
(the same code used by `docker push`) for OCI layer and weight upload
progress. This fixes terminal corruption when the terminal is resized
during a push.

Root cause: mpb writes all bars then uses a bulk cursor-up (CUU N) to
reposition. When the terminal shrinks, previously rendered lines wrap to
occupy more visual lines, but the cursor-up count stays at the logical
count, leaving ghost copies of progress bars on screen.

Docker's jsonmessage avoids this by erasing and rewriting each line
individually (ESC[2K + per-line cursor up/down), and re-querying
terminal width on every render via ioctl(TIOCGWINSZ).

Also removes the mpb dependency entirely from go.mod.
…back

Strip HTML response bodies from transport errors (e.g., Cloudflare 413
pages) before displaying to user. Add OnFallback callback to close the
progress writer before Docker push starts, preventing stale OCI progress
bars from lingering above Docker's output.
@markphelps
Copy link
Contributor Author

Latest: sanitize error output and clear progress on Docker fallback

Two fixes for terminal output quality when OCI push fails and falls back to Docker push:

  1. HTML response bodies no longer leak into error messages — When a registry/CDN returns an HTML error page (e.g., Cloudflare 413), transport.Error includes the entire raw body in its Error() string. Added sanitizeError() that extracts just the HTTP status code and text (e.g., HTTP 413 Request Entity Too Large) instead of dumping a full HTML page.

  2. OCI progress bars are cleared before Docker fallback starts — Added OnFallback callback that closes the progressWriter before Docker push begins its own jsonmessage output. Without this, stale OCI progress lines remained on screen above Docker's progress bars.

Both fixes are covered by new tests (TestSanitizeError, TestImagePusher_OnFallback).

@markphelps markphelps marked this pull request as ready for review February 25, 2026 20:27
mfainberg-cf
mfainberg-cf previously approved these changes Feb 25, 2026
@bfirsh
Copy link
Member

bfirsh commented Feb 25, 2026

This is incredibly exciting.

- Remove unused OCI layout directory creation that doubled disk I/O (C1)
- Randomize retry jitter to avoid thundering herd (W2)
- Skip Docker fallback on 401/403 auth errors since they'd fail identically (W3)
- Pool chunk buffers via sync.Pool to reduce memory pressure (W4)
- Suppress duplicate retry log messages in TTY mode (S5)
@markphelps markphelps changed the title feat: chunked image push via OCI layout feat: chunked image push via OCI registry API Feb 25, 2026
@markphelps markphelps changed the title feat: chunked image push via OCI registry API feat: chunked image push via OCI compatible push Feb 25, 2026
Copy link
Member

@michaeldwan michaeldwan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strong overall, but a couple of design quirks that would be easy now to correct so the spirit of the OCI artifact & model resolver refactor isn't lost.

- Fix multipart threshold/chunksize inversion: raise DefaultMultipartThreshold
  from 50MB to 128MB so blobs that fit in a single chunk avoid unnecessary
  multipart overhead
- Use HTTP/1.1 for all registry operations, not just chunked uploads, to avoid
  HTTP/2 head-of-line blocking and stream errors on large uploads
- Move ProgressWriter from pkg/cli to pkg/docker since it wraps Docker's
  jsonmessage rendering and belongs with Docker concerns
- Replace manual semaphore+WaitGroup in weights push with errgroup for
  bounded concurrency and first-error cancellation
- Refactor ImagePusher.Push to accept *ImageArtifact instead of a raw string,
  preserving the resolved reference from the Model type
- Unify BundlePusher construction: NewBundlePusher now takes docker+registry
  clients and creates both sub-pushers internally
- Clarify that tarball.ImageFromPath is lazy (reads layers on-demand from the
  temp file, not into memory)
@markphelps
Copy link
Contributor Author

Addressed review feedback in 6e0ed39

Worked through all unresolved comments from @mfainberg-cf and @michaeldwan. Here's what changed:

Threshold/ChunkSize inversion (pkg/registry/config.go)

Raised DefaultMultipartThreshold from 50MB → 128MB so it's above DefaultChunkSize (95MB). Blobs that fit in a single chunk now skip multipart entirely, avoiding unnecessary overhead.

HTTP/1.1 everywhere (pkg/registry/registry_client.go)

Per the discussion about HTTP/2 head-of-line blocking — all registry operations now use HTTP/1.1, not just chunked uploads. Added a transport field to RegistryClient and a remoteOptions() helper so every remote.* call uses the same HTTP/1.1 transport.

Move ProgressWriter to docker package (pkg/cli/progress.gopkg/docker/progress.go)

Moved and exported as docker.ProgressWriter since it's built on Docker's jsonmessage rendering. CLI now imports it from there.

Errgroup in weights push (pkg/cli/weights.go)

Replaced the manual semaphore + WaitGroup + results channel with errgroup.Group + g.SetLimit(), matching the pattern already used in pusher.go. First error cancels remaining pushes.

ImagePusher.Push takes *ImageArtifact instead of string (pkg/model/image_pusher.go)

Addresses the regression concern — Push() now takes the artifact directly instead of re-parsing a string ref. Removed the now-redundant PushArtifact() method. Callers pass the already-resolved *ImageArtifact from the Model.

Unified BundlePusher construction (pkg/model/pusher.go)

NewBundlePusher now takes docker.Command + registry.Client and creates both sub-pushers (image + weight) internally. No more passing a pre-built ImagePusher from outside.

Memory concern — already handled

tarball.ImageFromPath is lazy — it indexes the tar and reads layer data on-demand from the temp file during Compressed() calls. Added a comment clarifying this. The image data is never held in memory all at once.

All tests pass, golangci-lint reports 0 issues.

Signed-off-by: Mark Phelps <mphelps@cloudflare.com>
Replace the variadic struct slice ([]ImagePushOptions) with idiomatic
functional options (WithProgressFn, WithOnFallback). The resolved config
struct is now unexported, and callers compose options cleanly:

  pusher.Push(ctx, artifact, WithProgressFn(fn), WithOnFallback(fn))
…mments

Remove isHTTP2StreamError() and its test cases — HTTP/2 is no longer
possible since we force HTTP/1.1 on all registry operations. Fix stale
comments referencing 95 MB chunk size (now 96 MB), 50 MB multipart
threshold (now 128 MiB), and concurrency 4 (now 5).
- Remove outdated 'Requires registry client' comment from canOCIPush
  (nil registry is no longer a concern)
- Simplify sanitizeError doc comment
- Add context to DefaultPushConcurrency matching Docker's default
- Document why pooled chunk buffers don't need zeroing
markphelps and others added 2 commits February 26, 2026 13:27
Signed-off-by: Mark Phelps <mphelps@cloudflare.com>
Copy link
Member

@michaeldwan michaeldwan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, concerns addressed, ship it!

@markphelps markphelps merged commit dfa4fc3 into main Feb 26, 2026
31 checks passed
@markphelps markphelps deleted the mphelps/chunked-image-push branch February 26, 2026 19:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants