linstor: add LinstorDataMotionStrategy for live migration + storpool: fix qemu-img copy to NFS by DennisKonrad · Pull Request #1 · LINBIT/cloudstack

DennisKonrad · 2026-03-16T10:17:12Z

Summary

Two fixes for VM migration with Linstor and StorPool storage:

LinstorDataMotionStrategy — new DataMotionStrategy enabling VM live migration to Linstor pools
StorPool qemu-img fix — enable direct I/O for volume copy to NFS secondary storage

Commit 1: LinstorDataMotionStrategy

Problem

CloudStack had no DataMotionStrategy claiming live migrations when the destination is a Linstor pool. Three scenarios were broken:

Source	Destination	What happened
Linstor	Linstor	No strategy claimed it — migration blocked
SMP/NFS	Linstor	`KvmNonManagedDMS` claimed it but generated wrong DiskType=FILE, DriverType=QCOW2, and used the resource group name as device path instead of `/dev/drbd/by-res/...`
StorPool	Linstor	No strategy claimed it — migration blocked

Solution

New LinstorDataMotionStrategy bean that returns HIGHEST priority when all destination pools are Linstor. Works for any source storage type (Linstor, StorPool, NFS, SMP, ...).

Live migration flow per volume:

Create destination VolumeVO in DB
If cross-storage: spawn new DRBD resource via resourceGroupSpawn
Make resource available on destination host (resourceMakeAvailableOnNode)
Set allow-two-primaries (ResourceDefinition or ResourceConnection level, depending on topology)
Build MigrateDiskInfo with DiskType=BLOCK, DriverType=RAW, destPath=/dev/drbd/by-res/<rscName>/0
Send PrepareForMigrationCommand + MigrateCommand (with migrateStorageManaged=true)

Post-migration: UUID swap, source cleanup, allow-two-primaries removal. On failure: full rollback of Linstor resources and DB records.

Offline migration: Returns CANT_HANDLE — existing paths (AncientDMS) continue to work. Native Linstor offline copy can be added later.

Files

NEW plugins/storage/volume/linstor/.../motion/LinstorDataMotionStrategy.java
MOD plugins/storage/volume/linstor/.../spring-storage-volume-linstor-context.xml (bean registration)

Commit 2: StorPool qemu-img direct I/O

Problem

Offline migration of StorPool volumes to other storage (e.g. Linstor) fails:

qemu-img: error while writing sector 4202495: Invalid argument

This happens in StorPoolCopyVolumeToSecondaryCommandWrapper when copying a StorPool snapshot to NFS secondary storage via qemu-img convert.

Solution

Changed new QemuImg(timeout) to new QemuImg(timeout, false, true):

skipZero=false: NFS target files are NOT pre-zeroed. Enabling --target-is-zero would cause silent data corruption by skipping zero-filled sectors.
noCache=true: Enables direct I/O (-t none), bypassing kernel page cache. Fixes EINVAL errors on certain NFS configurations.

Files

MOD plugins/storage/volume/storpool/.../wrapper/StorPoolCopyVolumeToSecondaryCommandWrapper.java

Add a new DataMotionStrategy implementation that enables VM live migration when the destination storage pool is Linstor (DRBD). Without this strategy, CloudStack's storage migration framework had no code path to handle live migrations *to* Linstor pools, leaving three scenarios broken: - Linstor -> Linstor: blocked (no strategy claimed it) - SMP -> Linstor: DEFECT (KvmNonManagedDMS claimed it but generated wrong DiskType=FILE/DriverType=QCOW2 and an invalid device path using the resource group name instead of a DRBD block device path) - StorPool -> Linstor: blocked (no strategy claimed it) How strategy selection works: CloudStack iterates all DataMotionStrategy beans and picks the one returning the highest StrategyPriority from canHandle(). The existing strategies return: - StorPoolDMS: HIGHEST only when ALL dest pools are StorPool - KvmNonManagedDMS: HYPERVISOR only for {NFS, SMP, Filesystem} - StorageSystemDMS: only for managed (isManaged=true) pools - AncientDMS: DEFAULT (fallback, copies via secondary storage) LinstorDataMotionStrategy returns HIGHEST when ALL destination pools are Linstor, giving it priority over KvmNonManagedDMS (HYPERVISOR=2) and AncientDMS (DEFAULT=1), while not conflicting with StorPoolDMS. canHandle semantics: Offline (DataObject, DataObject): Always returns CANT_HANDLE. Offline volume copies continue to use existing paths (AncientDMS or driver canCopy). A native Linstor offline copy (e.g. DRBD clone) can be added in a future commit. Live (Map<VolumeInfo,DataStore>, Host, Host): Returns HIGHEST when ALL destination DataStores are Linstor pools. The source pools can be anything (Linstor, StorPool, SMP, NFS, ...), enabling cross-storage live migration *to* Linstor. Live migration flow (copyAsync with volumeMap): For each volume in the migration set: 1. Create a destination VolumeVO record in the database (duplicateVolumeOnAnotherStorage). 2. If cross-storage (src is not Linstor, or different Linstor controller): create a new DRBD resource via the Linstor REST API (resourceGroupSpawn on the destination pool's resource group). 3. Ensure the resource is available on the destination KVM host (resourceMakeAvailableOnNode). For same-controller Linstor->Linstor, DRBD already has the data replicated so this is a lightweight diskless attach or no-op. 4. Set DRBD allow-two-primaries so both source and destination hosts can have the device open read-write simultaneously during migration. Uses ResourceDefinition-level properties when both nodes are diskless (DRBD client topology), or ResourceConnection-level properties when nodes are hyperconverged (have local disks). 5. Build MigrateDiskInfo with DiskType=BLOCK, DriverType=RAW, Source=DEV, and destPath=/dev/drbd/by-res/<rscName>/0. This tells libvirt's replaceStorage() to modify the VM's disk XML for block-copy migration. 6. Send PrepareForMigrationCommand to destination host, then MigrateCommand (with migrateStorageManaged=true) to source host. Libvirt performs the actual block copy using VIR_MIGRATE_NON_SHARED_DISK. Post-migration success: - Remove allow-two-primaries from all resources - Swap volume UUIDs between source and destination (updateUuid) - Destroy and expunge source volumes - Update snapshot references to point to new volume IDs Post-migration failure: - Remove allow-two-primaries - Delete destination Linstor resources unconditionally (not just diskless) - Delete resource definitions if no resources remain - Expunge destination volumes (DB records) - Rollback PrepareForMigration on destination host Error handling: - On early failure (before MigrateCommand), handlePostMigration(false) is called from the catch block to ensure Linstor resources are cleaned up and not left orphaned. - viewResources API errors are logged as warnings and creation is attempted regardless (rather than silently assuming resource absence). - applyAuxProps errors are logged but non-fatal. Spring context: Register the LinstorDataMotionStrategy bean in spring-storage-volume-linstor-context.xml so StorageStrategyFactoryImpl discovers it via auto-wiring. Files: - NEW: plugins/storage/volume/linstor/src/main/java/org/apache/ cloudstack/storage/motion/LinstorDataMotionStrategy.java - MOD: plugins/storage/volume/linstor/src/main/resources/META-INF/ cloudstack/storage-volume-linstor/ spring-storage-volume-linstor-context.xml

…rage Fix qemu-img convert failures when copying StorPool volumes to secondary storage (NFS) during offline volume migration. Symptom: Offline migration of a StorPool volume to another primary storage (e.g. Linstor) fails with: qemu-img: error while writing sector 4202495: Invalid argument This happens in StorPoolCopyVolumeToSecondaryCommand which creates a temporary StorPool snapshot, attaches it as a block device, and copies it via qemu-img convert to a file on NFS secondary storage. Root cause: The basic QemuImg(timeout) constructor creates qemu-img convert commands without any I/O mode flags. On certain NFS configurations, buffered I/O can cause EINVAL errors at specific sector boundaries when writing large volumes from a raw block device source. Fix: Use the 3-parameter constructor QemuImg(timeout, skipZero=false, noCache=true): - skipZero=false: Do NOT enable --target-is-zero. This flag is only safe when the target device is guaranteed pre-zeroed (e.g. thin-provisioned block devices like LVM_THIN or ZFS_THIN). NFS files are NOT pre-zeroed, so enabling this flag would cause silent data corruption by skipping zero-filled sectors that the target still contains stale data for. The Linstor storage adaptor handles this correctly by checking LinstorUtil.resourceSupportZeroBlocks() before enabling skipZero. - noCache=true: Enable direct I/O (-t none) which bypasses the kernel page cache. This ensures writes are flushed directly to the NFS server, avoiding cache-related EINVAL errors at sector boundaries and improving reliability for large volume copies. Impact: Only affects the StorPoolCopyVolumeToSecondaryCommandWrapper code path, which is used during offline volume migration when StorPool is the source and the copy goes through secondary (NFS) storage. StorPool-to-StorPool copies use native StorPool cloning and are not affected. Files: - MOD: plugins/storage/volume/storpool/src/main/java/com/cloud/hypervisor/ kvm/resource/wrapper/StorPoolCopyVolumeToSecondaryCommandWrapper.java

DennisKonrad force-pushed the linstor-backport-4.17.2.0 branch from 43b4d00 to 9af031e Compare March 16, 2026 12:18

DennisKonrad added 2 commits March 16, 2026 13:25

DennisKonrad force-pushed the linstor-backport-4.17.2.0 branch from 9af031e to 7acac7a Compare March 16, 2026 12:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

linstor: add LinstorDataMotionStrategy for live migration + storpool: fix qemu-img copy to NFS#1

linstor: add LinstorDataMotionStrategy for live migration + storpool: fix qemu-img copy to NFS#1
DennisKonrad wants to merge 2 commits intoLINBIT:linstor-backport-4.17.2.0from
DennisKonrad:linstor-backport-4.17.2.0

DennisKonrad commented Mar 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

DennisKonrad commented Mar 16, 2026

Summary

Commit 1: LinstorDataMotionStrategy

Problem

Solution

Files

Commit 2: StorPool qemu-img direct I/O

Problem

Solution

Files

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant