Skip to content

KVM: Enable HA heartbeat on ShareMountPoint#12773

Open
weizhouapache wants to merge 16 commits intoapache:4.22from
weizhouapache:4.22-kvm-ha-on-shared-mount-point
Open

KVM: Enable HA heartbeat on ShareMountPoint#12773
weizhouapache wants to merge 16 commits intoapache:4.22from
weizhouapache:4.22-kvm-ha-on-shared-mount-point

Conversation

@weizhouapache
Copy link
Member

Description

This PR improves KVM HA heartbeat to support SharedMountPoint

Tested

  • add shared mount point pool
  • kvm hosts write heartbeat to the mount point
  • deploy vms
  • set global settingforce.ha to true
  • force shutdown one of the kvm hosts
  • after 3 mins, the VMs are started on other kvm hosts.

Example of heartbeat files

[root@kvm1 ~]# ls -l /mnt/cloudstack*/KVMHA
/mnt/cloudstack1/KVMHA:
total 1
-rw-r--r--. 1 root root 11 Mar  9 12:28 hb-10.1.34.125
-rw-r--r--. 1 root root 11 Mar  9 12:09 hb-10.1.35.76

/mnt/cloudstack2/KVMHA:
total 1
-rw-r--r--. 1 root root 11 Mar  9 12:28 hb-10.1.34.125
-rw-r--r--. 1 root root 11 Mar  9 12:09 hb-10.1.35.76

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)
  • Build/CI
  • Test (unit or integration test code)

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

  • Major
  • Minor

Bug Severity

  • BLOCKER
  • Critical
  • Major
  • Minor
  • Trivial

Screenshots (if appropriate):

How Has This Been Tested?

How did you try to break this feature and the system with this change?

@weizhouapache weizhouapache force-pushed the 4.22-kvm-ha-on-shared-mount-point branch from 296835a to 93b95aa Compare March 9, 2026 12:47
@codecov
Copy link

codecov bot commented Mar 9, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 3.70%. Comparing base (c3d6a8c) to head (a361b28).
⚠️ Report is 5 commits behind head on 4.22.

❗ There is a different number of reports uploaded between BASE (c3d6a8c) and HEAD (a361b28). Click for more details.

HEAD has 1 upload less than BASE
Flag BASE (c3d6a8c) HEAD (a361b28)
unittests 1 0
Additional details and impacted files
@@              Coverage Diff              @@
##               4.22   #12773       +/-   ##
=============================================
- Coverage     17.61%    3.70%   -13.91%     
=============================================
  Files          5917      448     -5469     
  Lines        531402    38028   -493374     
  Branches      64971     7036    -57935     
=============================================
- Hits          93586     1409    -92177     
+ Misses       427262    36432   -390830     
+ Partials      10554      187    -10367     
Flag Coverage Δ
uitests 3.70% <ø> (-0.01%) ⬇️
unittests ?

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@weizhouapache
Copy link
Member Author

@blueoranguran package

@weizhouapache weizhouapache requested a review from Copilot March 9, 2026 16:02
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Enhances KVM HA heartbeat to support SharedMountPoint primary storage by introducing a new heartbeat script and extending HA support checks across KVM storage and monitor components.

Changes:

  • Added a new KVM “SharedMountPoint” heartbeat script (kvmsmpheartbeat.sh) that writes heartbeat files to a mountpoint.
  • Expanded HA-supported storage pool types to include SharedMountPoint in driver, monitor, and pool logic.
  • Added a setType hook to KVMStoragePool and set the pool type after creation / retrieval to ensure HA detection works.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
scripts/vm/hypervisor/kvm/kvmsmpheartbeat.sh New local mountpoint-based heartbeat writer/checker script for SharedMountPoint HA.
plugins/storage/volume/default/.../CloudStackPrimaryDataStoreDriverImpl.java Allows HA support for SharedMountPoint in primary storage capability logic.
plugins/hypervisors/kvm/.../LibvirtStoragePool.java Enables HA support for SharedMountPoint and selects the correct heartbeat script.
plugins/hypervisors/kvm/.../KVMStoragePoolManager.java Sets pool type after creation; exposes getStorageAdaptor.
plugins/hypervisors/kvm/.../KVMStoragePool.java Introduces a default setType method for backward compatibility.
plugins/hypervisors/kvm/.../LibvirtCheckVMActivityOnStoragePoolCommandWrapper.java Ensures retrieved pool has its type set before HA checks.
plugins/hypervisors/kvm/.../KVMHAMonitor.java Expands heartbeat monitoring to include SharedMountPoint pools.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✖️ debian ✔️ suse15. SL-JID 17048

@weizhouapache
Copy link
Member Author

@blueorangutan package

@blueorangutan
Copy link

@weizhouapache a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 17054

@weizhouapache
Copy link
Member Author

@blueorangutan test

@blueorangutan
Copy link

@weizhouapache a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests

@weizhouapache weizhouapache added this to the 4.22.1 milestone Mar 10, 2026
@weizhouapache
Copy link
Member Author

@sureshanaparti @rajujith
added this to 4.22.1 milestone, a small improvement but may benefit several users

cc @DaanHoogland @NuxRo

@blueorangutan
Copy link

[SF] Trillian test result (tid-15586)
Environment: kvm-ol8 (x2), zone: Advanced Networking with Mgmt server ol8
Total time taken: 55643 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr12773-t15586-kvm-ol8.zip
Smoke tests completed. 146 look OK, 3 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File
ContextSuite context=TestListIdsParams>:teardown Error 1.12 test_list_ids_parameter.py
test_01_snapshot_root_disk Error 7.74 test_snapshots.py
test_02_list_snapshots_with_removed_data_store Error 47.53 test_snapshots.py
test_02_list_snapshots_with_removed_data_store Error 47.54 test_snapshots.py
ContextSuite context=TestSnapshotStandaloneBackup>:teardown Error 31.34 test_snapshots.py
test_01_snapshot_usage Error 25.61 test_usage.py
test_01_vpn_usage Error 1.08 test_usage.py

weizhouapache and others added 3 commits March 10, 2026 17:01
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@rajujith rajujith moved this to In Progress in Apache CloudStack 4.22.1 Mar 11, 2026
Copy link
Contributor

@DaanHoogland DaanHoogland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clgtm

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.

Comments suppressed due to low confidence (2)

plugins/hypervisors/kvm/src/main/java/com/cloud/hypervisor/kvm/storage/KVMStoragePoolManager.java:399

  • isPoolSupportHA() now includes SharedMountPoint, so this block will add SharedMountPoint primary pools to _haMonitor. However, deleteStoragePool(...) below still removes pools from _haMonitor only when type == NetworkFilesystem, which can leave SharedMountPoint pools registered in the HA monitor after deletion/disconnect. Update the removal condition to stay consistent with the expanded HA-support set.
    private synchronized KVMStoragePool createStoragePool(String name, String host, int port, String path, String userInfo, StoragePoolType type, Map<String, String> details, boolean primaryStorage) {
        StorageAdaptor adaptor = getStorageAdaptor(type);
        KVMStoragePool pool = adaptor.createStoragePool(name, host, port, path, userInfo, type, details, primaryStorage);
        pool.setType(type);

        // LibvirtStorageAdaptor-specific statement
        if (pool.isPoolSupportHA() && primaryStorage) {
            KVMHABase.HAStoragePool storagePool = new KVMHABase.HAStoragePool(pool, host, path, PoolType.PrimaryStorage);
            _haMonitor.addStoragePool(storagePool);
        }

plugins/hypervisors/kvm/src/main/java/com/cloud/hypervisor/kvm/resource/KVMHAMonitor.java:145

  • The debug log messages in this method are now used for both NetworkFilesystem and SharedMountPoint, but still say "Found NFS storage pool". Also, this "Found ... continuing" message is logged even after the pool is marked for removal (e.g. when storage == null). Adjust the log text/placement to reflect "libvirt storage pool" and only log "found" when the pool is actually present and running.
    private void checkForNotExistingLibvirtStoragePools(Set<String> removedPools, String uuid) {
        try {
            Connect conn = LibvirtConnection.getConnection();
            StoragePool storage = conn.storagePoolLookupByUUIDString(uuid);
            if (storage == null || storage.getInfo().state != StoragePoolState.VIR_STORAGE_POOL_RUNNING) {
                if (storage == null) {
                    logger.debug(String.format("Libvirt storage pool [%s] not found, removing from HA list.", uuid));
                } else {
                    logger.debug(String.format("Libvirt storage pool [%s] found, but not running, removing from HA list.", uuid));
                }

                removedPools.add(uuid);
            }

            logger.debug(String.format("Found NFS storage pool [%s] in libvirt, continuing.", uuid));


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

@sureshanaparti sureshanaparti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clgtm

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@sureshanaparti
Copy link
Contributor

@weizhouapache any doc update required for this?

weizhouapache and others added 3 commits March 12, 2026 10:27
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

weizhouapache and others added 3 commits March 12, 2026 11:01
…/storage/LibvirtStoragePool.java

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.

Comments suppressed due to low confidence (1)

plugins/hypervisors/kvm/src/main/java/com/cloud/hypervisor/kvm/storage/KVMStoragePoolManager.java:399

  • Now that pool.isPoolSupportHA() can be true for SharedMountPoint (via LIBVIRT_STORAGE_POOL_TYPES_WITH_HA_SUPPORT) and such pools will be added to _haMonitor here, the corresponding deleteStoragePool(...) methods in this class still only call _haMonitor.removeStoragePool(uuid) for StoragePoolType.NetworkFilesystem. This can leave stale entries for deleted SharedMountPoint pools and keep running unnecessary heartbeat attempts. Consider updating the delete path to remove HA pools for all HA-supported libvirt storage pool types.
        // LibvirtStorageAdaptor-specific statement
        if (pool.isPoolSupportHA() && primaryStorage) {
            KVMHABase.HAStoragePool storagePool = new KVMHABase.HAStoragePool(pool, host, path, PoolType.PrimaryStorage);
            _haMonitor.addStoragePool(storagePool);
        }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 9 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

weizhouapache and others added 2 commits March 12, 2026 12:54
@weizhouapache weizhouapache requested a review from Copilot March 12, 2026 11:56
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 9 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@weizhouapache
Copy link
Member Author

@blueorangutan package

@blueorangutan
Copy link

@weizhouapache a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@weizhouapache weizhouapache marked this pull request as ready for review March 12, 2026 12:25
@weizhouapache
Copy link
Member Author

thanks @DaanHoogland @sureshanaparti , and Copilot for review

@NuxRo
I think this is ready for testing now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

KV Host HA/instance HA with shared mountpoint

5 participants