Skip to content

feat(search): reindex-drift + index health checks in /system/validate#28856

Merged
pmbrull merged 6 commits into
mainfrom
pmbrull/search-reindex-health
Jun 10, 2026
Merged

feat(search): reindex-drift + index health checks in /system/validate#28856
pmbrull merged 6 commits into
mainfrom
pmbrull/search-reindex-health

Conversation

@pmbrull

@pmbrull pmbrull commented Jun 9, 2026

Copy link
Copy Markdown
Member

fix #28849

Screenshot 2026-06-09 at 11 53 08

Summary

Adds a "Search Reindex Status" step to /system/validate that detects when a deployed search index was not built from the current code index-mapping — so operators know a reindex is pending (stale search silently breaks RBAC / Data Quality / facets).

  • IndexMappingVersionTracker.computeDrift() classifies each entity CURRENT / STALE / UNTRACKED by comparing the current code-mapping hash to the hash stored when the index was last (re)built.
  • The stored hash is now stamped at every successful reindex seam, so the signal is reliable — SearchIndexApp.execute() (UI / scheduled / CLI) and SearchRepository.createIndexes() (fresh install / drop-create), gated on a COMPLETED run and only the entities actually rebuilt. The redundant CLI-only stamp in OpenMetadataOperations is removed.
  • The check fails on stale or missing indexes (both reindex-fixable) and additionally reports orphan indexes (zero-alias rebuild leftovers) and a degraded cluster as non-failing warnings.

Replaces an earlier curated-field "canary" check that only inspected a hardcoded subset of fields on a hardcoded subset of entities.

Test Plan

  • IndexMappingVersionTrackerTest (13), SystemRepositoryReindexStatusTest (8), SearchRepositoryBehaviorTest (110) — all green; mvn -pl openmetadata-service spotless:check clean.
  • Manual on a live stack: run a reindex, confirm /system/status "Search Reindex Status" passes and index_mapping_versions is populated; tamper one stored hash and re-check → that entity reported as reindex-pending.

🤖 Generated with Claude Code

Add a "Search Reindex Status" step to /system/validate that detects when a
deployed search index was not built from the current code index-mapping:

- IndexMappingVersionTracker.computeDrift() classifies each entity as
  CURRENT / STALE / UNTRACKED by comparing the current code-mapping hash
  to the hash stored when the index was last (re)built.
- The stored hash is now stamped at every successful reindex seam so the
  signal is reliable: SearchIndexApp.execute() (UI/scheduled/CLI) and
  SearchRepository.createIndexes() (fresh install/drop-create), gated on a
  COMPLETED run and the entities actually (re)built. The redundant CLI-only
  stamp in OpenMetadataOperations is removed.
- The check fails on stale or missing indexes (both reindex-fixable) and
  additionally reports orphan indexes (zero-alias rebuild leftovers) and a
  degraded cluster as non-failing warnings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings June 9, 2026 09:09

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot was unable to review this pull request because the user who requested the review has reached their quota limit.

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

✅ PR checks passed

The linked issue has a description and all required Shipping project fields set. Thanks!

@github-actions github-actions Bot added Ingestion safe to test Add this label to run secure Github workflows on PRs labels Jun 9, 2026
- Resolve index-mapping version from /catalog/VERSION via a shared
  IndexMappingVersionTracker.create() factory instead of the
  always-null project.version sysprop + "1.8.0-SNAPSHOT" fallback,
  removing the duplicated literal from all four call sites.
- Surface drift-computation failure as a failed reindex-status step
  ("Could not determine reindex status") instead of a false all-clear.
- Exclude vectorEmbedding from drift classification when semantic
  search is disabled, mirroring findMissingIndexes.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings June 9, 2026 11:15

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot was unable to review this pull request because the user who requested the review has reached their quota limit.

Resolve conflicts from #28885 (recreate indexes on major/minor upgrade):
- IndexMappingVersionTracker: keep both drift detection
  (computeDrift/classifyState) and version-upgrade full reindex
  (requiresFullReindexForVersionUpgrade); unify subset stamping on
  Collection<String> so Set and List callers all compile.
- OpenMetadataOperations: adopt main's planSmartReindex refactor and
  post-reindex version stamping (re-add shouldUpdateVersions).
- IndexMappingVersionTrackerTest: keep both drift and version-upgrade
  test suites (23 tests pass).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Move reindex-drift version stamping from SearchIndexApp/SearchRepository
into DefaultRecreateHandler, where staged-index promotion happens. Each
entity is stamped at its successful alias swap in finalizeReindex and
promoteEntityIndex, so all promotion paths (single-node createIndexes,
distributed coordinator finalize, distributed per-entity promote) stamp
at the seam instead of a blanket job-end pass.

Add IndexMappingVersionTracker.updateMappingVersion(String) to hash only
the promoted entity rather than rehashing every entity per promotion.

Fixes the prior whole-job COMPLETED gate: partial jobs no longer leave
promoted entities unstamped, and an entity whose alias swap failed is no
longer falsely stamped as current.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings June 10, 2026 08:00

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot was unable to review this pull request because the user who requested the review has reached their quota limit.

…onTracker

Resolve the /catalog/VERSION read and the JSON ObjectMappers once as
static finals instead of per-entity, since stampPromoted now creates a
tracker per promotion. getOpenMetadataServerVersion never throws (returns
"unknown" on failure), so eager static init is safe.

Also document at stampPromoted that the mapping-version stamp tracks
schema currency (the index was built from the current indexMapping.json),
not content completeness, so partial promotions are intentionally stamped
current; an incomplete reindex surfaces in the reindex job stats.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@sonarqubecloud

Copy link
Copy Markdown

@pmbrull pmbrull merged commit 1cd919e into main Jun 10, 2026
58 of 66 checks passed
@pmbrull pmbrull deleted the pmbrull/search-reindex-health branch June 10, 2026 13:45
@gitar-bot

gitar-bot Bot commented Jun 10, 2026

Copy link
Copy Markdown
Code Review ✅ Approved 5 resolved / 5 findings

Implements a robust search index health and reindex-drift validation in /system/validate, ensuring indexing alignment via reliable version stamping. Resolved concerns regarding silent drift failures, redundant version fallbacks, and improper mapping promotion triggers.

✅ 5 resolved
Quality: Hardcoded project-version fallback duplicated in 3 sites

📄 openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/SearchIndexApp.java:82-84 📄 openmetadata-service/src/main/java/org/openmetadata/service/search/SearchRepository.java:347-349 📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/SystemRepository.java:1016-1018
The fallback version literal "1.8.0-SNAPSHOT" (and the "system" updatedBy literal) is duplicated in three new code paths that construct an IndexMappingVersionTracker: SearchIndexApp.stampReindexedMappings (line 82), SearchRepository.stampRecreatedMappings (line 347), and SystemRepository.computeMappingDrift (line 1016). This violates the project's "no magic strings" guideline and is a maintenance hazard: when the release version moves past 1.8.0-SNAPSHOT, any environment that does not set the project.version system property will stamp/compare with a stale hardcoded version. While drift detection itself keys off the mapping hash (not the version) so this won't cause false drift, the persisted version column becomes misleading. Extract a single shared constant (or a small helper) for the default version and the system actor and reuse it across all three call sites.

Edge Case: Drift compute failure silently reports indexes as up-to-date

📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/SystemRepository.java:1012-1026 📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/SystemRepository.java:886-900
computeMappingDrift initializes status to an empty ReindexStatus(new ArrayList<>(), 0) and, on any exception (e.g. DB read of index_mapping_versions fails, mapping load fails), only logs a warning and returns that empty status. Combined with getReindexStatusValidation, an empty drift means reindexNeeded() is false and buildReindexStatusMessage emits "All deployed indexes were built from the current code mappings." with the step marked passed=true. So a failure to actually compute drift is indistinguishable from a genuinely healthy state — the validation can report a clean bill of health while having verified nothing. Since the stated goal of this feature is a reliable reindex-pending signal, consider surfacing the compute failure in the message (e.g. "could not determine reindex status") rather than defaulting to the all-clear text. The same defensive defaulting applies to findOrphanIndexes (returns empty) and isSearchClusterHealthy (returns healthy) on exception, but those are explicitly non-failing warnings so the impact there is lower.

Edge Case: vectorEmbedding may be flagged stale when semantic search off

📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/SystemRepository.java:964-978 📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/SystemRepository.java:1012-1026 📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/SystemRepository.java:861-875
findMissingIndexes deliberately skips the vectorEmbedding index when semantic search is disabled, so it is never added to missingIndexes. As a result computeMappingDrift does NOT remove vectorEmbedding from existingIndexes (line 1019-1020), and computeDrift() classifies it from the full IndexMappingLoader set. If a stored hash exists for vectorEmbedding and its code mapping has since changed, it will be classified STALE and reported in stalePending, causing the entire "Search Reindex Status" step to FAIL even though semantic/vector search is disabled and that index is not in use. This may produce a false-positive validation failure on deployments that ran a reindex while semantic search was enabled and later disabled it. Consider applying the same isVectorEmbeddingEnabled() exclusion to the drift classification (e.g. drop vectorEmbedding from existingIndexes when semantic search is off), mirroring the logic already in findMissingIndexes.

Edge Case: Mapping version stamped even on partial (failed) reindex promotion

📄 openmetadata-service/src/main/java/org/openmetadata/service/search/DefaultRecreateHandler.java:104-118 📄 openmetadata-service/src/main/java/org/openmetadata/service/search/DefaultRecreateHandler.java:204 📄 openmetadata-service/src/main/java/org/openmetadata/service/search/DefaultRecreateHandler.java:386
In finalizeReindex (and the analogous block in promoteEntityIndex), stampPromoted(entityType) runs after any successful alias swap — including the "partial data > no data" path where reindexSuccess == false but the staged index had documents (or an unknown doc count). In that case the entity's mapping hash is recorded as CURRENT, so /system/validate reindex-drift will report the index as healthy even though the reindex did not actually complete and the index content may be incomplete. The PR description states stamping is "gated on a COMPLETED run", but at this seam the gate is promotion success, not reindex success.

This is acceptable if drift detection is intended to track only mapping-schema currency (the index was built with the current mapping), but it can mask an incomplete reindex from operators who rely on the validate check. Consider either passing reindexSuccess through and only stamping on a fully successful reindex, or documenting explicitly that drift tracks schema version and not content completeness.

Performance: Per-entity stampPromoted re-reads VERSION + 4 mapping files each promotion

📄 openmetadata-service/src/main/java/org/openmetadata/service/search/DefaultRecreateHandler.java:416-425 📄 openmetadata-service/src/main/java/org/openmetadata/service/search/IndexMappingVersionTracker.java:42-48 📄 openmetadata-service/src/main/java/org/openmetadata/service/search/IndexMappingVersionTracker.java:163-170 📄 openmetadata-service/src/main/java/org/openmetadata/service/search/IndexMappingVersionTracker.java:241-255
stampPromoted is invoked once per promoted entity and creates a fresh IndexMappingVersionTracker via IndexMappingVersionTracker.create(), which calls VersionUtils.getOpenMetadataServerVersion("/catalog/VERSION") (a classpath resource read) on every call. updateMappingVersion then calls loadMappingForEntity, which constructs a new ObjectMapper() and reads/parses up to 4 language mapping files from the classpath for that entity before hashing. Across a full reindex of ~50 entity types this multiplies the resource I/O and parsing that previously happened once at job end. This is not a tight inner loop (once per entity at promotion time), so impact is modest, but caching the server version and/or reusing a single tracker/ObjectMapper would remove the redundancy.

Options

Display: compact → Showing less information.

Comment with these commands to change:

Compact
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Ingestion safe to test Add this label to run secure Github workflows on PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Search - Add index drift validation & multi-lang tests

3 participants