feat(search): reindex-drift + index health checks in /system/validate#28856
Conversation
Add a "Search Reindex Status" step to /system/validate that detects when a deployed search index was not built from the current code index-mapping: - IndexMappingVersionTracker.computeDrift() classifies each entity as CURRENT / STALE / UNTRACKED by comparing the current code-mapping hash to the hash stored when the index was last (re)built. - The stored hash is now stamped at every successful reindex seam so the signal is reliable: SearchIndexApp.execute() (UI/scheduled/CLI) and SearchRepository.createIndexes() (fresh install/drop-create), gated on a COMPLETED run and the entities actually (re)built. The redundant CLI-only stamp in OpenMetadataOperations is removed. - The check fails on stale or missing indexes (both reindex-fixable) and additionally reports orphan indexes (zero-alias rebuild leftovers) and a degraded cluster as non-failing warnings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
✅ PR checks passedThe linked issue has a description and all required Shipping project fields set. Thanks! |
- Resolve index-mapping version from /catalog/VERSION via a shared
IndexMappingVersionTracker.create() factory instead of the
always-null project.version sysprop + "1.8.0-SNAPSHOT" fallback,
removing the duplicated literal from all four call sites.
- Surface drift-computation failure as a failed reindex-status step
("Could not determine reindex status") instead of a false all-clear.
- Exclude vectorEmbedding from drift classification when semantic
search is disabled, mirroring findMissingIndexes.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Resolve conflicts from #28885 (recreate indexes on major/minor upgrade): - IndexMappingVersionTracker: keep both drift detection (computeDrift/classifyState) and version-upgrade full reindex (requiresFullReindexForVersionUpgrade); unify subset stamping on Collection<String> so Set and List callers all compile. - OpenMetadataOperations: adopt main's planSmartReindex refactor and post-reindex version stamping (re-add shouldUpdateVersions). - IndexMappingVersionTrackerTest: keep both drift and version-upgrade test suites (23 tests pass). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Move reindex-drift version stamping from SearchIndexApp/SearchRepository into DefaultRecreateHandler, where staged-index promotion happens. Each entity is stamped at its successful alias swap in finalizeReindex and promoteEntityIndex, so all promotion paths (single-node createIndexes, distributed coordinator finalize, distributed per-entity promote) stamp at the seam instead of a blanket job-end pass. Add IndexMappingVersionTracker.updateMappingVersion(String) to hash only the promoted entity rather than rehashing every entity per promotion. Fixes the prior whole-job COMPLETED gate: partial jobs no longer leave promoted entities unstamped, and an entity whose alias swap failed is no longer falsely stamped as current. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…onTracker Resolve the /catalog/VERSION read and the JSON ObjectMappers once as static finals instead of per-entity, since stampPromoted now creates a tracker per promotion. getOpenMetadataServerVersion never throws (returns "unknown" on failure), so eager static init is safe. Also document at stampPromoted that the mapping-version stamp tracks schema currency (the index was built from the current indexMapping.json), not content completeness, so partial promotions are intentionally stamped current; an incomplete reindex surfaces in the reindex job stats. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Code Review ✅ Approved 5 resolved / 5 findingsImplements a robust search index health and reindex-drift validation in /system/validate, ensuring indexing alignment via reliable version stamping. Resolved concerns regarding silent drift failures, redundant version fallbacks, and improper mapping promotion triggers. ✅ 5 resolved✅ Quality: Hardcoded project-version fallback duplicated in 3 sites
✅ Edge Case: Drift compute failure silently reports indexes as up-to-date
✅ Edge Case: vectorEmbedding may be flagged stale when semantic search off
✅ Edge Case: Mapping version stamped even on partial (failed) reindex promotion
✅ Performance: Per-entity stampPromoted re-reads VERSION + 4 mapping files each promotion
OptionsDisplay: compact → Showing less information. Comment with these commands to change:
Was this helpful? React with 👍 / 👎 | Gitar |



fix #28849
Summary
Adds a "Search Reindex Status" step to
/system/validatethat detects when a deployed search index was not built from the current code index-mapping — so operators know a reindex is pending (stale search silently breaks RBAC / Data Quality / facets).IndexMappingVersionTracker.computeDrift()classifies each entity CURRENT / STALE / UNTRACKED by comparing the current code-mapping hash to the hash stored when the index was last (re)built.SearchIndexApp.execute()(UI / scheduled / CLI) andSearchRepository.createIndexes()(fresh install / drop-create), gated on aCOMPLETEDrun and only the entities actually rebuilt. The redundant CLI-only stamp inOpenMetadataOperationsis removed.Replaces an earlier curated-field "canary" check that only inspected a hardcoded subset of fields on a hardcoded subset of entities.
Test Plan
IndexMappingVersionTrackerTest(13),SystemRepositoryReindexStatusTest(8),SearchRepositoryBehaviorTest(110) — all green;mvn -pl openmetadata-service spotless:checkclean./system/status"Search Reindex Status" passes andindex_mapping_versionsis populated; tamper one stored hash and re-check → that entity reported as reindex-pending.🤖 Generated with Claude Code