feat(search): consumer-field contract test + index mapping health check#28751
feat(search): consumer-field contract test + index mapping health check#28751pmbrull wants to merge 21 commits into
Conversation
Guard the denormalized search-index fields that RBAC, Data Quality, Incident Manager, Lineage, and Data Insights query (tags.tagFQN, tier, certification, owners, domains, fqnParts, testCaseStatus, ...). Renaming, retyping, or dropping one silently breaks those consumers with no compile- or boot-time failure. - SearchConsumerFields: single source of truth for the consumer contract. - SearchConsumerFieldContractTest: fails CI on a type change or a dropped field in any mapping (across all 4 languages), naming the affected consumers. - "Index Mapping Consistency" StepValidation on /system/validate: non-blocking health check that reads the live deployed mapping and warns "reindex required" when a core data-asset index is missing these fields (e.g. stale after upgrade). Adds a read-only getIndexFieldNames to the SearchClient (Elasticsearch + OpenSearch). - Fix jp/topic mapping missing top-level `domains` (found by the new test) — JP-locale topic search had lost domain-based RBAC/DQ filtering. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ract Index a document into the real en/jp/ru/zh index mapping for topic and test_case, then run the actual search/filter/aggregation each feature uses (tag/tier/certification/domain term filters, owners nested RBAC query, testCaseResult.testCaseStatus aggregation, fqnParts term) and assert it returns the document. A failure reads as a broken feature in a specific language, not "index is missing a mapping" — and would have caught the jp/topic domains drop. Runs against a real OpenSearch testcontainer. Text analyzers are stripped before index creation (the consumer fields are keyword/nested and analyzer-independent) so each language index is creatable without analysis plugins while the real field structure is still validated per language. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
30 of 55 jp index mappings referenced an analyzer the file does not define: an
incomplete om_analyzer <-> om_analyzer_jp split left field-level analyzer /
search_analyzer references dangling, so those jp indexes fail to create
("analyzer [...] has not been configured") whenever searchIndexMappingLanguage is
set to jp. It is invisible on en-default deployments, which never load these
files — the same reason the jp/topic domains drop went unnoticed.
Add the missing analyzer definition to each file, preserving every field's
existing reference: 29 files gain the generic om_analyzer (letter + lowercase +
om_stemmer, matching the other jp mappings) used by identifier fields, and
api_collection gains the kuromoji om_analyzer_jp used by its text fields. No field
reference is changed, so prose stays kuromoji-analyzed while identifier fields
(tagFQN, column names, custom-property text) stay on the generic analyzer,
consistent with en and the already-correct jp mappings. jp still requires the
analysis-kuromoji plugin. Surfaced by SearchConsumerFieldBehaviorIT.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
6016bad to
73282d0
Compare
Build the integration-test OpenSearch container from an image with the language analysis plugins installed (analysis-kuromoji for Japanese) via a shared SearchTestImages helper, wired into both TestSuiteBootstrap and ContainerizedServer. The plugins are installed unconditionally — an English-only run never references them — so the search IT suite can be run for non-English mapping languages. That is what would have caught the jp analyzer drift earlier: CI only ever ran en on a vanilla image, where the jp mappings (kuromoji) could never even be created. Add OpenSearchLanguageAnalyzerIT: boots the plugin image and creates the real en/ru/jp topic index mappings with their own analyzers (no analysis stripping), asserting analysis-kuromoji is installed and that the jp mappings — which reference kuromoji and previously had dangling analyzer references — now create cleanly. zh (third-party analysis-ik) is intentionally out of scope. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…h IT Run the per-language consumer-contract checks directly in the search IT, so the whole integration suite does not need to run once per language. The container now uses the analysis-plugin image (SearchTestImages), so en/ru/jp index with their real analyzers (jp = kuromoji) — which also exercises jp analyzer resolution end to end — and only zh (third-party analysis-ik, not installed) is validated structurally with its analysis stripped. Fold the plugin-present and real-jp-index creation checks into this IT and drop the standalone OpenSearchLanguageAnalyzerIT. Aggregation assertions now compare bucket keys case-insensitively: real keyword fields carry lowercase_normalizer (testCaseResult.testCaseStatus in every language, tier.tagFQN in en only), so the bucket key is lowercased while the feature still returns the asset. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
en uses lowercase_normalizer on 45 keyword field paths (tier.tagFQN, displayName.keyword, owners.name, dataProducts.*, ...) so those fields filter, sort and aggregate case-insensitively. jp/ru/zh were missing the normalizer on those fields, so the same Explore facets, RBAC rules and Data Quality widgets behaved case-sensitively — and inconsistently with en — for non-English deployments. Add lowercase_normalizer to the matching keyword fields in jp/ru/zh wherever en has it (216 occurrences across 122 files). The normalizer is already defined in every file, so this only adds references. Surfaced by SearchConsumerFieldBehaviorIT (tier aggregation diverged en vs jp/ru). Side effects: 3 files (jp/zh test_case, jp test_suite) had a duplicate settings.index key collapsed. 57 further en-only normalizers could not be aligned because the field is missing or non-keyword in the other language — a separate structural drift left for follow-up. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add the third-party IK plugin (analysis-ik, ik_max_word/ik_smart) to the IT OpenSearch image alongside analysis-kuromoji, with the release URL derived from the base image tag so it always matches the OpenSearch version under test. Chinese mappings now index with their real analyzers in CI. Drop the analysis stripping from SearchConsumerFieldBehaviorIT entirely: all four languages now create and query with their real analyzers (en/ru built-in, jp kuromoji, zh IK), and the test asserts both plugins are installed. Every per-language search-consumer check now runs against real analyzers in one IT, with no need to run the whole integration suite once per language. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…e enum Replace the hardcoded ["en","jp","ru","zh"] list with IndexMappingLanguage.values() mapped through the same toString().toLowerCase() the loader uses to pick a per-language mapping file. A newly supported language is then exercised automatically, and a declared language with no mapping files fails fast in createIndex. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…rage Extend SearchConsumerFieldBehaviorIT with the real queries the DQ dashboard, test case list and Incident Manager run, validated across every language with real analyzers: - test_case index: nested testSuites.id filter, entityLink.nonNormalized per-entity execution-summary aggregation, dataQualityDimension and testPlatforms filters. - test_case_resolution_status index (new): testCaseResolutionStatusType, testCaseResolutionStatusDetails.assignee.name, and the testCase.fullyQualifiedName.keyword / testCase.entityFQN.keyword filters. Field paths and query shapes mirror the backend SearchListFilter / TestSuiteRepository and the UI DQ/incident filter builders. 16 capabilities now run per language. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Adds MappingDriftState enum (CURRENT/STALE/UNTRACKED) and computeDrift() method to classify each entity's index mapping against the stored hash, enabling /system/validate to detect stale deployed indices. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Introduce `updateMappingVersions(Collection<String>)` overload and extract shared `stamp()` helper so only the explicitly provided entity types are persisted; the no-arg method continues to stamp all entities as before. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… tokens Add a per-language _analyze check on the description field (kuromoji for jp, IK for zh, stemmer for en/ru) asserting native-script text segments into the token the analyzer must produce (東京 / 销售 / продаж / revenue). Keyword and identifier fields are language-neutral and never exercise the text analyzers, so this is the only assertion that the per-language analyzers — the whole reason the per-language mappings exist — actually work. Tokens were verified against the real field analyzers (e.g. ru stems продажи -> продаж; jp drops the の particle). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… field check Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Track which entity indexes were successfully recreated in createIndexes() and stamp their mapping versions via IndexMappingVersionTracker, skipping any that threw during finalizeReindex. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…led/CLI) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…exApp owns it) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…removal Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
| @VisibleForTesting | ||
| List<String> findOrphanIndexes(SearchRepository searchRepository) { | ||
| List<String> orphans = new ArrayList<>(); | ||
| try { | ||
| OrphanedIndexCleaner cleaner = new OrphanedIndexCleaner(); | ||
| for (OrphanedIndexCleaner.OrphanedIndex orphan : | ||
| cleaner.findOrphanedRebuildIndices(searchRepository.getSearchClient())) { | ||
| orphans.add(orphan.indexName()); | ||
| } | ||
| } catch (Exception e) { | ||
| LOG.warn("Failed to check for orphan indexes: {}", e.getMessage()); | ||
| } | ||
| return orphans; | ||
| } |
There was a problem hiding this comment.
💡 Edge Case: Orphan check labels in-flight rebuild indexes as "safe to clean"
findOrphanIndexes reuses OrphanedIndexCleaner.findOrphanedRebuildIndices, which flags any _rebuild_ index older than 30 minutes (MIN_AGE_MS) that currently has no alias. A long-running reindex (large datasets routinely exceed 30 minutes) produces a rebuild index that is older than the threshold but not yet alias-swapped. During that window the /system/validate message will report it as an "orphan index(es) with no alias (safe to clean)", which is misleading for an operator who may then delete an index that is actively being built. Consider cross-checking against active reindex jobs, or softening the "safe to clean" wording to acknowledge in-progress rebuilds.
Sources: SystemRepository.java:894-907,935-940; OrphanedIndexCleaner.java:39,59-72.
Was this helpful? React with 👍 / 👎
| boolean reindexNeeded() { | ||
| return !stalePending.isEmpty() || !missingIndexes.isEmpty(); | ||
| } |
There was a problem hiding this comment.
💡 Edge Case: Degraded cluster health does not fail the Reindex Status step
reindexNeeded() (SystemRepository.java:856-858) is defined as !stalePending.isEmpty() || !missingIndexes.isEmpty() and the step result uses step.withPassed(!status.reindexNeeded()) (line 991). clusterHealthy is intentionally excluded, so when the search cluster is degraded the step still reports passed=true while the message appends "WARNING: search cluster health is degraded." (appendClusterState, lines 931-933).
This is defensible for a check named "Reindex Status" (a degraded cluster does not imply a reindex is pending), but operators/automation that key off the boolean passed flag rather than parsing the message text will treat a degraded cluster as fully healthy. If that surfacing is intended, no change is needed; otherwise consider either surfacing cluster health as its own StepValidation or having the consumer of /system/validate treat the WARNING text as non-green. Flagging as minor since the warning is still present in the message.
Was this helpful? React with 👍 / 👎
|
Split into two focused PRs:
Closing this combined draft in favor of those two. |
Code Review 👍 Approved with suggestions 5 resolved / 8 findingsImplements a robust search indexing contract and health check, ensuring schema consistency across languages while surfacing pending reindex states. Please address the minor findings regarding cluster health warnings and orphan index classification to ensure they correctly impact the system validation status. 💡 Edge Case: Degraded search cluster reported as warning but step still passes📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/SystemRepository.java:835-838 📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/SystemRepository.java:909-919 In Sources: SystemRepository.java:835-838, 909-919. 💡 Edge Case: Orphan check labels in-flight rebuild indexes as "safe to clean"📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/SystemRepository.java:894-907 📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/SystemRepository.java:935-940
Sources: SystemRepository.java:894-907,935-940; OrphanedIndexCleaner.java:39,59-72. 💡 Edge Case: Degraded cluster health does not fail the Reindex Status step📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/SystemRepository.java:856-858 📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/SystemRepository.java:931-933 📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/SystemRepository.java:989-991
This is defensible for a check named "Reindex Status" (a degraded cluster does not imply a reindex is pending), but operators/automation that key off the boolean ✅ 5 resolved✅ Edge Case: Health check treats unreadable/empty live mappings as healthy
✅ Edge Case: jp/api_collection repointed to English analyzer, not kuromoji
✅ Quality: Plugin install adds a build-time network dependency to all IT runs
✅ Performance: validateSystem issues up to 10 uncached getMapping round-trips
✅ Edge Case: Contract test misses denormalized object retyped to a scalar
🤖 Prompt for agentsOptionsDisplay: compact → Showing less information. Comment with these commands to change:
Was this helpful? React with 👍 / 👎 | Gitar |
|
🟡 Playwright Results — all passed (12 flaky)✅ 4268 passed · ❌ 0 failed · 🟡 12 flaky · ⏭️ 88 skipped
🟡 12 flaky test(s) (passed on retry)
How to debug locally# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip # view trace |



Describe your changes:
fix #28849
Search denormalizes a set of fields (
tags.tagFQN,tier,certification,owners,domains,fqnParts,testCaseResult.testCaseStatus, …) into every entity document, and RBAC, Data Quality, Incident Manager, Lineage, and Data Insights query them directly — so a mapping change that renames, retypes, or drops one breaks those features silently, with no compile- or boot-time failure. This PR hardens that area with a CI contract guard, a runtime "did we forget to reindex?" health check, and the mapping bug the guards surfaced.Type of change:
High-level design:
SearchConsumerFields— one source of truth for the consumer contract (denormalized leaf fields + requiredkeywordtype, and the core data-asset canary set).SearchConsumerFieldContractTest(CI) — over every entity mapping × 4 languages: (1) the denormalized leaf fields keep theirkeywordtype wherever present, and (2) core data-asset indexes expose the full denormalized set. Failures name the affected consumer. This guards the shipped mapping files at build time.StepValidationinSystemRepository.validateSystem()(the/system/validateSearch section). It answers "is a reindex pending?" by comparing each entity's current code-mapping hash against the hash the deployed index was last built from (IndexMappingVersionTracker), and warns "reindex required" for any entity whose deployed index is stale. This replaces the earlier curated-subset live-field check, which only inspected 8 hard-coded fields on 10 hard-coded entities.SearchIndexApp.execute()— stamps the reindexed entities when the run finishesCOMPLETED(covers UI, scheduled, and CLI-triggered reindex); never stamps onACTIVE_ERROR/FAILED/STOPPED.SearchRepository.createIndexes()— stamps each entity actually recreated (fresh install / drop-create), never one whose recreate threw.OpenMetadataOperationsis removed (the sameSearchIndexAppit triggers now owns stamping).getIndexFieldNamesplumbing (added for the now-replaced live-field check) fromSearchRepository, theSearchClientinterface, and the Elasticsearch/OpenSearch managers.jp/topicindex mapping was missing top-leveldomains(the contract test caught it), so JP-locale topic search had lost domain-based RBAC and DQ/Incident filtering. Added to matchen/ru/zh.Tests:
Use cases covered
/system/validatereports a non-blocking warning when a deployed index was built from an older code mapping (a reindex is pending), and reports clean once every index has been (re)built from the current mappings.Unit tests
IndexMappingVersionTrackerTest—computeDriftclassifies CURRENT/STALE/UNTRACKED;updateMappingVersions(Collection)stamps only the given subset (plus the existing change-detection coverage).SystemRepositoryReindexStatusTest— pure classification (stale→pending, untracked→note, missing→skip, current→none, sorted output) and the operator-facing message.SearchRepositoryBehaviorTest#createIndexesStampsOnlySucceededEntities— a partial recreate (one entity throws) stamps only the succeeded entity.SearchConsumerFieldContractTest— unchanged, still green.SystemRepositoryMappingConsistencyTest(asserted the removed canary logic) was deleted.Backend integration tests
validateSystemchange is unit-covered. Recommended manual check on a live stack: run a reindex, confirm/system/status"Search Reindex Status" passes andindex_mapping_versionsis populated; tamper one stored hash and re-check to see that entity reported as reindex-pending.Ingestion integration tests
Playwright (UI) tests
Manual testing performed
mvn -pl openmetadata-service spotless:check→ clean.mvn -pl openmetadata-spec install -DskipTeststhenmvn -pl openmetadata-service test -Dtest='IndexMappingVersionTrackerTest,SystemRepositoryReindexStatusTest,SearchRepositoryBehaviorTest,SearchConsumerFieldContractTest'→ all green (13 + 7 + 110 + 2). (Buildopenmetadata-specfirst so the contract test reads the current mappings, not a stale local jar.)UI screen recording / screenshots:
Not applicable.
Checklist:
Fixes <issue-number>: <short explanation>Fixes #<issue-number>above.