feat(search): consumer-field contract test + index mapping health check by pmbrull · Pull Request #28751 · open-metadata/OpenMetadata

pmbrull · 2026-06-05T10:30:23Z

Describe your changes:

Search denormalizes a set of fields (tags.tagFQN, tier, certification, owners, domains, fqnParts, testCaseResult.testCaseStatus, …) into every entity document, and RBAC, Data Quality, Incident Manager, Lineage, and Data Insights query them directly — so a mapping change that renames, retypes, or drops one breaks those features silently, with no compile- or boot-time failure. This PR hardens that area with a CI contract guard, a runtime "did we forget to reindex?" health check, and the mapping bug the guards surfaced.

Type of change:

Bug fix
Improvement

High-level design:

SearchConsumerFields — one source of truth for the consumer contract (denormalized leaf fields + required keyword type, and the core data-asset canary set).
SearchConsumerFieldContractTest (CI) — over every entity mapping × 4 languages: (1) the denormalized leaf fields keep their keyword type wherever present, and (2) core data-asset indexes expose the full denormalized set. Failures name the affected consumer. This guards the shipped mapping files at build time.
"Search Reindex Status" health check — a non-blocking StepValidation in SystemRepository.validateSystem() (the /system/validate Search section). It answers "is a reindex pending?" by comparing each entity's current code-mapping hash against the hash the deployed index was last built from (IndexMappingVersionTracker), and warns "reindex required" for any entity whose deployed index is stale. This replaces the earlier curated-subset live-field check, which only inspected 8 hard-coded fields on 10 hard-coded entities.
- The hash signal is only trustworthy if the stored hash is written whenever an index is genuinely (re)built. Previously only the CLI reindex stamped it, so a UI or scheduled reindex left the table stale and the signal would false-positive. This PR fixes that root cause by stamping at every successful seam:
  - SearchIndexApp.execute() — stamps the reindexed entities when the run finishes COMPLETED (covers UI, scheduled, and CLI-triggered reindex); never stamps on ACTIVE_ERROR/FAILED/STOPPED.
  - SearchRepository.createIndexes() — stamps each entity actually recreated (fresh install / drop-create), never one whose recreate threw.
  - The now-redundant CLI-only stamp in OpenMetadataOperations is removed (the same SearchIndexApp it triggers now owns stamping).
- Untracked entities (e.g. an existing deploy that upgrades in before its next reindex) are reported as a soft, non-failing note, not a failure. Missing indexes stay the responsibility of the existing Search-instance check.
- A pure-function live field-by-field comparison was considered and deliberately dropped: with reliable stamping, a matching hash already implies the deployed fields are correct, and a live diff only catches a "lying stamp" (alias-swap, partial build, manual drift) at meaningful cost and false-positive surface. It can be added later as defense-in-depth.
Cleanup: removed the orphaned getIndexFieldNames plumbing (added for the now-replaced live-field check) from SearchRepository, the SearchClient interface, and the Elasticsearch/OpenSearch managers.
Fix: the jp/topic index mapping was missing top-level domains (the contract test caught it), so JP-locale topic search had lost domain-based RBAC and DQ/Incident filtering. Added to match en/ru/zh.

Tests:

Use cases covered

A mapping change that retypes/renames/drops a shared consumer field fails CI, naming the affected consumer.
/system/validate reports a non-blocking warning when a deployed index was built from an older code mapping (a reindex is pending), and reports clean once every index has been (re)built from the current mappings.
A failed/partial reindex never stamps, so the check cannot report "clean" on a stale index.

Unit tests

Added/updated:
- IndexMappingVersionTrackerTest — computeDrift classifies CURRENT/STALE/UNTRACKED; updateMappingVersions(Collection) stamps only the given subset (plus the existing change-detection coverage).
- SystemRepositoryReindexStatusTest — pure classification (stale→pending, untracked→note, missing→skip, current→none, sorted output) and the operator-facing message.
- SearchRepositoryBehaviorTest#createIndexesStampsOnlySucceededEntities — a partial recreate (one entity throws) stamps only the succeeded entity.
- SearchConsumerFieldContractTest — unchanged, still green.
The old mock-heavy SystemRepositoryMappingConsistencyTest (asserted the removed canary logic) was deleted.

Backend integration tests

Not applicable — no new API endpoint; the validateSystem change is unit-covered. Recommended manual check on a live stack: run a reindex, confirm /system/status "Search Reindex Status" passes and index_mapping_versions is populated; tamper one stored hash and re-check to see that entity reported as reindex-pending.

Ingestion integration tests

Not applicable (no ingestion changes).

Playwright (UI) tests

Not applicable (no UI changes).

Manual testing performed

mvn -pl openmetadata-service spotless:check → clean.
mvn -pl openmetadata-spec install -DskipTests then mvn -pl openmetadata-service test -Dtest='IndexMappingVersionTrackerTest,SystemRepositoryReindexStatusTest,SearchRepositoryBehaviorTest,SearchConsumerFieldContractTest' → all green (13 + 7 + 110 + 2). (Build openmetadata-spec first so the contract test reads the current mappings, not a stale local jar.)

UI screen recording / screenshots:

Not applicable.

Checklist:

I have read the CONTRIBUTING document.
My PR title is Fixes <issue-number>: <short explanation>
My PR is linked to a GitHub issue via Fixes #<issue-number> above.
I have commented on my code, particularly in hard-to-understand areas.
For JSON Schema changes: not needed — the only mapping change is an additive Elasticsearch index-mapping field; existing indexes pick it up on the next reindex, which the new health check now surfaces.
For UI changes: not applicable.
I have added tests (unit) and listed them above.

Guard the denormalized search-index fields that RBAC, Data Quality, Incident Manager, Lineage, and Data Insights query (tags.tagFQN, tier, certification, owners, domains, fqnParts, testCaseStatus, ...). Renaming, retyping, or dropping one silently breaks those consumers with no compile- or boot-time failure. - SearchConsumerFields: single source of truth for the consumer contract. - SearchConsumerFieldContractTest: fails CI on a type change or a dropped field in any mapping (across all 4 languages), naming the affected consumers. - "Index Mapping Consistency" StepValidation on /system/validate: non-blocking health check that reads the live deployed mapping and warns "reindex required" when a core data-asset index is missing these fields (e.g. stale after upgrade). Adds a read-only getIndexFieldNames to the SearchClient (Elasticsearch + OpenSearch). - Fix jp/topic mapping missing top-level `domains` (found by the new test) — JP-locale topic search had lost domain-based RBAC/DQ filtering. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ract Index a document into the real en/jp/ru/zh index mapping for topic and test_case, then run the actual search/filter/aggregation each feature uses (tag/tier/certification/domain term filters, owners nested RBAC query, testCaseResult.testCaseStatus aggregation, fqnParts term) and assert it returns the document. A failure reads as a broken feature in a specific language, not "index is missing a mapping" — and would have caught the jp/topic domains drop. Runs against a real OpenSearch testcontainer. Text analyzers are stripped before index creation (the consumer fields are keyword/nested and analyzer-independent) so each language index is creatable without analysis plugins while the real field structure is still validated per language. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

30 of 55 jp index mappings referenced an analyzer the file does not define: an incomplete om_analyzer <-> om_analyzer_jp split left field-level analyzer / search_analyzer references dangling, so those jp indexes fail to create ("analyzer [...] has not been configured") whenever searchIndexMappingLanguage is set to jp. It is invisible on en-default deployments, which never load these files — the same reason the jp/topic domains drop went unnoticed. Add the missing analyzer definition to each file, preserving every field's existing reference: 29 files gain the generic om_analyzer (letter + lowercase + om_stemmer, matching the other jp mappings) used by identifier fields, and api_collection gains the kuromoji om_analyzer_jp used by its text fields. No field reference is changed, so prose stays kuromoji-analyzed while identifier fields (tagFQN, column names, custom-property text) stay on the generic analyzer, consistent with en and the already-correct jp mappings. jp still requires the analysis-kuromoji plugin. Surfaced by SearchConsumerFieldBehaviorIT. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Build the integration-test OpenSearch container from an image with the language analysis plugins installed (analysis-kuromoji for Japanese) via a shared SearchTestImages helper, wired into both TestSuiteBootstrap and ContainerizedServer. The plugins are installed unconditionally — an English-only run never references them — so the search IT suite can be run for non-English mapping languages. That is what would have caught the jp analyzer drift earlier: CI only ever ran en on a vanilla image, where the jp mappings (kuromoji) could never even be created. Add OpenSearchLanguageAnalyzerIT: boots the plugin image and creates the real en/ru/jp topic index mappings with their own analyzers (no analysis stripping), asserting analysis-kuromoji is installed and that the jp mappings — which reference kuromoji and previously had dangling analyzer references — now create cleanly. zh (third-party analysis-ik) is intentionally out of scope. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…h IT Run the per-language consumer-contract checks directly in the search IT, so the whole integration suite does not need to run once per language. The container now uses the analysis-plugin image (SearchTestImages), so en/ru/jp index with their real analyzers (jp = kuromoji) — which also exercises jp analyzer resolution end to end — and only zh (third-party analysis-ik, not installed) is validated structurally with its analysis stripped. Fold the plugin-present and real-jp-index creation checks into this IT and drop the standalone OpenSearchLanguageAnalyzerIT. Aggregation assertions now compare bucket keys case-insensitively: real keyword fields carry lowercase_normalizer (testCaseResult.testCaseStatus in every language, tier.tagFQN in en only), so the bucket key is lowercased while the feature still returns the asset. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

en uses lowercase_normalizer on 45 keyword field paths (tier.tagFQN, displayName.keyword, owners.name, dataProducts.*, ...) so those fields filter, sort and aggregate case-insensitively. jp/ru/zh were missing the normalizer on those fields, so the same Explore facets, RBAC rules and Data Quality widgets behaved case-sensitively — and inconsistently with en — for non-English deployments. Add lowercase_normalizer to the matching keyword fields in jp/ru/zh wherever en has it (216 occurrences across 122 files). The normalizer is already defined in every file, so this only adds references. Surfaced by SearchConsumerFieldBehaviorIT (tier aggregation diverged en vs jp/ru). Side effects: 3 files (jp/zh test_case, jp test_suite) had a duplicate settings.index key collapsed. 57 further en-only normalizers could not be aligned because the field is missing or non-keyword in the other language — a separate structural drift left for follow-up. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Add the third-party IK plugin (analysis-ik, ik_max_word/ik_smart) to the IT OpenSearch image alongside analysis-kuromoji, with the release URL derived from the base image tag so it always matches the OpenSearch version under test. Chinese mappings now index with their real analyzers in CI. Drop the analysis stripping from SearchConsumerFieldBehaviorIT entirely: all four languages now create and query with their real analyzers (en/ru built-in, jp kuromoji, zh IK), and the test asserts both plugins are installed. Every per-language search-consumer check now runs against real analyzers in one IT, with no need to run the whole integration suite once per language. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…e enum Replace the hardcoded ["en","jp","ru","zh"] list with IndexMappingLanguage.values() mapped through the same toString().toLowerCase() the loader uses to pick a per-language mapping file. A newly supported language is then exercised automatically, and a declared language with no mapping files fails fast in createIndex. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…rage Extend SearchConsumerFieldBehaviorIT with the real queries the DQ dashboard, test case list and Incident Manager run, validated across every language with real analyzers: - test_case index: nested testSuites.id filter, entityLink.nonNormalized per-entity execution-summary aggregation, dataQualityDimension and testPlatforms filters. - test_case_resolution_status index (new): testCaseResolutionStatusType, testCaseResolutionStatusDetails.assignee.name, and the testCase.fullyQualifiedName.keyword / testCase.entityFQN.keyword filters. Field paths and query shapes mirror the backend SearchListFilter / TestSuiteRepository and the UI DQ/incident filter builders. 16 capabilities now run per language. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Adds MappingDriftState enum (CURRENT/STALE/UNTRACKED) and computeDrift() method to classify each entity's index mapping against the stored hash, enabling /system/validate to detect stale deployed indices. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Introduce `updateMappingVersions(Collection<String>)` overload and extract shared `stamp()` helper so only the explicitly provided entity types are persisted; the no-arg method continues to stamp all entities as before. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

… tokens Add a per-language _analyze check on the description field (kuromoji for jp, IK for zh, stemmer for en/ru) asserting native-script text segments into the token the analyzer must produce (東京 / 销售 / продаж / revenue). Keyword and identifier fields are language-neutral and never exercise the text analyzers, so this is the only assertion that the per-language analyzers — the whole reason the per-language mappings exist — actually work. Tokens were verified against the real field analyzers (e.g. ru stems продажи -> продаж; jp drops the の particle). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

… field check Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Track which entity indexes were successfully recreated in createIndexes() and stamp their mapping versions via IndexMappingVersionTracker, skipping any that threw during finalizeReindex. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…led/CLI) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…exApp owns it) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…removal Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…removal

Copilot

Copilot was unable to review this pull request because the user who requested the review has reached their quota limit.

…lidate

gitar-bot · 2026-06-09T08:41:23Z

+  @VisibleForTesting
+  List<String> findOrphanIndexes(SearchRepository searchRepository) {
+    List<String> orphans = new ArrayList<>();
+    try {
+      OrphanedIndexCleaner cleaner = new OrphanedIndexCleaner();
+      for (OrphanedIndexCleaner.OrphanedIndex orphan :
+          cleaner.findOrphanedRebuildIndices(searchRepository.getSearchClient())) {
+        orphans.add(orphan.indexName());
+      }
+    } catch (Exception e) {
+      LOG.warn("Failed to check for orphan indexes: {}", e.getMessage());
+    }
+    return orphans;
+  }


💡 Edge Case: Orphan check labels in-flight rebuild indexes as "safe to clean"

findOrphanIndexes reuses OrphanedIndexCleaner.findOrphanedRebuildIndices, which flags any _rebuild_ index older than 30 minutes (MIN_AGE_MS) that currently has no alias. A long-running reindex (large datasets routinely exceed 30 minutes) produces a rebuild index that is older than the threshold but not yet alias-swapped. During that window the /system/validate message will report it as an "orphan index(es) with no alias (safe to clean)", which is misleading for an operator who may then delete an index that is actively being built. Consider cross-checking against active reindex jobs, or softening the "safe to clean" wording to acknowledge in-progress rebuilds.

Sources: SystemRepository.java:894-907,935-940; OrphanedIndexCleaner.java:39,59-72.

_{Was this helpful? React with 👍 / 👎}

…tus check

Copilot

Copilot was unable to review this pull request because the user who requested the review has reached their quota limit.

gitar-bot · 2026-06-09T08:59:55Z

+    boolean reindexNeeded() {
+      return !stalePending.isEmpty() || !missingIndexes.isEmpty();
+    }


💡 Edge Case: Degraded cluster health does not fail the Reindex Status step

reindexNeeded() (SystemRepository.java:856-858) is defined as !stalePending.isEmpty() || !missingIndexes.isEmpty() and the step result uses step.withPassed(!status.reindexNeeded()) (line 991). clusterHealthy is intentionally excluded, so when the search cluster is degraded the step still reports passed=true while the message appends "WARNING: search cluster health is degraded." (appendClusterState, lines 931-933).

This is defensible for a check named "Reindex Status" (a degraded cluster does not imply a reindex is pending), but operators/automation that key off the boolean passed flag rather than parsing the message text will treat a degraded cluster as fully healthy. If that surfacing is intended, no change is needed; otherwise consider either surfacing cluster health as its own StepValidation or having the consumer of /system/validate treat the WARNING text as non-green. Flagging as minor since the warning is still present in the message.

_{Was this helpful? React with 👍 / 👎}

pmbrull · 2026-06-09T09:13:37Z

Split into two focused PRs:

feat(search): reindex-drift + index health checks in /system/validate #28856 — /system/validate reindex-drift + index health checks (repository)
test(search): per-language index mappings + multi-language search ITs #28857 — per-language index mappings + multi-language search integration tests

Closing this combined draft in favor of those two.

gitar-bot · 2026-06-09T09:21:29Z

Code Review 👍 Approved with suggestions 5 resolved / 8 findings

Implements a robust search indexing contract and health check, ensuring schema consistency across languages while surfacing pending reindex states. Please address the minor findings regarding cluster health warnings and orphan index classification to ensure they correctly impact the system validation status.

💡 Edge Case: Degraded search cluster reported as warning but step still passes

📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/SystemRepository.java:835-838 📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/SystemRepository.java:909-919

In getSearchValidation, the step's pass/fail is withPassed(missingIndexes.isEmpty()). A degraded cluster (isSearchClusterHealthy returns false, i.e. Elasticsearch/OpenSearch reports RED) and orphan indexes are only appended to the message text — they never flip passed to false. A RED search cluster is arguably as severe as (or worse than) a missing index, yet /system/validate will still mark the Search step as passed=true. Automated consumers that key off the boolean rather than parsing the message string will miss a degraded cluster. If this is intentional (non-blocking by design, as the PR states), consider documenting it; otherwise factor cluster health into the passed condition.

Sources: SystemRepository.java:835-838, 909-919.

💡 Edge Case: Orphan check labels in-flight rebuild indexes as "safe to clean"

📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/SystemRepository.java:894-907 📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/SystemRepository.java:935-940

findOrphanIndexes reuses OrphanedIndexCleaner.findOrphanedRebuildIndices, which flags any _rebuild_ index older than 30 minutes (MIN_AGE_MS) that currently has no alias. A long-running reindex (large datasets routinely exceed 30 minutes) produces a rebuild index that is older than the threshold but not yet alias-swapped. During that window the /system/validate message will report it as an "orphan index(es) with no alias (safe to clean)", which is misleading for an operator who may then delete an index that is actively being built. Consider cross-checking against active reindex jobs, or softening the "safe to clean" wording to acknowledge in-progress rebuilds.

Sources: SystemRepository.java:894-907,935-940; OrphanedIndexCleaner.java:39,59-72.

💡 Edge Case: Degraded cluster health does not fail the Reindex Status step

📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/SystemRepository.java:856-858 📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/SystemRepository.java:931-933 📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/SystemRepository.java:989-991

reindexNeeded() (SystemRepository.java:856-858) is defined as !stalePending.isEmpty() || !missingIndexes.isEmpty() and the step result uses step.withPassed(!status.reindexNeeded()) (line 991). clusterHealthy is intentionally excluded, so when the search cluster is degraded the step still reports passed=true while the message appends "WARNING: search cluster health is degraded." (appendClusterState, lines 931-933).

This is defensible for a check named "Reindex Status" (a degraded cluster does not imply a reindex is pending), but operators/automation that key off the boolean passed flag rather than parsing the message text will treat a degraded cluster as fully healthy. If that surfacing is intended, no change is needed; otherwise consider either surfacing cluster health as its own StepValidation or having the consumer of /system/validate treat the WARNING text as non-green. Flagging as minor since the warning is still present in the message.

✅ 5 resolved

✅ Edge Case: Health check treats unreadable/empty live mappings as healthy

📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/SystemRepository.java:920-934 📄 openmetadata-service/src/main/java/org/openmetadata/service/search/elasticsearch/ElasticSearchIndexManager.java:67-81
getIndexFieldNames swallows any exception and returns an empty set, and collectMissingConsumerFields skips an index entirely when liveFields.isEmpty(). This intentionally avoids double-reporting a not-yet-created index, but it also means a transient failure reading a deployed index's mapping (cluster timeout, permission error, partial outage) is indistinguishable from "index absent" — both result in the index being skipped and the overall step reporting passed=true ("All core data asset indexes expose the fields..."). A genuinely stale/broken index can therefore be masked behind a green check whenever the mapping read fails. Since this is a non-blocking advisory check the impact is limited, but consider distinguishing "index missing" from "mapping read failed" (e.g. surface a separate "could not verify N indexes" note) so transient errors don't produce a false all-clear.

✅ Edge Case: jp/api_collection repointed to English analyzer, not kuromoji

📄 openmetadata-spec/src/main/resources/elasticsearch/jp/api_collection_index_mapping.json:27-35 📄 openmetadata-spec/src/main/resources/elasticsearch/jp/api_collection_index_mapping.json:93-95 📄 openmetadata-spec/src/main/resources/elasticsearch/jp/api_collection_index_mapping.json:119-121 📄 openmetadata-spec/src/main/resources/elasticsearch/jp/api_collection_index_mapping.json:142-144
This commit repoints analyzer references the right way for almost every jp index: those files define om_analyzer_jp (kuromoji) in settings.analysis.analyzer but had dangling field references to om_analyzer, so they were correctly changed to om_analyzer_jp. jp/api_collection_index_mapping.json is the lone exception — it went the opposite direction (om_analyzer_jp -> om_analyzer) on name, displayName, and description.

Root cause: unlike every other jp mapping, api_collection's settings block was never localized — it still defines om_analyzer with the English kstem stemmer (lines 27-56) rather than om_analyzer_jp with the kuromoji tokenizer/filters that topic, container, glossary, etc. define. Because the only analyzer actually declared in this file is om_analyzer, the fix made the field references consistent with the (un-localized) settings rather than fixing the settings.

Impact: the dangling-reference fix is correct (referencing an undefined om_analyzer_jp would have failed index creation), but the result is that JP full-text search over API Collections (name/displayName/description) silently uses English tokenization instead of Japanese — inconsistent with the rest of the jp locale. The denormalized consumer fields (keyword/nested) are unaffected, so RBAC/DQ still work; this is a search-quality regression for Japanese text on this one index.

Suggested fix: localize api_collection's settings to define om_analyzer_jp (kuromoji) matching the other jp indexes, and keep the three field references at om_analyzer_jp rather than downgrading them to om_analyzer.

✅ Quality: Plugin install adds a build-time network dependency to all IT runs

📄 openmetadata-integration-tests/src/test/java/org/openmetadata/it/server/SearchTestImages.java:31-44 📄 openmetadata-integration-tests/src/test/java/org/openmetadata/it/bootstrap/TestSuiteBootstrap.java:353-355 📄 openmetadata-integration-tests/src/test/java/org/openmetadata/it/server/ContainerizedServer.java:226-228
SearchTestImages.openSearchWithAnalysisPlugins is now wired into the main IT bootstrap (TestSuiteBootstrap.startSearch) and the full production-stack runner (ContainerizedServer.newOpenSearch), not just the new language-analyzer test. Because it builds an image whose Dockerfile runs opensearch-plugin install --batch analysis-kuromoji, every OpenSearch-based integration test run now requires the OpenSearch plugin artifact repository to be reachable at image-build time. If that registry is unavailable (offline/air-gapped CI, transient outage), the image build fails and the entire OpenSearch IT suite fails to start — not only the per-language analyzer coverage. The docstring documents installing kuromoji unconditionally as intentional, but coupling the whole suite's startup to an external download is a new failure mode. Consider gating the plugin-augmented image behind a flag/system property (defaulting to the language IT only), or pre-baking the plugin into a published base image so routine IT runs don't depend on the plugin registry being online.

✅ Performance: validateSystem issues up to 10 uncached getMapping round-trips

📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/SystemRepository.java:871-885
getMappingConsistencyValidation is invoked on every /system/validate call and performs one synchronous getMapping network round-trip per entry in CANARY_DATA_ASSET_ENTITIES (10 indexes), with no caching. This is acceptable for a manually triggered admin endpoint, but if /system/validate is ever polled by monitoring/health tooling the added latency and ES/OS load become noticeable. Consider a short-lived cache of the result (similar to the existing HEALTH_CHECK_CACHE_MS pattern in the search clients) or documenting that this endpoint is admin-only/low-frequency.

✅ Edge Case: Contract test misses denormalized object retyped to a scalar

📄 openmetadata-service/src/test/java/org/openmetadata/service/search/SearchConsumerFieldContractTest.java:105-119 📄 openmetadata-service/src/test/java/org/openmetadata/service/search/SearchConsumerFieldContractTest.java:126-139 📄 openmetadata-service/src/main/java/org/openmetadata/service/search/SearchConsumerFields.java:44-58
The two guards have a blind spot for the most likely "retype" regression on the top-level denormalized objects (tags, tier, domains, certification). coreDataAssetIndexesExposeConsumerFields only checks properties.has(required) — key presence, no type. consumerLeafFieldsKeepTheirTypeWherePresent validates leaf paths like tags.tagFQN / domains.fullyQualifiedName, but findField walks into node.path("properties") for non-terminal segments; if the parent object is retyped to a scalar (e.g. tags changed from an object to {"type":"keyword"}), it has no properties, the loop stops, found is null, and the field is silently treated as "not present" (skipped). Meanwhile the CANARY presence check still passes because the tags key exists. So a mapping change that flattens tags/tier/domains/certification from an object to a scalar — exactly the kind of silent break the contract is meant to catch for RBAC/DQ/Lineage — passes CI green. Consider asserting the parent object's structure (e.g. that the field declares properties / is not typed as a scalar) for these top-level denormalized objects, or distinguishing "path absent" from "path truncated by an unexpected scalar parent" in findField so the latter is reported as a violation.

🤖 Prompt for agents

Code Review: Implements a robust search indexing contract and health check, ensuring schema consistency across languages while surfacing pending reindex states. Please address the minor findings regarding cluster health warnings and orphan index classification to ensure they correctly impact the system validation status.

1. 💡 Edge Case: Degraded search cluster reported as warning but step still passes
   Files: openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/SystemRepository.java:835-838, openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/SystemRepository.java:909-919

   In `getSearchValidation`, the step's pass/fail is `withPassed(missingIndexes.isEmpty())`. A degraded cluster (`isSearchClusterHealthy` returns false, i.e. Elasticsearch/OpenSearch reports RED) and orphan indexes are only appended to the message text — they never flip `passed` to false. A RED search cluster is arguably as severe as (or worse than) a missing index, yet `/system/validate` will still mark the Search step as passed=true. Automated consumers that key off the boolean rather than parsing the message string will miss a degraded cluster. If this is intentional (non-blocking by design, as the PR states), consider documenting it; otherwise factor cluster health into the passed condition.
   
   Sources: SystemRepository.java:835-838, 909-919.

2. 💡 Edge Case: Orphan check labels in-flight rebuild indexes as "safe to clean"
   Files: openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/SystemRepository.java:894-907, openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/SystemRepository.java:935-940

   `findOrphanIndexes` reuses `OrphanedIndexCleaner.findOrphanedRebuildIndices`, which flags any `_rebuild_` index older than 30 minutes (`MIN_AGE_MS`) that currently has no alias. A long-running reindex (large datasets routinely exceed 30 minutes) produces a rebuild index that is older than the threshold but not yet alias-swapped. During that window the `/system/validate` message will report it as an "orphan index(es) with no alias (safe to clean)", which is misleading for an operator who may then delete an index that is actively being built. Consider cross-checking against active reindex jobs, or softening the "safe to clean" wording to acknowledge in-progress rebuilds.
   
   Sources: SystemRepository.java:894-907,935-940; OrphanedIndexCleaner.java:39,59-72.

3. 💡 Edge Case: Degraded cluster health does not fail the Reindex Status step
   Files: openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/SystemRepository.java:856-858, openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/SystemRepository.java:931-933, openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/SystemRepository.java:989-991

   `reindexNeeded()` (SystemRepository.java:856-858) is defined as `!stalePending.isEmpty() || !missingIndexes.isEmpty()` and the step result uses `step.withPassed(!status.reindexNeeded())` (line 991). `clusterHealthy` is intentionally excluded, so when the search cluster is degraded the step still reports `passed=true` while the message appends "WARNING: search cluster health is degraded." (appendClusterState, lines 931-933).
   
   This is defensible for a check named "Reindex Status" (a degraded cluster does not imply a reindex is pending), but operators/automation that key off the boolean `passed` flag rather than parsing the message text will treat a degraded cluster as fully healthy. If that surfacing is intended, no change is needed; otherwise consider either surfacing cluster health as its own StepValidation or having the consumer of `/system/validate` treat the WARNING text as non-green. Flagging as minor since the warning is still present in the message.

Options

Display: compact → Showing less information.

Comment with these commands to change:

`Compact`
`gitar display:verbose`

_{Was this helpful? React with 👍 / 👎 | Gitar}

sonarqubecloud · 2026-06-09T10:04:26Z

Quality Gate passed for 'open-metadata-ingestion'

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

github-actions · 2026-06-09T11:32:36Z

🟡 Playwright Results — all passed (12 flaky)

✅ 4268 passed · ❌ 0 failed · 🟡 12 flaky · ⏭️ 88 skipped

Shard	Passed	Flaky	Skipped
✅ Shard 1	301	0	4
🟡 Shard 2	803	1	9
🟡 Shard 3	802	3	8
🟡 Shard 4	852	3	12
🟡 Shard 5	720	1	47
🟡 Shard 6	790	4	8

🟡 12 flaky test(s) (passed on retry)

Features/Glossary/GlossaryWorkflow.spec.ts › should display correct status badge color and icon (shard 2, 1 retry)
Features/RTL.spec.ts › Verify Following widget functionality (shard 3, 1 retry)
Features/SettingsNavigationPage.spec.ts › should support drag and drop reordering of navigation items (shard 3, 1 retry)
Features/Table.spec.ts › Table pagination with sorting should works (shard 3, 1 retry)
Pages/CustomProperties.spec.ts › Set & Update all CP types on pipeline (shard 4, 1 retry)
Pages/CustomProperties.spec.ts › Should verify property name is visible for apiCollection in right panel (shard 4, 1 retry)
Pages/CustomProperties.spec.ts › Entity Reference (shard 4, 1 retry)
Pages/ExplorePageRightPanel_KnowledgeCenter.spec.ts › Should remove user owner for knowledgeCenter (shard 5, 2 retries)
Pages/ExplorePageRightPanel.spec.ts › Should allow Data Consumer to edit description for searchIndex (shard 6, 1 retry)
Pages/Lineage/DataAssetLineage.spec.ts › Column lineage for apiEndpoint -> dashboard (shard 6, 1 retry)
Pages/Lineage/LineageFilters.spec.ts › Verify Impact Analysis service filter selection (shard 6, 1 retry)
Pages/ODCSImportExport.spec.ts › Multi-object ODCS contract - object selector shows all schema objects (shard 6, 1 retry)

📦 Download artifacts

How to debug locally

# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip    # view trace

github-actions Bot added Ingestion safe to test Add this label to run secure Github workflows on PRs labels Jun 5, 2026

gitar-bot Bot reviewed Jun 5, 2026

View reviewed changes

Comment thread openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/SystemRepository.java Outdated

gitar-bot Bot reviewed Jun 5, 2026

View reviewed changes

Comment thread openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/SystemRepository.java Outdated

gitar-bot Bot reviewed Jun 5, 2026

View reviewed changes

Comment thread openmetadata-spec/src/main/resources/elasticsearch/jp/api_collection_index_mapping.json Outdated

pmbrull force-pushed the pmbrull/search-indexing-audit branch from 6016bad to 73282d0 Compare June 8, 2026 06:56

gitar-bot Bot reviewed Jun 8, 2026

View reviewed changes

Comment thread ...a-service/src/test/java/org/openmetadata/service/search/SearchConsumerFieldContractTest.java

gitar-bot Bot reviewed Jun 8, 2026

View reviewed changes

Comment thread openmetadata-integration-tests/src/test/java/org/openmetadata/it/server/SearchTestImages.java

pmbrull and others added 15 commits June 8, 2026 13:09

feat(search): add reindex-status classification to SystemRepository

cddcb55

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

feat(search): report reindex-pending in /system/validate, drop canary…

93fd91e

… field check Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

feat(search): stamp mapping versions on successful reindex (UI/schedu…

32e60fb

…led/CLI) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

refactor(search): drop redundant CLI mapping-version stamp (SearchInd…

e0c2397

…exApp owns it) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

chore(search): remove orphaned getIndexFieldNames after canary check …

0607357

…removal Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

docs(search): update SearchConsumerFields javadoc after canary check …

3155865

…removal

pmbrull marked this pull request as ready for review June 9, 2026 06:42

Copilot AI review requested due to automatic review settings June 9, 2026 06:42

Copilot AI reviewed Jun 9, 2026

open-metadata deleted a comment from github-actions Bot Jun 9, 2026

pmbrull temporarily deployed to test June 9, 2026 06:54 — with GitHub Actions Inactive

pmbrull had a problem deploying to test June 9, 2026 06:54 — with GitHub Actions Error

pmbrull temporarily deployed to test June 9, 2026 06:54 — with GitHub Actions Inactive

pmbrull had a problem deploying to test June 9, 2026 06:54 — with GitHub Actions Error

feat(search): surface orphan indexes and cluster health in /system/va…

4cb9691

…lidate

gitar-bot Bot reviewed Jun 9, 2026

View reviewed changes

refactor(search): fold missing/orphan/cluster into Search Reindex Sta…

9b59a56

…tus check

pmbrull had a problem deploying to test June 9, 2026 08:50 — with GitHub Actions Error

Copilot AI review requested due to automatic review settings June 9, 2026 08:55

Copilot AI reviewed Jun 9, 2026

gitar-bot Bot reviewed Jun 9, 2026

View reviewed changes

pmbrull temporarily deployed to test June 9, 2026 09:06 — with GitHub Actions Inactive

pmbrull closed this Jun 9, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(search): consumer-field contract test + index mapping health check#28751

feat(search): consumer-field contract test + index mapping health check#28751
pmbrull wants to merge 21 commits into
mainfrom
pmbrull/search-indexing-audit

pmbrull commented Jun 5, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

gitar-bot Bot Jun 9, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

gitar-bot Bot Jun 9, 2026

Uh oh!

pmbrull commented Jun 9, 2026

Uh oh!

gitar-bot Bot commented Jun 9, 2026

Uh oh!

sonarqubecloud Bot commented Jun 9, 2026

Uh oh!

github-actions Bot commented Jun 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

pmbrull commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Describe your changes:

Type of change:

High-level design:

Tests:

Use cases covered

Unit tests

Backend integration tests

Ingestion integration tests

Playwright (UI) tests

Manual testing performed

UI screen recording / screenshots:

Checklist:

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

gitar-bot Bot Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

gitar-bot Bot Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

pmbrull commented Jun 9, 2026

Uh oh!

gitar-bot Bot commented Jun 9, 2026

Uh oh!

sonarqubecloud Bot commented Jun 9, 2026

Quality Gate passed for 'open-metadata-ingestion'

Uh oh!

github-actions Bot commented Jun 9, 2026

🟡 Playwright Results — all passed (12 flaky)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pmbrull commented Jun 5, 2026 •

edited

Loading