test(search): per-language index mappings + multi-language search ITs by pmbrull · Pull Request #28857 · open-metadata/OpenMetadata

pmbrull · 2026-06-09T09:13:16Z

fix #28849

Summary

Hardens search across the four supported index-mapping languages (en / jp / ru / zh).

SearchConsumerFields — single source of truth for the denormalized search-index fields product features beyond Explore depend on (RBAC, Data Quality, Incident Manager, Lineage, Data Insights).
SearchConsumerFieldContractTest (CI) — over every entity mapping × 4 languages, asserts the denormalized leaf fields keep their keyword type wherever present and core data-asset indexes expose the full set; failures name the affected consumer.
SearchConsumerFieldBehaviorIT — per-language behavioral IT: indexes a real document into each language's real analyzers (kuromoji for jp, ik for zh) and runs the actual search / filter / aggregation each consumer relies on, reporting a failure as a broken feature in a specific language (e.g. "Domain filter is broken in [jp]").
Mapping fixes — define missing jp analyzers, align jp/ru/zh keyword normalizers to en (case-insensitive), and restore top-level domains on jp/topic, which had lost domain-based RBAC and DQ/Incident filtering (caught by the contract test).

Test Plan

SearchConsumerFieldContractTest (2) — green; openmetadata-integration-tests test-compiles; mvn -pl openmetadata-service spotless:check clean. (Build openmetadata-spec first so the contract test reads the current mappings.)
CI runs SearchConsumerFieldBehaviorIT (needs the language-analysis OpenSearch image / Docker).

🤖 Generated with Claude Code

Summary by Gitar

Cleanup:
- Removed SearchConsumerFields.java and SearchConsumerFieldContractTest.java in favor of the new behavioral integration test suite.
Search Mapping Improvements:
- Added om_analyzer_jp with kuromoji filters to support Japanese text analysis.
- Updated tagFQN and pipelineType fields across multiple language mappings to include lowercase_normalizer.

_{This will update automatically on new commits.}

- SearchConsumerFields: single source of truth for the denormalized search-index fields product features beyond Explore depend on (RBAC, Data Quality, Incident Manager, Lineage, Data Insights), guarded by a CI contract test (SearchConsumerFieldContractTest) that fails when a shipped mapping file renames/retypes/drops one, over every entity x 4 languages. - SearchConsumerFieldBehaviorIT: per-language behavioral IT that indexes a real document into each language's real analyzers (kuromoji/ik) and runs the actual search/filter/aggregation each consumer relies on, reporting a failure as a broken feature in a specific language. - Fix jp/ru/zh index mappings: define missing jp analyzers, align jp/ru/zh keyword normalizers to en (case-insensitive), and restore top-level domains on jp/topic (which had lost domain-based RBAC and DQ/Incident filtering). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Copilot

Copilot was unable to review this pull request because the user who requested the review has reached their quota limit.

github-actions · 2026-06-09T09:13:27Z

✅ PR checks passed

The linked issue has a description and all required Shipping project fields set. Thanks!

The analysis-plugin OpenSearch image (kuromoji + third-party IK from release.infinilabs.com) was wired into ContainerizedServer and TestSuiteBootstrap, so every OpenSearch IT built it at test time. Those suites are pinned to the EN mappings and never reference the jp/zh analyzers, so the install added a suite-wide build-time dependency on infinilabs.com -- and a version-skew failure on any OpenSearch bump where no matching IK release exists -- for zero coverage gain. Revert both call sites to the vanilla image. All non-English coverage lives in SearchConsumerFieldBehaviorIT, which builds its own plugin image, so language drift is still caught while the rest of the IT suite keeps no external dependency. Also derive SearchConsumerFieldContractTest's language list from IndexMappingLanguage.values() instead of a hardcoded list, matching the behavioral IT so a newly added language is guarded by both. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

SearchConsumerFields was a hand-maintained, test-only registry mirroring the denormalized fields, consumed solely by SearchConsumerFieldContractTest which statically re-checked the shipped mapping JSON. The behavioral guard SearchConsumerFieldBehaviorIT already exercises these fields with real RBAC/Data Quality/Incident queries across every mapping language, so the static mirror added upkeep without enforcing completeness. Remove both and rely on the behavioral IT. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Copilot

Copilot was unable to review this pull request because the user who requested the review has reached their quota limit.

Copilot

Copilot was unable to review this pull request because the user who requested the review has reached their quota limit.

github-actions · 2026-06-09T14:51:54Z

🟡 Playwright Results — all passed (9 flaky)

✅ 4275 passed · ❌ 0 failed · 🟡 9 flaky · ⏭️ 88 skipped

Shard	Passed	Flaky	Skipped
✅ Shard 1	301	0	4
🟡 Shard 2	805	1	9
🟡 Shard 3	806	2	8
🟡 Shard 4	842	1	12
✅ Shard 5	721	0	47
🟡 Shard 6	800	5	8

🟡 9 flaky test(s) (passed on retry)

Features/Glossary/GlossaryWorkflow.spec.ts › should start term as Draft when glossary has reviewers (shard 2, 2 retries)
Features/RTL.spec.ts › Verify Following widget functionality (shard 3, 1 retry)
Features/SettingsNavigationPage.spec.ts › should support drag and drop reordering of navigation items (shard 3, 1 retry)
Pages/DataContracts.spec.ts › Contract Status badge should be visible on condition if Contract Tab is present/hidden by Persona (shard 4, 1 retry)
Features/AutoPilot.spec.ts › Agents created by AutoPilot should be deleted (shard 6, 1 retry)
Pages/Glossary.spec.ts › Column dropdown drag-and-drop functionality for Glossary Terms table (shard 6, 1 retry)
Pages/Glossary.spec.ts › Glossary Term Update in Glossary Page should persist tree (shard 6, 1 retry)
Pages/Lineage/LineageFilters.spec.ts › Verify lineage schema filter selection (shard 6, 1 retry)
Pages/Lineage/PlatformLineage.spec.ts › Verify domain platform view (shard 6, 1 retry)

📦 Download artifacts

How to debug locally

# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip    # view trace

mohityadav766

LGTM

gitar-bot · 2026-06-10T07:42:20Z

Code Review ✅ Approved 2 resolved / 2 findings

Hardens cross-language search support by introducing behavioral integration tests and normalizing Japanese and multi-language index mappings. Obsolete contract tests and static fields were removed, resolving the hardcoded language list and build-time dependency findings.

✅ 2 resolved

✅ Bug: All ITs now depend on external infinilabs.com IK download at build time

📄 openmetadata-integration-tests/src/test/java/org/openmetadata/it/server/SearchTestImages.java:25-39 📄 openmetadata-integration-tests/src/test/java/org/openmetadata/it/server/ContainerizedServer.java:226-229 📄 openmetadata-integration-tests/src/test/java/org/openmetadata/it/bootstrap/TestSuiteBootstrap.java:353-355
SearchTestImages.openSearchWithAnalysisPlugins builds the OpenSearch test image by running opensearch-plugin install <ikPluginUrl>, where the URL is https://release.infinilabs.com/analysis-ik/stable/opensearch-analysis-ik-<version>.zip derived from the image tag. This helper is now wired not only into the new SearchConsumerFieldBehaviorIT, but also into ContainerizedServer.newOpenSearch and TestSuiteBootstrap, so every integration test that starts OpenSearch now requires:

Network access to release.infinilabs.com at test time (the ImageFromDockerfile build runs the plugin install), and

A version-matched IK release existing under the stable/ path for the exact OpenSearch version being used (currently 3.4.0).

If infinilabs is unreachable, or the stable/ directory does not contain an artifact matching a future bumped OpenSearch version, the Docker image build fails and the entire IT suite cannot start its search container — a much larger blast radius than the search feature this PR targets. Previously these suites ran on the vanilla image with no external dependency.

Consider: (a) guarding the IK install so a download failure degrades gracefully (e.g. only the zh-analyzer tests skip) rather than aborting all ITs, (b) pinning/vendoring the IK artifact, or (c) only building the analysis-plugin image for the search test that needs it rather than for ContainerizedServer/TestSuiteBootstrap broadly.

✅ Quality: Contract test hardcodes language list instead of using the enum

📄 openmetadata-service/src/test/java/org/openmetadata/service/search/SearchConsumerFieldContractTest.java:41 📄 openmetadata-integration-tests/src/test/java/org/openmetadata/it/tests/SearchConsumerFieldBehaviorIT.java:69-72
SearchConsumerFieldContractTest hardcodes LANGUAGES = List.of("en", "jp", "ru", "zh"), while the companion SearchConsumerFieldBehaviorIT derives the same list from IndexMappingLanguage.values() (the schema-defined registry). If a new language is added to IndexMappingLanguage, the behavioral IT will exercise it automatically but the contract test silently will not, leaving the new language's mappings unguarded against the field-type/field-presence contract. Deriving the list from IndexMappingLanguage.values() here too keeps both guards in sync.

Options

Display: compact → Showing less information.

Comment with these commands to change:

`Compact`
`gitar display:verbose`

_{Was this helpful? React with 👍 / 👎 | Gitar}

Copilot AI review requested due to automatic review settings June 9, 2026 09:13

Copilot AI reviewed Jun 9, 2026

pmbrull mentioned this pull request Jun 9, 2026

feat(search): consumer-field contract test + index mapping health check #28751

Closed

13 tasks

github-actions Bot added Ingestion safe to test Add this label to run secure Github workflows on PRs labels Jun 9, 2026

gitar-bot Bot reviewed Jun 9, 2026

View reviewed changes

Comment thread openmetadata-integration-tests/src/test/java/org/openmetadata/it/server/SearchTestImages.java

gitar-bot Bot reviewed Jun 9, 2026

View reviewed changes

Comment thread ...a-service/src/test/java/org/openmetadata/service/search/SearchConsumerFieldContractTest.java Outdated

pmbrull had a problem deploying to test June 9, 2026 09:25 — with GitHub Actions Error

pmbrull and others added 2 commits June 9, 2026 12:20

Copilot AI review requested due to automatic review settings June 9, 2026 10:30

Copilot AI reviewed Jun 9, 2026

pmbrull had a problem deploying to test June 9, 2026 10:41 — with GitHub Actions Error

Merge branch 'main' into pmbrull/search-multilang-mappings

dec5a68

pmbrull had a problem deploying to test June 9, 2026 11:26 — with GitHub Actions Error

pmbrull temporarily deployed to test June 9, 2026 11:26 — with GitHub Actions Inactive

Merge branch 'main' into pmbrull/search-multilang-mappings

e937c5a

Copilot AI review requested due to automatic review settings June 9, 2026 12:24

Copilot AI reviewed Jun 9, 2026

pmbrull temporarily deployed to test June 9, 2026 12:36 — with GitHub Actions Inactive

mohityadav766 approved these changes Jun 10, 2026

View reviewed changes

pmbrull merged commit 3829c30 into main Jun 10, 2026
40 of 44 checks passed

pmbrull deleted the pmbrull/search-multilang-mappings branch June 10, 2026 07:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(search): per-language index mappings + multi-language search ITs#28857

test(search): per-language index mappings + multi-language search ITs#28857
pmbrull merged 5 commits into
mainfrom
pmbrull/search-multilang-mappings

pmbrull commented Jun 9, 2026 •

edited by gitar-bot Bot

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

github-actions Bot commented Jun 9, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

github-actions Bot commented Jun 9, 2026

Uh oh!

mohityadav766 left a comment

Uh oh!

Uh oh!

gitar-bot Bot commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

pmbrull commented Jun 9, 2026 • edited by gitar-bot Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test Plan

Summary by Gitar

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ PR checks passed

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Jun 9, 2026

🟡 Playwright Results — all passed (9 flaky)

Uh oh!

mohityadav766 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gitar-bot Bot commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

pmbrull commented Jun 9, 2026 •

edited by gitar-bot Bot

Loading

github-actions Bot commented Jun 9, 2026 •

edited

Loading