Skip to content

test(search): per-language index mappings + multi-language search ITs#28857

Merged
pmbrull merged 5 commits into
mainfrom
pmbrull/search-multilang-mappings
Jun 10, 2026
Merged

test(search): per-language index mappings + multi-language search ITs#28857
pmbrull merged 5 commits into
mainfrom
pmbrull/search-multilang-mappings

Conversation

@pmbrull

@pmbrull pmbrull commented Jun 9, 2026

Copy link
Copy Markdown
Member

fix #28849

Summary

Hardens search across the four supported index-mapping languages (en / jp / ru / zh).

  • SearchConsumerFields — single source of truth for the denormalized search-index fields product features beyond Explore depend on (RBAC, Data Quality, Incident Manager, Lineage, Data Insights).
  • SearchConsumerFieldContractTest (CI) — over every entity mapping × 4 languages, asserts the denormalized leaf fields keep their keyword type wherever present and core data-asset indexes expose the full set; failures name the affected consumer.
  • SearchConsumerFieldBehaviorIT — per-language behavioral IT: indexes a real document into each language's real analyzers (kuromoji for jp, ik for zh) and runs the actual search / filter / aggregation each consumer relies on, reporting a failure as a broken feature in a specific language (e.g. "Domain filter is broken in [jp]").
  • Mapping fixes — define missing jp analyzers, align jp/ru/zh keyword normalizers to en (case-insensitive), and restore top-level domains on jp/topic, which had lost domain-based RBAC and DQ/Incident filtering (caught by the contract test).

Test Plan

  • SearchConsumerFieldContractTest (2) — green; openmetadata-integration-tests test-compiles; mvn -pl openmetadata-service spotless:check clean. (Build openmetadata-spec first so the contract test reads the current mappings.)
  • CI runs SearchConsumerFieldBehaviorIT (needs the language-analysis OpenSearch image / Docker).

🤖 Generated with Claude Code


Summary by Gitar

  • Cleanup:
    • Removed SearchConsumerFields.java and SearchConsumerFieldContractTest.java in favor of the new behavioral integration test suite.
  • Search Mapping Improvements:
    • Added om_analyzer_jp with kuromoji filters to support Japanese text analysis.
    • Updated tagFQN and pipelineType fields across multiple language mappings to include lowercase_normalizer.

This will update automatically on new commits.

- SearchConsumerFields: single source of truth for the denormalized
  search-index fields product features beyond Explore depend on (RBAC, Data
  Quality, Incident Manager, Lineage, Data Insights), guarded by a CI
  contract test (SearchConsumerFieldContractTest) that fails when a shipped
  mapping file renames/retypes/drops one, over every entity x 4 languages.
- SearchConsumerFieldBehaviorIT: per-language behavioral IT that indexes a
  real document into each language's real analyzers (kuromoji/ik) and runs
  the actual search/filter/aggregation each consumer relies on, reporting a
  failure as a broken feature in a specific language.
- Fix jp/ru/zh index mappings: define missing jp analyzers, align jp/ru/zh
  keyword normalizers to en (case-insensitive), and restore top-level
  domains on jp/topic (which had lost domain-based RBAC and DQ/Incident
  filtering).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings June 9, 2026 09:13

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot was unable to review this pull request because the user who requested the review has reached their quota limit.

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

✅ PR checks passed

The linked issue has a description and all required Shipping project fields set. Thanks!

@github-actions github-actions Bot added Ingestion safe to test Add this label to run secure Github workflows on PRs labels Jun 9, 2026
pmbrull and others added 2 commits June 9, 2026 12:20
The analysis-plugin OpenSearch image (kuromoji + third-party IK from
release.infinilabs.com) was wired into ContainerizedServer and
TestSuiteBootstrap, so every OpenSearch IT built it at test time. Those
suites are pinned to the EN mappings and never reference the jp/zh
analyzers, so the install added a suite-wide build-time dependency on
infinilabs.com -- and a version-skew failure on any OpenSearch bump where
no matching IK release exists -- for zero coverage gain.

Revert both call sites to the vanilla image. All non-English coverage
lives in SearchConsumerFieldBehaviorIT, which builds its own plugin image,
so language drift is still caught while the rest of the IT suite keeps no
external dependency.

Also derive SearchConsumerFieldContractTest's language list from
IndexMappingLanguage.values() instead of a hardcoded list, matching the
behavioral IT so a newly added language is guarded by both.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
SearchConsumerFields was a hand-maintained, test-only registry mirroring
the denormalized fields, consumed solely by SearchConsumerFieldContractTest
which statically re-checked the shipped mapping JSON. The behavioral guard
SearchConsumerFieldBehaviorIT already exercises these fields with real
RBAC/Data Quality/Incident queries across every mapping language, so the
static mirror added upkeep without enforcing completeness. Remove both and
rely on the behavioral IT.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings June 9, 2026 10:30

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot was unable to review this pull request because the user who requested the review has reached their quota limit.

Copilot AI review requested due to automatic review settings June 9, 2026 12:24

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot was unable to review this pull request because the user who requested the review has reached their quota limit.

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

🟡 Playwright Results — all passed (9 flaky)

✅ 4275 passed · ❌ 0 failed · 🟡 9 flaky · ⏭️ 88 skipped

Shard Passed Failed Flaky Skipped
✅ Shard 1 301 0 0 4
🟡 Shard 2 805 0 1 9
🟡 Shard 3 806 0 2 8
🟡 Shard 4 842 0 1 12
✅ Shard 5 721 0 0 47
🟡 Shard 6 800 0 5 8
🟡 9 flaky test(s) (passed on retry)
  • Features/Glossary/GlossaryWorkflow.spec.ts › should start term as Draft when glossary has reviewers (shard 2, 2 retries)
  • Features/RTL.spec.ts › Verify Following widget functionality (shard 3, 1 retry)
  • Features/SettingsNavigationPage.spec.ts › should support drag and drop reordering of navigation items (shard 3, 1 retry)
  • Pages/DataContracts.spec.ts › Contract Status badge should be visible on condition if Contract Tab is present/hidden by Persona (shard 4, 1 retry)
  • Features/AutoPilot.spec.ts › Agents created by AutoPilot should be deleted (shard 6, 1 retry)
  • Pages/Glossary.spec.ts › Column dropdown drag-and-drop functionality for Glossary Terms table (shard 6, 1 retry)
  • Pages/Glossary.spec.ts › Glossary Term Update in Glossary Page should persist tree (shard 6, 1 retry)
  • Pages/Lineage/LineageFilters.spec.ts › Verify lineage schema filter selection (shard 6, 1 retry)
  • Pages/Lineage/PlatformLineage.spec.ts › Verify domain platform view (shard 6, 1 retry)

📦 Download artifacts

How to debug locally
# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip    # view trace

@mohityadav766 mohityadav766 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@pmbrull pmbrull merged commit 3829c30 into main Jun 10, 2026
40 of 44 checks passed
@pmbrull pmbrull deleted the pmbrull/search-multilang-mappings branch June 10, 2026 07:41
@gitar-bot

gitar-bot Bot commented Jun 10, 2026

Copy link
Copy Markdown
Code Review ✅ Approved 2 resolved / 2 findings

Hardens cross-language search support by introducing behavioral integration tests and normalizing Japanese and multi-language index mappings. Obsolete contract tests and static fields were removed, resolving the hardcoded language list and build-time dependency findings.

✅ 2 resolved
Bug: All ITs now depend on external infinilabs.com IK download at build time

📄 openmetadata-integration-tests/src/test/java/org/openmetadata/it/server/SearchTestImages.java:25-39 📄 openmetadata-integration-tests/src/test/java/org/openmetadata/it/server/ContainerizedServer.java:226-229 📄 openmetadata-integration-tests/src/test/java/org/openmetadata/it/bootstrap/TestSuiteBootstrap.java:353-355
SearchTestImages.openSearchWithAnalysisPlugins builds the OpenSearch test image by running opensearch-plugin install <ikPluginUrl>, where the URL is https://release.infinilabs.com/analysis-ik/stable/opensearch-analysis-ik-<version>.zip derived from the image tag. This helper is now wired not only into the new SearchConsumerFieldBehaviorIT, but also into ContainerizedServer.newOpenSearch and TestSuiteBootstrap, so every integration test that starts OpenSearch now requires:

  1. Network access to release.infinilabs.com at test time (the ImageFromDockerfile build runs the plugin install), and
  2. A version-matched IK release existing under the stable/ path for the exact OpenSearch version being used (currently 3.4.0).

If infinilabs is unreachable, or the stable/ directory does not contain an artifact matching a future bumped OpenSearch version, the Docker image build fails and the entire IT suite cannot start its search container — a much larger blast radius than the search feature this PR targets. Previously these suites ran on the vanilla image with no external dependency.

Consider: (a) guarding the IK install so a download failure degrades gracefully (e.g. only the zh-analyzer tests skip) rather than aborting all ITs, (b) pinning/vendoring the IK artifact, or (c) only building the analysis-plugin image for the search test that needs it rather than for ContainerizedServer/TestSuiteBootstrap broadly.

Quality: Contract test hardcodes language list instead of using the enum

📄 openmetadata-service/src/test/java/org/openmetadata/service/search/SearchConsumerFieldContractTest.java:41 📄 openmetadata-integration-tests/src/test/java/org/openmetadata/it/tests/SearchConsumerFieldBehaviorIT.java:69-72
SearchConsumerFieldContractTest hardcodes LANGUAGES = List.of("en", "jp", "ru", "zh"), while the companion SearchConsumerFieldBehaviorIT derives the same list from IndexMappingLanguage.values() (the schema-defined registry). If a new language is added to IndexMappingLanguage, the behavioral IT will exercise it automatically but the contract test silently will not, leaving the new language's mappings unguarded against the field-type/field-presence contract. Deriving the list from IndexMappingLanguage.values() here too keeps both guards in sync.

Options

Display: compact → Showing less information.

Comment with these commands to change:

Compact
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Ingestion safe to test Add this label to run secure Github workflows on PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Search - Add index drift validation & multi-lang tests

3 participants