test(search): per-language index mappings + multi-language search ITs#28857
Conversation
- SearchConsumerFields: single source of truth for the denormalized search-index fields product features beyond Explore depend on (RBAC, Data Quality, Incident Manager, Lineage, Data Insights), guarded by a CI contract test (SearchConsumerFieldContractTest) that fails when a shipped mapping file renames/retypes/drops one, over every entity x 4 languages. - SearchConsumerFieldBehaviorIT: per-language behavioral IT that indexes a real document into each language's real analyzers (kuromoji/ik) and runs the actual search/filter/aggregation each consumer relies on, reporting a failure as a broken feature in a specific language. - Fix jp/ru/zh index mappings: define missing jp analyzers, align jp/ru/zh keyword normalizers to en (case-insensitive), and restore top-level domains on jp/topic (which had lost domain-based RBAC and DQ/Incident filtering). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
✅ PR checks passedThe linked issue has a description and all required Shipping project fields set. Thanks! |
The analysis-plugin OpenSearch image (kuromoji + third-party IK from release.infinilabs.com) was wired into ContainerizedServer and TestSuiteBootstrap, so every OpenSearch IT built it at test time. Those suites are pinned to the EN mappings and never reference the jp/zh analyzers, so the install added a suite-wide build-time dependency on infinilabs.com -- and a version-skew failure on any OpenSearch bump where no matching IK release exists -- for zero coverage gain. Revert both call sites to the vanilla image. All non-English coverage lives in SearchConsumerFieldBehaviorIT, which builds its own plugin image, so language drift is still caught while the rest of the IT suite keeps no external dependency. Also derive SearchConsumerFieldContractTest's language list from IndexMappingLanguage.values() instead of a hardcoded list, matching the behavioral IT so a newly added language is guarded by both. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
SearchConsumerFields was a hand-maintained, test-only registry mirroring the denormalized fields, consumed solely by SearchConsumerFieldContractTest which statically re-checked the shipped mapping JSON. The behavioral guard SearchConsumerFieldBehaviorIT already exercises these fields with real RBAC/Data Quality/Incident queries across every mapping language, so the static mirror added upkeep without enforcing completeness. Remove both and rely on the behavioral IT. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
🟡 Playwright Results — all passed (9 flaky)✅ 4275 passed · ❌ 0 failed · 🟡 9 flaky · ⏭️ 88 skipped
🟡 9 flaky test(s) (passed on retry)
How to debug locally# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip # view trace |
Code Review ✅ Approved 2 resolved / 2 findingsHardens cross-language search support by introducing behavioral integration tests and normalizing Japanese and multi-language index mappings. Obsolete contract tests and static fields were removed, resolving the hardcoded language list and build-time dependency findings. ✅ 2 resolved✅ Bug: All ITs now depend on external infinilabs.com IK download at build time
✅ Quality: Contract test hardcodes language list instead of using the enum
OptionsDisplay: compact → Showing less information. Comment with these commands to change:
Was this helpful? React with 👍 / 👎 | Gitar |
fix #28849
Summary
Hardens search across the four supported index-mapping languages (en / jp / ru / zh).
SearchConsumerFields— single source of truth for the denormalized search-index fields product features beyond Explore depend on (RBAC, Data Quality, Incident Manager, Lineage, Data Insights).SearchConsumerFieldContractTest(CI) — over every entity mapping × 4 languages, asserts the denormalized leaf fields keep theirkeywordtype wherever present and core data-asset indexes expose the full set; failures name the affected consumer.SearchConsumerFieldBehaviorIT— per-language behavioral IT: indexes a real document into each language's real analyzers (kuromoji for jp, ik for zh) and runs the actual search / filter / aggregation each consumer relies on, reporting a failure as a broken feature in a specific language (e.g. "Domain filter is broken in [jp]").domainsonjp/topic, which had lost domain-based RBAC and DQ/Incident filtering (caught by the contract test).Test Plan
SearchConsumerFieldContractTest(2) — green;openmetadata-integration-teststest-compiles;mvn -pl openmetadata-service spotless:checkclean. (Buildopenmetadata-specfirst so the contract test reads the current mappings.)SearchConsumerFieldBehaviorIT(needs the language-analysis OpenSearch image / Docker).🤖 Generated with Claude Code
Summary by Gitar
SearchConsumerFields.javaandSearchConsumerFieldContractTest.javain favor of the new behavioral integration test suite.om_analyzer_jpwithkuromojifilters to support Japanese text analysis.tagFQNandpipelineTypefields across multiple language mappings to includelowercase_normalizer.This will update automatically on new commits.