[FLINK-39079][Web Frontend] Add Diagnosis Advisor to Flink Web UI#27772
Open
featzhang wants to merge 5 commits intoapache:masterfrom
Open
[FLINK-39079][Web Frontend] Add Diagnosis Advisor to Flink Web UI#27772featzhang wants to merge 5 commits intoapache:masterfrom
featzhang wants to merge 5 commits intoapache:masterfrom
Conversation
Collaborator
3895a67 to
46d9e73
Compare
This commit introduces a Top N Metrics Dashboard to the Flink Web UI, providing visibility into resource-intensive components: - Top N CPU Consumers: Identify tasks with highest CPU usage - Top N Backpressure Operators: Highlight operators experiencing backpressure - Top N GC Intensive Tasks: Show tasks with highest GC overhead The implementation includes: - REST API endpoint: /jobs/:jobid/metrics/top-n - Response body with three metric categories - Angular components for displaying metrics - Demo page showcasing the feature This feature helps operators quickly identify performance bottlenecks and optimize job execution.
This commit introduces the Diagnosis Advisor feature to the Flink Web UI, providing automated diagnostic suggestions based on job metrics analysis. The Diagnosis Advisor analyzes multiple metric categories: - CPU usage metrics - Memory consumption (heap usage) - Garbage collection activity - Backpressure ratios It provides intelligent recommendations for common scenarios: - High CPU + High Memory: Suggests GC-related issues - High CPU + Normal Memory: Indicates heavy computation - Low CPU + High Backpressure: Points to I/O bottlenecks - Excessive GC count: Flags performance concerns The implementation includes: - REST API endpoint: /jobs/:jobid/diagnosis - DiagnosisHandler with rule-based diagnostic engine - Angular components for displaying diagnostic suggestions - Demo page showcasing various diagnostic scenarios This feature significantly improves the diagnostic experience for both new and experienced users by automating the analysis of complex metric correlations and providing actionable recommendations.
- Replace 'any' types with 'unknown' for type safety - Fix all Prettier formatting issues - Remove unused imports
… handlers - Refactor DiagnosisHandler to use AbstractRestHandler instead of AbstractJobHandler - Update metric names to match Flink 2.3 naming conventions - Add missing DiagnosisMessageParameters class - Fix imports and code formatting - Rebase onto latest master branch
de1f2c1 to
1ab8148
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Purpose
This PR adds the Diagnosis Advisor feature to the Flink Web UI, providing automated diagnostic suggestions based on job metrics analysis. Users often face difficulty diagnosing performance issues like high CPU usage, memory leaks, and backpressure. The current Flink UI provides raw metrics but lacks intelligent diagnostic suggestions.
The Diagnosis Advisor analyzes multiple metric categories and provides actionable recommendations for common performance scenarios.
Change log
Backend Changes:
DiagnosisHandler.java- REST API handler for diagnostic analysisDiagnosisResponseBody.java- Response body with diagnostic suggestionsDiagnosisHeaders.java- REST endpoint definitionFrontend Changes:
DiagnosisService.ts- Service to fetch diagnostic suggestionsdiagnosis.ts- TypeScript interface definitionsDiagnosisComponent- Angular component with HTML template and Less stylesDiagnosisDemoComponent- Demo page showcasing scenariosNew REST API:
GET /jobs/:jobid/diagnosis- Returns automated diagnostic suggestionsDiagnostic Rules Implemented:
Verifying
mvn clean install -DskipTestscd flink-runtime-web/web-dashboard && npm install && npm run build/diagnosis-demoto see the Diagnosis Advisor/jobs/{jobId}/diagnosisImpact
Scope:
Performance:
Compatibility:
Documentation
Documentation updates needed for: