Skip to content

[deploy] 0.7 alpha 2#258

Merged
Chenglong-MS merged 421 commits into
mainfrom
dev
May 12, 2026
Merged

[deploy] 0.7 alpha 2#258
Chenglong-MS merged 421 commits into
mainfrom
dev

Conversation

@Chenglong-MS
Copy link
Copy Markdown
Collaborator

@Chenglong-MS Chenglong-MS commented Mar 18, 2026

PR Summary

Agents & AI Pipeline

  • Unified data agents: Consolidated agent_py_data_rec, agent_sql_data_rec, agent_py_data_transform, agent_sql_data_transform, agent_concept_derive, agent_py_concept_derive, agent_data_clean, and agent_exploration into three unified agents: data_agent.py, agent_data_rec.py, and agent_data_transform.py
  • Semantic type system: New semantic_types.py backend module and full frontend type registry (src/lib/agents-chart/core/type-registry.ts, field-semantics.ts, semantic-types.ts) with domain shape inference, tick constraints, zero-baseline classification, and snap-to-bound heuristics
  • Chart insight agent: New agent_chart_insight.py for AI-generated chart takeaways
  • Language agent: New agent_language.py for i18n-aware prompts
  • Diagnostics agent: New agent_diagnostics.py with unified diagnostic information builder for better error reporting
  • Improved agent robustness: Better handling of missing output blocks, output variable detection, multimodal fallback for text-only models

Visualization

  • Agents-chart library: Complete new chart rendering library (src/lib/agents-chart/, 120 files, ~44K lines) with multi-backend support for Vega-Lite, ECharts, Chart.js, and GoFish — includes template system, semantic-aware axis/domain/tick handling, color decisions, layout computation, faceting, and overflow filtering
  • Chart gallery: New ChartGallery.tsx with expanded chart type support including pie, US map, world map, bump, candlestick, density, lollipop, pyramid, radar, rose, streamgraph, strip plot, waterfall, and more
  • Chart render service: New ChartRenderService.tsx replacing static SVG rendering with vega-embed for interactive charts
  • Insight panel redesign: Insight takeaways now display as styled cards (matching concept explanation style) with 2-column grid layout instead of bullet lists
  • Chart recommendations: New SimpleChartRecBox.tsx and chartRecommendation.ts for improved chart suggestion workflow
  • Score tick fix: Score type with small domain spans (e.g., [0,1]) no longer forces integer-only ticks, preserving intermediate decimal ticks

Data Thread & Workflow

  • Hybrid thread redesign: Unified data thread with reports integrated into threads (DataThread.tsx rewrite, new DataThreadCards.tsx, InteractionEntryCard.tsx)
  • Unified formulate data hook: New useFormulateData.ts consolidating data derivation logic
  • Report editor: New Tiptap-based report editor (TiptapReportEditor.tsx) with richer editing support

Data Loading & Management

  • Unified upload dialog: New UnifiedDataUploadDialog.tsx replacing the old table selection view — supports file upload, URL, paste, database, and sample datasets in a single dialog with loading state indicators
  • Multi-table preview: New MultiTablePreview.tsx for previewing multiple tables before loading
  • Unified table loading thunk: New tableThunks.ts handling all data source types with server-side workspace storage
  • Live data & refresh: New useDataRefresh.tsx with auto-refresh, stream data sources, and RefreshDataDialog.tsx
  • Virtual table sorting: Server-side sorting now returns original row IDs (#rowId) via ROW_NUMBER() in DuckDB and pandas paths, preserving original row positions after sort

Data Loaders (Database Plugins)

  • New data loaders: Added Athena, BigQuery, and MongoDB data loaders
  • Enhanced existing loaders: Improved MySQL, PostgreSQL, MSSQL, S3, Azure Blob, and Kusto loaders with better error handling, connection cleanup, and password sanitization

Datalake / Workspace Backend

  • New workspace system: Complete datalake/ package with workspace.py, azure_blob_workspace.py, cached_azure_blob_workspace.py, file_manager.py, metadata.py, cache_manager.py, parquet_utils.py, and table_names.py
  • Workspace factory: New workspace_factory.py for configuration-driven workspace initialization
  • Session management: New session_routes.py for session-level API endpoints
  • Unicode & encoding: Support for Unicode filenames, path traversal checks, safe filename processing, UTF-8/GBK encoding detection
  • Atomic metadata updates: Prevent lost updates in concurrent scenarios

Security

  • Code signing: New code_signing.py for generated code integrity verification
  • Auth module: New auth.py for authentication handling
  • URL allowlist: New url_allowlist.py for URL validation
  • Error sanitization: New sanitize.py to prevent leaking sensitive info in error messages
  • Sandbox system: New sandbox/ package with local_sandbox.py, docker_sandbox.py, not_a_sandbox.py, and Dockerfile.sandbox replacing the old py_sandbox.py
  • Identity management: New identity.ts with browser-based identity for multi-user support

Internationalization (i18n)

  • Full i18n framework: Added react-i18next with English and Chinese locale files across 7 namespaces (common, chart, encoding, messages, model, navigation, upload)
  • Translation guide: Comprehensive TRANSLATION_GUIDE.md for contributors

UI & Design System

  • Design tokens: New tokens.ts with centralized color, spacing, shadow, transition, and radius tokens
  • Canvas redesign: Refactored DataFormulator.tsx and App.tsx with TopNavButton, AppShell navigation, and model management UI
  • Encoding shelf updates: Reworked EncodingShelfCard.tsx and EncodingShelfThread.tsx
  • Removed legacy components: Deleted ConceptCard.tsx, ConceptShelf.tsx, DerivedDataDialog.tsx

Model Management

  • Server-side global models: New model_registry.py for managing model configurations server-side
  • Model selection dialog: Enhanced ModelSelectionDialog.tsx with multi-model support

Infrastructure & DevOps

  • Docker support: New Dockerfile, docker-compose.yml, docker-compose.test.yml with volume permissions and sandbox user handling
  • Updated dev container: Refreshed .devcontainer/devcontainer.json
  • Dependency management: Migrated from npm to yarn, added uv.lock, updated pyproject.toml and requirements.txt

Testing

  • Comprehensive test suite: 69 new test files (~8K lines) covering backend unit, integration, contract, security, plugin, and frontend unit tests
  • Test infrastructure: New vitest.config.ts, pytest.ini, conftest.py, frontend setup, and test_plan.md
  • Database plugin tests: Docker-based test harnesses for MySQL, PostgreSQL, MongoDB, and BigQuery

Comment thread py-src/data_formulator/agent_routes.py Fixed
Comment thread py-src/data_formulator/agent_routes.py Fixed
Comment thread py-src/data_formulator/tables_routes.py Fixed
@Chenglong-MS Chenglong-MS requested a review from Copilot March 24, 2026 20:34
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot wasn't able to review this pull request because it exceeds the maximum number of lines (20,000). Try reducing the number of changed lines and requesting a review from Copilot again.

@Chenglong-MS Chenglong-MS requested a review from Copilot March 24, 2026 20:36
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot wasn't able to review this pull request because it exceeds the maximum number of lines (20,000). Try reducing the number of changed lines and requesting a review from Copilot again.

Comment thread py-src/data_formulator/routes/agents.py Fixed
Comment thread py-src/data_formulator/tables_routes.py Fixed
Comment thread py-src/data_formulator/tables_routes.py Fixed
Comment thread py-src/data_formulator/tables_routes.py Fixed
@Chenglong-MS Chenglong-MS requested a review from zhb-ai April 8, 2026 05:46
Comment thread py-src/data_formulator/session_routes.py Fixed
Comment thread py-src/data_formulator/session_routes.py Fixed
Comment thread py-src/data_formulator/routes/agents.py Fixed
Comment thread py-src/data_formulator/agent_routes.py Fixed
Comment thread py-src/data_formulator/agent_routes.py Fixed
zhb-y-agent and others added 17 commits April 11, 2026 03:05
Add Superset data source plugin with support for browsing and loading datasets via password or SSO login. Integrate OIDC authentication system with frontend login button and callback handling. Extend authentication regex to support OIDC's sub claim format. Add GitHub OAuth as an alternative authentication option. Introduce plugin system framework supporting automatic discovery and registration of frontend and backend plugins. Optimize internationalization files with Superset-related translations. Add test cases covering authentication and plugin functionality.

New files include:

- Superset plugin frontend components and API modules
- OIDC authentication configuration and callback pages
- GitHub OAuth gateway and authentication provider
- Plugin system base classes and registration logic
- Related test cases and internationalization resources
Modified files involve:

- Enhanced authentication logic
- Extended application configuration interfaces
- Dependency updates (pyproject.toml, package.json)
- Security-related regex adjustments
…plugin interface

Refactor Superset plugin interface with dashboard search and filtering capabilities
Migrate translation files from public directory to plugin local directory
Add plugin translation registration mechanism with support for dynamically loading plugin localization resources
Optimize table name generation logic with automatic suffix functionality
Improve filter dialog interactions with support for more operators and type checking
…plugin interface

Refactor Superset plugin interface with dashboard search and filtering capabilities
Migrate translation files from public directory to plugin local directory
Add plugin translation registration mechanism with support for dynamically loading plugin localization resources
Optimize table name generation logic with automatic suffix functionality
Improve filter dialog interactions with support for more operators and type checking
在数据源插件架构文档中新增第12节,详细说明插件如何自带翻译文件并通过框架自动合并的方案。该方案解决插件翻译与宿主项目混杂的问题,保持插件自包含性,同时支持多语言自动切换。
…structions

Add detailed documentation about plugin self-contained translations, including directory structure, JSON format, export mechanism, and merge rules
Update section numbering to reflect new content and provide a quick reference for plugin translations
Add oauth2 as an alias for oidc to avoid confusion, and update authentication configuration documentation in .env.template with options for OIDC, GitHub, and Azure authentication
添加对接OIDC/OAuth2身份提供者的详细文档说明,包括环境变量配置、IdP必须满足的条件、Discovery端点最小JSON格式以及JWT Access Token的claim要求。同时更新配置说明,明确oidc和oauth2是等价别名。
… Superset plugin

- Add password show/hide toggle button in login form
- Add Chinese and English description text for Superset plugin
- Update data upload dialog to use plugin description text
… UserInfo validation fallback

Support two OIDC/OAuth2 configuration modes:

1. Auto-discovery mode (standard OIDC IdP)
2. Manual endpoint mode (OAuth2 servers without Discovery)
New token validation strategies:
前端检测到匿名用户首次登录时,自动触发迁移流程并提供数据导入选项。后端实现安全的数据复制机制,确保迁移过程幂等且不删除源数据。同时添加必要的安全约束,防止非法迁移请求。
Implement workspace data migration from anonymous browser identity to authenticated user

- Add migration dialog component and internationalization text
- Extend workspace manager to support copying workspaces
- Add /sessions/migrate API endpoint
- Detect identity change after login and prompt for migration
…ndpoint

Add local logout handling logic to perform local cleanup and redirect when IdP does not provide end_session_endpoint
… authenticated user migration

Add local storage flag to track migration status, ensuring migration is only performed once per user
…ge management

- Change anonymous workspace migration from copy to move operation, with merge functionality added
- Add cleanup anonymous workspace API endpoint
- Persist identity type and browser ID in local storage
- Modify identity migration dialog logic to use the new cleanup API
- Fix local storage state inconsistency after migration
…and cleanup

fix(superset): Enhance SSO login popup handling and add documentation

refactor(workspace): Change migration operation to copy-then-delete pattern

docs: Add Superset SSO bridge configuration guide documentation

test: Add test cases for workspace locking scenarios
Add new internationalization text and logic handling to support IdP-initiated SSO login flow. When users directly redirect from the SSO system, automatically re-initiate the standard SP flow and display corresponding waiting prompt messages.
Copy link
Copy Markdown
Contributor

@github-advanced-security github-advanced-security AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CodeQL found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.

…ty configuration

Add silent refresh capability for expired tokens in OIDC authentication, along with enhanced Flask session security configuration
Add related test cases to verify token refresh and session configuration
Add offline_access to OIDC scopes to support refresh tokens
…a during anonymous user migration

test(IdentityMigrationDialog): Add unit tests for migration dialog

refactor(DataFormulator): Refactor workspace list display logic
zhb-y-agent and others added 24 commits May 7, 2026 23:11
…oader

Merge verbose_name, description, and expression into a single description field for improved catalog search, while preserving individual fields for consumers that need them separately.
- Enhanced the oauth_config.py example with detailed comments and a new staging handler for SSO integration.
- Updated the superset_config.py to include a TokenExchangeView for silent token exchange and improved security checks.
- Revised the SSO configuration guide to clarify the structure and requirements for SSO user information parsing and role mapping.
- Refactor data loading to utilize Superset's Chart Data API instead of SQL Lab API, simplifying permission requirements to only `datasource access`.
- Enhance documentation to clarify permission needs and security advantages of the new approach.
- Update related tests to validate the new data retrieval method and ensure compatibility with the Chart Data API.
- Remove obsolete SQL Lab related code and helper functions.
…ized errors

- Introduce `extractErrorMessage` function to handle various error types, including RTK serialized errors, ensuring proper message extraction.
- Update error handling in `DataSourceSidebar`, `DataThread`, and `DBTableManager` components to utilize the new extraction method, preventing `[object Object]` outputs.
- Add tests for `extractErrorMessage` to validate behavior with different error shapes, including ApiRequestError and plain Error instances.
- Refactor existing error handling logic to improve clarity and maintainability.
…ersetLoader

- Introduce functionality to detect and convert epoch-ms temporal columns to appropriate Arrow date/timestamp types.
dataloader 没有统一每个实现类最大可加载的行
@Chenglong-MS Chenglong-MS requested a review from Mestway May 12, 2026 07:20
@Chenglong-MS Chenglong-MS changed the title 0.7 alpha 2 [deploy] 0.7 alpha 2 May 12, 2026
Copy link
Copy Markdown
Collaborator

@Mestway Mestway left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be good

@Chenglong-MS Chenglong-MS merged commit 410996a into main May 12, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants