DeepFinance Enhancements #6

TaoShuchang · 2026-01-22T10:18:34Z

Description

This PR introduces several improvements to the multi-agent context tracking, adds service name prefix support, and optimizes the DeepFinance judge logic.

Changes

1. Multi-Agent Message Content Parsing Enhancement

File: ajet/context_tracker/multiagent_tracking.py

Added support for tool_result content block format (AgentScope format)
Improved handling of different content block types (text, tool_result, tool_use)
Enhanced error handling: now continues processing other items instead of skipping the entire message
Better robustness when parsing complex multi-agent message structures

2. Service Name Prefix Support

Files: ajet/launcher.py, ajet/utils/pty.py

Added --prefix CLI argument to support service name prefixing
Allows multiple instances of services to run with unique identifiers
Useful for multi-experiment environments and distributed training

3. Dependency Upgrade

File: pyproject.toml

Bumped agentscope from 1.0.7 to 1.0.8

4. DeepFinance Judge Optimization

File: tutorial/example_deep_finance/deep_finance_judge.py

Fixed tool_stats extraction logic: now correctly reads from log_metrics instead of metadata
Added penalty logging for better debugging and monitoring

5. Metric Helper Improvements

Files: ajet/utils/metric_helper/save_trajectory_as_json.py, ajet/utils/metric_helper/tool_metric_helper.py

Corrected trajectory save path from ctx_trackers to trajectory for better organization
Added tool call count metric (tool_error/{tool_name}/calls) for more comprehensive monitoring

6. DeepFinance Configuration Updates

Files: tutorial/example_deep_finance/deep_finance.sh, tutorial/example_deep_finance/deep_finance_template.yaml

Updated default experiment suffix naming
Integrated --prefix argument in training script

Testing

Multi-agent context tracking with various message formats
Service prefix functionality in distributed environments
DeepFinance training pipeline with updated judge logic
Metric collection and trajectory saving

Impact

Backward Compatible: All changes maintain backward compatibility
No Breaking Changes: Existing configurations and workflows continue to work

@Haoran

…uation functionality to the FinWorld task. - Added the ExampleAgentScopeLearnProtocol class to implement the AgentScope execution flow for multi-turn interactions. - Integrated semaphore control to manage the parallelism of environment calls, improving environment stepping performance. - Implemented a mechanism for detecting context overflows and quickly terminating during environment interactions to prevent blocking. - Added a finworld.yaml configuration file to define project training and rollout parameters. - Added the FinWorldJudgeByOpenJudge class, integrating multiple evaluators including RM Gallery and OpenJudge (@Haoran). - Implemented a mechanism for converting task output, asynchronous calls, and retrying to ensure evaluation stability. - Weight normalization manages the contributions of each evaluator, merging them to calculate the final reward and success determination.

* fix end of files * autoflake import fix * add mypy check

…dates - Renamed ExampleAgentScopeLearnProtocol to ExampleDeepResearchProtocol and modified the execute method signature. - Unified the parameter name of the model tuner to `tuner` and its related attribute references. - Optimized the multi-turn interaction step configuration, changing it to use `tuner.config.ajet.rollout.multi_turn.max_steps`. - Modified the context overflow judgment logic to prevent tool call blocking. - Updated the finworld.yaml configuration, replacing astune with ajet-related configurations, and adjusted the workflow protocol and environment parameters. - Modified the default environment variable values and log saving paths in finworld_judge.py. - Added and improved multi-machine and single-machine startup scripts, supporting dynamic generation of MCP configuration and environment variable loading. - Added the finworld_single.yaml template to adapt to single-machine training configurations. - Adjusted the key reference for multi-turn step configuration in ma_deepresearch.py, using the ajet configuration path.

…ipts and templates - Added bash startup scripts for multi-machine, multi-GPU training, supporting dynamic configuration generation and environment variable import. - Implemented training configuration file templates, supporting automatic injection of various weight parameters and model paths. - Adjusted the default request timeout of EnvClient from 30 seconds to 300 seconds to accommodate long training requests. - Added a new finworld example directory and related documentation, improving the example project structure.

…_tool_stats_from_cmts`

…uation configuration and scripts - Replaced model initialization in FinWorldJudgeByOpenJudge with the `_init_openjudge_model` method - Read Judge model parameters from the configuration file first, using environment variables as a fallback - Optimized RM Gallery initialization, using configuration-first logic, and improved exception stack trace printing - Cleaned up and removed the old `_init_model` singleton method and related code - Updated the example startup script `ajet_finworld.sh`, adding OPENJUDGE_LLM and RM_LLM configurations - Modified YAML templates and configuration files to unify the structure and field naming of Judge configuration items - Deleted the outdated `cc_rm4_res2cit2fai2_30b.sh` script - Adjusted the `env_service` startup path to improve environment activation compatibility - Adjusted script log output format and content to enhance the clarity of configuration parameter printing

- Added the jsonl_with_env_service type, which allows loading data from jsonl files while calling tools via env_service. - Extended ResourceKeeper to handle the creation and release logic of environment instances for jsonl_with_env_service. - Maintained the env_service type logic, calling create_instance to register instances and initializing them using init_messages from the jsonl file. - Added an example protocol, ExampleDeepResearchProtocol, to implement multi-turn interaction and environment call coordination. - Provided training scripts and YAML configuration templates for finworld, supporting the jsonl_with_env_service mode training environment. - Optimized scripts to support multi-node multi-GPU training, including environment variables and Ray cluster configuration.

… grading logic

…eering

…ipts

…cripts and configuration templates:

…of the metrics update logic - Modified the `update_metrics` function, adding a `prefix` parameter to distinguish between training and validation metrics. - Adjusted the data source for extracting `reward_stats` and `tool_stats`, migrating from `workflow_metadata` to `log_metrics`. - Added debug printing to output the `log_metrics` content and metric key names at key steps for easier troubleshooting. - Used the appropriate prefix when calling `update_metrics` in `trainer_verl.py`, and added multiple debug prints. - Modified `WorkflowOutput` to place `tool_stats` and `reward_stats` into the `log_metrics` field. - Removed redundant and deprecated code for extracting `reward_stats` and calculation functions. - Added debug information output to the `finworld` and `finworld_judge` modules to track log metrics and scoring data.

- Removed debug print statements before and after the `update_metrics` call in `trainer_verl.py` - Removed debug print statements related to the `log_metrics` key in `finworld.py` - Removed debug print statements before updating `metadata_stats` in `finworld_judge.py` - Added logic in `general_runner.py` to synchronize `reward_stats` from `metadata` to `log_metrics` after the judge calculation - Cleaned up debug print statements within `update_metrics` in `metric_helper`, improving code readability.

feat(tutorial): Added FinWorld multi-machine multi-GPU training startup script

…g configuration and startup processes.

…ations - Switched the example directory from example_finworld to example_deep_finance - Modified startup parameters and logic to support deep_finance, replacing the finworld option - Replaced finworld_reader with deep_finance_reader in the task reader - Adjusted environment client configuration in resource management, using deep_finance instead of finworld-related checks - Updated reward metric tool documentation to support deep_finance - Deleted finworld-related configuration files, scripts, code, and evaluation modules, cleaning up leftover files and scripts - Replaced the keyword "finworld" with "deep_finance" in comments and logs

… references - Replace all "finworld" and "deep_finance" names with the unified "deepfinance" format. - Modify command-line arguments to `--with-deepfinance` for consistency. - Adjust the class name in `task_reader` from `deep_financeReader` to `DeepFinanceReader`. - Update the documentation description and file name of the `metric_helper` module to DeepFinance. - Modify environment variables and configuration paths in the example script `deep_finance.sh` to use the `DEEPFINANCE` prefix. - Update `judge_protocol` to `DeepFinanceJudgeByOpenJudge` in the `deep_finance.yaml` configuration. - Refactor the `FinWorldJudgeByOpenJudge` class in `deep_finance_judge.py` to `DeepFinanceJudgeByOpenJudge`. - Rename the `FinworldReader` class in `deep_finance_reader.py` to `DeepFinanceReader`. - Modify the debug log identifier and corresponding environment variable name to `DEEPFINANCE_DEBUG`. - Update the evaluation protocol in the `deep_finance_template.yaml` template to `DeepFinanceJudgeByOpenJudge`. - Ensure that internal references and comments in all modules are updated to use DeepFinance and deepfinance-related names.

…on file paths

…ariable settings

…urning environment state - Corrected the `env_output` return value structure in `BaseGymEnv` to ensure correct assignment of `reward` and `info` fields. - Removed `RefJudge` and `StructureJudge` related metric calculations and statistics from `reward_metric_helper`. - Cleaned up redundant code in `reward_metric_helper`, removing invalid comments and statistical items. - Modified `save_trajectory_as_json` to always print trajectory saving confirmation information. - Corrected log comments in `example_deep_finance` to avoid meaningless log output. - Added the `save_trajectory_as_json_file` configuration item to `deep_finance_template.yaml` to support trajectory saving functionality.

… files - Added a new ignore rule for config file paths in .gitignore - Deleted the automatically generated mcp_finance_tool_generated.json file in example_deep_finance - Refactored the deep_finance.yaml configuration file, adjusting project and experiment names - Reorganized Judge configuration, clarifying openjudge_llm and rm_llm models - Optimized model paths and training parameter configurations, adding parallel and batch processing settings - Adjusted data reading methods and training/validation set path placeholders - Reduced GPU memory usage ratio for rollout to 0.8 - Updated the default save directory path for the trainer to a placeholder variable - Cleaned up unused and commented-out code to improve configuration file conciseness

- Corrected the data source field for timeline data used during trajectory saving. - Removed redundant fields in tool execution time, cache hit rate, and error rate statistics. - Updated .gitignore to add ignore rules for the example script directory. - Removed unnecessary debugging information from logs to reduce log noise. - Adjusted log printing in the multi-round interaction execution process to simplify output content. - Streamlined log code for environment observation and termination checks to improve code readability.

- 在 launcher 中添加 --prefix 参数支持 - 在 pty_launch 函数中实现前缀逻辑 - 更新 deep_finance.sh 脚本以使用前缀功能 - 允许在同一环境中运行多个服务实例

- 支持 tool_result 格式的消息内容块 - 改进非文本内容的处理逻辑，继续处理其他项而非跳过整个消息 - 添加 tool_use 类型的处理（跳过，因为已通过 tool_calls 字段处理） - 优化代码结构和注释，提高可读性

- 修复 tool_stats 提取逻辑，从 log_metrics 中正确获取数据 - 添加惩罚项调试信息输出 - 启用 tool calls 功能（force_disable_toolcalls: False） - 确保奖励计算准确性

…tric - Change trajectory save directory from "ctx_trackers" to "trajectory" to organize files better - Add recording of tool call counts alongside error rates in tool metrics - Update experiment suffix in deep finance example script for clearer naming convention

gemini-code-assist · 2026-01-22T10:18:54Z

Summary of Changes

Hello @TaoShuchang, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the DeepFinance framework by improving its multi-agent context tracking capabilities, introducing a flexible service name prefixing mechanism for better experiment management, and optimizing the DeepFinance judge's logic for more accurate tool statistics and penalty calculations. Additionally, it refines metric collection and updates core dependencies, all while maintaining backward compatibility.

Highlights

Multi-Agent Context Tracking: Enhanced the multi-agent message content parsing to support tool_result content blocks (AgentScope format) and improved robustness by continuing to process other items even if some content is non-text or unrecognized, rather than skipping the entire message.
Service Name Prefix Support: Introduced a new --prefix CLI argument for service name prefixing, enabling multiple instances of services to run with unique identifiers, which is beneficial for multi-experiment and distributed training environments.
DeepFinance Judge Optimization: Corrected the tool_stats extraction logic in the DeepFinance judge to read from log_metrics instead of metadata and added logging for applied penalties to improve debugging and monitoring.
Metric Helper Improvements: Refactored trajectory saving to use a more organized trajectory path and added a new tool_error/{tool_name}/calls metric for more comprehensive monitoring of tool usage.
Dependency Upgrade: Updated the agentscope dependency from version 1.0.7 to 1.0.8.
DeepFinance Configuration Updates: Adjusted the default experiment suffix naming in the DeepFinance training script and integrated the new --prefix argument. Also, force_disable_toolcalls was set to False in the DeepFinance YAML template.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces several valuable enhancements across the DeepFinance project. Key improvements include more robust multi-agent message content parsing, the addition of service name prefixing for better instance management, and an essential fix in the DeepFinance judge logic for tool_stats extraction. The dependency upgrade and metric helper improvements further contribute to the system's stability and observability. Overall, the changes are well-implemented and align with the described objectives, improving both functionality and maintainability.

gemini-code-assist · 2026-01-22T10:20:22Z

tutorial/example_deep_finance/deep_finance_judge.py

+            tool_stats = workflow_output.log_metrics.get("tool_stats", {})
+            tool_calls = tool_stats.get("total_calls", 0)


The fix to extract tool_stats from workflow_output.log_metrics instead of metadata is crucial for the correctness of the DeepFinance judge. This ensures that the penalty calculation is based on accurate tool usage statistics.

gemini-code-assist · 2026-01-22T10:20:22Z

ajet/context_tracker/multiagent_tracking.py

+            # Handle text content block
+            if "text" in item:
+                if isinstance(item["text"], str):
+                    str_content += item["text"]
+            # Handle tool_result content block (AgentScope format)
+            elif item_type == "tool_result" and "output" in item:
+                output = item["output"]
+                if isinstance(output, str):
+                    str_content += output
+                else:
+                    str_content += str(output)


The updated logic to handle tool_result content blocks and convert their output to a string, even if it's not initially a string, is a good improvement. This makes the content extraction more robust and prevents potential data loss or type errors when processing diverse tool outputs.

gemini-code-assist · 2026-01-22T10:20:22Z

ajet/context_tracker/multiagent_tracking.py

                logger.warning(
-                    f"Non-text content in message content detected: {item}. Ignoring."
+                    f"Non-text content in message content detected: {item}. Ignoring this item."
                )
-                should_skip_message = True
-                return str_content, should_skip_message
-
-            if isinstance(item["text"], str):
-                str_content += str(item["text"])
-            else:
-                str_content = ""
+                # Continue processing other items instead of skipping the entire message
+                continue


Changing the error handling from immediately returning to logging a warning and continuing to process other items in the message content is a significant improvement. This makes the system more resilient to malformed or unexpected content blocks, ensuring that valid parts of a message are still processed instead of skipping the entire message.

gemini-code-assist · 2026-01-22T10:20:22Z

ajet/launcher.py

        default=False,
        help="Kill system processes (ray + vllm + python) that may block the current experiment",
    )
+    parser.add_argument("--prefix", type=str, default="", required=False, help="Prefix for service names")


Adding the --prefix CLI argument is a good way to support unique identifiers for service instances, which is particularly useful in multi-experiment or distributed environments. This enhances the flexibility and scalability of the launcher.

gemini-code-assist · 2026-01-22T10:20:22Z

ajet/launcher.py


    if args.with_deepfinance:
-        pty_launch("deepfinance")
+        pty_launch("deepfinance", prefix=args.prefix)


Passing the prefix argument to pty_launch for the deepfinance service correctly integrates the new prefixing functionality. This ensures that DeepFinance instances can be uniquely identified when launched.

gemini-code-assist · 2026-01-22T10:20:22Z

ajet/utils/metric_helper/save_trajectory_as_json.py

        traj_save_dir = os.path.join(
            os.environ.get("BEST_LOGGER_PATH", "launcher_record"),
-            "ctx_trackers",
+            "trajectory",


Changing the trajectory save path from ctx_trackers to trajectory improves the organization of saved data. This makes the directory structure more intuitive and easier to navigate.

gemini-code-assist · 2026-01-22T10:20:22Z

ajet/utils/metric_helper/tool_metric_helper.py

        if calls > 0:
            error_rate = errors / calls * 100
            metrics[f"{prefix}tool_error/{tool_name}/error_rate"] = round(error_rate, 2)
+            metrics[f"{prefix}tool_error/{tool_name}/calls"] = calls


Adding metrics[f"{prefix}tool_error/{tool_name}/calls"] = calls provides a more comprehensive view of tool usage. Knowing the total number of calls alongside the error rate offers better insights into tool performance and potential issues.

gemini-code-assist · 2026-01-22T10:20:22Z

ajet/utils/pty.py

+    if prefix != "":
+        service_name = prefix + "_" + service_name  


The logic to prepend the prefix to the service_name when a prefix is provided is correctly implemented. This ensures that the service name used for launching is unique, supporting the multi-instance use case.

gemini-code-assist · 2026-01-22T10:20:22Z

tutorial/example_deep_finance/deep_finance_judge.py

+            if penalty < 0:
+                print(f"⚠️ Penalty applied: penalty={penalty}, tool_calls={tool_stats}")


Adding a print statement to log when a penalty is applied, along with the penalty value and tool call statistics, is excellent for debugging and monitoring. This provides immediate feedback on the reward calculation process.

gemini-code-assist · 2026-01-22T10:20:22Z

tutorial/example_deep_finance/yaml_template/deep_finance_template.yaml

    # ✨✨✨✨ 编写并选择Agent
    user_workflow: tutorial.example_deep_finance.deep_finance->ExampleDeepResearchProtocol
-    force_disable_toolcalls: True
+    force_disable_toolcalls: False


Changing force_disable_toolcalls to False enables tool calls within the DeepFinance workflow. This is a necessary configuration update to leverage the enhanced tool_stats extraction and penalty logic in the judge.

TaoShuchang and others added 30 commits January 16, 2026 14:53

Merge remote-tracking branch 'origin/main' into dev/shuchang

ba41164

Precommit fix (#4)

c7ca8c7

* fix end of files * autoflake import fix * add mypy check

fix test bench import

7f2b017

refactor(utils): Remove unused extract and compute functions `extract…

079e4bd

…_tool_stats_from_cmts`

feat(core): add finworld task reader support to framework

de81c1d

feat(finworld): implement specialized data reader and openjudge-based…

248acc4

… grading logic

refactor(finworld): optimize configuration templates and prompt engin…

9d651fd

…eering

chore(finworld): update launch scripts and add variant experiment scr…

7475ecc

…ipts

Merge remote-tracking branch 'origin/main' into dev/shuchang

b95d491

feat(finworld): Added support for multi-machine, multi-GPU training s…

f20ab91

…cripts and configuration templates:

chore(git): ignore finworld/yaml/*

ea87d4b

chore: "Stop tracking existing yaml files in tutorial directory"

0889483

fix(task_runner): Synchronize reward_stats to log_metrics

db7114c

feat(tutorial): Added FinWorld multi-machine multi-GPU training startup script

refactor(script): Refactored the finworld training script, integratin…

5a25550

…g configuration and startup processes.

refactor(tutorial): Optimize dynamic generation logic for configurati…

04f4959

…on file paths

fix(deep_finance): argparse: with-deepfinance

d0ff68b

Merge remote-tracking branch 'origin/main' into dev/shuchang

1c356d7

fix(tutorial): Fixed issues with multi-machine training environment v…

37dcbcc

…ariable settings

TaoShuchang and others added 10 commits January 21, 2026 00:09

fix(metric_helper): fix tool cache metric

08ba184

fix little bug

3d55692

fix(utils): Suppress httpx AsyncClient.aclose() exception warnings

a478827

comments to english

88be3e4

feat: 支持服务名称前缀功能

fb41962

- 在 launcher 中添加 --prefix 参数支持 - 在 pty_launch 函数中实现前缀逻辑 - 更新 deep_finance.sh 脚本以使用前缀功能 - 允许在同一环境中运行多个服务实例

fix: 优化 DeepFinance 判断逻辑和配置

8d2e5d7

- 修复 tool_stats 提取逻辑，从 log_metrics 中正确获取数据 - 添加惩罚项调试信息输出 - 启用 tool calls 功能（force_disable_toolcalls: False） - 确保奖励计算准确性

chore(deps): bump agentscope from 1.0.7 to 1.0.8

3c85960

Merge remote-tracking branch 'origin/main' into dev/shuchang

06fda5f

gemini-code-assist bot reviewed Jan 22, 2026

View reviewed changes

binary-husky added 2 commits January 23, 2026 14:51

Merge branch 'main' into dev/shuchang

63cc682

revise message parsing

c9b87ac

binary-husky merged commit 224560f into main Jan 23, 2026
0 of 2 checks passed

		tool_stats = workflow_output.log_metrics.get("tool_stats", {})
		tool_calls = tool_stats.get("total_calls", 0)

		if penalty < 0:
		print(f"⚠️ Penalty applied: penalty={penalty}, tool_calls={tool_stats}")

DeepFinance Enhancements #6

DeepFinance Enhancements #6

Uh oh!

Conversation

TaoShuchang commented Jan 22, 2026

Description

Changes

1. Multi-Agent Message Content Parsing Enhancement

2. Service Name Prefix Support

3. Dependency Upgrade

4. DeepFinance Judge Optimization

5. Metric Helper Improvements

6. DeepFinance Configuration Updates

Testing

Impact

Uh oh!

gemini-code-assist bot commented Jan 22, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants