Skip to content

Conversation

@TaoShuchang
Copy link
Collaborator

Add Deep Finance Tutorial

Description

This PR adds a new tutorial example example_deep_finance demonstrating how to train financial analysis agents using AgentJet.

New Features

  • Deep Finance Tutorial (tutorial/example_deep_finance/)
    • deep_finance.py - Main agent implementation with tool-calling capabilities
    • deep_finance_reader.py - Custom task reader for financial data
    • deep_finance_judge.py - OpenJudge-based evaluation logic for grading agent responses
    • deep_finance.yaml - Training configuration template
    • deep_finance.sh - Multi-machine, multi-GPU training script
    • prompt/ - Financial analyst prompt templates and tool prompt builder
    • yaml_template/ - YAML configuration templates

Bug Fixes & Improvements

  • Fixed multi-machine training environment variable settings
  • Synchronized reward_stats to log_metrics in task runner
  • Enhanced metrics update logic compatibility
  • Minor fixes in metric_helper and resource_keeper

Files Changed

  • 22 files changed, ~2095 insertions, ~78 deletions

TaoShuchang and others added 27 commits January 16, 2026 14:53
…uation functionality to the FinWorld task.

- Added the ExampleAgentScopeLearnProtocol class to implement the AgentScope execution flow for multi-turn interactions.

- Integrated semaphore control to manage the parallelism of environment calls, improving environment stepping performance.

- Implemented a mechanism for detecting context overflows and quickly terminating during environment interactions to prevent blocking.

- Added a finworld.yaml configuration file to define project training and rollout parameters.

- Added the FinWorldJudgeByOpenJudge class, integrating multiple evaluators including RM Gallery and OpenJudge (@Haoran).

- Implemented a mechanism for converting task output, asynchronous calls, and retrying to ensure evaluation stability.

- Weight normalization manages the contributions of each evaluator, merging them to calculate the final reward and success determination.
* fix end of files

* autoflake import fix

* add mypy check
…dates

- Renamed ExampleAgentScopeLearnProtocol to ExampleDeepResearchProtocol and modified the execute method signature.
- Unified the parameter name of the model tuner to `tuner` and its related attribute references.
- Optimized the multi-turn interaction step configuration, changing it to use `tuner.config.ajet.rollout.multi_turn.max_steps`.
- Modified the context overflow judgment logic to prevent tool call blocking.
- Updated the finworld.yaml configuration, replacing astune with ajet-related configurations, and adjusted the workflow protocol and environment parameters.
- Modified the default environment variable values ​​and log saving paths in finworld_judge.py.
- Added and improved multi-machine and single-machine startup scripts, supporting dynamic generation of MCP configuration and environment variable loading.
- Added the finworld_single.yaml template to adapt to single-machine training configurations.
- Adjusted the key reference for multi-turn step configuration in ma_deepresearch.py, using the ajet configuration path.
…ipts and templates

- Added bash startup scripts for multi-machine, multi-GPU training, supporting dynamic configuration generation and environment variable import.
- Implemented training configuration file templates, supporting automatic injection of various weight parameters and model paths.
- Adjusted the default request timeout of EnvClient from 30 seconds to 300 seconds to accommodate long training requests.
- Added a new finworld example directory and related documentation, improving the example project structure.
…uation configuration and scripts

- Replaced model initialization in FinWorldJudgeByOpenJudge with the `_init_openjudge_model` method
- Read Judge model parameters from the configuration file first, using environment variables as a fallback
- Optimized RM Gallery initialization, using configuration-first logic, and improved exception stack trace printing
- Cleaned up and removed the old `_init_model` singleton method and related code
- Updated the example startup script `ajet_finworld.sh`, adding OPENJUDGE_LLM and RM_LLM configurations
- Modified YAML templates and configuration files to unify the structure and field naming of Judge configuration items
- Deleted the outdated `cc_rm4_res2cit2fai2_30b.sh` script
- Adjusted the `env_service` startup path to improve environment activation compatibility
- Adjusted script log output format and content to enhance the clarity of configuration parameter printing
- Added the jsonl_with_env_service type, which allows loading data from jsonl files while calling tools via env_service.
- Extended ResourceKeeper to handle the creation and release logic of environment instances for jsonl_with_env_service.
- Maintained the env_service type logic, calling create_instance to register instances and initializing them using init_messages from the jsonl file.
- Added an example protocol, ExampleDeepResearchProtocol, to implement multi-turn interaction and environment call coordination.
- Provided training scripts and YAML configuration templates for finworld, supporting the jsonl_with_env_service mode training environment.
- Optimized scripts to support multi-node multi-GPU training, including environment variables and Ray cluster configuration.
…of the metrics update logic

- Modified the `update_metrics` function, adding a `prefix` parameter to distinguish between training and validation metrics.
- Adjusted the data source for extracting `reward_stats` and `tool_stats`, migrating from `workflow_metadata` to `log_metrics`.
- Added debug printing to output the `log_metrics` content and metric key names at key steps for easier troubleshooting.
- Used the appropriate prefix when calling `update_metrics` in `trainer_verl.py`, and added multiple debug prints.
- Modified `WorkflowOutput` to place `tool_stats` and `reward_stats` into the `log_metrics` field.
- Removed redundant and deprecated code for extracting `reward_stats` and calculation functions.
- Added debug information output to the `finworld` and `finworld_judge` modules to track log metrics and scoring data.
- Removed debug print statements before and after the `update_metrics` call in `trainer_verl.py`
- Removed debug print statements related to the `log_metrics` key in `finworld.py`
- Removed debug print statements before updating `metadata_stats` in `finworld_judge.py`
- Added logic in `general_runner.py` to synchronize `reward_stats` from `metadata` to `log_metrics` after the judge calculation
- Cleaned up debug print statements within `update_metrics` in `metric_helper`, improving code readability.
feat(tutorial): Added FinWorld multi-machine multi-GPU training startup script
…ations

- Switched the example directory from example_finworld to example_deep_finance
- Modified startup parameters and logic to support deep_finance, replacing the finworld option
- Replaced finworld_reader with deep_finance_reader in the task reader
- Adjusted environment client configuration in resource management, using deep_finance instead of finworld-related checks
- Updated reward metric tool documentation to support deep_finance
- Deleted finworld-related configuration files, scripts, code, and evaluation modules, cleaning up leftover files and scripts
- Replaced the keyword "finworld" with "deep_finance" in comments and logs
… references

- Replace all "finworld" and "deep_finance" names with the unified "deepfinance" format.
- Modify command-line arguments to `--with-deepfinance` for consistency.
- Adjust the class name in `task_reader` from `deep_financeReader` to `DeepFinanceReader`.
- Update the documentation description and file name of the `metric_helper` module to DeepFinance.
- Modify environment variables and configuration paths in the example script `deep_finance.sh` to use the `DEEPFINANCE` prefix.
- Update `judge_protocol` to `DeepFinanceJudgeByOpenJudge` in the `deep_finance.yaml` configuration.
- Refactor the `FinWorldJudgeByOpenJudge` class in `deep_finance_judge.py` to `DeepFinanceJudgeByOpenJudge`.
- Rename the `FinworldReader` class in `deep_finance_reader.py` to `DeepFinanceReader`.
- Modify the debug log identifier and corresponding environment variable name to `DEEPFINANCE_DEBUG`.
- Update the evaluation protocol in the `deep_finance_template.yaml` template to `DeepFinanceJudgeByOpenJudge`.
- Ensure that internal references and comments in all modules are updated to use DeepFinance and deepfinance-related names.
@gemini-code-assist
Copy link

Summary of Changes

Hello @TaoShuchang, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly expands the AgentJet framework by integrating a new 'Deep Finance' tutorial. This tutorial provides a complete example of training financial analysis agents, encompassing custom data reading, sophisticated OpenJudge-based evaluation, and robust multi-machine training infrastructure. The changes also include several core improvements to metric logging, environment interaction, and configuration management, making the system more adaptable and performant for complex agent training scenarios.

Highlights

  • New Deep Finance Tutorial: Introduced a comprehensive new tutorial, 'example_deep_finance', demonstrating how to train financial analysis agents using AgentJet. This includes a main agent implementation with tool-calling capabilities, a custom task reader for financial data, and an OpenJudge-based evaluation logic.
  • Enhanced Metric Logging and Compatibility: Improved the update_metrics function in trainer_verl.py and metric_helper to accept a prefix argument, ensuring better organization of logged metrics. Synchronized reward_stats from metadata to log_metrics in the task runner for consistent data flow.
  • Flexible Task Reading and Environment Interaction: Added a new DeepFinanceReader to handle financial task data loaded from JSON files, dynamically assembling initial messages. The resource_keeper was updated to support 'deep_finance' as an environment service type, allowing tool calls to be routed through the environment service.
  • Multi-Machine Training Script and Configuration: Included a new shell script (deep_finance.sh) for multi-machine, multi-GPU training, along with a dynamic YAML configuration template (deep_finance_template.yaml) to streamline setup and execution of financial agent training.
  • Increased Environment Service Timeout: The default timeout for the EnvClient was significantly increased from 30 seconds to 300 seconds, addressing potential issues with long-running environment interactions.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a comprehensive tutorial for deep_finance, complete with an agent implementation, data readers, evaluation logic, and training scripts. The overall structure is well-conceived. However, there are several critical issues, primarily within the configuration files, that would prevent the tutorial from functioning as intended. The deep_finance.yaml file, in particular, contains numerous hardcoded, user-specific paths and incorrect references that appear to be copied from another example. Similarly, the main shell script and a JSON configuration file include hardcoded IP addresses. Addressing these configuration problems is essential to ensure the tutorial is portable and usable. Beyond these critical fixes, there are also some minor opportunities for code improvement and cleanup that would enhance maintainability.

Comment on lines 1 to 94
# ------------------ 主要配置 ------------------
ajet:
project_name: ajet
experiment_name: "cc_rm4_res2cit2fai2_30b"
judge_llm: qwen-flash
judge_concurrency: 10
# OpenJudge 权重配置
report_resolution_weight: 0.2 # 报告质量评估
trajectory_faithfulness_weight: 0.2 # 事实准确性评估
citation_audit_weight: 0.2 # 引用审计评估 (覆盖率 + 真实性)
rm_weight: 0.4 # RM Gallery 权重
task_judge:
judge_type: customized_protocol
judge_protocol: tutorial.example_finworld.finworld_judge->DeepFinanceJudgeByOpenJudge
model:
# ✨✨✨✨ 设置待训练的模型
path: /mnt/data_cpfs/taoshuchang.tsc/models/Qwen3-30B-A3B-Instruct-2507
# path: /mnt/data_cpfs/taoshuchang.tsc/models/Qwen3-8B
trainer_common:
nnodes: 8
n_gpus_per_node: 8
val_before_train: True
val_pass_n: 8
save_freq: 10
test_freq: 2
total_epochs: 200
rollout:
# ✨✨✨✨ 编写并选择Agent
user_workflow: tutorial.example_finworld.finworld->ExampleDeepResearchProtocol
force_disable_toolcalls: True
enable_oversample: False
tensor_model_parallel_size: 8
num_repeat: 4
max_env_worker: 64 # 增加环境并行数
max_num_seqs: 64 # 增加VLLM并发序列数
max_env_len: 10000
max_response_length_in_one_turn: 8000
max_model_len: 50000
agent_madness_reward: 0.0
multi_turn:
max_steps: 6
interchange_server:
interchange_method: 'tcp' # options: 'tcp' (multi-nodes) or 'ipc' (1 node)
debug:
debug_max_parallel: 64 # 增加并行任务数,充分利用GPU
debug_first_n_tasks: 100 # 增加处理的任务数
data:
train_batch_size: 32 # 增加批次大小,适配8卡并行
max_prompt_length: 8000
max_response_length: 41000

task_reader:
# type: env_service # `env_service` or `dataset_file` or `huggingface_dat_repo` or `finworld`
# === 方案 A: 传统 env_service 模式 ===
# env_service:
# env_type: "finworld"
# env_url: "http://127.0.0.1:8080"
# env_action_preference: code
# training_split: train
# validation_split: val

# === 方案 B: DeepFinance Reader 模式 (数据从 JSON 加载,工具调用走 env_service) ===
type: finworld
finworld:
training:
file_path: /mnt/data_cpfs/taoshuchang.tsc/deepresearch/AgentJet/tutorial/example_finworld/data/finworld_tasks_11171143_cc.json
validation:
file_path: /mnt/data_cpfs/taoshuchang.tsc/deepresearch/AgentJet/tutorial/example_finworld/data/AgentEvolver_query_val.json
# env_service 仍然需要配置(用于工具调用)
env_service:
env_type: "finworld"
env_url: "http://127.0.0.1:8080"
env_action_preference: code
trainer:
default_local_dir: "/mnt/data/taoshuchang.tsc/deepresearch/ajet/checkpoints/example_finworld//localths/cc_rm4_res2cit2fai2_30b"
# resume_mode: disable # 禁用自动恢复,从头开始训练
actor_rollout_ref:
rollout:
tensor_model_parallel_size: 8
gpu_memory_utilization: 0.95
# ------------------ 不需要修改 ------------------
hydra:
searchpath:
- file://ajet/default_config
- file://ajet/default_config/verl # verl only
- file://external/verl/verl/trainer/config # verl only
- file://ajet/default_config/trinity # trinity only

# ------------------ 不需要修改 ------------------
defaults:
- verl_default # verl inherit 2/2
- trinity_default # trinity inherit 1/1
- ajet_default
- _self_

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This configuration file appears to be an incorrectly adapted copy from the example_finworld tutorial and has several critical issues that will prevent this deep_finance tutorial from working:

  1. Hardcoded Paths: Paths like /mnt/data_cpfs/taoshuchang.tsc/... are user-specific and should be replaced with placeholders or environment variables, as is done in deep_finance_template.yaml.
  2. Incorrect References: Several keys point to finworld instead of deep_finance:
    • user_workflow (line 29) points to tutorial.example_finworld.finworld->...
    • judge_protocol (line 14) points to tutorial.example_finworld.finworld_judge->...
    • task_reader.type (line 63) is finworld.
    • file_paths (lines 66, 68) point to example_finworld data.
    • default_local_dir (line 75) points to an example_finworld checkpoint directory.

Please correct these paths and references to point to the deep_finance components and data. This file should likely be generated from yaml_template/deep_finance_template.yaml rather than being a static file.

"mcpServers": {
"flowllm": {
"transport": "sse",
"url": "http://22.17.31.142:8040/sse",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This file contains a hardcoded IP address (22.17.31.142). Hardcoding IP addresses makes the code less portable and harder to configure for different environments. It's recommended to use a placeholder, an environment variable, or load it from a configuration file that is not checked into version control. The deep_finance.sh script already does this for another configuration file, and a similar approach should be used here for consistency.


# 其他服务配置
HF_ENDPOINT="https://hf-mirror.com"
ES_HOSTS="http://11.160.132.46:8200"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The ES_HOSTS variable is hardcoded to an IP address. This makes the script difficult to use in different environments. Please consider moving this to an environment variable loaded from the .env file, similar to other secrets in this script.



# 创建信号量,允许同时12个线程运行
sem = threading.Semaphore(30)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The comment 创建信号量,允许同时12个线程运行 (allow 12 threads to run simultaneously) is inconsistent with the code Semaphore(30). Please update the comment or the value to ensure they match.

Suggested change
sem = threading.Semaphore(30)
sem = threading.Semaphore(12)

Comment on lines +182 to +183
content = m.get('content')
if content is None: content = ""

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This logic can be simplified by providing a default value to the get method, which improves readability.

Suggested change
content = m.get('content')
if content is None: content = ""
content = m.get('content', "")

export PYTHONPATH="${AJET_ROOT}:${PYTHONPATH}"
export RAY_CLUSTER_MODE="multi_node"
export DEEPFINANCE_PATH="${ENV_SERVICE_ROOT}" # AgentJet 内部可能使用此路径
export DEEPFINANCE_SCRIPT="source /mnt/data/taoshuchang.tsc/anaconda3/etc/profile.d/conda.sh && conda activate finworld_1209 && cd ${ENV_SERVICE_ROOT} && DEEPFINANCE_TOOL_RESULT_MAX_CHARS=${DEEPFINANCE_TOOL_RESULT_MAX_CHARS} DEEPFINANCE_MCP_CONFIG=${DEEPFINANCE_MCP_CONFIG} CACHE_TYPE=${CACHE_TYPE} MONGO_URI=${MONGO_URI} MONGO_DB_NAME=${MONGO_DB_NAME} MONGO_COLLECTION_NAME=${MONGO_COLLECTION_NAME} python -m env_service.env_service --env finworld --portal 0.0.0.0 --port 8080"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This line is very long, which makes it difficult to read and maintain. For better readability, you could break it into multiple lines using backslashes within the string.

"""初始化 RM Gallery Evaluator"""
try:
# Monkey patch OpenAI client timeout (RM Gallery 默认只有60s,对于30B模型不够用)
import openai

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This file contains several imports inside methods (e.g., import openai here, from rm_gallery... on line 193, and from rm_gallery... on line 623). It is a best practice to place all imports at the top of the file for better readability, dependency management, and to avoid repeated import overhead. Please move these imports to the top of the module.

Comment on lines +187 to +191
_original_openai_init = openai.OpenAI.__init__
def _patched_openai_init(self, *args, **kwargs):
kwargs.setdefault('timeout', 600.0) # 增大到600秒
return _original_openai_init(self, *args, **kwargs)
openai.OpenAI.__init__ = _patched_openai_init

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Monkey-patching a library's __init__ method can be brittle and have unintended side effects, making the code harder to maintain and debug. It would be more robust to check if the rm_gallery library provides a way to pass a pre-configured openai.OpenAI client or to specify the timeout directly during initialization. If not, consider opening an issue or pull request with the library to add this feature.

TaoShuchang and others added 7 commits January 20, 2026 20:12
…urning environment state

- Corrected the `env_output` return value structure in `BaseGymEnv` to ensure correct assignment of `reward` and `info` fields.
- Removed `RefJudge` and `StructureJudge` related metric calculations and statistics from `reward_metric_helper`.
- Cleaned up redundant code in `reward_metric_helper`, removing invalid comments and statistical items.
- Modified `save_trajectory_as_json` to always print trajectory saving confirmation information.
- Corrected log comments in `example_deep_finance` to avoid meaningless log output.
- Added the `save_trajectory_as_json_file` configuration item to `deep_finance_template.yaml` to support trajectory saving functionality.
… files

- Added a new ignore rule for config file paths in .gitignore
- Deleted the automatically generated mcp_finance_tool_generated.json file in example_deep_finance
- Refactored the deep_finance.yaml configuration file, adjusting project and experiment names
- Reorganized Judge configuration, clarifying openjudge_llm and rm_llm models
- Optimized model paths and training parameter configurations, adding parallel and batch processing settings
- Adjusted data reading methods and training/validation set path placeholders
- Reduced GPU memory usage ratio for rollout to 0.8
- Updated the default save directory path for the trainer to a placeholder variable
- Cleaned up unused and commented-out code to improve configuration file conciseness
- Corrected the data source field for timeline data used during trajectory saving.
- Removed redundant fields in tool execution time, cache hit rate, and error rate statistics.
- Updated .gitignore to add ignore rules for the example script directory.
- Removed unnecessary debugging information from logs to reduce log noise.
- Adjusted log printing in the multi-round interaction execution process to simplify output content.
- Streamlined log code for environment observation and termination checks to improve code readability.
@binary-husky binary-husky merged commit 0000b8d into main Jan 21, 2026
0 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants