Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
bac05b5
feat(finworld): Added AgentScope learning protocol and OpenJudge eval…
TaoShuchang Jan 16, 2026
ba41164
Merge remote-tracking branch 'origin/main' into dev/shuchang
TaoShuchang Jan 16, 2026
c7ca8c7
Precommit fix (#4)
binary-husky Jan 16, 2026
7f2b017
fix test bench import
binary-husky Jan 16, 2026
9dd3c42
refactor(finworld): Replace agent protocol and unify configuration up…
TaoShuchang Jan 17, 2026
757f8a1
feat(finworld): Added FinWorld training environment configuration scr…
TaoShuchang Jan 18, 2026
079e4bd
refactor(utils): Remove unused extract and compute functions `extract…
TaoShuchang Jan 18, 2026
bcce8f0
refactor(finworld): Replace the old model with OpenJudge, update eval…
TaoShuchang Jan 18, 2026
4662d63
feat(task_reader): Support data reading of type jsonl_with_env_service
TaoShuchang Jan 19, 2026
de81c1d
feat(core): add finworld task reader support to framework
TaoShuchang Jan 19, 2026
248acc4
feat(finworld): implement specialized data reader and openjudge-based…
TaoShuchang Jan 19, 2026
9d651fd
refactor(finworld): optimize configuration templates and prompt engin…
TaoShuchang Jan 19, 2026
7475ecc
chore(finworld): update launch scripts and add variant experiment scr…
TaoShuchang Jan 19, 2026
b95d491
Merge remote-tracking branch 'origin/main' into dev/shuchang
TaoShuchang Jan 19, 2026
f20ab91
feat(finworld): Added support for multi-machine, multi-GPU training s…
TaoShuchang Jan 19, 2026
ea87d4b
chore(git): ignore finworld/yaml/*
TaoShuchang Jan 20, 2026
3082bca
fix(metrics): Fix and enhance the compatibility and debugging output …
TaoShuchang Jan 20, 2026
ef44b63
fix(metrics): Remove debug prints and synchronize reward statistics
TaoShuchang Jan 20, 2026
0889483
chore: "Stop tracking existing yaml files in tutorial directory"
TaoShuchang Jan 20, 2026
db7114c
fix(task_runner): Synchronize reward_stats to log_metrics
TaoShuchang Jan 20, 2026
5a25550
refactor(script): Refactored the finworld training script, integratin…
TaoShuchang Jan 20, 2026
623b7d9
Refactor(deep_finance): Replace and remove finworld-related implement…
TaoShuchang Jan 20, 2026
0aaab86
refactor(deepfinance): Rename and unify DeepFinance module and config…
TaoShuchang Jan 20, 2026
04f4959
refactor(tutorial): Optimize dynamic generation logic for configurati…
TaoShuchang Jan 20, 2026
d0ff68b
fix(deep_finance): argparse: with-deepfinance
TaoShuchang Jan 20, 2026
1c356d7
Merge remote-tracking branch 'origin/main' into dev/shuchang
TaoShuchang Jan 20, 2026
37dcbcc
fix(tutorial): Fixed issues with multi-machine training environment v…
TaoShuchang Jan 20, 2026
529ae7e
fix(env): Corrected the assignment logic for reward and info when ret…
TaoShuchang Jan 20, 2026
f4eb231
chore(config): Update example_deep_finance configuration and clean up…
TaoShuchang Jan 20, 2026
1e07515
Refactor(metric): Optimize tool metric calculation and data saving logic
TaoShuchang Jan 20, 2026
08ba184
fix(metric_helper): fix tool cache metric
TaoShuchang Jan 20, 2026
3d55692
fix little bug
TaoShuchang Jan 21, 2026
a478827
fix(utils): Suppress httpx AsyncClient.aclose() exception warnings
TaoShuchang Jan 21, 2026
88be3e4
comments to english
binary-husky Jan 21, 2026
fb41962
feat: 支持服务名称前缀功能
TaoShuchang Jan 21, 2026
a1f909b
fix: 改进 MultiAgent 消息内容解析逻辑
TaoShuchang Jan 21, 2026
8d2e5d7
fix: 优化 DeepFinance 判断逻辑和配置
TaoShuchang Jan 21, 2026
3c85960
chore(deps): bump agentscope from 1.0.7 to 1.0.8
TaoShuchang Jan 22, 2026
9b541c5
fix(metric_helper): correct trajectory save path and add tool call me…
TaoShuchang Jan 22, 2026
06fda5f
Merge remote-tracking branch 'origin/main' into dev/shuchang
TaoShuchang Jan 22, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
47 changes: 33 additions & 14 deletions ajet/context_tracker/multiagent_tracking.py
Original file line number Diff line number Diff line change
Expand Up @@ -82,27 +82,46 @@ def extract_text_content_from_content_dict(self, msg):
# },
# ],
# }
# or tool_result format:
# msg = {
# "role": "tool",
# "content": [
# {
# "type": "tool_result",
# "id": "call_xxx",
# "output": "tool output content",
# "name": "tool_name"
# },
# ],
# }

str_content = ""
for item in msg["content"]:
# item = {
# "type": "text",
# "text": "some text"
# },

assert isinstance(item, dict), f"Unsupported non-dict item in message content: {item}. Full message: {msg}"

if ("text" not in item):
item_type = item.get("type", "")

# Handle text content block
if "text" in item:
if isinstance(item["text"], str):
str_content += item["text"]
# Handle tool_result content block (AgentScope format)
elif item_type == "tool_result" and "output" in item:
output = item["output"]
if isinstance(output, str):
str_content += output
else:
str_content += str(output)
Comment on lines +104 to +114

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The updated logic to handle tool_result content blocks and convert their output to a string, even if it's not initially a string, is a good improvement. This makes the content extraction more robust and prevents potential data loss or type errors when processing diverse tool outputs.

# Handle tool_use content block (for completeness)
elif item_type == "tool_use":
# tool_use blocks are handled via tool_calls field, skip content extraction
continue
else:
logger.warning(
f"Non-text content in message content detected: {item}. Ignoring."
f"Non-text content in message content detected: {item}. Ignoring this item."
)
should_skip_message = True
return str_content, should_skip_message

if isinstance(item["text"], str):
str_content += str(item["text"])
else:
str_content = ""
# Continue processing other items instead of skipping the entire message
continue
Comment on lines 120 to +124

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Changing the error handling from immediately returning to logging a warning and continuing to process other items in the message content is a significant improvement. This makes the system more resilient to malformed or unexpected content blocks, ensuring that valid parts of a message are still processed instead of skipping the entire message.


should_skip_message = False
return str_content, should_skip_message
Expand Down
3 changes: 2 additions & 1 deletion ajet/launcher.py
Original file line number Diff line number Diff line change
Expand Up @@ -99,6 +99,7 @@ def parse_args():
default=False,
help="Kill system processes (ray + vllm + python) that may block the current experiment",
)
parser.add_argument("--prefix", type=str, default="", required=False, help="Prefix for service names")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Adding the --prefix CLI argument is a good way to support unique identifiers for service instances, which is particularly useful in multi-experiment or distributed environments. This enhances the flexibility and scalability of the launcher.

return parser.parse_args()


Expand Down Expand Up @@ -304,7 +305,7 @@ def main():
pty_launch("appworld")

if args.with_deepfinance:
pty_launch("deepfinance")
pty_launch("deepfinance", prefix=args.prefix)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Passing the prefix argument to pty_launch for the deepfinance service correctly integrates the new prefixing functionality. This ensures that DeepFinance instances can be uniquely identified when launched.


if args.with_crafters:
pty_launch("crafters")
Expand Down
2 changes: 1 addition & 1 deletion ajet/utils/metric_helper/save_trajectory_as_json.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ def save_trajectory_as_json(ctx_trackers, global_steps, prefix="train"):
# Define save directory and file path
traj_save_dir = os.path.join(
os.environ.get("BEST_LOGGER_PATH", "launcher_record"),
"ctx_trackers",
"trajectory",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Changing the trajectory save path from ctx_trackers to trajectory improves the organization of saved data. This makes the directory structure more intuitive and easier to navigate.

prefix,
f"step_{global_steps}"
)
Expand Down
1 change: 1 addition & 0 deletions ajet/utils/metric_helper/tool_metric_helper.py
Original file line number Diff line number Diff line change
Expand Up @@ -125,6 +125,7 @@ def compute_tool_metrics(tool_stats_list: List[Dict[str, Any]], prefix: str = ""
if calls > 0:
error_rate = errors / calls * 100
metrics[f"{prefix}tool_error/{tool_name}/error_rate"] = round(error_rate, 2)
metrics[f"{prefix}tool_error/{tool_name}/calls"] = calls

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Adding metrics[f"{prefix}tool_error/{tool_name}/calls"] = calls provides a more comprehensive view of tool usage. Knowing the total number of calls alongside the error rate offers better insights into tool performance and potential issues.



return metrics
Expand Down
4 changes: 3 additions & 1 deletion ajet/utils/pty.py
Original file line number Diff line number Diff line change
Expand Up @@ -96,13 +96,15 @@ def pty_wrapper_final(human_cmd, dir, env_dict):
pty_wrapper(["/bin/bash", "-c", human_cmd], dir, env_dict)


def pty_launch(service_name: str, success_std_string="Starting server on"):
def pty_launch(service_name: str, success_std_string="Starting server on", prefix: str=""):
from ajet.utils.smart_daemon import LaunchCommandWhenAbsent

service_path = os.environ.get(f"{service_name.upper()}_PATH")
service_script = os.environ.get(f"{service_name.upper()}_SCRIPT")
if service_path is None or service_script is None:
raise ValueError(f"Environment variables for {service_name} not properly set.")
if prefix != "":
service_name = prefix + "_" + service_name
Comment on lines +106 to +107

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The logic to prepend the prefix to the service_name when a prefix is provided is correctly implemented. This ensures that the service name used for launching is unique, supporting the multi-instance use case.

companion = LaunchCommandWhenAbsent(
full_argument_list=[service_script],
dir=service_path,
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ classifiers = [
]
requires-python = ">=3.10,<3.13"
dependencies = [
"agentscope==1.0.7",
"agentscope==1.0.8",
"chromadb",
"httpx",
"tenacity",
Expand Down
3 changes: 2 additions & 1 deletion tutorial/example_deep_finance/deep_finance.sh
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ set -e
#===============================================================================
# 1. 配置区域 - 用户只需修改这里
#===============================================================================
SUFFIX="ajet_deep_finance" # 实验后缀,影响所有日志和实验名称
SUFFIX="deep_finance" # 实验后缀,影响所有日志和实验名称
PREFIX="open" # 实验前缀,影响日志和实验所在文件夹

# OpenJudge 模型配置
Expand Down Expand Up @@ -208,6 +208,7 @@ if [[ $HOSTNAME == *"-master-"* ]]; then
--with-deepfinance \
--conf ${CONFIG_FILE} \
--backbone="verl" \
--prefix=${SUFFIX} \
2>&1 | tee ${TRAIN_LOG}


Expand Down
6 changes: 5 additions & 1 deletion tutorial/example_deep_finance/deep_finance_judge.py
Original file line number Diff line number Diff line change
Expand Up @@ -373,8 +373,12 @@ def compute_reward(self, workflow_task: WorkflowTask, workflow_output: WorkflowO
fused_reward, contributions = self._fuse_grader_scores(grader_scores, rm_raw)

# 6. 计算惩罚项(保留原有的 tool_calls 惩罚逻辑)
tool_calls = metadata.get("tool_stats", {}).get("total_calls", 0)
# 从 log_metrics 中提取 tool_stats(deep_finance.py 将其放在 log_metrics 而非 metadata)
tool_stats = workflow_output.log_metrics.get("tool_stats", {})
tool_calls = tool_stats.get("total_calls", 0)
Comment on lines +377 to +378

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The fix to extract tool_stats from workflow_output.log_metrics instead of metadata is crucial for the correctness of the DeepFinance judge. This ensures that the penalty calculation is based on accurate tool usage statistics.

penalty = self._compute_penalty(tool_calls)
if penalty < 0:
print(f"⚠️ Penalty applied: penalty={penalty}, tool_calls={tool_stats}")
Comment on lines +380 to +381

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Adding a print statement to log when a penalty is applied, along with the penalty value and tool call statistics, is excellent for debugging and monitoring. This provides immediate feedback on the reward calculation process.


# 7. 汇总
final_reward = fused_reward + step_reward + penalty
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ ajet:
rollout:
# ✨✨✨✨ 编写并选择Agent
user_workflow: tutorial.example_deep_finance.deep_finance->ExampleDeepResearchProtocol
force_disable_toolcalls: True
force_disable_toolcalls: False

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Changing force_disable_toolcalls to False enables tool calls within the DeepFinance workflow. This is a necessary configuration update to leverage the enhanced tool_stats extraction and penalty logic in the judge.

enable_oversample: False
tensor_model_parallel_size: 8
num_repeat: {{NUM_REPEAT}}
Expand Down
Loading