You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This repository provides tools and scripts for evaluating the LoCoMo and LongMemEval dataset using various models and APIs.
3
+
This repository provides tools and scripts for evaluating the LoCoMo dataset using various models and APIs.
4
4
5
5
## Installation
6
6
@@ -17,67 +17,18 @@ This repository provides tools and scripts for evaluating the LoCoMo and LongMem
17
17
18
18
## Configuration
19
19
20
-
Create an `.env` file in the `evaluation/` directory and include the following environment variables:
20
+
1. Copy the `.env-example` file to `.env`, and fill in the required environment variables according to your environment and API keys.
21
21
22
-
```plaintext
23
-
OPENAI_API_KEY="sk-xxx"
24
-
OPENAI_BASE_URL="your_base_url"
22
+
2. Copy the `configs-example/` directory to a new directory named `configs/`, and modify the configuration files inside it as needed. This directory contains model and API-specific settings.
25
23
26
-
MEM0_API_KEY="your_mem0_api_key"
27
-
MEM0_PROJECT_ID="your_mem0_proj_id"
28
-
MEM0_ORGANIZATION_ID="your_mem0_org_id"
29
24
30
-
MODEL="gpt-4o-mini" # or your preferred model
31
-
EMBEDDING_MODEL="text-embedding-3-small" # or your preferred embedding model
32
-
ZEP_API_KEY="your_zep_api_key"
33
-
```
25
+
## Evaluation Scripts
34
26
35
-
##Dataset
36
-
The smaller dataset "LoCoMo" has already been included in the repo to facilitate reproducing.
27
+
### LoCoMo Evaluation
28
+
To evaluate the **LoCoMo** dataset using one of the supported memory frameworks — `memos`, `mem0`, or `zep` — run the following command:
37
29
38
-
To download the "LongMemEval" dataset, run the following command:
When incorporating the evaluation of reflection duration, ensure to record related data in `{lib}_locomo_judged.json`. For additional NLP metrics like BLEU and ROUGE-L score, make adjustments to the `locomo_grader` function in `scripts/locomo/locomo_eval.py`.
81
-
82
-
2.**Intermediate Results**
83
-
While I have provided intermediate results like `{lib}_locomo_search_results.json`, `{lib}_locomo_responses.json`, and `{lib}_locomo_judged.json` for reproducibility, contributors are encouraged to report final results in the PR description rather than editing these files directly. Any valuable modifications will be combined into an updated version of the evaluation code containing revised intermediate results (at specified intervals).
0 commit comments