Skip to content

Commit 3378fc0

Browse files
authored
feat(eval): add eval configs example (#19)
* feat(eval): add eval dependencies * feat(eval): add configs example * docs(eval): update README.md * feat(eval): remove the dependency (pydantic)
1 parent da639db commit 3378fc0

File tree

10 files changed

+822
-186
lines changed

10 files changed

+822
-186
lines changed

docs/modules/mem_reader.md

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -147,20 +147,20 @@ Documents are chunked and summarized to create searchable knowledge items.
147147

148148
We use [`markitdown`](https://github.com/microsoft/markitdown) to convert files to Markdown format texts.
149149

150-
**MarkItDown currently supports the conversion from:**
150+
**MarkItDown currently supports the conversion from:**
151151

152152
```
153-
PDF
154-
PowerPoint
155-
Word
156-
Excel
157-
Images (EXIF metadata and OCR)
158-
Audio (EXIF metadata and speech transcription)
159-
HTML
160-
Text-based formats (CSV, JSON, XML)
161-
ZIP files (iterates over contents)
162-
YouTube URLs
163-
EPUBs
153+
PDF
154+
PowerPoint
155+
Word
156+
Excel
157+
Images (EXIF metadata and OCR)
158+
Audio (EXIF metadata and speech transcription)
159+
HTML
160+
Text-based formats (CSV, JSON, XML)
161+
ZIP files (iterates over contents)
162+
YouTube URLs
163+
EPUBs
164164
... and more!
165165
```
166166
*(Content sourced from [MarkItDown GitHub repository](https://github.com/microsoft/markitdown))*

evaluation/.env-example

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
MODEL="gpt-4o-mini"
2+
OPENAI_API_KEY="sk-***REDACTED***"
3+
OPENAI_BASE_URL="http://***.***.***.***:3000/v1"
4+
5+
MEM0_API_KEY="m0-***REDACTED***"
6+
7+
ZEP_API_KEY="z_***REDACTED***"
8+
9+
CHAT_MODEL="gpt-4o-mini"
10+
CHAT_MODEL_BASE_URL="http://***.***.***.***:3000/v1"
11+
CHAT_MODEL_API_KEY="sk-***REDACTED***"

evaluation/README.md

Lines changed: 9 additions & 58 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Evaluation Memory Framework
22

3-
This repository provides tools and scripts for evaluating the LoCoMo and LongMemEval dataset using various models and APIs.
3+
This repository provides tools and scripts for evaluating the LoCoMo dataset using various models and APIs.
44

55
## Installation
66

@@ -17,67 +17,18 @@ This repository provides tools and scripts for evaluating the LoCoMo and LongMem
1717

1818
## Configuration
1919

20-
Create an `.env` file in the `evaluation/` directory and include the following environment variables:
20+
1. Copy the `.env-example` file to `.env`, and fill in the required environment variables according to your environment and API keys.
2121

22-
```plaintext
23-
OPENAI_API_KEY="sk-xxx"
24-
OPENAI_BASE_URL="your_base_url"
22+
2. Copy the `configs-example/` directory to a new directory named `configs/`, and modify the configuration files inside it as needed. This directory contains model and API-specific settings.
2523

26-
MEM0_API_KEY="your_mem0_api_key"
27-
MEM0_PROJECT_ID="your_mem0_proj_id"
28-
MEM0_ORGANIZATION_ID="your_mem0_org_id"
2924

30-
MODEL="gpt-4o-mini" # or your preferred model
31-
EMBEDDING_MODEL="text-embedding-3-small" # or your preferred embedding model
32-
ZEP_API_KEY="your_zep_api_key"
33-
```
25+
## Evaluation Scripts
3426

35-
## Dataset
36-
The smaller dataset "LoCoMo" has already been included in the repo to facilitate reproducing.
27+
### LoCoMo Evaluation
28+
To evaluate the **LoCoMo** dataset using one of the supported memory frameworks — `memos`, `mem0`, or `zep` — run the following command:
3729

38-
To download the "LongMemEval" dataset, run the following command:
3930
```bash
40-
huggingface-cli download --repo-type dataset --resume-download xiaowu0162/longmemeval --local-dir data/longmemeval
31+
# Edit the configuration in ./scripts/run_locomo_eval.sh
32+
# Specify the model and memory backend you want to use (e.g., mem0, zep, etc.)
33+
./scripts/run_locomo_eval.sh
4134
```
42-
43-
After downloading, rename the files as follows:
44-
- `longmemeval_m.json`
45-
- `longmemeval_s.json`
46-
- `longmemeval_oracle.json`
47-
48-
## Evaluation Scripts
49-
50-
To evaluate the `locomo` dataset, execute the following scripts in order:
51-
52-
1. **Ingest locomo history into MemOS:**
53-
```bash
54-
python scripts/locomo/locomo_ingestion.py --lib memos
55-
```
56-
57-
2. **Search Memory for each QA pair in locomo:**
58-
```bash
59-
python scripts/locomo/locomo_search.py --lib memos
60-
```
61-
62-
3. **Generate responses from OpenAI with provided context:**
63-
```bash
64-
python scripts/locomo/locomo_responses.py --lib memos
65-
```
66-
67-
4. **Evaluate the generated answers:**
68-
```bash
69-
python scripts/locomo/locomo_eval.py --lib memos
70-
```
71-
72-
5. **Calculate fine-grained scores for each category:**
73-
```bash
74-
python scripts/locomo/locomo_metric.py --lib memos
75-
```
76-
77-
## Contributing Guidelines
78-
79-
1. **Add New Metrics**
80-
When incorporating the evaluation of reflection duration, ensure to record related data in `{lib}_locomo_judged.json`. For additional NLP metrics like BLEU and ROUGE-L score, make adjustments to the `locomo_grader` function in `scripts/locomo/locomo_eval.py`.
81-
82-
2. **Intermediate Results**
83-
While I have provided intermediate results like `{lib}_locomo_search_results.json`, `{lib}_locomo_responses.json`, and `{lib}_locomo_judged.json` for reproducibility, contributors are encouraged to report final results in the PR description rather than editing these files directly. Any valuable modifications will be combined into an updated version of the evaluation code containing revised intermediate results (at specified intervals).

evaluation/scripts/locomo/locomo_eval.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -363,8 +363,8 @@ async def limited_task(task):
363363
parser.add_argument(
364364
"--lib",
365365
type=str,
366-
choices=["zep", "memos", "mem0", "mem0_graph", "memos_mos", "langmem", "openai"],
367-
help="Specify the memory framework (zep or memos or mem0 or mem0_graph or memos_mos)",
366+
choices=["zep", "memos", "mem0", "mem0_graph", "langmem", "openai"],
367+
help="Specify the memory framework (zep or memos or mem0 or mem0_graph)",
368368
)
369369
parser.add_argument(
370370
"--version",

evaluation/scripts/locomo/locomo_ingestion.py

Lines changed: 5 additions & 48 deletions
Original file line numberDiff line numberDiff line change
@@ -15,10 +15,8 @@
1515

1616
from memos.configs.mem_cube import GeneralMemCubeConfig
1717
from memos.configs.mem_os import MOSConfig
18-
from memos.configs.memory import MemoryConfigFactory
1918
from memos.mem_cube.general import GeneralMemCube
2019
from memos.mem_os.main import MOS
21-
from memos.memories.factory import MemoryFactory
2220

2321

2422
custom_instructions = """
@@ -61,29 +59,10 @@ def get_client(frame: str, user_id: str | None = None, version: str = "default")
6159
return mem0
6260

6361
elif frame == "memos":
64-
config_path = "configs/text_memos_config.json"
65-
with open(config_path) as f:
66-
config_data = json.load(f)
67-
config_data["config"]["extractor_llm"]["config"]["model_name_or_path"] = os.getenv("MODEL")
68-
config_data["config"]["extractor_llm"]["config"]["api_key"] = os.getenv("OPENAI_API_KEY")
69-
config_data["config"]["extractor_llm"]["config"]["api_base"] = os.getenv("OPENAI_BASE_URL")
70-
config_data["config"]["vector_db"]["config"]["path"] = (
71-
f"results/locomo/memos-{version}/storages/{user_id}/qdrant"
72-
)
73-
config_data["config"]["embedder"]["config"]["model_name_or_path"] = os.getenv(
74-
"EMBEDDING_MODEL"
75-
)
76-
77-
config = MemoryConfigFactory.model_validate(config_data)
78-
79-
m = MemoryFactory.from_config(config)
80-
m.load(f"results/locomo/memos-{version}/storages/{user_id}")
81-
return m
82-
83-
elif frame == "memos_mos":
8462
mos_config_path = "configs/mos_memos_config.json"
8563
with open(mos_config_path) as f:
8664
mos_config_data = json.load(f)
65+
mos_config_data["top_k"] = 20
8766
mos_config = MOSConfig(**mos_config_data)
8867
mos = MOS(mos_config)
8968
mos.create_user(user_id=user_id)
@@ -147,20 +126,6 @@ def ingest_session(client, session, frame, metadata, revised_client=None):
147126
)
148127

149128
elif frame == "memos":
150-
for chat in tqdm(session, desc=f"{metadata['session_key']}"):
151-
data = chat.get("speaker") + ": " + chat.get("text")
152-
print({"context": data, "conv_id": conv_id, "created_at": iso_date})
153-
msg = [{"role": "user", "content": data}]
154-
155-
try:
156-
memories = client.extract(msg)
157-
except Exception as ex:
158-
print(f"Error extracting message {msg}: {ex}")
159-
memories = []
160-
print(memories)
161-
client.add(memories)
162-
163-
elif frame == "memos_mos":
164129
messages = []
165130
messages_reverse = []
166131

@@ -276,14 +241,11 @@ def process_user(conv_idx, frame, locomo_df, version, num_workers=1):
276241
client.delete_all(user_id=f"{conversation.get('speaker_a')}_{conv_idx}")
277242
client.delete_all(user_id=f"{conversation.get('speaker_b')}_{conv_idx}")
278243
elif frame == "memos":
279-
conv_id = "locomo_exp_user_" + str(conv_idx)
280-
client = get_client("memos", conv_id, version)
281-
elif frame == "memos_mos":
282244
conv_id = "locomo_exp_user_" + str(conv_idx)
283245
speaker_a_user_id = conv_id + "_speaker_a"
284246
speaker_b_user_id = conv_id + "_speaker_b"
285-
client = get_client("memos_mos", speaker_a_user_id, version)
286-
revised_client = get_client("memos_mos", speaker_b_user_id, version)
247+
client = get_client("memos", speaker_a_user_id, version)
248+
revised_client = get_client("memos", speaker_b_user_id, version)
287249

288250
sessions_to_process = []
289251
for session_idx in range(max_session_count):
@@ -324,11 +286,6 @@ def process_user(conv_idx, frame, locomo_df, version, num_workers=1):
324286
except Exception as e:
325287
print(f"Error processing user {conv_idx}, session {session_key}: {e!s}")
326288

327-
if frame == "memos":
328-
conv_id = "locomo_exp_user_" + str(conv_idx)
329-
client.dump(f"results/locomo/memos-{version}/storages/{conv_id}")
330-
del client
331-
332289
end_time = time.time()
333290
elapsed_time = round(end_time - start_time, 2)
334291
print(f"User {conv_idx} processed successfully in {elapsed_time} seconds")
@@ -383,8 +340,8 @@ def main(frame, version="default", num_workers=4):
383340
parser.add_argument(
384341
"--lib",
385342
type=str,
386-
choices=["zep", "memos", "mem0", "mem0_graph", "memos_mos"],
387-
help="Specify the memory framework (zep or memos or mem0 or mem0_graph or memos_mos)",
343+
choices=["zep", "memos", "mem0", "mem0_graph"],
344+
help="Specify the memory framework (zep or memos or mem0 or mem0_graph)",
388345
)
389346
parser.add_argument(
390347
"--version",

evaluation/scripts/locomo/locomo_metric.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,8 +9,8 @@
99
parser.add_argument(
1010
"--lib",
1111
type=str,
12-
choices=["zep", "memos", "mem0", "mem0_graph", "memos_mos", "langmem", "openai"],
13-
help="Specify the memory framework (zep or memos or mem0 or mem0_graph or memos_mos)",
12+
choices=["zep", "memos", "mem0", "mem0_graph", "langmem", "openai"],
13+
help="Specify the memory framework (zep or memos or mem0 or mem0_graph)",
1414
)
1515
parser.add_argument(
1616
"--version",

evaluation/scripts/locomo/locomo_responses.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ async def locomo_response(frame, llm_client, context: str, question: str) -> str
2424
context=context,
2525
question=question,
2626
)
27-
elif frame == "memos" or frame == "memos_mos":
27+
elif frame == "memos":
2828
prompt = ANSWER_PROMPT_MEMOS.format(
2929
context=context,
3030
question=question,
@@ -124,8 +124,8 @@ async def main(frame, version="default"):
124124
parser.add_argument(
125125
"--lib",
126126
type=str,
127-
choices=["zep", "memos", "mem0", "mem0_graph", "memos_mos", "openai"],
128-
help="Specify the memory framework (zep or memos or mem0 or mem0_graph or memos_mos)",
127+
choices=["zep", "memos", "mem0", "mem0_graph", "openai"],
128+
help="Specify the memory framework (zep or memos or mem0 or mem0_graph)",
129129
)
130130
parser.add_argument(
131131
"--version",

evaluation/scripts/locomo/locomo_search.py

Lines changed: 10 additions & 43 deletions
Original file line numberDiff line numberDiff line change
@@ -15,12 +15,10 @@
1515
from zep_cloud.client import Zep
1616

1717
from memos.configs.mem_os import MOSConfig
18-
from memos.configs.memory import MemoryConfigFactory
1918
from memos.mem_os.main import MOS
20-
from memos.memories.factory import MemoryFactory
2119

2220

23-
def get_client(frame: str, user_id: str | None = None, version: str = "default"):
21+
def get_client(frame: str, user_id: str | None = None, version: str = "default", top_k: int = 20):
2422
if frame == "zep":
2523
zep = Zep(api_key=os.getenv("ZEP_API_KEY"), base_url="https://api.getzep.com/api/v2")
2624
return zep
@@ -30,29 +28,10 @@ def get_client(frame: str, user_id: str | None = None, version: str = "default")
3028
return mem0
3129

3230
elif frame == "memos":
33-
config_path = "configs/text_memos_config.json"
34-
with open(config_path) as f:
35-
config_data = json.load(f)
36-
config_data["config"]["extractor_llm"]["config"]["model_name_or_path"] = os.getenv("MODEL")
37-
config_data["config"]["extractor_llm"]["config"]["api_key"] = os.getenv("OPENAI_API_KEY")
38-
config_data["config"]["extractor_llm"]["config"]["api_base"] = os.getenv("OPENAI_BASE_URL")
39-
config_data["config"]["vector_db"]["config"]["path"] = (
40-
f"results/locomo/memos-{version}/storages/{user_id}/qdrant"
41-
)
42-
config_data["config"]["embedder"]["config"]["model_name_or_path"] = os.getenv(
43-
"EMBEDDING_MODEL"
44-
)
45-
46-
config = MemoryConfigFactory.model_validate(config_data)
47-
48-
m = MemoryFactory.from_config(config)
49-
m.load(f"results/locomo/memos-{version}/storages/{user_id}")
50-
return m
51-
52-
elif frame == "memos_mos":
5331
mos_config_path = "configs/mos_memos_config.json"
5432
with open(mos_config_path) as f:
5533
mos_config_data = json.load(f)
34+
mos_config_data["top_k"] = top_k
5635
mos_config = MOSConfig(**mos_config_data)
5736
mos = MOS(mos_config)
5837
mos.create_user(user_id=user_id)
@@ -123,18 +102,6 @@ def get_client(frame: str, user_id: str | None = None, version: str = "default")
123102
"""
124103

125104

126-
def memos_search(client, query):
127-
start = time()
128-
search_results = client.search(query, top_k=20)
129-
context = ""
130-
for item in search_results:
131-
item = item.to_dict()
132-
context += f"{item['memory']}\n"
133-
print(query, context)
134-
duration_ms = (time() - start) * 1000
135-
return context, duration_ms
136-
137-
138105
def mem0_search(client, query, speaker_a_user_id, speaker_b_user_id, top_k=20):
139106
start = time()
140107
search_speaker_a_results = client.search(
@@ -192,7 +159,7 @@ def mem0_search(client, query, speaker_a_user_id, speaker_b_user_id, top_k=20):
192159
return context, duration_ms
193160

194161

195-
def memos_mos_search(client, query, conv_id, speaker_a, speaker_b, reversed_client=None):
162+
def memos_search(client, query, conv_id, speaker_a, speaker_b, reversed_client=None):
196163
start = time()
197164
search_a_results = client.search(
198165
query=query,
@@ -349,8 +316,8 @@ def search_query(client, query, metadata, frame, reversed_client=None, top_k=20)
349316
context, duration_ms = mem0_graph_search(
350317
client, query, speaker_a_user_id, speaker_b_user_id, top_k
351318
)
352-
elif frame == "memos_mos":
353-
context, duration_ms = memos_mos_search(
319+
elif frame == "memos":
320+
context, duration_ms = memos_search(
354321
client, query, conv_id, speaker_a, speaker_b, reversed_client
355322
)
356323
return context, duration_ms
@@ -394,11 +361,11 @@ def process_user(group_idx, locomo_df, frame, version, top_k=20, num_workers=1):
394361
}
395362

396363
reversed_client = None
397-
if frame == "memos_mos":
364+
if frame == "memos":
398365
speaker_a_user_id = conv_id + "_speaker_a"
399366
speaker_b_user_id = conv_id + "_speaker_b"
400-
client = get_client(frame, speaker_a_user_id, version)
401-
reversed_client = get_client(frame, speaker_b_user_id, version)
367+
client = get_client(frame, speaker_a_user_id, version, top_k=top_k)
368+
reversed_client = get_client(frame, speaker_b_user_id, version, top_k=top_k)
402369
else:
403370
client = get_client(frame, conv_id, version)
404371

@@ -474,8 +441,8 @@ def main(frame, version="default", num_workers=1, top_k=20):
474441
parser.add_argument(
475442
"--lib",
476443
type=str,
477-
choices=["zep", "memos", "mem0", "mem0_graph", "memos_mos", "langmem"],
478-
help="Specify the memory framework (zep or memos or mem0 or mem0_graph or memos_mos)",
444+
choices=["zep", "memos", "mem0", "mem0_graph", "langmem"],
445+
help="Specify the memory framework (zep or memos or mem0 or mem0_graph)",
479446
)
480447
parser.add_argument(
481448
"--version",

0 commit comments

Comments
 (0)