Memory Benchmarks
OpenLoomi's memory system is rigorously evaluated against academic and industry benchmarks to ensure industry-leading performance in real-world scenarios.
LoCoMo (Long-term Conversation Memory)
Source: Stony Brook University β Academic Benchmark Dataset
LoCoMo contains real conversation records with corresponding observations, summaries, and QA pairs, specifically designed to evaluate memory system performance across different retrieval modes.
Question Categories
| Category | Description |
|---|---|
single_hop | Single memory retrieval fact recall |
temporal | Date/time reasoning questions |
multi_hop | Cross-session multi-step reasoning |
open_domain | Open-domain fusion Q&A |
Performance is on par with leading open-source memory projects like agentmemory and mempalace.
π GitHub Repository
LongMemEval-S
Scale: 500 QA Pairs, 10+ Question Types, 100+ Sessions
Extracted from real multi-turn conversations, LongMemEval contains questions covering short-term memory, cross-session reasoning, temporal reasoning, and more β evaluating Agent memory recall capability in complex scenarios.
Question Types
- single-session-assistant: Assistant interactions within a single session
- single-session-user: User interactions within a single session
- multi-session: Cross-session multi-step reasoning
- temporal-reasoning: Time-sensitive query reasoning
- knowledge-update: Knowledge iteration and fact changes
- single-session-preference: User preference queries
π GitHub Repository
CL-bench (Context Learning Benchmark)
Source: Tencent β Industry Benchmark Dataset
Scale: 1,899 Tasks (CL-bench), 405 Tasks (CL-bench-Life)
CL-bench evaluates AI models' context learning capabilities across professional and everyday life scenarios. It tests the model's ability to understand, reason about, and apply information from extended context.
Task Categories
| Category | Description |
|---|---|
| Domain Knowledge Reasoning | Professional domain-specific reasoning |
| Language Understanding | Natural language comprehension |
| Information Extraction | Structured information extraction from context |
| Text Generation | Context-aware content generation |
π GitHub Repository