Memory Benchmarks

OpenLoomi's memory system is rigorously evaluated against academic and industry benchmarks to ensure industry-leading performance in real-world scenarios.


LoCoMo (Long-term Conversation Memory)

Source: Stony Brook University β€” Academic Benchmark Dataset

LoCoMo contains real conversation records with corresponding observations, summaries, and QA pairs, specifically designed to evaluate memory system performance across different retrieval modes.

Question Categories

CategoryDescription
single_hopSingle memory retrieval fact recall
temporalDate/time reasoning questions
multi_hopCross-session multi-step reasoning
open_domainOpen-domain fusion Q&A

Performance is on par with leading open-source memory projects like agentmemory and mempalace.

πŸ”— GitHub Repository


LongMemEval-S

Scale: 500 QA Pairs, 10+ Question Types, 100+ Sessions

Extracted from real multi-turn conversations, LongMemEval contains questions covering short-term memory, cross-session reasoning, temporal reasoning, and more β€” evaluating Agent memory recall capability in complex scenarios.

Question Types

  • single-session-assistant: Assistant interactions within a single session
  • single-session-user: User interactions within a single session
  • multi-session: Cross-session multi-step reasoning
  • temporal-reasoning: Time-sensitive query reasoning
  • knowledge-update: Knowledge iteration and fact changes
  • single-session-preference: User preference queries

πŸ”— GitHub Repository


CL-bench (Context Learning Benchmark)

Source: Tencent β€” Industry Benchmark Dataset

Scale: 1,899 Tasks (CL-bench), 405 Tasks (CL-bench-Life)

CL-bench evaluates AI models' context learning capabilities across professional and everyday life scenarios. It tests the model's ability to understand, reason about, and apply information from extended context.

Task Categories

CategoryDescription
Domain Knowledge ReasoningProfessional domain-specific reasoning
Language UnderstandingNatural language comprehension
Information ExtractionStructured information extraction from context
Text GenerationContext-aware content generation

πŸ”— GitHub Repository