Memory
OpenLoomi’s memory system is a local-first, tiered knowledge base built from messages across connected platforms. It combines structured storage, vector search, and a scheduled forgetting engine that manages lifecycle transitions automatically.
This document covers the system architecture, data model, and the relationships between components.
System Overview
The memory system spans five distinct data layers, each serving a different purpose in the information architecture:
| Layer | Storage | Purpose |
|---|---|---|
| raw_messages | Local | Verbatim message records — the ground truth |
| memory_summaries | Local | Compressed summaries — derived from raw messages |
| Insights | Local | AI-extracted structured records from platform messages |
| Knowledge Base | Local | User-uploaded document chunks for RAG |
| Vector index | Local | Semantic search across all layers |
All layers originate from platform messages (and broader context inputs such as local files, audio/video, screenshots, Browser Use / Computer Use operation traces, etc.), but diverge into different representations for different purposes. The full pipeline maps to the MelandOS architecture:
Connectors → Processor → Memory → Insights → Chat/Search → Weight Adjustment + Forgetting Engine → Knowledge Base + MCP ToolsWhen a message arrives from a connector, it flows through this pipeline:
- Stored as a
raw_messagesrecord withmemoryStage: "short" - AAAK-encoded and embedded; embedding stored alongside the record
- Added to the vector index for semantic retrieval
- Periodically processed by the forgetting engine, which may compress groups of records into
memory_summaries
The three layers are not redundant — they serve different query patterns. Raw messages answer “what was said exactly.” Summaries answer “what was the gist of this time period.” Vector index answers “what messages are semantically similar to this query.”
Data Model
Raw Messages
raw_messages is the primary local object store. Each record represents a single ingested message.
raw_messages
├── id # Auto-increment primary key
├── messageId # Platform-specific message ID (unique index)
├── platform # slack | discord | telegram | imessage | ...
├── userId # Owner of this record
├── botId # Bot/user who sent the message
├── channel # Platform channel identifier
├── person # Contact or conversation identifier
├── timestamp # Unix ms when message was sent
├── createdAt # Unix ms when stored locally
├── content # Full message text
├── attachments # [{name, url, contentType, sizeBytes}]
├── embedding # 1536-dim float array
├── embeddingModel # e.g. "text-embedding-3-small"
├── embeddingContentHash # FNV-64a of content for dream/re-embed detection
├── embeddingDimensions # Should be 1536
├── embeddingUpdatedAt # When embedding was last computed
├── metadata # Platform-specific extras
├── memoryStage # "short" | "mid" | "long"
├── accessCount # Number of times this record was retrieved
├── lastAccessAt # Unix ms of last retrieval
├── importanceScore # 0-1, provided importance signal
├── archivedAt # Set when details are archived after summarization
├── isPinned # User-marked important
└── summaryRefId # Reference to the memory_summaries record, if summarizedIndexes:
userId_memoryStage(compound) — filters records by owner and tier for forgetting engine candidate scansuserId_timestamp(compound) — enables time-bounded queries sorted by recencymessageId(unique) — fast platform-ID lookuparchivedAt— cleanup for hard delete of old archived recordsisPinned— filter pinned records
Memory Summaries
memory_summaries stores compressed representations of groups of raw messages. Created by the forgetting engine during tier transitions.
memory_summaries
├── summaryId # "ms_<hash>" — deterministic ID from inputs
├── userId # Owner
├── summaryTier # "L1" | "L2" | "L3" (maps from short→L1, mid→L2, long→L3)
├── sourceTier # The tier before transition
├── startTimestamp # Inclusive start of the grouped window
├── endTimestamp # Inclusive end
├── messageCount # How many raw records are in this summary
├── sourceRecordIds # IDs of the compressed raw_messages records
├── keyPoints # Extracted highlights from the group
├── keywords # Extracted keyword tokens
├── keywordsText # keywords[] joined for contains() search
├── summaryText # Human-readable one-paragraph summary
├── dimensions # {platform, channel, person, botId} — preserved from source
├── qualityScore # 0-1 quality indicator from summarizer
├── createdAt # When summary was created
└── updatedAt # Last modification timeIndexes:
userId_summaryTier(compound) — filter by summary leveluserId_endTimestamp(compound) — time-bounded queries by recency
The sourceRecordIds array is the link between layers. A summary references the raw records it was derived from. Raw records reference their summary via summaryRefId.
Relationships
raw_messages (N) ←─────── (1) memory_summaries
│
└── summaryRefId ──────────→ summaryId
└── sourceRecordIds ────────→ id (reverse)
One raw_messages record belongs to one summary (after summarization).
One memory_summaries record covers N raw_messages records.When a record is archived (archivedAt is set), its content field is omitted from the in-memory representation — the details are considered “compressed.” The original raw message is preserved in metadata.__rawMessage for potential reconstruction.
AAAK Symbol Language
Before a message is embedded, its text is encoded into AAAK (OpenLoomi’s compressed symbol language). This encoding normalizes the text and appends structured metadata as a prefix, so the resulting embedding captures both semantic content and contextual signals.
Encoding Format
buildMemoryRecordEmbeddingDocument() produces:
Text: <message content, whitespace-normalized, max 8000 chars>
Time: <unix timestamp>
Tier: <short | mid | long>
Media: <media refs joined by ", " or "none">
Dimensions: platform: <val>; channel: <val>; ...
Metadata: <flattened key:value pairs, max 2 levels deep>Key Encoding Rules
- Whitespace: Collapses
/\s+/gto a single space - Metadata flattening: Max 2 levels deep, keys sorted alphabetically, keys starting with
__excluded - Truncation: Smart boundary detection at
\n,.,;, or space within 75% of maxLength (8000 chars) - Content hashing: FNV-64a hash with version prefix
memory-record-embedding-text-v1:— used by the dream process to detect changed content that needs re-embedding
The encoding is designed so that:
- The semantic core (message text) dominates the embedding
- Temporal and tier signals are present but secondary
- Metadata enables faceted filtering in vector search
Vector Layer
Vector Storage
Vector storage varies by platform:
Desktop uses a dedicated vector engine. Web stores vectors directly in the raw_messages.embedding field.
Cosine similarity is computed client-side:
similarity = dot(vecA, vecB) / (norm(vecA) * norm(vecB));Search scans up to scanLimit = limit * 10 records, computes similarity against each, filters by threshold (default 0.7), and returns top limit sorted by similarity.
Hybrid Search
Search uses both vector similarity and keyword matching:
- Semantic path: Embed query → vector search → similarity scores
- Keyword path: Query AAAK-encoded keyword field → exact matches
- Merge: Results combined and sorted by relevance
The keyword index catches exact matches (specific names, IDs, dates) that semantic similarity might miss due to embedding variance.
The Forgetting Engine
The forgetting engine is a scheduled background process that manages the memory lifecycle. It promotes records between tiers and compresses groups into summaries.
Tier Lifecycle
short (minutes–7 days) → mid (7–90 days) → long (90+ days)Age alone does not determine promotion — a value score does. Records are evaluated when they exceed the tier’s maximum age.
Scoring Formula
Records are scored on a 0–1 scale (higher = more worth keeping):
score = clamp01(
0.35 * recencyScore +
0.30 * accessScore +
0.25 * importanceScore +
0.10 * mediaScore +
pinnedBoost
)
recencyScore = clamp01(1 - ageMs / (180 * DAY_MS))
accessScore = clamp01(log1p(accessCount) / log(10))
importanceScore = max(providedImportance, inferredImportance)
# inferredImportance = hits/4 from keyword scan
# keywords: deadline, todo, urgent, risk, decision, blocker,
# meeting, action item, milestone, bug, incident, follow up
mediaScore = hasMediaRefs ? 0.7 : 0.25
pinnedBoost = isPinned ? 0.3 : 0Promotion thresholds:
| Transition | Threshold | Max Age |
|---|---|---|
| short → mid | 0.65 | 7 days |
| mid → long | 0.45 | 90 days |
Records scoring below the threshold for their age boundary are archived. Their verbatim content is preserved (via archivedAt), but the in-memory representation is compressed.
Grouping and Summarization
The engine does not evaluate records individually. It groups them first:
- Group window: short tier uses 1-day buckets; mid tier uses 7-day buckets
- Dimension key: Groups are further segmented by
platform,channel,person,botId— so a single bucket contains only records sharing the same dimension values - Minimum group size: 3 records — smaller groups are skipped
- Maximum candidates: 500 records per tier per run to avoid long-running transactions
Within each group, RuleBasedMemorySummarizer produces a MemorySummary record with keyPoints, keywords, summaryText, and qualityScore. The raw records in that group are linked to the new summary via summaryRefId.
Lock Mechanism
The engine uses a process-local lock to prevent concurrent runs:
Lock key: memory_forgetting:<userId>
Lock TTL: 60,000ms
Token format: <key>:<timestamp>:<random>If a new cycle starts while one is running, the second cycle returns status: "skipped_locked" and exits early.
Tier-to-SummaryTier Mapping
| Memory Tier | Summary Tier |
|---|---|
| short | L1 |
| mid | L2 |
| long | L3 |
This L1/L2/L3 distinction in memory_summaries.summaryTier allows the query layer to know the provenance of each summary — what lifecycle stage the source material was in when summarized.
Query Flow
When you ask OpenLoomi about your memory, the query goes through several layers:
Semantic Search
- User query is embedded via
text-embedding-3-small→ 1536-dim vector - Vector index is queried
- Top-k candidates retrieved, scored by
1 - distance - Filtered by threshold (default 0.7)
- Sorted by similarity score descending
Raw Message Fallback
If semantic results are insufficient (results < minRawResultsWithoutFallback), the system also queries memory_summaries:
- Keyword search on
keywordsTextfield - Time-bounded query on
userId_endTimestamp - Results merged with semantic results and resort by timestamp
Access Tracking
When a raw message record is retrieved (whether via semantic search or direct lookup), the system marks it:
accessCount += 1
lastAccessAt = nowThis access data feeds back into the scoring formula, so frequently accessed memories score higher and are less likely to be archived.
Insights
Insights are AI-extracted structured records derived from platform messages. Where Memory stores verbatim records and summaries for retrieval, Insights captures high-level facts, decisions, and events that the AI identifies as worth tracking separately.
Insights vs Memory
These are completely separate systems:
| Memory | Insights | |
|---|---|---|
| Location | Local-first | Local |
| Content | Messages and summaries | AI-extracted structured facts |
| Management | Forgetting engine (tier transitions) | Weight adjustment (boost/decay) |
| Source | Platform messages | AI subagent analysis of messages |
Raw messages are the shared origin: platforms fetch messages and feed both the insight extraction pipeline and the memory storage pipeline. The two systems then diverge — memory stays close to the original text, while insights are structured abstractions.
Data Model
Key insight fields:
insights
├── id # UUID, deterministic from botId + dedupeKey
├── title # Short identifying label
├── description # Natural language summary
├── importance # critical | high | medium | low
├── urgency # immediate | urgent | medium | low
├── details[] # Event-level data tracked over time
├── timeline[] # Chronological events
├── taskLabel # Category: bug_report, feature_request, etc.
├── insightWeights # Per-user tracking:
│ ├── accessCount30d # Access count in last 30 days
│ ├── accessCount7d # Access count in last 7 days
│ ├── currentEventRank # Ranking position
│ └── customWeightMultiplier # User-adjusted multiplierValue Score
Insights are ranked using a 4-signal formula:
valueScore = 0.45 * frequencyScore + 0.25 * freshnessScore + 0.20 * relevanceScore + 0.10 * favoriteScore- frequencyScore: Log-scaled access count relative to a configured maximum
- freshnessScore:
<1 day → 1.0,<7 days → 0.8,<30 days → 0.45… - relevanceScore:
importance * 0.7 + urgency * 0.3 - favoriteScore:
1if favorited, else0
Weight Adjustment System
Insight weights change dynamically based on user interactions:
Favorite boost: multiplier = min(5.0, currentWeight * 1.5), 7-day duration
View boost: multiplier = min(5.0, currentWeight * 1.1), 24-hour duration, only applied after >1 day of inactivity
Decay: Applied to insights not viewed in a while:
- 7–14 days inactive → rate
0.95 - 14–30 days inactive → rate
0.85 - 30+ days inactive → rate
0.7(floor at0.3)
Active / Dormant Classification
- Active:
accessCount30d > 0 - Dormant:
accessCount30d == 0
Trend
The trend signal compares recent access against the prior period:
- Rising: recent 7d accesses ≥ previous 7d accesses + 25%
- Falling: previous 7d accesses ≥ recent 7d accesses + 25%
- Stable: otherwise
Generation Pipeline
Insights are generated server-side in a batch pipeline:
- Messages are grouped by
platform + channel - An AI subagent analyzes each group and extracts structured
InsightDatarecords - Records are upserted with deduplication (same
botId + dedupeKey) - Embeddings are generated for each insight
Knowledge Base
The Knowledge Base is a user-uploaded document RAG system. Unlike memory (which is built from platform messages) and insights (which are AI-extracted), the Knowledge Base is explicitly populated by the user — they upload files they want the AI to be able to reason about.
Supported Formats
PDF, DOCX, PPTX, XLSX, CSV, TXT, MD, Apple formats (Pages, Numbers, Keynote)
Data Model
rag_documents
├── id # Document identifier
├── userId # Owner
├── fileName # Original filename
├── contentType # MIME type
├── sizeBytes # File size
├── totalChunks # Number of chunks extracted
├── blobPath # Storage path for original file
├── uploadedAt # Timestamp
└── metadata # Extracted metadata (title, author, etc.)
rag_chunks
├── id # Chunk identifier
├── documentId # Parent document reference
├── userId # Owner
├── chunkIndex # Position in document
├── content # Text content (1000 chars)
├── embedding # 1536-dim vector
└── metadata # Chunk-level metadataChunks are created using RecursiveCharacterTextSplitter with 1000-character target size and 200-character overlap.
RAG Pipeline
- Parse:
parseFile()extracts text from the uploaded format using LangChain loaders - Split:
splitDocuments()produces overlapping chunks - Embed:
embedDocuments()generates 1536-dimtext-embedding-3-smallvectors (via OpenAI or OpenRouter) - Store: Chunks inserted into
rag_documents+rag_chunksin batches of 1000
Query
Vector similarity search against rag_chunks, using cosine distance:
- Threshold: 0.7 (70% similarity required)
- Default limit: 5 results
Insight Settings as Knowledge Base
When a user configures personalization in Insight Settings — focus people, topics, AI soul prompt — these preferences are converted into a memory.txt document and inserted into the Knowledge Base. This ensures the AI’s personal context is always included in RAG retrieval.
MCP Tools
| Tool | Description |
|---|---|
searchKnowledgeBase(query, limit, documentIds?) | Semantic search across document chunks |
getFullDocumentContent(documentId) | Retrieve the complete text of a document |
listKnowledgeBaseDocuments(limit) | List recently uploaded documents |
Key Design Decisions
Why Tiered Storage Instead of a Single Store?
Raw messages are cheap to write but expensive to scan. As time passes, older messages are accessed less frequently but carry historical value. The tiered model lets the system keep raw records for recent periods (where access is common) and compress older material (where verbatim retrieval is rare) into summaries.
Why FNV-64a for Content Hashing?
The dream process (re-embedding stale or changed content) needs to detect when content has changed without comparing the full text. FNV-64a is a fast, non-cryptographic hash suitable for content fingerprinting. The versioned prefix (memory-record-embedding-text-v1:) allows future encoding format changes to trigger re-embedding automatically.
Why Write-Ahead Logging Mode?
The background indexing pipeline writes new vectors while the user may be simultaneously querying. WAL (Write-Ahead Logging) allows concurrent readers without blocking the writer, and without the writer blocking readers. This is critical for maintaining <500ms ingestion latency under read load.
Why Log-Scale Access Score?
accessScore = clamp01(log1p(accessCount) / log(10)) means the access score grows rapidly at low counts (1 access → ~0.46, 2 → ~0.56, 5 → ~0.78) but plateaus at high counts (10 → ~1.0). This reflects diminishing returns — a message accessed 100 times is not 10x more important than one accessed 10 times.
Why dimension-key Grouping?
Grouping by platform + channel + person + botId ensures that summaries respect natural conversation boundaries. A week’s worth of Slack messages in #engineering won’t be compressed into the same summary as a week’s Telegram messages from a different person. This preserves topical coherence in the summarization output.
Memory as a Skill
Memory is also available as a standalone Skill for integration with other Agent systems. This allows any AI agent to connect to OpenLoomi’s memory capabilities and leverage the same tiered storage, vector search, and knowledge base features.
Skill Capabilities
The Memory Skill exposes the following capabilities:
| Feature | Description |
|---|---|
| Memory Files Search | Case-insensitive full-text search across local memory files (~/.openloomi/data/memory/) |
| Knowledge Base Search | Semantic document search using RAG/embeddings on the OpenLoomi server |
| Insights | Query AI-extracted structured records from chat history including decisions, action items, preferences, and relationships |
Three Memory Types
- Memory Files: Personal markdown/JSON files stored locally at
~/.openloomi/data/memory/with subdirectories for chats, channels, people, projects, notes, and strategy - Knowledge Base: Uploaded documents searchable via RAG/embeddings on the OpenLoomi server
- Insights: Structured information extracted from chat history, including decisions, action items, preferences, and relationships
Agent Integrations
The Memory Skill supports 10+ communication channels including Gmail, Slack, Discord, Telegram, WhatsApp, and more. This enables agents to:
- Search across all connected platform histories
- Extract and track key decisions and action items
- Maintain context across conversations
- Access uploaded documents and knowledge bases
API Endpoints
The skill exposes REST endpoints at http://localhost:3415/api/:
| Endpoint | Description |
|---|---|
| Document search | Semantic search across knowledge base |
| Insight management | Query and manage extracted insights |
| Usage analytics | Track access frequency and relevance |
Authentication
The CLI automatically reads authentication tokens from ~/.openloomi/token (base64 encoded JWT).
For full integration details, visit the openloomi-memory Skill .