NGT Memory

Give your LLM a memory it never forgets

Drop-in REST API that adds persistent, cross-session memory to any LLM application. 2ms retrieval. Zero infrastructure.

v0.23.0Apache 2.0Python 3.10+ Star on GitHub

Memory Pipeline

Usermessage
NGTMemory
LLMGPT-4o
Responseresult
Cosine + Graph retrieval<2ms latencyPersistent across sessions

Proven results — not just promises

Benchmarked against baseline LLM with no memory layer

+83%
Factual consistency improvement
2ms
Memory retrieval latency
42K/sec
Store throughput

Benchmark results (Exp 44)

ModeFactual scoreKeyword hit
NGT Memory2.44 / 357%
No memory1.33 / 327%

LLMs without memory are broken by design

Every session starts fresh. Your users have to repeat themselves. Your AI gives dangerous generic advice. NGT Memory fixes this.

Without memory
With NGT Memory
LLM recommends meat to a vegetarian
Remembers user dietary preferences
Asks to repeat context every session
Persists facts between sessions
Generic advice ignoring user history
Personalized responses every time
Dangerous advice in medical/finance contexts
Respects allergies, medications, restrictions

Real example — Restaurant recommendation in Kyoto

No memory

“Ippudo is great for ramen lovers” — recommends meat to a vegetarian

With NGT Memory

“Shigetsu at Tenryu-ji serves shojin ryori (Buddhist vegan cuisine)” — personalized because it remembers you’re vegetarian

How NGT Memory works

A simple pipeline that injects relevant memories into every LLM prompt

Request Pipeline

POST /chat
HTTP Request
Embed
text-embedding-3-small
~700ms
NGT Retrieve
cosine + graph
~2ms
LLM Prompt
[MEMORY CONTEXT] injected
GPT-4o-mini
Generate response
~1500ms
Store
NGT Memory
~1ms

Cosine Similarity

Semantically close facts retrieved via vector similarity search

Hebbian Graph

Associative links between concepts, like the human brain

Hierarchical Consolidation

Important facts promoted to long-term memory automatically

Up and running in 5 minutes

Drop-in REST API — no new infrastructure, no vector database, no vendor lock-in

bash
# 1. Clone the repository
git clone https://github.com/ngt-memory/ngt-memory.git
cd ngt-memory

# 2. Configure environment
cp .env.example .env
# Set OPENAI_API_KEY in .env

# 3. Start the service
docker-compose up -d

# ✓ NGT Memory is running at http://localhost:8000
REST API · OpenAPI spec included

Everything you need

Production-ready memory layer with all the features your LLM app needs

Persistent Memory

Stores facts between sessions — users never repeat themselves

2ms Retrieval

Graph + cosine search with no external database required

Drop-in REST API

Integrates into any LLM app in under 5 minutes

Multi-session

Isolated memory per user — scales to thousands of sessions

Docker Ready

One command full deployment — docker-compose up -d

Local-first

Runs entirely on your infrastructure — no cloud dependency

Hebbian Graph

Associative links between concepts, like the human brain

Built-in Analytics

Memory metrics, session stats, retrieval performance

API Key Auth

Optional endpoint protection with configurable API keys

How we compare

NGT Memory is the only solution that requires no external vector database and delivers sub-2ms retrieval

Feature★ BestNGT MemoryMem0ZepLangChain Memory
Self-hosted
No vector DB required
Hebbian graph
Retrieval latency2ms~50ms~100ms~30ms
Open source
REST API

Built for real-world AI applications

From healthcare to consumer apps — NGT Memory makes every LLM application smarter with context

Healthcare

Medical AI Assistant

Remembers allergies, medications, and patient history across sessions. Never gives advice that conflicts with known conditions.

💡 Patient mentioned penicillin allergy 3 sessions ago → avoided in all subsequent recommendations

Consumer

Personal AI Companion

Remembers preferences, plans, and important life events. Grows smarter and more personal with every conversation.

💡 Knows you're vegetarian, live in Berlin, and training for a marathon

Enterprise

Customer Support Bot

Remembers support history, preferences, and past resolutions. No more asking customers to repeat themselves.

💡 Customer contacted support 3 times about billing → context injected automatically

Ready to give your LLM a memory?

Join developers building smarter AI applications with persistent memory. Open source. Self-hosted. Production-ready.

Apache 2.0
Free forever
Self-hosted
Your data, your control
2ms retrieval
Production-ready speed