SillyTavern Extension · System Architecture · v1.7.5

Smart Memory

A multi-tier memory extraction, deduplication, and injection pipeline for long-form LLM roleplay. Hardware-adaptive, model-agnostic, and resilient to mid-flight chat switches and swipes.

statusproduction
tiers4 extraction + dedup + consolidation + injection
profilesA · Ollama/WebLLM   B · Main/OpenAI
storageextension_settings + chatMetadata
injection slots13 named
Events Orchestration Generation Hardware Profile T1 · Compaction T2 · Scene Layer T3 · Batch Extraction T4 · Derived Layers Deduplication Consolidation Migration Storage Token Mgmt Injection Macros Final Prompt
01 · Input

⚡ SillyTavern Events

Entry points wired into ST's event bus — solo + group flows + swipe interruption.
Message rendered EV1
  • CHARACTER_MESSAGE_RENDERED
  • makeLast — runs after all other extensions
Chat lifecycle EV2
  • CHAT_CHANGED
  • CHAT_LOADED
  • debounced
Group orchestration EV3
  • GROUP_WRAPPER_STARTED
  • GROUP_MEMBER_DRAFTED
  • GROUP_WRAPPER_FINISHED
Swipe abort EV4
  • MESSAGE_SWIPED
  • → abort in-flight LLM call
dispatch → orchestration scheduler
02 · Control

🎛️ Orchestration — index.js

Solo vs. group flows, message-count gating, and clean abort on mid-operation chat change.
Solo chat flow OR1
  • messagesSinceLastExtraction counter
  • batches every N messages
Group chat flow OR2
  • sceneMessageBuffer · roundResponders
  • per-round batching
CHAT_SWITCHED sentinel OR3
  • clean abort on mid-operation chat change
03 · LLM Generation

🤖 generate.js

Backend-agnostic. All reasoning tokens stripped before parsers see output.
Main API GM
  • generateRaw
Ollama GO
  • /api/chat
  • 8192-token budget
OpenAI-compat GO2
  • /v1/chat/completions
WebLLM GW
  • in-browser inference
Think-block stripper · GT ⟨think⟩…⟨/think⟩ removed from every backend response before the extraction parsers see output.
04 · Config

⚙️ Hardware Profile

Auto-detected. Caps, thresholds, and policy depend on inferred model class.
Profile A Ollama · WebLLM
  • conservative per-type caps (2/pass)
  • manual continuity only
  • noun-derived triggers only
Profile B Main · OpenAI-compat
  • richer extraction (4/pass)
  • auto continuity check post-turn
  • auto canon regen on arc resolve
  • LLM-suggested context triggers
generation results  ·  extraction prompts dispatched sequentially  ·  profile caps applied
05 · Extraction Pipeline

⚙️ Four Tiers · Sequential

No Promise.all. Ollama serialises; parallel risks OOM on 8 GB VRAM.
Tier 1 · Compaction progressive · extends summary, never rewrites
Rolling summary PCcompaction.js
  • Token-threshold check (configurable %)
  • Progressive UPDATE_SUMMARY prompt
  • extends existing summary, does not rewrite
  • summaryEnd index tracks last included message
Tier 2 · Scene Layer scene-break detection drives downstream epistemic pass
Scene breaks PSscenes.js
  • heuristic patterns (location, time, cast)
  • + optional AI yes/no check
  • 2–3 sentence mini-summary per scene
  • links source_memory_ids → scene entry
Epistemic map PEepistemic.js
  • fires once per scene break
  • per-character knowledge map
  • [knows] [suspects] [believes] [unaware]
  • [hiding] (subject → target)
  • stored per-char · per-target
Tier 3 · Batch Extraction every N messages · feeds dedup pipeline
Session PB1session.js
  • [scene] [revelation]
  • [development] [detail]
  • dedup: cosine 0.82 / Jaccard 0.65
  • stored in chatMetadata
Long-term PB2longterm.js
  • [fact] [relationship] [preference] [event]
  • + relationship history (trusting·high…)
  • dedup + supersession detection
  • confidence decay over passes
  • extension_settings per character
Arcs PB3arcs.js
  • open plot threads
  • arc narrative summaries on resolve
  • persistent arcs survive across chats
  • 100-msg sliding window
  • extension_settings + chatMetadata
State ledger PB4state-ledger.js
  • mutable entity state snapshots
  • character: location · mood · injuries · goal
  • object: condition · owner
  • place: occupants · hazards
  • faction: leadership · objective
  • chatMetadata (not persistent)
Tier 4 · Derived Layers built from upstream tier output, regenerated on cadence
Profiles PD1profiles.js
  • character · world · relationship matrix
  • regenerated every N messages
  • stale after 30 min (configurable)
  • chatMetadata.profiles
Canon PD2canon.js
  • stable prose narrative
  • built from resolved arc summaries
  • + long-term memories (confidence ≥ 2)
  • manual trigger
  • extension_settings per character
Continuity PD3continuity.js
  • contradiction detection vs.
  • character card + long-term + session
  • optional one-shot repair note
  • auto on Profile B · manual on Profile A
extraction candidates → dedup & supersession
06 · Dedup

🔍 Deduplication — embeddings.js + similarity.js

Embeddings preferred, Jaccard fallback. Classifier decides keep / supersede / re-confirm / drop.
Semantic embeddings DE1
  • Ollama /api/embed
  • or OpenAI /v1/embeddings
  • API key in ST secrets store
Cosine similarity DE2
  • dup ≥ 0.82 · same-topic ≥ 0.55 (Profile A)
  • dup ≥ 0.85 · same-topic ≥ 0.52 (Profile B)
Jaccard fallback DE3
  • word-overlap
  • dup ≥ 0.65 · same-topic ≥ 0.40
  • auto when embeddings unavailable
batchVerify classifier DE4
  • passed (new)
  • superseded (state-change update)
  • uncertain (model confirmation)
  • rejected (duplicate)
Supersession chains DE5
  • valid_from / valid_to message indices
  • retired memories preserved
  • excluded from injection
Confidence decay DE6
  • unconfirmed counter per memory
  • boost + reset on re-extraction
  • drop after 10 unconfirmed passes
In-session vector cache DE7
  • normalized text → float[] vector
  • cleared on chat change
Flow
  • DE1 → DE2 + DE7
  • DE3 ⇢ DE2 (fallback)
  • DE2 → DE4 → DE5 + DE6
  • classified writes → Consolidation
07 · Consolidation

🧩 Consolidation — consolidation.js

Fires after dedup classification, before storage write. Per-type thresholds; LLM merges near-identical entries into richer single ones.
Trigger CN1
  • fires when entry count for a type crosses its threshold
  • per-type counters tracked independently
  • runs after dedup · before storage write
Merge prompt CN2
  • collects candidates for a single type
  • asks model to merge near-identical / redundant entries
  • output: fewer, richer entries
  • replaced entries retired via supersession
Scope CN3
  • runs for both long-term and session memory
  • independent per-tier passes
Long-term thresholds per type · independent counters
fact CN-L1
  • threshold-gated merge pass
  • stable knowledge synthesis
relationship CN-L2
  • threshold-gated merge pass
  • descriptors collapsed
preference CN-L3
  • threshold-gated merge pass
  • duplicates folded
event CN-L4
  • threshold-gated merge pass
  • co-occurring events combined
Session thresholds per type · independent counters
scene CN-S1
  • threshold-gated merge pass
  • contiguous scenes folded
revelation CN-S2
  • threshold-gated merge pass
  • overlapping reveals merged
development CN-S3
  • threshold-gated merge pass
  • arc beats coalesced
detail CN-S4
  • threshold-gated merge pass
  • redundant details collapsed
08 · Storage

💾 Two-Tier Storage

Persistent identity vs. ephemeral session state. Reset semantics differ by tier.
Persistent · extension_settings survives all sessions, all chats
Long-term memories SP1
  • fact · relationship · preference · event
Relationship history SP2
  • descriptor + magnitude pairs
  • per character pair
Entity registry SP3
  • name · type · aliases
  • state card templates
Epistemic knowledge SP4
  • knows · suspects · believes · unaware · hiding
  • per character · per target
Persistent arcs SP5
  • cross-chat open threads
Canon document SP6
  • stable narrative prose
Per-chat · chatMetadata per conversation · reset on Fresh Start
Short-term summary SC1
  • rolling progressive compaction
Session memories SC2
  • granular within-session details
Scene history SC3
  • ordered mini-summaries
Chat-scoped arcs SC4
  • open threads this conversation
State ledger SC5
  • mutable entity snapshots
Character profiles SC6
  • character · world · relations matrix
09 · Migrations

🔄 Schema Migration

graph-migration.js — never destructive.
Versioned store MIG
  • SCHEMA_VERSION stored per-character + per-chat
  • CHARACTER_MIGRATIONS + CHAT_MIGRATIONS registries
  • applied sequentially on every chat / character load
  • never removes steps — old chats upgradeable from v0
10 · Token Mgmt

📊 Budgets

Sliders TK1
  • 10 per-tier budget sliders
  • Simple mode: shared total cap
  • Advanced mode: independent per-tier
Trim stats TK2
  • injected vs full token counts
  • one-time trim toast (post-load only)
  • short-term exempt (self-corrects)
Auto-tune TK3
  • demand-driven adjustment
  • observed trim stats × 1.15 headroom
  • snap to nearest 50 tokens
Adaptive budgets TK4
  • turn classifier: dialogue · action · transition · intimate
  • per-tier multipliers applied per turn
read on inject · budgets clamped · 13 named slots populated
11 · Injection

💉 setExtensionPrompt · 13 named slots

Anchored slots vs. depth-relative slots vs. unified single-block mode.
IN_PROMPT · anchored at character-card depth 4 slots
smart_memory_short IJ1
  • summary
smart_memory_long IJ2
  • long-term memories
smart_memory_profiles IJ8 · depth 1
  • entity snapshots
smart_memory_canon IJ9 · depth 0
  • narrative document
IN_CHAT · depth-relative to current message 8 slots
smart_memory_triggered IJ3 · depth 4
  • contextual relevance reinjection
  • memories overlapping current turn
smart_memory_session IJ4
  • session memories
smart_memory_scenes IJ5 · depth 6
  • scene history
smart_memory_arcs IJ6 · depth 2
  • story arcs
smart_memory_relationships IJ7 · depth 5
  • relationship state
smart_memory_epistemic IJ10 · depth 1
  • character knowledge map
  • injected per responding char only
smart_memory_state_ledger IJ11 · depth 1
  • mutable entity state
smart_memory_repair IJ12 · depth 0
  • one-shot continuity correction
Unified Mode · optional replaces all individual slots
smart_memory_unified IJ13 · IN_PROMPT · depth 0
  • single merged context block
  • Canon → Profiles → Long-term → Short-term → Scenes → Session → Arcs
  • content cache bridges infrequent tiers
12 · Macros

🔧 macros.js

{{smartmemory-*}} tokens that resolve at prompt-assembly time.
Token registry MA1
  • 11 {{smartmemory-*}} tokens
  • 10 per-tier + {{smartmemory-unified}}
Auto-detection MA2a
  • scans character card fields for {{smartmemory-*}} tokens
  • system_prompt · description · personality · scenario · mes_example
  • activates only the slots whose tokens are present
Force macro injection mode MA2b
  • user-toggled setting
  • bypasses auto-detection
  • activates all macros unconditionally — instruct templates
Cache bridge MA3
  • content cache updated by inject functions
  • macro always returns latest output
  • individual macros inactive when unified is on
13 · Auxiliary

🌅 Recap & 🧪 Model Test

Out-of-band UX surfaces — recap on return, validation harness for model selection.
Away Recap RECAPrecap.js
  • tracks lastActive timestamp per chat
  • on return after threshold hours:
  • generates "Previously on…" summary
  • displayed as dismissible modal overlay
Extraction Model Test MTESTmodel-test.js
  • 3 fixed scenarios — all 5 tiers always run
  • Main: 30-message fantasy investigation → long-term · session · arcs
  • Epistemic: village healer scene → Perspectives + Secrets
  • State: dungeon heist excerpt → State Ledger
  • shows raw model output + quality hints
  • never writes to session or memories
slots populated · macros resolved · recap modal (if active) attached
⟶  Output  ⟵
🧠 Final Prompt → LLM Model
13 named slots · resolved macros · clamped budgets · optional repair note · optional recap overlay