SillyTavern Extension · System Architecture · v1.7.5

Smart Memory

A multi-tier memory extraction, deduplication, and injection pipeline for long-form LLM roleplay. Hardware-adaptive, model-agnostic, and resilient to mid-flight chat switches and swipes.

statusproduction

tiers4 extraction + dedup + consolidation + injection

profilesA · Ollama/WebLLM B · Main/OpenAI

storageextension_settings + chatMetadata

injection slots13 named

Events Orchestration Generation Hardware Profile T1 · Compaction T2 · Scene Layer T3 · Batch Extraction T4 · Derived Layers Deduplication Consolidation Migration Storage Token Mgmt Injection Macros Final Prompt

01 · Input

⚡ SillyTavern Events

Entry points wired into ST's event bus — solo + group flows + swipe interruption.

Message rendered EV1

CHARACTER_MESSAGE_RENDERED
makeLast — runs after all other extensions

Chat lifecycle EV2

CHAT_CHANGED
CHAT_LOADED
debounced

Group orchestration EV3

GROUP_WRAPPER_STARTED
GROUP_MEMBER_DRAFTED
GROUP_WRAPPER_FINISHED

Swipe abort EV4

MESSAGE_SWIPED
→ abort in-flight LLM call

↓dispatch → orchestration scheduler

02 · Control

🎛️ Orchestration — `index.js`

Solo vs. group flows, message-count gating, and clean abort on mid-operation chat change.

Solo chat flow OR1

messagesSinceLastExtraction counter
batches every N messages

Group chat flow OR2

sceneMessageBuffer · roundResponders
per-round batching

CHAT_SWITCHED sentinel OR3

clean abort on mid-operation chat change

03 · LLM Generation

🤖 generate.js

Backend-agnostic. All reasoning tokens stripped before parsers see output.

Main API GM

generateRaw

Ollama GO

/api/chat
8192-token budget

OpenAI-compat GO2

/v1/chat/completions

WebLLM GW

in-browser inference

Think-block stripper · GT ⟨think⟩…⟨/think⟩ removed from every backend response before the extraction parsers see output.

04 · Config

⚙️ Hardware Profile

Auto-detected. Caps, thresholds, and policy depend on inferred model class.

Profile A Ollama · WebLLM

conservative per-type caps (2/pass)
manual continuity only
noun-derived triggers only

Profile B Main · OpenAI-compat

richer extraction (4/pass)
auto continuity check post-turn
auto canon regen on arc resolve
LLM-suggested context triggers

↓generation results · extraction prompts dispatched sequentially · profile caps applied

05 · Extraction Pipeline

⚙️ Four Tiers · Sequential

No Promise.all. Ollama serialises; parallel risks OOM on 8 GB VRAM.

Tier 1 · Compaction progressive · extends summary, never rewrites

Rolling summary PCcompaction.js

Token-threshold check (configurable %)
Progressive UPDATE_SUMMARY prompt
extends existing summary, does not rewrite
summaryEnd index tracks last included message

Tier 2 · Scene Layer scene-break detection drives downstream epistemic pass

Scene breaks PSscenes.js

heuristic patterns (location, time, cast)
+ optional AI yes/no check
2–3 sentence mini-summary per scene
links source_memory_ids → scene entry

Epistemic map PEepistemic.js

fires once per scene break
per-character knowledge map
[knows] [suspects] [believes] [unaware]
[hiding] (subject → target)
stored per-char · per-target

Tier 3 · Batch Extraction every N messages · feeds dedup pipeline

Session PB1session.js

[scene] [revelation]
[development] [detail]
dedup: cosine 0.82 / Jaccard 0.65
stored in chatMetadata

Long-term PB2longterm.js

[fact] [relationship] [preference] [event]
+ relationship history (trusting·high…)
dedup + supersession detection
confidence decay over passes
extension_settings per character

Arcs PB3arcs.js

open plot threads
arc narrative summaries on resolve
persistent arcs survive across chats
100-msg sliding window
extension_settings + chatMetadata

State ledger PB4state-ledger.js

mutable entity state snapshots
character: location · mood · injuries · goal
object: condition · owner
place: occupants · hazards
faction: leadership · objective
chatMetadata (not persistent)

Tier 4 · Derived Layers built from upstream tier output, regenerated on cadence

Profiles PD1profiles.js

character · world · relationship matrix
regenerated every N messages
stale after 30 min (configurable)
chatMetadata.profiles

Canon PD2canon.js

stable prose narrative
built from resolved arc summaries
+ long-term memories (confidence ≥ 2)
manual trigger
extension_settings per character

Continuity PD3continuity.js

contradiction detection vs.
character card + long-term + session
optional one-shot repair note
auto on Profile B · manual on Profile A

↓extraction candidates → dedup & supersession

06 · Dedup

🔍 Deduplication — `embeddings.js + similarity.js`

Embeddings preferred, Jaccard fallback. Classifier decides keep / supersede / re-confirm / drop.

Semantic embeddings DE1

Ollama /api/embed
or OpenAI /v1/embeddings
API key in ST secrets store

Cosine similarity DE2

dup ≥ 0.82 · same-topic ≥ 0.55 (Profile A)
dup ≥ 0.85 · same-topic ≥ 0.52 (Profile B)

Jaccard fallback DE3

word-overlap
dup ≥ 0.65 · same-topic ≥ 0.40
auto when embeddings unavailable

batchVerify classifier DE4

passed (new)
superseded (state-change update)
uncertain (model confirmation)
rejected (duplicate)

Supersession chains DE5

valid_from / valid_to message indices
retired memories preserved
excluded from injection

Confidence decay DE6

unconfirmed counter per memory
boost + reset on re-extraction
drop after 10 unconfirmed passes

In-session vector cache DE7

normalized text → float[] vector
cleared on chat change

Flow

DE1 → DE2 + DE7
DE3 ⇢ DE2 (fallback)
DE2 → DE4 → DE5 + DE6
classified writes → Consolidation

07 · Consolidation

🧩 Consolidation — `consolidation.js`

Fires after dedup classification, before storage write. Per-type thresholds; LLM merges near-identical entries into richer single ones.

Trigger CN1

fires when entry count for a type crosses its threshold
per-type counters tracked independently
runs after dedup · before storage write

Merge prompt CN2

collects candidates for a single type
asks model to merge near-identical / redundant entries
output: fewer, richer entries
replaced entries retired via supersession

Scope CN3

runs for both long-term and session memory
independent per-tier passes

Long-term thresholds per type · independent counters

fact CN-L1

threshold-gated merge pass
stable knowledge synthesis

relationship CN-L2

threshold-gated merge pass
descriptors collapsed

preference CN-L3

threshold-gated merge pass
duplicates folded

event CN-L4

threshold-gated merge pass
co-occurring events combined

Session thresholds per type · independent counters

scene CN-S1

threshold-gated merge pass
contiguous scenes folded

revelation CN-S2

threshold-gated merge pass
overlapping reveals merged

development CN-S3

threshold-gated merge pass
arc beats coalesced

detail CN-S4

threshold-gated merge pass
redundant details collapsed

08 · Storage

💾 Two-Tier Storage

Persistent identity vs. ephemeral session state. Reset semantics differ by tier.

Persistent · extension_settings survives all sessions, all chats

Long-term memories SP1

fact · relationship · preference · event

Relationship history SP2

descriptor + magnitude pairs
per character pair

Entity registry SP3

name · type · aliases
state card templates

Epistemic knowledge SP4

knows · suspects · believes · unaware · hiding
per character · per target

Persistent arcs SP5

cross-chat open threads

Canon document SP6

stable narrative prose

Per-chat · chatMetadata per conversation · reset on Fresh Start

Short-term summary SC1

rolling progressive compaction

Session memories SC2

granular within-session details

Scene history SC3

ordered mini-summaries

Chat-scoped arcs SC4

open threads this conversation

State ledger SC5

mutable entity snapshots

Character profiles SC6

character · world · relations matrix

09 · Migrations

🔄 Schema Migration

graph-migration.js — never destructive.

Versioned store MIG

SCHEMA_VERSION stored per-character + per-chat
CHARACTER_MIGRATIONS + CHAT_MIGRATIONS registries
applied sequentially on every chat / character load
never removes steps — old chats upgradeable from v0

10 · Token Mgmt

📊 Budgets

Sliders TK1

10 per-tier budget sliders
Simple mode: shared total cap
Advanced mode: independent per-tier

Trim stats TK2

injected vs full token counts
one-time trim toast (post-load only)
short-term exempt (self-corrects)

Auto-tune TK3

demand-driven adjustment
observed trim stats × 1.15 headroom
snap to nearest 50 tokens

Adaptive budgets TK4

turn classifier: dialogue · action · transition · intimate
per-tier multipliers applied per turn

↓read on inject · budgets clamped · 13 named slots populated

11 · Injection

💉 setExtensionPrompt · 13 named slots

Anchored slots vs. depth-relative slots vs. unified single-block mode.

IN_PROMPT · anchored at character-card depth 4 slots

smart_memory_short IJ1

summary

smart_memory_long IJ2

long-term memories

smart_memory_profiles IJ8 · depth 1

entity snapshots

smart_memory_canon IJ9 · depth 0

narrative document

IN_CHAT · depth-relative to current message 8 slots

smart_memory_triggered IJ3 · depth 4

contextual relevance reinjection
memories overlapping current turn

smart_memory_session IJ4

session memories

smart_memory_scenes IJ5 · depth 6

scene history

smart_memory_arcs IJ6 · depth 2

story arcs

smart_memory_relationships IJ7 · depth 5

relationship state

smart_memory_epistemic IJ10 · depth 1

character knowledge map
injected per responding char only

smart_memory_state_ledger IJ11 · depth 1

mutable entity state

smart_memory_repair IJ12 · depth 0

one-shot continuity correction

Unified Mode · optional replaces all individual slots

smart_memory_unified IJ13 · IN_PROMPT · depth 0

single merged context block
Canon → Profiles → Long-term → Short-term → Scenes → Session → Arcs
content cache bridges infrequent tiers

12 · Macros

🔧 macros.js

{{smartmemory-*}} tokens that resolve at prompt-assembly time.

Token registry MA1

11 {{smartmemory-*}} tokens
10 per-tier + {{smartmemory-unified}}

Auto-detection MA2a

scans character card fields for {{smartmemory-*}} tokens
system_prompt · description · personality · scenario · mes_example
activates only the slots whose tokens are present

Force macro injection mode MA2b

user-toggled setting
bypasses auto-detection
activates all macros unconditionally — instruct templates

Cache bridge MA3

content cache updated by inject functions
macro always returns latest output
individual macros inactive when unified is on

13 · Auxiliary

🌅 Recap & 🧪 Model Test

Out-of-band UX surfaces — recap on return, validation harness for model selection.

Away Recap RECAPrecap.js

tracks lastActive timestamp per chat
on return after threshold hours:
generates "Previously on…" summary
displayed as dismissible modal overlay

Extraction Model Test MTESTmodel-test.js

3 fixed scenarios — all 5 tiers always run
Main: 30-message fantasy investigation → long-term · session · arcs
Epistemic: village healer scene → Perspectives + Secrets
State: dungeon heist excerpt → State Ledger
shows raw model output + quality hints
never writes to session or memories

↓slots populated · macros resolved · recap modal (if active) attached

⟶ Output ⟵

🧠 Final Prompt → LLM Model

13 named slots · resolved macros · clamped budgets · optional repair note · optional recap overlay