01 · Input
⚡ SillyTavern Events
Entry points wired into ST's event bus — solo + group flows + swipe interruption.
Message rendered EV1
- CHARACTER_MESSAGE_RENDERED
- makeLast — runs after all other extensions
Chat lifecycle EV2
- CHAT_CHANGED
- CHAT_LOADED
- debounced
Group orchestration EV3
- GROUP_WRAPPER_STARTED
- GROUP_MEMBER_DRAFTED
- GROUP_WRAPPER_FINISHED
Swipe abort EV4
- MESSAGE_SWIPED
- → abort in-flight LLM call
↓dispatch → orchestration scheduler
02 · Control
🎛️ Orchestration — index.js
Solo vs. group flows, message-count gating, and clean abort on mid-operation chat change.
Solo chat flow OR1
- messagesSinceLastExtraction counter
- batches every N messages
Group chat flow OR2
- sceneMessageBuffer · roundResponders
- per-round batching
CHAT_SWITCHED sentinel OR3
- clean abort on mid-operation chat change
03 · LLM Generation
🤖 generate.js
Backend-agnostic. All reasoning tokens stripped before parsers see output.
Ollama GO
- /api/chat
- 8192-token budget
Think-block stripper · GT
⟨think⟩…⟨/think⟩ removed from every backend response before the extraction parsers see output.
04 · Config
⚙️ Hardware Profile
Auto-detected. Caps, thresholds, and policy depend on inferred model class.
Profile A Ollama · WebLLM
- conservative per-type caps (2/pass)
- manual continuity only
- noun-derived triggers only
Profile B Main · OpenAI-compat
- richer extraction (4/pass)
- auto continuity check post-turn
- auto canon regen on arc resolve
- LLM-suggested context triggers
↓generation results · extraction prompts dispatched sequentially · profile caps applied
05 · Extraction Pipeline
⚙️ Four Tiers · Sequential
No Promise.all. Ollama serialises; parallel risks OOM on 8 GB VRAM.
Tier 1 · Compaction
progressive · extends summary, never rewrites
Rolling summary PCcompaction.js
- Token-threshold check (configurable %)
- Progressive UPDATE_SUMMARY prompt
- extends existing summary, does not rewrite
- summaryEnd index tracks last included message
Tier 2 · Scene Layer
scene-break detection drives downstream epistemic pass
Scene breaks PSscenes.js
- heuristic patterns (location, time, cast)
- + optional AI yes/no check
- 2–3 sentence mini-summary per scene
- links source_memory_ids → scene entry
Epistemic map PEepistemic.js
- fires once per scene break
- per-character knowledge map
- [knows] [suspects] [believes] [unaware]
- [hiding] (subject → target)
- stored per-char · per-target
Tier 3 · Batch Extraction
every N messages · feeds dedup pipeline
Session PB1session.js
- [scene] [revelation]
- [development] [detail]
- dedup: cosine 0.82 / Jaccard 0.65
- stored in chatMetadata
Long-term PB2longterm.js
- [fact] [relationship] [preference] [event]
- + relationship history (trusting·high…)
- dedup + supersession detection
- confidence decay over passes
- extension_settings per character
Arcs PB3arcs.js
- open plot threads
- arc narrative summaries on resolve
- persistent arcs survive across chats
- 100-msg sliding window
- extension_settings + chatMetadata
State ledger PB4state-ledger.js
- mutable entity state snapshots
- character: location · mood · injuries · goal
- object: condition · owner
- place: occupants · hazards
- faction: leadership · objective
- chatMetadata (not persistent)
Tier 4 · Derived Layers
built from upstream tier output, regenerated on cadence
Profiles PD1profiles.js
- character · world · relationship matrix
- regenerated every N messages
- stale after 30 min (configurable)
- chatMetadata.profiles
Canon PD2canon.js
- stable prose narrative
- built from resolved arc summaries
- + long-term memories (confidence ≥ 2)
- manual trigger
- extension_settings per character
Continuity PD3continuity.js
- contradiction detection vs.
- character card + long-term + session
- optional one-shot repair note
- auto on Profile B · manual on Profile A
↓extraction candidates → dedup & supersession
06 · Dedup
🔍 Deduplication — embeddings.js + similarity.js
Embeddings preferred, Jaccard fallback. Classifier decides keep / supersede / re-confirm / drop.
Semantic embeddings DE1
- Ollama /api/embed
- or OpenAI /v1/embeddings
- API key in ST secrets store
Cosine similarity DE2
- dup ≥ 0.82 · same-topic ≥ 0.55 (Profile A)
- dup ≥ 0.85 · same-topic ≥ 0.52 (Profile B)
Jaccard fallback DE3
- word-overlap
- dup ≥ 0.65 · same-topic ≥ 0.40
- auto when embeddings unavailable
batchVerify classifier DE4
- passed (new)
- superseded (state-change update)
- uncertain (model confirmation)
- rejected (duplicate)
Supersession chains DE5
- valid_from / valid_to message indices
- retired memories preserved
- excluded from injection
Confidence decay DE6
- unconfirmed counter per memory
- boost + reset on re-extraction
- drop after 10 unconfirmed passes
In-session vector cache DE7
- normalized text → float[] vector
- cleared on chat change
Flow
- DE1 → DE2 + DE7
- DE3 ⇢ DE2 (fallback)
- DE2 → DE4 → DE5 + DE6
- classified writes → Consolidation
07 · Consolidation
🧩 Consolidation — consolidation.js
Fires after dedup classification, before storage write. Per-type thresholds; LLM merges near-identical entries into richer single ones.
Trigger CN1
- fires when entry count for a type crosses its threshold
- per-type counters tracked independently
- runs after dedup · before storage write
Merge prompt CN2
- collects candidates for a single type
- asks model to merge near-identical / redundant entries
- output: fewer, richer entries
- replaced entries retired via supersession
Scope CN3
- runs for both long-term and session memory
- independent per-tier passes
Long-term thresholds
per type · independent counters
fact CN-L1
- threshold-gated merge pass
- stable knowledge synthesis
relationship CN-L2
- threshold-gated merge pass
- descriptors collapsed
preference CN-L3
- threshold-gated merge pass
- duplicates folded
event CN-L4
- threshold-gated merge pass
- co-occurring events combined
Session thresholds
per type · independent counters
scene CN-S1
- threshold-gated merge pass
- contiguous scenes folded
revelation CN-S2
- threshold-gated merge pass
- overlapping reveals merged
development CN-S3
- threshold-gated merge pass
- arc beats coalesced
detail CN-S4
- threshold-gated merge pass
- redundant details collapsed
08 · Storage
💾 Two-Tier Storage
Persistent identity vs. ephemeral session state. Reset semantics differ by tier.
Persistent · extension_settings
survives all sessions, all chats
Long-term memories SP1
- fact · relationship · preference · event
Relationship history SP2
- descriptor + magnitude pairs
- per character pair
Entity registry SP3
- name · type · aliases
- state card templates
Epistemic knowledge SP4
- knows · suspects · believes · unaware · hiding
- per character · per target
Per-chat · chatMetadata
per conversation · reset on Fresh Start
Short-term summary SC1
- rolling progressive compaction
Session memories SC2
- granular within-session details
Chat-scoped arcs SC4
- open threads this conversation
Character profiles SC6
- character · world · relations matrix
09 · Migrations
🔄 Schema Migration
graph-migration.js — never destructive.
Versioned store MIG
- SCHEMA_VERSION stored per-character + per-chat
- CHARACTER_MIGRATIONS + CHAT_MIGRATIONS registries
- applied sequentially on every chat / character load
- never removes steps — old chats upgradeable from v0
10 · Token Mgmt
📊 Budgets
Sliders TK1
- 10 per-tier budget sliders
- Simple mode: shared total cap
- Advanced mode: independent per-tier
Trim stats TK2
- injected vs full token counts
- one-time trim toast (post-load only)
- short-term exempt (self-corrects)
Auto-tune TK3
- demand-driven adjustment
- observed trim stats × 1.15 headroom
- snap to nearest 50 tokens
Adaptive budgets TK4
- turn classifier: dialogue · action · transition · intimate
- per-tier multipliers applied per turn
↓read on inject · budgets clamped · 13 named slots populated
11 · Injection
💉 setExtensionPrompt · 13 named slots
Anchored slots vs. depth-relative slots vs. unified single-block mode.
IN_PROMPT · anchored at character-card depth
4 slots
smart_memory_profiles IJ8 · depth 1
smart_memory_canon IJ9 · depth 0
IN_CHAT · depth-relative to current message
8 slots
smart_memory_triggered IJ3 · depth 4
- contextual relevance reinjection
- memories overlapping current turn
smart_memory_scenes IJ5 · depth 6
smart_memory_arcs IJ6 · depth 2
smart_memory_relationships IJ7 · depth 5
smart_memory_epistemic IJ10 · depth 1
- character knowledge map
- injected per responding char only
smart_memory_state_ledger IJ11 · depth 1
smart_memory_repair IJ12 · depth 0
- one-shot continuity correction
Unified Mode · optional
replaces all individual slots
smart_memory_unified IJ13 · IN_PROMPT · depth 0
- single merged context block
- Canon → Profiles → Long-term → Short-term → Scenes → Session → Arcs
- content cache bridges infrequent tiers
12 · Macros
🔧 macros.js
{{smartmemory-*}} tokens that resolve at prompt-assembly time.
Token registry MA1
- 11 {{smartmemory-*}} tokens
- 10 per-tier + {{smartmemory-unified}}
Auto-detection MA2a
- scans character card fields for {{smartmemory-*}} tokens
- system_prompt · description · personality · scenario · mes_example
- activates only the slots whose tokens are present
Force macro injection mode MA2b
- user-toggled setting
- bypasses auto-detection
- activates all macros unconditionally — instruct templates
Cache bridge MA3
- content cache updated by inject functions
- macro always returns latest output
- individual macros inactive when unified is on
13 · Auxiliary
🌅 Recap & 🧪 Model Test
Out-of-band UX surfaces — recap on return, validation harness for model selection.
Away Recap RECAPrecap.js
- tracks lastActive timestamp per chat
- on return after threshold hours:
- generates "Previously on…" summary
- displayed as dismissible modal overlay
Extraction Model Test MTESTmodel-test.js
- 3 fixed scenarios — all 5 tiers always run
- Main: 30-message fantasy investigation → long-term · session · arcs
- Epistemic: village healer scene → Perspectives + Secrets
- State: dungeon heist excerpt → State Ledger
- shows raw model output + quality hints
- never writes to session or memories
↓slots populated · macros resolved · recap modal (if active) attached
⟶ Output ⟵
🧠 Final Prompt → LLM Model
13 named slots · resolved macros · clamped budgets · optional repair note · optional recap overlay