architecture
the daemon is composed of a small set of types with well-defined responsibilities. understanding them makes it easier to debug signal flow, extend the daemon, or read the source.
process topology
the following diagram shows how the two services (kasmosdb and kasmos), operator clients, the shared sqlite store, and agent execution hosts are connected:
core types
Daemon
defined in daemon/daemon.go. the top-level struct that owns everything:
| field | type | role |
|---|---|---|
cfg | *DaemonConfig | poll interval, repo list, auto-advance flags, socket path |
repos | *RepoManager | thread-safe list of registered repositories |
spawner | *TmuxSpawner | creates and tracks tmux sessions for each agent role |
broadcaster | *api.EventBroadcaster | fan-out sse event stream served by the control socket |
prMonitor | *PRMonitor | optional goroutine that polls GitHub PRs for review comments |
pidLock | *PIDLock | advisory lock (<socket>.pid) preventing duplicate daemon processes |
the Daemon.Run(ctx) call is the blocking entry point. it acquires the pid lock, opens the unix socket, starts the http control server, and then enters the select loop.
RepoManager
defined in daemon/repo.go. a sync.RWMutex-protected slice of RepoEntry values. repos are added with Add(path) and removed with Remove(path) or RemoveByProject(project). all mutations are safe to call from the control api goroutine while the poll loop runs.
RepoEntry
each registered repo is represented as a RepoEntry:
| field | type | role |
|---|---|---|
Path | string | absolute filesystem path |
Project | string | filepath.Base(path) — the repo's canonical name; used as the namespace key inside the shared DB |
Store | taskstore.Store | handle into the shared global task store (~/.config/kasmos/taskstore.db), scoped to this repo's project name (may be nil if the DB could not be opened) |
SignalGateway | taskstore.SignalGateway | handle into the shared global signal gateway (same DB, same open call) — reused across all registered repos (may be nil) |
SignalsDir | string | <path>/.kasmos/signals/ — repo-local filesystem directory; still scanned by BridgeFilesystemSignals for backward compatibility |
Processor | *loop.Processor | per-repo signal processor that persists wave orchestrator state between ticks |
the Processor is created once per Add call and reused on every tick. this keeps WaveOrchestrator state alive across poll cycles without re-reading from disk.
loop.Processor
defined in orchestration/loop/. receives scanned signals and returns typed Action values:
SpawnCoderAction,SpawnReviewerAction,SpawnFixerAction,SpawnElaboratorActionAdvanceWaveAction,TaskCompleteActionTransitionAction,ReviewApprovedAction,ReviewCycleLimitActionPausePlanAgentAction,CreatePRAction
the processor is stateful — it holds a WaveOrchestrator per active plan so that inter-tick wave state is not lost.
TmuxSpawner
defined in daemon/spawner.go. creates named tmux sessions for each agent role using a consistent naming convention (<plan-name>-<role>). tracks running instances in memory so the daemon can monitor prompt state, detect completion, and kill agents when needed.
key methods:
SpawnCoder,SpawnReviewer,SpawnFixer,SpawnElaborator,SpawnPlanner— create sessionsSpawnWaveTask— creates a coder session with wave/peer-count metadata injected into the promptKillAgent(repoPath, planFile, agentType)— sends SIGTERM to the named sessionKillWaveAgents(repoPath, planFile, wave)— kills all coder sessions for a completed waveRunningInstances()/InstancesForRepo(path)— read current tracking stateDiscoverOrphanSessions()— listskas_*tmux sessions not tracked by this spawner (used during recovery)RestoreTrackedInstance(…)— re-adopts an orphan session into the tracking mapDrainAll(ctx)— waits for all running sessions to exit before shutdown
api.EventBroadcaster
defined in daemon/api/. maintains a fan-out channel to all active sse connections on the control socket. the daemon calls broadcaster.Emit(event) after every significant action. clients streaming GET /v1/events receive these events in real time. Close() signals eof to all subscribers when the daemon shuts down.
PRMonitor
defined in daemon/pr_monitor.go. runs as a separate goroutine started by Daemon.Run when cfg.PRMonitor.Enabled is true. polls open PRs at cfg.PRMonitor.PollInterval (default 60 s) and dispatches fixer agents for unprocessed CHANGES_REQUESTED or COMMENTED reviews from non-bot users.
the event loop
Daemon.Run after setup:
ticker := time.NewTicker(cfg.PollInterval) // default 2 s
for {
select {
case <-ctx.Done():
drain → shutdown http → release lock → close broadcaster → return
case <-ticker.C:
for each RepoEntry:
tickRepo(ctx, entry)
}
}
tickRepo chooses between two signal processing paths depending on gateway availability:
db-backed path (primary — when entry.SignalGateway != nil):
loop.BridgeFilesystemSignals— scans<repo>/.kasmos/signals/and all shared worktrees under<repo>/.worktrees/; inserts any new filesystem sentinel files aspendingrows in the sharedsignalstable. this is a compatibility bridge — for agents, the canonical way to emit signals is mcpsignal_create;kas signal emitis the operator and cli fallback.gateway.Claim(project, workerID)— atomically transitions onependingrow toprocessingand returns it.loop.ConvertSignalEntry(entry, &scan)— decodes the DB row into a typedScanResult.processor.Tick(scan)→[]Action— feeds the scan to the processor.executeAction(ctx, entry, action)for each action.gateway.MarkProcessed(id, status, result)— marks the rowdoneorfailed.- repeat from step 2 until no pending signals remain for this repo's project.
legacy filesystem path (fallback — when SignalGateway is nil, e.g. shared db unavailable):
- Reads signals directly from
staging/,processing/, andfailed/subdirs under<repo>/.kasmos/signals/. - Each signal is moved to
processing/before handling, then deleted (CompleteProcessing) or dead-lettered tofailed/(FailProcessing). - This path exists so the daemon degrades gracefully; it is not the recommended signal API.
reaper goroutine
a secondary goroutine runs every 30 seconds and calls gateway.ResetStuck(60s) across all registered repos. signals that have been in the processing status for more than 60 seconds are reset to pending so a new daemon instance (or the next tick) can reclaim them. this guards against crashes between claim and mark-processed.
session monitoring
after processing signals for a repo, the daemon calls monitorRunningInstances. this method:
- calls
spawner.InstancesForRepo(path)to enumerate tracked sessions. - calls
inst.CollectMetadata()to capture the current tmux pane content. - detects completion (prompt visible and no pending work, or tmux session gone).
- calls
autoAdvanceCompletedImplementer— pushes the branch and transitions the fsm toreviewingautomatically. - kills the completed implementer session and spawns a reviewer.
shutdown sequence
when ctx is cancelled (SIGTERM/SIGINT):
spawner.DrainAll(ctx)— waits up to 35 seconds for all tracked sessions to exit naturally.srv.Shutdown(context.Background())— closes the control socket; in-flight http requests finish.wg.Wait()— waits for the reaper and pr monitor goroutines.pidLock.Release()— removes the pid lock file.broadcaster.Close()— sends eof to all sse subscribers.