Research Amp Toolkit

/runlog — Longitudinal toolkit observability

/runlog — Longitudinal Toolkit Observability

You've used the toolkit's commands across many sessions. Which ones did you actually run, how did they perform, and where are improvement signals piling up?

When you need this

What it does — and what it won't

/runlog reads the instrumentation CSVs that other commands (/pace, /coa, /audit, /improve, /dailysummary, /commit, /startup) append to during normal use, and renders a longitudinal table of recent runs — convergence rate, token cost, outcome, and improvement signals per command. It is pure aggregation: no new data collection, no external services, no writes to the source CSVs. PLN-verifiable by construction — re-running produces the same table modulo new rows since last run.

Unlike /dailysummary, which captures a single day's narrative, /runlog aggregates across days and across commands to surface patterns — which commands you actually use, where convergence has been weak, where improvement signals are repeating.

Prerequisite: The commands you want to see in the table must already have been run and have appended rows to .toolkit/evidence/run_log.csv. If the table is sparse, that's your signal to run the instrumented commands more, not to invoke /runlog differently.

Worked example

It is the end of a research sprint. You have run /pace three times, /coa twice, and /audit on five different sections.

/runlog

/runlog reads .toolkit/evidence/run_log.csv and renders a table: date, command, outcome (complete / partial / failed), convergence rate where applicable, and a one-line task summary per row. You spot that two of the three /pace runs converged at only 0.55 on the same methodology question — an improvement signal you missed at the time. That row gets a follow-up /coa session before the next submission, instead of being discovered by a reviewer.

Try it

/runlog
/runlog --since 2026-04-01
/runlog --drill-down <run-id>

Read-only: /runlog never writes to run_log.csv or command_performance_log.md — those are produced by the commands being observed. If a row looks wrong, fix it at the source.