What Is the AI Research Toolkit?
A collection of 13 Claude Code commands that catch errors, prevent bad plans, and keep complex projects on track. Organized into three layers: verification, workflow, and content. Validated across 40+ production sessions.
What Problem It Solves
AI coding assistants are powerful but unstructured. Three recurring gaps:
Verification gap
How do you know the AI’s output is correct — especially for citations, numerical results, and analysis?
Planning gap
Complex multi-component projects need structured planning, not ad-hoc prompting.
Documentation gap
Research progress disappears when the terminal closes. Session context is ephemeral.
This toolkit addresses all three gaps with reusable commands that add structure without adding friction. Each command was built for a real research task, validated in production, and generalized for others.
Commands
Verification Layer
| Command | What it does |
|---|---|
/pcv |
Plan-Construct-Verify — structured planning with sequential clarification, adversarial review, and human approval gates before any code or document is written. Based on PCV v3.9 (Dr. Kay, NC State). |
/pcv-research |
Parallel planning experiments — runs depth-first and breadth-first planning strategies in parallel with instrumentation for cross-instance convergence analysis. |
/coa |
Council of Agents — spawns specialists with distinct professional perspectives (Skeptic, Economist, Practitioner, etc.) to independently analyze a question, then synthesizes convergence and divergence. |
/pace |
Parallel Agent Consensus Engine — routes a task through two independent players with coaching review, then cross-compares for verification through redundancy. |
/audit |
Citation & numerical audit — verifies every cited metric exists on disk and every quoted number matches the source paper. Catches misquoted figures and fabricated citations before publication. |
Workflow Layer
| Command | What it does |
|---|---|
/startup | Reads recent work summaries and orients you on where you left off — across all active workstreams. |
/dailysummary | Creates a dated summary of the day’s work with cross-references to decisions and open TODOs. |
/weeklysummary | Aggregates daily summaries into weekly workstream reports; surfaces dormant threads and stale TODOs. |
/commit | Analyzes staged changes and creates logically grouped commits — prevents mixed-concern commits. |
/simplify | Reviews code or documents for redundancy, complexity, and performance issues via 5-lens analysis. |
/improve | Self-reflective meta-agent that audits your own Claude Code infrastructure and proposes improvements. |
Content Layer
| Command | What it does |
|---|---|
/quarto | Generates Quarto RevealJS slide decks from background documents. |
/pdftotxt | Extracts text from PDF, Word, and HTML files — supports single files or entire directories. |
Evidence
All evidence is from actual production usage (2025–2026). The same tools work across simulation, documentation, code review, and presentation workflows.
| Tool | What it caught or produced |
|---|---|
/audit |
Caught a consumer surplus figure misquoted as −35% when the source paper actually states −2.5%. Also caught benchmark numbers cited to the wrong paper and a job displacement figure with inverted framing — all on a single poster before printing. |
/pace |
Two independent players identified circular logic in a simulation’s acceptance rate calculation that passed single-agent review. The fix changed results by several percentage points. |
/coa |
A multi-seat council evaluated external review concerns about a simulation’s learning mechanism. The Skeptic identified the one genuine threat (signal vs. noise) that subsequent diagnostics confirmed and cleared. |
/pcv-research |
Parallel planning for a conference poster produced convergent decisions on layout and color, but diverged on figure placement — surfacing a genuine design ambiguity the researcher had not considered. |
/pcv |
Structured the design of a conference poster, mathematical proof verification in Lean 4, TA grading workflows, and dissertation chapter outlines — each using the same clarify-before-building discipline. |
/startup |
Across sessions spanning multiple workstreams (simulation, poster, toolkit, coursework), session startup consistently recovered full context in under 60 seconds. |
Methodology note. All agents in PCV-Research, CoA, and PACE use the same underlying Claude model. Convergence between agents indicates consistency within the model’s reasoning space, not independent validation. Cross-model validation via Gemini partially addresses this limitation.
Installation
Prerequisite: Claude Code must be installed. The toolkit runs inside Claude Code, not separately.
-
Clone the toolkit.
git clone https://github.com/jbenhart44/Research-Toolkit.git cd Research-Toolkit -
Run the installer.
This installs all 13 commands and the verification hooks.bash install.sh -
Configure for your project.
Edit~/.claude/toolkit-config.mdto set your project name and workstreams. -
Type
/pcvin any Claude Code session to verify it works.
Cite This Toolkit
If you use or build on this toolkit, please cite:
BibTeX entry
@software{benhart_kay2026toolkit,
author = {Benhart, Jake and Kay, Michael G.},
title = {AI-Assisted Research Toolkit for Claude Code},
year = {2026},
url = {https://github.com/jbenhart44/Research-Toolkit}
}
The toolkit bundles PCV v3.9 (Dr. Kay, NC State) as its planning foundation.
Contact
Jake Benhart & Dr. Michael Kay
NC State University — Operations Research
jbenhart@ncsu.edu ·
github.com/jbenhart44