Transcript Donation Lab
Build a sanitized, reviewable donation bundle without ever uploading raw logs. Everything runs in your browser until you click submit.
Why review your transcripts?
For yourself
- Learn from patterns — See which prompts led to better outcomes
- Spot inefficiencies — Find where you or the agent went in circles
- Improve your workflow — Understand how you actually use coding agents
- Audit for leaks — Catch secrets or sensitive data you didn't notice
For everyone
- Improve future agents — Real usage data helps train better models
- Document edge cases — Your tricky bugs help others avoid them
- Build shared knowledge — Contribute to open research on AI tools
- Shape the ecosystem — Your data influences how agents evolve
Step 0 — Export locally
Run the export script on your machine to create raw_export.zip. The export stays on your computer and is never uploaded until you review and sanitize it.
Download exporter
python export_transcripts.py --source codex
python export_transcripts.py --source opencode
What gets exported?
Claude Code
- Session transcripts from
~/.claude/projects/ - Excludes: settings, todos, commands, MCP config
- How Claude Code stores sessions →
Codex CLI
- History from
~/.codex/history.jsonl - Excludes: config.toml, auth.json (credentials), AGENTS.md
- Codex CLI config docs →
OpenCode
- Sessions from SQLite database in
~/.opencode/ - Automatically converted to JSON format
- OpenCode documentation →
--dry-run to preview what will be exported,
--verbose to see excluded files, or --show-rules to view filtering rules.
Step 1 — Import your exports
Upload one or more export files. Everything is processed locally in your browser.
export_transcripts.py --source claude export_transcripts.py --source codex export_transcripts.py --source opencode Loaded sources
View file tree
Understanding your data
How Claude Code stores data
~/.claude/ contains your session history:
- history.jsonl — Index of all sessions (prompts, timestamps, project paths)
- projects/ — Full conversation transcripts as JSONL files
Entry types in session files:
user | Your messages to Claude |
assistant | Claude's responses (text, thinking, tool calls) |
system | System prompts and context |
summary | Compressed context from long conversations |
file-history-snapshot | Snapshots of file state during edits |
How Codex CLI stores data
~/.codex/ contains:
- history.jsonl — Prompt history (session_id, timestamp, text)
- sessions/YYYY/MM/DD/ — Full rollout files per session
Entry types in rollout files:
session_meta | Session metadata (cwd, git info, model) |
response_item | Messages with role (user/assistant) and content |
How OpenCode stores data
~/.opencode/ uses SQLite for persistence:
- *.db / *.sqlite — SQLite database with sessions table
- Sessions contain messages, metadata, and file changes
Our exporter converts SQLite to JSON for processing.
Step 2 — Sessions
Pick the sessions worth donating. Click any session to view its full content. We only score locally.
How scoring works
Sessions are scored locally to help identify potentially valuable transcripts for donation. Higher scores suggest more useful content.
| Signal | Points | Why it matters |
|---|---|---|
Keywords: error, traceback, stack | +1.5 each | Debugging conversations |
Keywords: diff, patch, git, commit | +1.5 each | Code changes |
Keywords: test, pytest, npm, yarn, pip | +1.5 each | Build/test tooling |
Keywords: tool call, function, stderr, stdout | +1.5 each | Tool use patterns |
| Length: 400–8000 chars | +2 | Substantive but focused |
| Length: >8000 chars | -1 | May be too verbose |
| Length: <120 chars | -1 | Too short to be useful |
Scores are heuristic only. Review content before donating — a low-scored session may still be valuable, and high-scored ones may contain sensitive data.
Step 3 — Configure redaction
Contributor metadata
Redaction rules
Select what to automatically redact. Matched content will be replaced with placeholders like [REDACTED_EMAIL].
sk-abc123... → [REDACTED_API_KEY] john@email.com → [REDACTED_EMAIL] /Users/john/code/ → [REDACTED_PATH]/code/ Use if code may contain proprietary content
Field selection — Choose which data fields to include
Select which fields to keep in your donation. Essential fields cannot be removed. Path-related fields are excluded by default to protect privacy.
Essential (always included)
Recommended (included by default)
Optional (you choose)
Path/Privacy fields (excluded by default)
Always removed (cannot include)
Live preview of field stripping
See how your field selection affects actual session data.
Step 4 — Review & confirm
Redaction summary
Preview snapshots
Step 5 — Submit
Submit directly
Sign in with HuggingFace to upload your bundle directly to the dataset.
Or download for manual upload
Only needed if you want to keep a local copy or prefer to upload via git/web UI.
Bundle not built yet.
Download donation_bundle.zipManual upload instructions
Option 1: Git workflow
Option 2: HuggingFace Web UI