Transcript Donation Lab

Build a sanitized, reviewable donation bundle without ever uploading raw logs. Everything runs in your browser until you click submit.

Note: This is a live demo. To submit donations, you need write access to the target repository. Request access or use the manual download option. Anything merged into the dataset may be copied or mirrored. Removing it later may not fully retract it from downstream use.
Saved Data 0 items

Your progress and preferences are saved locally in this browser. Nothing is sent to any server until you submit.

Why review your transcripts?

How this can help you

  • See which prompts led to better outcomes
  • Find where you or the agent went in circles
  • Understand how you actually use coding agents
  • Catch secrets or sensitive data you didn't notice

How this can help others

  • Real usage data helps train better models
  • Your tricky bugs help others avoid them
  • Contribute to open research on AI tools
  • Your data influences how agents evolve

Start small: We recommend uploading just 2-3 transcripts first. Check how they look on HuggingFace, make sure the redaction caught everything, and see if there's anything you didn't think of. Then come back for more.

Work in batches: Your review progress is saved in this browser's localStorage. Come back anytime to continue where you left off.

Progress is stored locally in your browser only. If you clear browser data or switch devices, you'll start fresh.

Step 0 — Export locally

Run the export script on your machine to create raw_export.zip. The export stays on your computer and is never uploaded until you review and sanitize it.

Download exporter

python export_transcripts.py --source claude
python export_transcripts.py --source codex
python export_transcripts.py --source opencode

What gets exported?

Claude Code
Codex CLI
OpenCode
New options: Use --dry-run to preview what will be exported, --verbose to see excluded files, or --show-rules to view filtering rules.

Step 1 — Import your exports

Upload one or more export files. Everything is processed locally in your browser.

Claude Code
From export_transcripts.py --source claude
Codex CLI
From export_transcripts.py --source codex
OpenCode
From export_transcripts.py --source opencode

Loaded sources

No files loaded yet.
View file tree
Awaiting import.

Understanding your data

How Claude Code stores data

~/.claude/ contains your session history:

  • history.jsonl — Index of all sessions (prompts, timestamps, project paths)
  • projects/ — Full conversation transcripts as JSONL files

Entry types in session files:

userYour messages to Claude
assistantClaude's responses (text, thinking, tool calls)
systemSystem prompts and context
summaryCompressed context from long conversations
file-history-snapshotSnapshots of file state during edits
How Codex CLI stores data

~/.codex/ contains:

  • history.jsonl — Prompt history (session_id, timestamp, text)
  • sessions/YYYY/MM/DD/ — Full rollout files per session

Entry types in rollout files:

session_metaSession metadata (cwd, git info, model)
response_itemMessages with role (user/assistant) and content
How OpenCode stores data

~/.opencode/ uses SQLite for persistence:

  • *.db / *.sqlite — SQLite database with sessions table
  • Sessions contain messages, metadata, and file changes

Our exporter converts SQLite to JSON for processing.

Step 2 — Sessions

Pick the sessions worth donating. Click any session to view its full content. We only score locally.

Tip: For your first donation, try selecting just 2-3 sessions. You can always come back for more after you've seen how they look on HuggingFace.
How scoring works

Sessions are scored locally to help identify potentially valuable transcripts for donation. Higher scores suggest more useful content.

SignalPointsWhy it matters
Keywords: error, traceback, stack+1.5 eachDebugging conversations
Keywords: diff, patch, git, commit+1.5 eachCode changes
Keywords: test, pytest, npm, yarn, pip+1.5 eachBuild/test tooling
Keywords: tool call, function, stderr, stdout+1.5 eachTool use patterns
Length: 400–8000 chars+2Substantive but focused
Length: >8000 chars-1May be too verbose
Length: <120 chars-1Too short to be useful

Scores are heuristic only. Review content before donating — a low-scored session may still be valuable, and high-scored ones may contain sensitive data.

Step 3 — Redact & Filter

Click through each session to review how it will look after redaction. Adjust settings on the left — changes apply instantly.

No sessions selected. Go back to Step 2 and select sessions to review.

Step 4 — Confirm & Build

Summary

Sessions to donate: 0
Total redactions: 0
Fields included: 0

Step 5 — Submit

Contributor Info

This info is included in your donation bundle and will be publicly associated with your contribution.

Required. Sign in with HuggingFace below to auto-fill.
How others may use your donated transcripts.
Whether AI models may train on your data.

Sign in & Submit

Opens Hugging Face login in a new window

Or download for manual upload

Only needed if you want to keep a local copy or prefer to upload via git/web UI.

Bundle not built yet. Go to Step 4 to build your bundle first.

Manual upload instructions

Option 1: Git workflow

Option 2: HuggingFace Web UI