Transcript Donation Lab

Why review your transcripts?

🔍

For yourself

Learn from patterns — See which prompts led to better outcomes
Spot inefficiencies — Find where you or the agent went in circles
Improve your workflow — Understand how you actually use coding agents
Audit for leaks — Catch secrets or sensitive data you didn't notice

🤝

For everyone

Improve future agents — Real usage data helps train better models
Document edge cases — Your tricky bugs help others avoid them
Build shared knowledge — Contribute to open research on AI tools
Shape the ecosystem — Your data influences how agents evolve

Start small: We recommend uploading just 2-3 transcripts first. Check how they look on HuggingFace, make sure the redaction caught everything, and see if there's anything you didn't think of. Then come back for more.

Work in batches: Your review progress is saved in this browser's localStorage. Come back anytime to continue where you left off.

Note: Progress is stored locally in your browser only. If you clear browser data or switch devices, you'll start fresh. We may add accounts and cloud sync in the future.

Step 0 — Export locally

Run the export script on your machine to create raw_export.zip. The export stays on your computer and is never uploaded until you review and sanitize it.

Download exporter

Python exporter Bash helper PowerShell

python export_transcripts.py --source claude

python export_transcripts.py --source codex

python export_transcripts.py --source opencode

What gets exported?

Claude Code

Session transcripts from ~/.claude/projects/
Excludes: settings, todos, commands, MCP config
How Claude Code stores sessions →

Codex CLI

History from ~/.codex/history.jsonl
Excludes: config.toml, auth.json (credentials), AGENTS.md
Codex CLI config docs →

OpenCode

Sessions from SQLite database in ~/.opencode/
Automatically converted to JSON format
OpenCode documentation →

New options: Use --dry-run to preview what will be exported, --verbose to see excluded files, or --show-rules to view filtering rules.

Step 1 — Import your exports

Upload one or more export files. Everything is processed locally in your browser.

Claude Code

From export_transcripts.py --source claude

Codex CLI

From export_transcripts.py --source codex

OpenCode

From export_transcripts.py --source opencode

Loaded sources

No files loaded yet.

View file tree

Awaiting import.

Understanding your data

How Claude Code stores data

~/.claude/ contains your session history:

history.jsonl — Index of all sessions (prompts, timestamps, project paths)
projects/ — Full conversation transcripts as JSONL files

Entry types in session files:

`user`	Your messages to Claude
`assistant`	Claude's responses (text, thinking, tool calls)
`system`	System prompts and context
`summary`	Compressed context from long conversations
`file-history-snapshot`	Snapshots of file state during edits

How Codex CLI stores data

~/.codex/ contains:

history.jsonl — Prompt history (session_id, timestamp, text)
sessions/YYYY/MM/DD/ — Full rollout files per session

Entry types in rollout files:

`session_meta`	Session metadata (cwd, git info, model)
`response_item`	Messages with role (user/assistant) and content

How OpenCode stores data

~/.opencode/ uses SQLite for persistence:

*.db / *.sqlite — SQLite database with sessions table
Sessions contain messages, metadata, and file changes

Our exporter converts SQLite to JSON for processing.

Step 2 — Sessions

Pick the sessions worth donating. Click any session to view its full content. We only score locally.

Tip: For your first donation, try selecting just 2-3 sessions. You can always come back for more after you've seen how they look on HuggingFace.

How scoring works

Sessions are scored locally to help identify potentially valuable transcripts for donation. Higher scores suggest more useful content.

Signal	Points	Why it matters
Keywords: `error`, `traceback`, `stack`	+1.5 each	Debugging conversations
Keywords: `diff`, `patch`, `git`, `commit`	+1.5 each	Code changes
Keywords: `test`, `pytest`, `npm`, `yarn`, `pip`	+1.5 each	Build/test tooling
Keywords: `tool call`, `function`, `stderr`, `stdout`	+1.5 each	Tool use patterns
Length: 400–8000 chars	+2	Substantive but focused
Length: >8000 chars	-1	May be too verbose
Length: <120 chars	-1	Too short to be useful

Scores are heuristic only. Review content before donating — a low-scored session may still be valuable, and high-scored ones may contain sensitive data.

Step 3 — Configure redaction

Contributor metadata

Contributor handle Defaults to your HuggingFace username if signed in

License

AI training preference This preference is recorded in metadata; enforcement depends on downstream users respecting it.

Rights statement (optional)

Redaction rules

Select what to automatically redact. Matched content will be replaced with placeholders like [REDACTED_EMAIL].

Secrets & tokens

API keys, AWS credentials, JWT tokens, private keys
sk-abc123... → [REDACTED_API_KEY]

Personal information

Email, phone, IP addresses, SSN
john@email.com → [REDACTED_EMAIL]

Local file paths

Home directories, usernames in paths
/Users/john/code/ → [REDACTED_PATH]/code/

Code blocks (aggressive)

Replaces entire code blocks with placeholder
Use if code may contain proprietary content

Custom patterns (regex, one per line)

Field selection — Choose which data fields to include

Select which fields to keep in your donation. Essential fields cannot be removed. Path-related fields are excluded by default to protect privacy.

Essential (always included)

Recommended (included by default)

Optional (you choose)

Path/Privacy fields (excluded by default)

Always removed (cannot include)

Live preview of field stripping

See how your field selection affects actual session data.

Before (original)

After (fields stripped)

Step 4 — Review & confirm

Redaction summary

Run redaction to view summary.

Preview snapshots

Sanitized previews will appear here.

I have the rights to share this content and it does not violate any confidentiality obligations.

I have reviewed the sanitized output and confirm it is safe to share.

Step 5 — Submit

Submit directly

Sign in with HuggingFace to upload your bundle directly to the dataset.

Or download for manual upload

Only needed if you want to keep a local copy or prefer to upload via git/web UI.

Bundle not built yet.

Manual upload instructions

Option 1: Git workflow

Option 2: HuggingFace Web UI

Sign in to submit

Opens Hugging Face login in a new window

Target dataset repository