How to Use Claude’s Code Execution Sandbox to Build and Test Python Scripts Without Leaving the Chat
Claude runs Python in a sandboxed environment directly in the chat — here’s how to go from raw CSV to polished visualization without leaving the conversation.
Copying code from a chat window, pasting it into your terminal, running it, hitting an error, pasting the traceback back into the chat, and repeating — that’s the workflow nobody talks about when they hype AI coding assistants. Claude’s built-in code execution sandbox kills that loop. You upload a CSV, ask Claude to analyze it, and the code runs right there in the conversation. No local environment setup, no copy-paste roulette.
Anthropic rolled out code execution capabilities in Claude.ai in mid-2024, and the feature has matured into something genuinely useful for data exploration workflows. You get a sandboxed Python runtime supporting the usual data science stack — Pandas, NumPy, Matplotlib, Seaborn — with results and charts appearing directly in the chat. It’s not replacing your Jupyter setup for production work, but for exploratory analysis and rapid prototyping, it removes a real amount of friction.
This tutorial walks you through the full loop: uploading a CSV, prompting Claude to explore and visualize it, iterating on the output, and understanding where the sandbox’s limits actually matter.
What You’ll Actually Achieve
By the end of this, you’ll be able to upload any structured dataset and go from raw CSV to a polished visualization with statistical annotations — all inside a single Claude conversation. You’ll know how to structure prompts to get reliable, executable code on the first try, how to iterate when the output isn’t quite right, and how to work within the sandbox’s constraints without fighting them.
Requirements
You need a Claude.ai account — the code execution feature is available on free and paid tiers, though Pro users get more execution headroom. Alternatively, you can hit the same functionality through the Claude API with the computer_use or tool-use beta endpoints. Have a CSV file ready to upload. If you don’t have one handy, download any dataset from Kaggle or use a local export from Excel. The sandbox handles files up to a reasonable size for exploratory work — just don’t try to process a 2GB log file and expect miracles.
Step 1: Upload Your CSV and Start the Conversation Right
Click the paperclip icon in Claude.ai’s chat input and attach your CSV. Don’t just drop the file and say “analyze this” — that prompt is too open-ended and you’ll get a generic summary that could apply to any dataset. Instead, give Claude the context it needs to write immediately executable code.
I've uploaded a CSV called sales_data.csv. Please execute Python code to:
1. Load the file with pandas and print the first 5 rows
2. Show dtypes and check for null values
3. Print basic descriptive statistics for all numeric columns
Use pandas and display actual output, don't just show me the code.
That last line matters. Without it, Claude will sometimes write out the code block without triggering execution. The phrase “display actual output” signals that you want the sandbox to run, not just generate.
Pro tip ✅
Start every new analysis session with an explicit data inspection step. If Claude misreads column types — treating a numeric ID as a float, or a date as a string — every downstream prompt will generate subtly broken code. Catching type issues in step one saves you three iterations later.
Step 2: Generate Your First Visualization
Once Claude has loaded the data and confirmed the structure, you’re ready to plot. Be specific about what you want to see, or you’ll get a bar chart of the first column it notices.
Now create a visualization using matplotlib and seaborn. I want:
- A correlation heatmap of all numeric columns
- Figure size 10x8
- Use a diverging colormap (coolwarm)
- Annotate each cell with the correlation value rounded to 2 decimal places
- Title: "Feature Correlations — Sales Data"
Execute and show the output.
Claude will generate the code, execute it in the sandbox, and render the chart directly in the conversation. If something looks off — axis labels overlap, the colormap washes out — you refine in the next message, not by going back to your editor.
The heatmap looks good but the axis labels are rotated awkwardly.
Please:
- Rotate x-axis labels to 45 degrees
- Add tight_layout() to prevent clipping
- Increase font size of the cell annotations to 9pt
Re-execute and show updated output.
Pro tip ✅
When asking for visual tweaks, list every change in one message rather than one per message. The sandbox has execution time limits — typically around 60 seconds per run — and each regeneration of a complex plot eats into your conversation flow. Bundle your refinements.
Step 3: Run Actual Analysis, Not Just Pretty Charts
Visualization is the fun part, but the sandbox earns its keep on the analytical work — aggregations, groupby operations, statistical tests. Here’s a prompt that generates a proper grouped analysis:
Using the sales_data.csv we already loaded, please execute code to:
1. Group by the 'region' column and calculate: sum of revenue, mean of units_sold, count of transactions
2. Sort by revenue descending
3. Create a horizontal bar chart showing revenue by region
4. Add value labels at the end of each bar
5. Use a professional color palette (not default matplotlib blue)
Print the grouped dataframe AND show the chart.
The key here is asking for both the dataframe print AND the chart. You want the raw numbers visible alongside the visualization — if a bar looks unexpectedly short, you can cross-reference immediately rather than wondering whether it’s a data issue or a scaling issue.
Now run a simple linear regression of units_sold on revenue using scipy.stats.linregress.
Print: slope, intercept, r-squared, and p-value.
Then create a scatter plot with the regression line overlaid.
Annotate the chart with the r-squared value in the top-left corner.
Note 💡
The sandbox supports scipy, sklearn, and statsmodels for statistical work — not just pandas and matplotlib. If you’re doing anything beyond basic aggregation, call out the specific library you want in the prompt. Claude will default to whatever feels most straightforward, which isn’t always the library that fits your existing codebase.
Step 4: Iterate When the Code Breaks
Code execution in a sandbox means you’ll occasionally hit errors — a column name that doesn’t match, a library function that behaves differently than Claude expected, a timeout on a slow operation. Here’s how to handle each case efficiently.
When you get a KeyError or column-not-found error, paste the exact error message back and ask Claude to fix it rather than re-explaining the entire task:
I got this error when it ran:
KeyError: 'units_sold'
Please check the actual column names in the dataframe (print df.columns) and re-run the analysis with the correct column name.
When a chart renders but looks wrong — wrong scale, missing labels, truncated title:
The chart rendered but there are two issues:
1. The y-axis is showing scientific notation (e.g. 1e6) — please format as regular integers with comma separators
2. The chart title is being cut off — add more top padding
Fix both and re-execute.
When code runs but produces a result you don’t trust:
The mean revenue of $847 looks suspiciously low. Before proceeding, please:
1. Print the raw revenue column values for the first 20 rows
2. Check if there are any values that look like they might be in thousands vs dollars
3. Print the actual min, max, and median to help me sanity-check
Don't generate any new charts yet — just run the diagnostic checks.
Warning ⚠️
The sandbox doesn’t persist data between separate conversations. If you start a new chat, Claude loses access to the uploaded file and any variables it computed. Keep your analysis in a single conversation thread, or re-upload the file at the start of each session. This is the single most common source of “it just stopped working” frustration.
Step 5: Build a Reusable Analysis Template
Once you’ve gone through a few iterations, ask Claude to package the working code into a clean, reusable script you can run locally or drop into a notebook:
Take all the analysis we've run in this conversation and produce a clean, standalone Python script that:
1. Accepts a CSV file path as a command-line argument (use argparse)
2. Runs all the analysis steps we did: data inspection, correlation heatmap, grouped revenue chart, regression scatter plot
3. Saves all charts as PNG files in an /output directory
4. Prints a summary of key stats to the console
5. Includes comments explaining each section
Make the code production-quality — proper error handling, no hardcoded column names where possible.
This is where the sandbox workflow pays off in a way that’s hard to replicate in a standard code-only chat. Every function in that final script has already been tested against your actual data. The column names are correct. The chart parameters work. Claude isn’t writing hypothetical code — it’s cleaning up code it already ran.
Pro tip ✅
Ask Claude to add a
--dry-runflag to the generated script that prints what it would do without actually writing files. Takes 10 seconds to add in the prompt and saves the “wait, it overwrote my charts” moment later.
Step 6: Know Your Sandbox Limits
The execution environment is sandboxed and isolated — it can’t access the internet, write to your local filesystem, or access system-level resources. Scripts have execution time limits, so anything computationally heavy (training a neural network, processing millions of rows) will hit a timeout. This is a feature for exploratory work, not a replacement for your local machine or a cloud compute instance.
A few practical boundaries to keep in mind: avoid loading multiple large files in the same session as memory is constrained; matplotlib plots render at screen resolution so don’t expect print-quality output; and if you’re working with sensitive data, consider whether uploading a full production dataset to a cloud environment is the right call versus using anonymized or sampled data.
Avoid 🚫
Don’t upload files containing real PII, credentials, or proprietary business data to Claude.ai for execution. Use anonymized or synthetic datasets for exploratory analysis. If your workflow requires sensitive data, use the API with appropriate data handling agreements in place, or keep execution local.
I need to analyze a large dataset but I'm worried about timeouts.
Please write code that:
1. Loads only the first 10,000 rows using nrows=10000 in pd.read_csv()
2. Runs the full analysis pipeline on that sample
3. Includes a comment noting where to remove the nrows limit for full-dataset runs
This is for testing the pipeline before I run it locally on the full file.
Pro tip ✅
Sample first, always. Load 1,000–10,000 rows in the sandbox to validate your analysis logic, then run the full dataset locally with the exported script. You get fast iteration in Claude and full data fidelity on your own machine. Best of both.
Sandbox vs. Jupyter: When to Use Which
Claude’s sandbox wins on speed of setup, iteration on simple exploratory tasks, and getting from zero to “I understand this dataset” in under 10 minutes. Jupyter wins on everything else: persistent state across sessions, version control integration, custom library installations, offline work, and anything that requires more than a few seconds of compute time.
The honest take: these tools aren’t competing. Use Claude’s sandbox for the first 20 minutes of any new dataset — understand the shape, spot anomalies, figure out what questions are worth asking. Then move to a proper notebook for the analysis you actually ship.
What This Means for Your Workflow
The code execution sandbox changes Claude from a code-writing assistant into something closer to a pair programmer who can actually run what they write. That’s not a small distinction. The iteration loop — generate, execute, check output, refine — now happens inside a single tool, which is where the real time savings accumulate.
The workflow is: upload CSV, run inspection prompts, generate visualizations, iterate on specifics, export clean script. Each step builds on real execution results, not theoretical code. If you do any kind of regular data analysis and you’re still copying code back and forth between Claude and your terminal, you’re doing it the hard way.


