Workshop three of five. You’ve done a bar chart and a multi-series line by clicking points. Here’s where clicking stops scaling — 250 points across three clusters — and the color-based auto-extraction that handles it in 90 seconds.
The practice chart

Open this chart in DataFromChart →
Three distinct clusters: red top-left (~80 points), blue center-bottom (~90), green right (~80). Both axes 0-100, no log, distinct colors with light alpha blending.
Target: all 250 points, separated by cluster.
Why clicking won’t work here
Manual clicking takes 2-3 seconds per point. 250 × 2.5s = 10 minutes, and you’ll lose track of which dots you’ve already hit.
It’s also where AI extraction reliably fails. ChatGPT and Claude refuse on dense scatters and return 10-15 “representative” points; Gemini caps at ~30% coverage.
Color-based auto-extraction bridges the gap: sub-1% accuracy, full coverage, 90 seconds.
How color-based extraction works
Pick a color. The tool scans every pixel, snaps a point at every pixel matching within a tolerance you set, and returns a cluster. Repeat per series.
Near-deterministic — no model in the loop. The only variable is tolerance, with a slider and visual preview.
Step 1: open the chart and switch to color extraction
Open the chart. In the POINTS layer, find the “Color extract” / “Auto-extract” mode toggle near the manual-click tool. Switch to it.
Step 2: pick the first cluster color
Click one of the red dots. The tool reads the pixel color and selects every pixel within tolerance; a preview overlay shows which matched.
Bleeds into other clusters or the background? Tighten. Misses obvious red dots? Loosen.
The three clusters are well-separated in color space, so the default usually works. Adjust until red covers the red cluster cleanly with no bleed.
Step 3: snap and name
Click “snap points” (DataFromChart calls it “extract series”). The pixels become real points — typically ~80 in the top-left.
Name this group “Cluster 1 - red” so the export keeps it separated.
Step 4: repeat for the other two clusters
Same process for blue and green. Three rounds of color-pick → preview → snap → name, 20-30 seconds each. End: three named series, ~250 points.
Step 5: calibrate the axes
Both axes are 0-100. Drag the y-calibration lines to the 0 and 100 gridlines, enter the values, repeat for x. Two pairs of clicks.
Step 6: export
XLSX or CSV with cluster names preserved. Long format with cluster as a column:
cluster,x,y
red,11.27,70.94
red,12.61,62.60
...
blue,42.13,30.45
...
green,72.51,55.67 Answer key
Each cluster came from a 2D normal distribution with known parameters:
| Cluster | Mean (x, y) | Std dev (x, y) | N points |
|---|---|---|---|
| Red | (25, 70) | (6, 6) | 80 |
| Blue | (55, 35) | (8, 6) | 90 |
| Green | (80, 60) | (5, 8) | 80 |
Compute the mean (x, y) of each extracted cluster and compare. Centers should match within ±1 unit on each axis; standard deviations within ±1.
If your cluster centers are off by more than 2 units, tolerance was too loose and you grabbed pixels from neighbors. Re-run with tighter tolerance.
The full 250-point ground truth lives in our benchmark repo if you want to compute true MAE.
Common mistakes
- Tolerance too loose, picking up background. Gridlines and off-white background add thousands of phantom points along gridlines. A regular grid pattern in the preview means tighten.
- Bleeding into adjacent clusters. Alpha blending shades color edges. Red tolerance should not pick up blue, even partially. Check the preview before snapping.
- Forgetting to re-pick color per cluster. The tool keeps the last color. Clicking “snap” three times without re-picking gives the same cluster three times.
- Calibrating before snapping. Order doesn’t matter — axes apply to whatever points exist. Just don’t do it twice.
How this compares to AI
We sent this exact chart to the major vision LLMs in our benchmark:
- ChatGPT (GPT-4o): refused, recommended a specialized tool.
- Claude Sonnet 4.6: refused (“too many points for reliable extraction”).
- Gemini 2.5 Pro: ~80 points at 33% coverage. Centers roughly right; std devs wildly off from sparse sampling.
The calibrated workflow gives all 250 points at sub-1% MAE in 90 seconds. Different category of result.
Dense scatter is where the comparison stops being “AI is faster but less accurate” and becomes “AI doesn’t do this.” For gene expression, particle physics, financial tick data, or weather time series — color-based auto-extraction is the workflow.
Next
- Workshop 4: Log-Scale Chart Without Arithmetic Mistakes — the chart type AI gets most catastrophically wrong.
- Workshop 5: Reconstruct a Kaplan-Meier Curve — survival analysis from a published figure.
- All five workshops + practice datasets — full hub.
Try it on your own chart
Upload an image, click your data points, calibrate the axes, and export CSV. Under three minutes, no login required for a single export.
Open the extractor