Article 7 min read

How to Extract Data from a Graph Image

A practical four-step method for digitizing data points from chart images, plus fixes for log axes, multi-series plots, and low-resolution scans.

Illustration for "How to Extract Data from a Graph Image"

To extract data from a graph image: upload the picture into a digitizer, click each data point, calibrate two known values on each axis, and export as CSV or XLSX. The whole loop takes under five minutes once the chart is clean.

This guide is the long version. Use it when the chart is awkward — log scales, dense scatter, multi-series legends, a phone JPEG — and the result needs to survive peer review.

The four-step method

Every digitizer worth using has four screens: image, points, axes, data. Skip none of them.

  1. Upload the chart image.
  2. Place points on every value you want.
  3. Calibrate the axes with two known coordinates per axis.
  4. Export as CSV or XLSX.

DataFromChart implements exactly these four steps as workflow layers. Order matters: placing points before calibrating axes means a misplaced axis only requires fixing the axis, not re-clicking 40 points.

Step 1: upload a clean chart image

Aim for a PNG at native resolution — gridlines crisp at 100% zoom.

If the source is a PDF, upload it directly — the app lets you pick the page and renders it in your browser. (Optionally pre-render at ~300 DPI for very dense charts.) See our PDF chart guide.

Crop tightly around the plot area. Fewer pixels of caption, gridlines, and legend make axes easier to calibrate. Don’t crop the tick labels.

JPEG compression hurts thin lines. Sharpen the JPEG or request the original PNG/SVG. On Kaplan-Meier curves this can shift values by 1–2%.

Step 2: place points

Click each point you care about — what you’ll actually analyze, not everything.

Line: visible markers, or each gridline intersection if smooth. Scatter: every dot. Bar: top edge.

Zoom in. A point 3 pixels off on a 600-pixel chart is 0.5% error baked in. Use panzoom.

For multiple series, work one at a time and group points before moving on. Without grouping, a 4-series chart becomes one undifferentiated cloud of (x, y) pairs.

When you have hundreds of points

Manual clicking doesn’t scale past ~50 points. Use color-based automatic extraction.

DataFromChart’s WebPlotDigitizer-style picker: select series color, set tolerance, and the tool snaps points along every matching pixel. On a 200-point IPCC temperature anomaly chart, color extraction matched a careful manual trace to within the line thickness.

Try it on a chart you actually need. Open the extractor, upload your image, and you’ll have a CSV in under three minutes. No login required for a single export.

Step 3: calibrate the axes

Two known points per axis. More is overkill; less is impossible.

X: drop a start line on a known tick, an end line on another. Type the values. Repeat for Y.

The longer the calibration interval, the smaller the percentage error. On a 1990–2020 X axis, calibrating at 1995 and 2000 turns a one-pixel error at either endpoint into a multi-year error across the full range. Calibrate at the leftmost and rightmost visible ticks.

Log axes

Calibrate at two visible powers of ten. The tool needs two real values and their pixel positions; linear-vs-log is a display property.

No “log axis” toggle? Calibrate as linear, export, take 10^value in a spreadsheet. DataFromChart supports both natively.

Non-orthogonal axes

If the chart is rotated, skewed, or axes aren’t perpendicular (rare in publishing, common in old scans), use four-point calibration — X start, X end, Y start, Y end as four independent lines.

Step 4: export

Every point converts from pixel to real value via linear interpolation:

value = ((point_px - axis_start_px) / (axis_end_px - axis_start_px))
        * (end_value - start_value) + start_value

For log axes, the same formula applies in log space — the tool takes log10, interpolates, then raises 10 to the result.

CSV works for most analyses. XLSX is better when shipping the chart with the data — DataFromChart embeds the chart image and axis labels (with units), so the next person can verify visually. See chart screenshot to Excel.

Tricky cases

Most charts go through the four steps without incident. These don’t.

Log axes

Already covered. The key trap is calibrating at non-power-of-ten ticks: don’t. If visible ticks are 1, 10, 100, 1000, calibrate at 1 and 1000. For semi-log vs log-log and ln-vs-log10 confusions, see our log chart extraction guide.

Multi-series charts

One series at a time, label, move on. Common failure: a 5-series chart exported as one column of 200 points with no identifier. Recovery requires re-running the digitization.

Color-based extraction shines here — each color is implicitly a series. Run it once per color and you have labeled groups without manual grouping. For dense scatter, see our scatter plot guide.

Low-resolution scans

Below ~600 pixels wide, points become ambiguous. Gridlines and data lines merge; JPEG blocking introduces step-artifacts the eye reads as data.

Mitigations: zoom the chart in your browser before screenshotting (browsers resample cleanly), or request the original. Otherwise expect 3–5% noise and report it.

Color-based extraction failures

Color picking fails when series shares color with axes, gridlines, or text. Lower tolerance, or mask offending areas with a quick crop. It also fails on anti-aliased lines where edges blend to white — keep tolerance high enough for blended pixels but not the gridline.

Stacked bars and area charts

Each “value” is the difference between two visible edges. Click top and bottom of each band, then subtract. Don’t eyeball absolute heights.

Accuracy tips

Three things drive accuracy. None are clever.

Source resolution. 300+ DPI behaves; 72 DPI doesn’t. From a PDF, the app’s page render already gives you a crisp image — for an exceptionally dense figure you can pre-render at higher DPI.

Endpoint placement. Calibration error scales with the inverse of the interval. Calibrate at the longest visible interval — leftmost to rightmost on X, top to bottom on Y.

Value precision. “1.0e6” or “1000000” — fine. “1e6” parsed as a string — not. Verify your digitizer parses scientific notation.

After exporting, overlay extracted data on the original. If curves diverge, it’s a calibration problem, not a clicking problem.

What tool to use

We’re biased — we built DataFromChart. Best fit if you want XLSX with the chart embedded, color extraction without installing anything, and a UI that works on a 13-inch laptop. WebPlotDigitizer is the long-standing reference, still excellent for complex axis types. See our WebPlotDigitizer alternatives roundup.

For academic work — systematic reviews, meta-analyses, dose-response — the tool matters less than methods reporting. See our meta-analysis guide.

CTA

Open the extractor, drop in your chart, and you’ll have clean CSV or XLSX in the time it took to read this paragraph. No installation, no account required.

FAQ

Can I extract data from a graph image for free?

Yes. DataFromChart is free for the core workflow. WebPlotDigitizer is free and open source. Most digitizers have a free tier covering single-chart use.

How accurate is data extracted from a chart image?

On a clean source with careful calibration, expect 0.5–2% mean absolute error. Below 600px source width or with sloppy calibration, 5%+ is common. Overlay your data on the original to sanity-check.

What if the chart uses a log scale?

Calibrate at two visible powers of ten and toggle to logarithmic. No log toggle? Calibrate as linear and exponentiate post-export — identical result.

Can I extract data from a hand-drawn or sketched chart?

Yes, with caveats. Calibration still works if you can place two known values per axis. Accuracy will be poor (5–10% error) because hand-drawn charts have inconsistent line widths and imprecise ticks.

Can I extract data from a 3D chart?

Not reliably. 3D bar and pie charts project values through perspective, destroying the linear pixel-to-value mapping digitizers rely on. Recreate as 2D if you can.

How do I extract data from a chart in a PDF?

Upload the PDF directly — the app shows an in-app page picker, you choose the page, and it renders to an image in your browser. (For a very dense figure you can optionally pre-render the page at ~300 DPI first, but it isn’t required.) Full walkthrough in our PDF chart guide.

What’s the difference between CSV and XLSX export?

CSV is plain text — pure (x, y) values. XLSX preserves the chart image alongside the data and includes axis labels with units, easier to verify and share. See chart screenshot to Excel.

How do I cite digitized data in a publication?

Cite the original figure (paper and figure number) as the data source. Cite the digitization tool in methods. For systematic reviews, follow PRISMA reporting — detail in our meta-analysis guide.

Try it on your own chart

Upload an image, click your data points, calibrate the axes, and export CSV. Under three minutes, no login required for a single export.

Open the extractor

Keep reading

All articles