PDF to Excel

Convert PDF to Excel in seconds and extract tables into an editable spreadsheet. Upload your PDF and download an Excel-friendly file to analyze data, reuse numbers, and edit rows/columns—perfect for invoices, reports, and statements. Fast, secure, and free to use with no registration required.

PDF to Excel Converter

The PDF to Excel Converter extracts tables and structured data from PDF files and converts them into an editable Excel-compatible spreadsheet (XLSX). Upload your PDF, click Convert to Excel, and download a file you can open in Microsoft Excel, Google Sheets, or LibreOffice Calc — with the data in rows and columns, ready to sort, filter, calculate, and reuse.

Converting PDF tables to Excel is one of the most practically valuable PDF operations: it turns static data locked in a document into structured, calculable data in a spreadsheet. The accuracy of the result depends on how the PDF was created and how its tables are structured — well-formatted, bordered tables from text-based PDFs extract reliably; complex, borderless, or scanned tables require cleanup or OCR.

How to use the PDF to Excel Converter

  1. Click Select a File or drag and drop your PDF. Guest users can upload up to 5 files (10 MB each); registered users up to 20 files (40 MB each).
  2. Click Convert to Excel. The tool analyses the PDF, detects table structures, and extracts the data into rows and columns.
  3. Download the XLSX file and open it in Excel, Google Sheets, or LibreOffice Calc.
  4. Review the output. Check that column values are in the correct columns, numbers are recognized as numeric (not text), and multi-page tables have been correctly joined. See the cleanup section below for specific fixes.
  5. Correct any extraction errors. PDF to Excel conversion almost always produces output that needs at least minor cleanup — particularly for headers, merged cells, and number formatting. The cleanup table below covers the most common issues and their Excel solutions.

Always verify extracted numbers against the original PDF before using them in calculations, reports, or financial analysis. Even a single misread digit from a complex table can propagate into every formula that references it. For any data that will inform decisions or documents, spot-check totals, subtotals, and key figures by comparing the Excel output to the PDF source.

How PDF table extraction works — and why it is not perfect

PDF does not store tables as structured data. When you look at a table in a PDF, you are seeing text characters positioned at precise X/Y coordinates on a page — the visual appearance of rows and columns comes from how those characters are spaced and aligned, not from an underlying table structure like the kind Excel uses. There are no cell objects, no row or column identifiers, and no metadata telling the extractor 'this group of text belongs to column 3 of table 2.'

The PDF to Excel converter has to reconstruct table structure from visual cues: it analyses character positions, looks for whitespace that might indicate column gaps, detects horizontal lines that might be table borders, and groups text that appears to be in the same row or column. This is an inference process — not a read-and-translate operation. The more visual cues available (clear borders, consistent spacing, regular alignment), the more accurately the extractor can reconstruct the intended structure.

What affects extraction quality

 

FactorEffect on qualityDetails
Factors that improve extraction quality
PDF was originally created from ExcelBest caseSpreadsheets exported to PDF from Excel preserve precise column and row alignment. The extraction engine can reliably identify cell boundaries. Results are usually very accurate for straightforward tables.
Visible table borders (ruled grid lines)Very goodTables with clear borders around every cell give the extractor clear signals about where each cell begins and ends. Bordered tables from financial reports, data exports, and bank statements typically extract cleanly.
Regular column structure (fixed widths)GoodTables where columns are consistently aligned with the same spacing throughout extract more reliably than tables with irregular or variable-width columns.
Single table per pageGoodA PDF page containing one main table is simpler to parse than a page with multiple tables, text columns alongside tables, or footnotes that share the same horizontal space.
Factors that reduce extraction quality
Borderless tables (whitespace alignment only)VariableSome tables use whitespace rather than grid lines to indicate cell boundaries. Without visible borders, the extractor must infer column positions from text alignment — an unreliable process that often produces shifted columns or merged values.
Merged cells (spanning multiple columns or rows)PoorCells that span multiple columns or rows (common in headers, subtotals, and report summaries) often extract with incorrect cell placement, duplicated content, or merged values that should be in separate cells.
Multi-line column headersVariableColumn headers that wrap to two or more lines within the header row are frequently split across cells or merged into adjacent columns during extraction.
Tables spanning multiple PDF pagesVariableTables that continue across page boundaries are often extracted as separate, disconnected tables in the spreadsheet — each page's portion appearing as an independent table. Manual concatenation in Excel is needed.
Mixed content on the page (text + tables)VariableWhen a PDF page contains both body text and tables, the extractor may include text paragraphs as rows within the table, or split tables at points where surrounding text is present.
Scanned PDF (image-based)Requires OCRScanned PDFs contain no text data — each page is a photograph. Without OCR, extraction produces no data. With OCR applied, extraction quality depends entirely on scan resolution and image clarity.

 

For scanned PDFs, OCR (Optical Character Recognition) must be applied before meaningful data extraction is possible. Without OCR, each PDF page is treated as an image — the converter sees pixels, not text, and cannot extract any data. With OCR applied, the accuracy of the resulting data depends on scan quality: 300 DPI or higher, good contrast, flat on scanner glass, printed (not handwritten) text. For financial data extracted via OCR, manually verify all numeric values — digit misreads (1/7, 0/O, 6/8) are common and can be difficult to detect without cross-checking against the original document.

Common output problems and how to fix them in Excel

PDF to Excel conversion almost always requires some cleanup. The following table covers the most common issues and their specific Excel solutions:

Problem in the Excel outputHow to fix it in Excel
Values in the wrong columns — data shifted left or rightThis is the most common extraction error, caused by whitespace-aligned or borderless tables. Select the affected rows/columns. Use Data → Text to Columns to re-split values, or manually cut and paste cells into the correct columns. For repetitive data, a macro or formula approach may be faster than cell-by-cell correction.
Numbers extracted as text — formulas return errorsPDF to Excel converters sometimes extract numbers as text strings rather than numeric values. Select the column, use Data → Text to Columns → Finish (with no changes) to force Excel to re-evaluate the cell types. Alternatively, use VALUE() formula to convert: =VALUE(A1). Check for invisible leading spaces that prevent number recognition — use TRIM() to remove them.
Column headers in the wrong rows or split across cellsMulti-line headers frequently extract with the header text split into separate rows or merged into adjacent cells. Manually correct the header row: merge or combine split cells, move header content to a single row, and delete redundant rows. Freeze the header row (View → Freeze Panes → Freeze Top Row) once it is corrected.
Table from multiple pages split into disconnected sectionsA table that spans multiple PDF pages often extracts as several separate tables on different rows of the spreadsheet. Identify each table section by its header row. Delete duplicate header rows (the header repeated at the top of each section), then use Ctrl+Shift+End to find the last row of each section and concatenate them into one continuous table.
Merged cells breaking pivot tables or sortingMerged cells cannot be sorted or used in pivot tables. Select the affected range, go to Home → Merge & Center → Unmerge Cells. Then use Ctrl+G → Special → Blanks → OK → Type the fill value → Ctrl+Enter to fill empty cells left by unmerging. This converts the merged structure into a flat, filterable table.
Extra blank rows or unwanted text between data rowsText paragraphs from the PDF that were captured alongside the table appear as extra rows. Use Data → Filter → show only non-blank rows, then manually identify and delete rows that contain descriptive text rather than data values. Alternatively, use a helper column with =ISNUMBER() to flag numeric vs text rows and filter from there.

 

Common use cases and expected accuracy

ScenarioExpected conversion qualityAfter conversion
Bank or credit card statementUsually good to very good. Bank statements are typically clean, bordered tables from consistent templates. Transaction dates, descriptions, and amounts usually extract cleanly into separate columns.Verify that debit and credit columns are correctly separated. Check that dates are recognized as dates, not text. Totals and balance columns should be verified against the PDF source before using in calculations.
Invoice or purchase order with line itemsUsually good. Simple invoice tables with item, quantity, unit price, and total extract reliably from text-based PDFs. Headers and totals may require cleanup.Verify product codes, descriptions, and amounts are in the correct columns. Running totals and subtotals that appeared as merged header rows may need manual placement. Check that unit prices are numeric.
Annual report or financial statement tableVariable. Financial report tables often have complex headers (multi-line, merged year columns), footnote references, and nested row groups that reduce extraction accuracy.Treat the conversion as a starting point. Expect to spend time restructuring headers, removing footnote rows, and verifying that all values match the original. Compare totals against the PDF source.
Data export from a database or applicationBest case. PDFs created by exporting from a database, accounting system, or business application usually contain clean, bordered, regular tables with consistent formatting.Minimal cleanup expected. Verify column headers and check that data types (numbers, dates, currency) are correctly interpreted. Remove any page-header repetitions if the data spans multiple PDF pages.
Scanned historical document or printed reportRequires OCR. Without OCR, the output will contain no usable data. With good-quality OCR (300 DPI+, printed text, clean scan), extraction quality ranges from fair to good depending on table structure.Proofread all extracted values, especially numbers — digit misreads (1 vs 7, 0 vs O, 6 vs 8) are common in OCR output. Do not use OCR-extracted financial figures without manual verification.

 

When to use PDF to Excel — and when to consider alternatives

PDF to Excel is the right tool when you need to extract and work with tabular data from a PDF and the original spreadsheet source is not available. It is most effective for:

  • Bank statements, card statements, and financial export PDFs with clean, bordered tables.
  • Invoice and purchase order line items from text-based PDFs.
  • Data exports from business applications saved as PDF.
  • Any PDF that was originally created from a spreadsheet or database export.

Consider alternatives when:

  • The original Excel file is available — open the XLSX directly rather than converting the PDF version, which is always less accurate than the source file.
  • The PDF is a scanned image with no text layer — OCR must be applied first, and accuracy for financial figures is not guaranteed.
  • The data volume is large or data accuracy is critical — for important financial, legal, or scientific data, manual data entry with verification may be more reliable than automated extraction from a complex PDF.
  • The PDF contains complex, multi-level header tables — these typically require extensive restructuring and may take as long to clean up as retyping the data would.

Usage limits

Account typeDaily conversionsMax file sizeFiles per session
Guest25 per day10 MB per fileUp to 5 files
Registered100 per day40 MB per fileUp to 20 files

 

Related tools

  • Excel to PDF — convert Excel spreadsheets to PDF for sharing and distribution. The reverse of this tool.
  • PDF to Word — convert PDFs to editable Word documents. Better suited for text-heavy documents with minimal tabular content.
  • PDF to PowerPoint — convert PDFs to editable presentations.
  • Unlock PDF — remove password protection from a PDF before converting.
  • Merge PDF — combine multiple PDFs into one before extracting all tables in a single conversion.

Frequently asked questions

Will all tables in the PDF be extracted accurately?

Accuracy varies significantly by PDF type and table structure. PDFs with clearly bordered tables created from Excel or database exports typically extract with high accuracy. PDFs with borderless, whitespace-aligned tables; complex headers with merged cells; or tables spanning multiple pages typically extract with lower accuracy and require manual cleanup. The extraction quality factors table above lists the specific characteristics that affect accuracy. Always review the output against the original PDF before using the data.

Why are some column values shifted or in the wrong columns?

This is the most common extraction error and occurs because PDF does not store table structure — only character positions. When a table has no visible borders, the extractor determines column boundaries from whitespace gaps between values. If those gaps are inconsistent (varying column widths, values with different lengths that cause the spacing to shift), the extractor may assign a value to the wrong column. The fix is to select the affected rows in Excel and manually cut and paste values into the correct columns. For data that is consistently misaligned, Data → Text to Columns can help re-split values using a specific delimiter.

Numbers extracted from the PDF are showing as text in Excel. How do I fix this?

This happens when the PDF extractor outputs numbers as text strings rather than numeric values. In Excel, select the column containing the text-formatted numbers. Go to Data → Text to Columns → click Finish without changing any settings. This forces Excel to re-evaluate the cells and convert text-format numbers to numeric values. If numbers still do not convert, check for invisible leading spaces: use Find & Replace (Ctrl+H) to replace ' ' (space) with nothing, or apply TRIM() in a helper column. Currency symbols and comma thousand-separators may also prevent numeric recognition — remove them using Find & Replace.

Can I convert a scanned PDF to Excel?

Yes, but OCR is required first. Scanned PDFs store each page as a photograph — there is no text data for the extractor to analyze. When OCR is applied, the converter identifies letter shapes in the image and reconstructs text that can then be extracted. Quality depends on scan resolution (300 DPI or higher recommended), contrast, and font clarity. Printed text from clean, well-lit scans produces the best OCR results. Handwritten text, poor contrast, skewed pages, and low resolution all reduce OCR accuracy. For financial data from scanned documents, manually verify all extracted values — digit misreads are common and may be difficult to spot without careful comparison.

Why is the table split into multiple disconnected sections in Excel?

A table that spans multiple PDF pages is often extracted as separate tables — one section per page — because the page boundary is treated as a natural break. Each page produces its own table section, usually preceded by a repeated column header row. To fix this: identify each section, delete the duplicate header rows (keep only the first), and then concatenate the sections into a single continuous table. Select the rows of the second section, cut them (Ctrl+X), click the first empty row below the first section, and paste (Ctrl+V). Repeat for any additional sections.

What is the difference between PDF to Excel and PDF to Word for tabular data?

PDF to Excel is specifically optimized for extracting structured tabular data — it identifies rows and columns and places values in corresponding cells, producing a spreadsheet you can sort, filter, and calculate with. PDF to Word extracts all content (text, tables, images) into a Word document where tables are Word table objects that can be edited but are not natively suitable for data analysis. For tabular data you intend to use for calculations, filtering, or data processing, PDF to Excel is the right choice. For tabular data that is part of a larger document you want to edit as a document, PDF to Word may be more appropriate.

The extracted data looks correct but formulas using the values return errors. Why?

This typically means the values were extracted as text, not as numbers. Excel cannot perform arithmetic on text strings, even if they look like numbers. Common causes: leading spaces before the number, numbers stored as text by the extractor, currency symbols or unit labels attached to numeric values, and decimal separators using the wrong format (comma vs period depending on regional settings). Apply TRIM() to remove spaces, VALUE() to convert text to numbers, and SUBSTITUTE() to remove unwanted characters before applying mathematical formulas.

Is the PDF to Excel Converter free?

Yes. The converter is free within the daily usage limits shown above. Guest users can run 25 conversion sessions per day and upload up to 5 files per session (10 MB each) without creating an account. Registering a free ToolsPiNG account increases the daily limit to 100 sessions, the file size limit to 40 MB per file, and the per-session file count to 20.