Document Parsing

Firecrawl provides document parsing capabilities that convert supported document formats into clean, structured Markdown.

Supported document formats

Firecrawl currently supports:

Excel spreadsheets (.xlsx, .xls)
- Each worksheet is converted to an HTML table
- Worksheets are separated by H2 headings with the sheet name
- Preserves cell formatting and data types
Word documents (.docx, .doc, .odt, .rtf)
- Extracts text while preserving document structure
- Maintains headings, paragraphs, lists, and tables
- Preserves basic formatting and styling
PDF documents (.pdf)
- Extracts text content with layout information
- Preserves document structure including sections and paragraphs
- Handles both text-based and scanned PDFs (with OCR support)
- Supports a mode option to control parsing strategy: fast (text-only), auto (text with OCR fallback, default), or ocr (force OCR)
- Priced at 1 credit per page (PDF → Markdown)

PDF parsing modes

Use the parsers option to control how PDFs are processed:

Mode	Description
`auto`	Attempts fast text-based extraction first, falls back to OCR if needed. Default.
`fast`	Text-based parsing only (embedded text). Fastest, but won’t extract from scanned/image-heavy pages.
`ocr`	Forces OCR parsing on every page. Use for scanned documents or when `auto` misclassifies a page.

parsers: [{ type: "pdf", mode: "ocr", maxPages: 20 }]

parsers: [{ type: "pdf" }]

parsers: ["pdf"]

parsers: []

Passing an empty array parsers: [] skips PDF parsing and returns the PDF as base64 (flat 1 credit per PDF).

How to use document parsing

Document parsing works automatically when you provide a URL pointing to a supported document type. Firecrawl will detect the file type based on the URL extension or the response content-type header and process it accordingly.

Example: scraping an Excel file

import Firecrawl from '@mendable/firecrawl-js';

const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" });

const doc = await firecrawl.scrape('https://example.com/data.xlsx');

console.log(doc.markdown);

Example: scraping a Word document

import Firecrawl from '@mendable/firecrawl-js';

const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" });

const doc = await firecrawl.scrape('https://example.com/data.docx');

console.log(doc.markdown);

Output format

All supported document types are converted to clean, structured Markdown. For example, an Excel file with multiple sheets might be converted to:

## Sheet1

| Name | Value |
|-------|-------|
| Item 1 | 100 |
| Item 2 | 200 |

## Sheet2

| Date | Description |
|------------|--------------|
| 2023-01-01 | First quarter|