mcp-pdf-extraction-server

MCP.Pizza Chef: xraywu

The mcp-pdf-extraction-server is an MCP server designed to extract contents from local PDF files efficiently. It supports both direct PDF text reading and OCR for scanned documents, allowing extraction from specified pages or the entire document. This server enables seamless integration of PDF content extraction into AI workflows, enhancing document analysis, data retrieval, and automation tasks. It accepts a file path and optional page numbers, including negative indices for flexible page targeting.

Use This MCP server To

Extract text from specific pages of local PDF files Perform OCR on scanned PDF documents Integrate PDF content extraction into AI workflows Automate data retrieval from PDF reports Enable document analysis in AI-powered applications

README

MseeP.ai Security Assessment Badge

PDF Extraction MCP server

MCP server to extract contents from a PDF file

Components

Tools

The server implements one tool:

  • extract-pdf-contents: Extract contents from a local PDF file
    • Takes "pdf_path" as a required string argument, representing the local file path of the PDF file
    • Takes "pages" as an optional string argument, representing the page numbers to extract contents from the PDF file. Page numbers are separated in comma, and negative page numbers supported (e.g. '-1' means the last page)
    • Supports PDF file reader and OCR

Quickstart

Install

Claude Desktop

On MacOS: ~/Library/Application\ Support/Claude/claude_desktop_config.json On Windows: %APPDATA%/Claude/claude_desktop_config.json

Development/Unpublished Servers Configuration ``` "mcpServers": { "pdf_extraction": { "command": "uv", "args": [ "--directory", "/Users/xraywu/Workspace/pdf_extraction", "run", "pdf_extraction" ] } } ```
Published Servers Configuration ``` "mcpServers": { "pdf_extraction": { "command": "uvx", "args": [ "pdf_extraction" ] } } ```

mcp-pdf-extraction-server FAQ

How do I specify which pages to extract from a PDF?
Use the 'pages' argument with comma-separated page numbers; negative numbers count from the end (e.g., '-1' for last page).
Can this server extract text from scanned PDFs?
Yes, it supports OCR to extract text from scanned or image-based PDFs.
What argument is required to extract PDF contents?
The 'pdf_path' argument specifying the local file path of the PDF is required.
How do I install the mcp-pdf-extraction-server for Claude Desktop?
Add the server configuration to the claude_desktop_config.json file located in the user application support directory on MacOS or Windows.
Does the server support extraction from remote PDF files?
No, it currently supports extraction only from local PDF files specified by path.
Is the server compatible with multiple LLM providers?
Yes, it can be integrated with models from OpenAI, Anthropic Claude, and Google Gemini via MCP clients.
Can I extract content from multiple pages at once?
Yes, specify multiple pages separated by commas in the 'pages' argument.
What happens if I omit the 'pages' argument?
The server extracts content from all pages of the PDF by default.