🌊 AnyParser

AnyParser provides an API to accurately extract unstructured data (e.g., PDFs, images, charts) into a structured format.

🌱 Set up your AnyParser API key

To get started, generate your API key from the Sandbox Account Page. Each account comes with 100 free pages.

⚠️ Note: The free API is limited to 10 pages/call.

For more information or to inquire about larger usage plans, feel free to contact us at [email protected].

To set up your API key (CAMBIO_API_KEY), follow these steps:

Create a .env file in the root directory of your project.
Add the following line to the .env file:

CAMBIO_API_KEY=0cam************************

💻 Installation

1. Set Up a New Conda Environment and Install AnyParser

First, create and activate a new Conda environment, then install AnyParser:

conda create -n any-parse python=3.10 -y
conda activate any-parse
pip3 install any-parser

2. Create an AnyParser Instance Using Your API Key

Use your API key to create an instance of AnyParser. Make sure you’ve set up your .env file to store your API key securely:

import os
from dotenv import load_dotenv
from any_parser import AnyParser

# Load environment variables
load_dotenv(override=True)

# Get the API key from the environment
example_apikey = os.getenv("CAMBIO_API_KEY")

# Create an AnyParser instance
ap = AnyParser(api_key=example_apikey)

3. Run Synchronous Extraction

To extract data synchronously and receive immediate results:

# Extract content from the file and get the markdown output along with processing time
markdown, total_time = ap.parse(file_path="./data/test.pdf")

4. Run Asynchronous Extraction

For asynchronous extraction, send the file for processing and fetch results later:

# Send the file to begin asynchronous extraction
file_id = ap.async_parse(file_path="./data/test.pdf")

# Fetch the extracted content using the file ID
markdown = ap.async_fetch(file_id=file_id)

5. Run Batch Extraction (Beta)

For batch extraction, send the file to begin processing and fetch results later:

# Send the file to begin batch extraction
response = ap.batches.create(file_path="./data/test.pdf")
request_id = response.requestId

# Fetch the extracted content using the request ID
markdown = ap.batches.retrieve(request_id)

Batch API for folder input:

# Send the folder to begin batch extraction
WORKING_FOLDER = "./sample_data"
# This will generate a jsonl with filename and requestID
response = ap.batches.create(WORKING_FOLDER)

Each response in the JSONL file contains:

The filename
A unique request ID
Additional processing metadata

You can later use these request IDs to retrieve the extracted content for each file:

# Fetch the extracted content using the request ID from the jsonl file
markdown = ap.batches.retrieve(request_id)

For more details about code implementation of batch API, refer to examples/parse_batch_upload.py and examples/parse_batch_fetch.py

⚠️ Note: Batch extraction is currently in beta testing. Processing time may take up to 12 hours to complete.

⚠️ Important: API keys generated from cambioml.com do not automatically have batch processing permissions. Please contact [email protected] to request batch processing access for your API key.

📜 Examples

Check out these examples to see how you can utilize AnyParser to extract text, numbers, and symbols in fewer than 10 lines of code!

Extract all text and layout from PDF into Markdown Format

Are you an AI engineer looking to accurately extract both the text and layout (e.g., table of contents or Markdown headers hierarchy) from a PDF? Check out this 3-minute notebook demo.

Extract a Table from an Image into Markdown Format

Are you a financial analyst needing to accurately extract numbers from a table within an image? Explore this 3-minute notebook example.

Name		Name	Last commit message	Last commit date
Latest commit History 282 Commits
.github		.github
any_parser		any_parser
examples		examples
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
pyproject.toml		pyproject.toml
run_tests.sh		run_tests.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🌊 AnyParser

🌱 Set up your AnyParser API key

💻 Installation

1. Set Up a New Conda Environment and Install AnyParser

2. Create an AnyParser Instance Using Your API Key

3. Run Synchronous Extraction

4. Run Asynchronous Extraction

5. Run Batch Extraction (Beta)

📜 Examples

Extract all text and layout from PDF into Markdown Format

Extract a Table from an Image into Markdown Format

About

Releases 17

Packages

Contributors 8

Languages

CambioML/any-parser

Folders and files

Latest commit

History

Repository files navigation

🌊 AnyParser

🌱 Set up your AnyParser API key

💻 Installation

1. Set Up a New Conda Environment and Install AnyParser

2. Create an AnyParser Instance Using Your API Key

3. Run Synchronous Extraction

4. Run Asynchronous Extraction

5. Run Batch Extraction (Beta)

📜 Examples

Extract all text and layout from PDF into Markdown Format

Extract a Table from an Image into Markdown Format

About

Topics

Resources

Stars

Watchers

Forks

Releases 17

Packages 0

Contributors 8

Languages

Packages