DeepSeek OCR on Simplismart: Lightning-Fast Document Processing at 800 Tokens/Second

Authors

Pratik Parmar

Tushar Goel

TABLE OF CONTENTS

Regular Item

Selected Item

Last Updated

November 6, 2025

Introduction

‍

Document processing has always been a critical challenge for businesses dealing with invoices, receipts, contracts, and various forms of paperwork. Thanks to DeepSeek OCR, a powerful and context-aware vision-language model, the limitations of traditional OCR solutions are being redefined. Unlike older systems that often struggle with complex layouts, handwritten text, or maintaining document context, DeepSeek OCR brings remarkable accuracy and flexibility, reducing errors, manual corrections, and workflow bottlenecks.

‍

We're excited to announce that DeepSeek OCR is now officially supported on the Simplismart platform! Our inference API delivers industry-leading performance at 800 tokens per second, making it one of the fastest OCR solutions available today.

‍

In this comprehensive tutorial, you'll learn how to leverage AI models like DeepSeek OCR’s advanced capabilities through Simplismart's optimized infrastructure. Whether you're building an expense management system, automating invoice processing, or digitizing historical documents, this guide will show you how to integrate powerful OCR into your applications using the OpenAI Python SDK.

‍

What is DeepSeek OCR?

‍

DeepSeek OCR is a state-of-the-art vision-language model specifically designed for optical character recognition tasks. Unlike traditional OCR engines that rely on pattern matching, DeepSeek OCR leverages neural networks to understand document context, enabling it to handle:

‍

Complex document layouts (invoices, forms, tables)
Handwritten notes and signatures
Multi-column text and embedded images
Low-quality scans and photographs
Multiple languages and special characters

‍

Architecture Breakdown

‍

DeepSeek OCR's architecture consists of two innovative components that work together seamlessly:

‍

1. DeepEncoder: The Vision Processing Powerhouse

The DeepEncoder is a high-performance vision encoder that processes document images at high resolutions. What makes it special?

‍

Optical Context Compression: Instead of treating every pixel equally, DeepEncoder intelligently compresses high-resolution images into a manageable set of visual tokens
Semantic Preservation: While compressing, it maintains the semantic meaning of text, layout, and visual elements
Adaptive Resolution: Automatically adjusts to different document sizes and qualities without losing critical information

‍

DeepSeek-OCR uses an optical “vision token” encoding that compresses text contexts by roughly 10×, where it still achieves about 97% OCR precision. When pushed to a compression ratio of 20×, accuracy drops to around 60%, highlighting the trade-off between token reduction and fidelity.

‍

In practical terms, the model processed entire document pages with as few as ~100 vision tokens per page compared with ~256+ text tokens in prior OCR benchmarks.

‍

2. DeepSeek-3B-MoE Decoder: Efficient Text Reconstruction

‍

The decoder utilizes a Mixture-of-Experts (MoE) architecture, which is key to its efficiency:

‍

Sparse Activation: Only activates approximately 570 million parameters per token (out of 3 billion total)
Specialized Experts: It employs a total of 64 routed experts and 2 shared experts
Context-Aware: Understands document structure to maintain proper reading order and formatting

‍

This MoE approach enables you to harness the power of a large model with the speed of a much smaller one.

‍

What Makes DeepSeek OCR Special?

‍

Context Understanding: Unlike traditional OCR, which reads character by character, DeepSeek OCR understands the relationship between text elements, maintaining proper structure in tables and forms.
Layout Preservation: The model not only extracts text but also understands the spatial relationships, making it ideal for structured documents such as invoices or forms.
Multilingual Support: Trained on diverse datasets, it handles 100 languages seamlessly without requiring language-specific models.
Handwriting Recognition: Reliable recognition of handwritten text, which traditional OCR often fails at.

‍

Getting Started with DeepSeek OCR Inference API

‍

Simplismart makes it incredibly easy to access DeepSeek OCR through our optimized inference API. You'll use the OpenAI Python SDK, which means if you're already familiar with OpenAI's API, you'll feel right at home.

‍

Prerequisites

‍

Before diving into the code, make sure you have:

‍

Simplismart Account: Sign up on the Simplismart platform (free tier available)
API Credentials: Get your API key from the API Keys settings
Python 3.7+: Ensure you have Python 3.7 or higher installed
OpenAI SDK: We'll use the OpenAI Python SDK for API calls

‍

OCR Implementation Guide

‍

1. Basic Setup and Configuration

‍

Let's start by setting up your development environment with the Simplismart cookbook repository, which contains ready-to-use examples.

‍

Step 1: Clone the Simplismart Cookbook

‍

git clone https://github.com/simpli-smart/cookbook.git cd cookbook/deepseek-ocr

Step 2: Configure Your API Credentials

‍

Rename the .env-template file to .env:

cp .env-template .env

‍

Step 3: Get Your API Details

‍

Navigate to the Simplismart Playground and select DeepSeek OCR:

‍

Click on the "Get API details" button
Go to the "API Usage" tab
Select "Python" as your language
Copy the BASE_URL and DEFAULT_HEADERS_ID values

‍

Step 4: Update Your .env File

‍

Add your credentials to the .env file:

SIMPLISMART_API_KEY=your_api_key_here SIMPLISMART_BASE_URL=your_base_url_here DEFAULT_HEADERS_ID=your_default_headers_id_here

Step 5: Install Dependencies

‍

pip install -r requirements.txt

Now you're all set to start processing documents! Let's explore different OCR scenarios.

‍

2. OCR on Local Image Files using DeepSeek OCR

‍

The most common use case is extracting text from images stored locally on your machine. The code for this is in ocr_local_image.py.

‍

How it works: When working with local files, you'll need to:

‍

Read the image file and encode it to base64 format
Send it as a data URL to the API

‍

Here's the complete code snippet:

# deepseek-ocr/ocr_local_image.py import base64 from pathlib import Path import os from openai import OpenAI from dotenv import load_dotenv load_dotenv() API_KEY = os.getenv("SIMPLISMART_API_KEY") BASE_URL = os.getenv("SIMPLISMART_BASE_URL") DEFAULT_HEADERS_ID = os.getenv("DEFAULT_HEADERS_ID") def ocr_local_image(image_path: str) -> str: """ Extract text from a local image file using DeepSeek OCR. Args: image_path: Path to the image file (jpg, png, etc.) Returns: Extracted text from the image """ # Read and encode the image image_base64 = base64.b64encode(Path(image_path).read_bytes()).decode() # Initialize the Simplismart client client = OpenAI( api_key=API_KEY, base_url=BASE_URL, default_headers={"id": DEFAULT_HEADERS_ID} ) # Prepare messages with system prompt and image messages = [ { "role": "system", "content": "You are an expert OCR assistant. Extract all text from images accurately, maintaining the original structure and formatting." }, { "role": "user", "content": [ { "type": "text", "text": "Please extract all text from this image, preserving the layout and structure." }, { "type": "image_url", "image_url": { "url": f"data:image/jpeg;base64,{image_base64}" } } ] } ] # Make the API call response = client.chat.completions.create( model="deepseek-ocr", messages=messages, max_tokens=2048, temperature=0 ) return response.choices[0].message.content # Usage example if __name__ == "__main__": # Process a receipt receipt_text = ocr_local_image("samples/receipt.jpg") print("Extracted Text:") print(receipt_text)

Running the Code

‍

Execute the script to extract text from the sample receipt:

python ocr_local_image.py

‍

Output: The script will extract and print all text from the receipt in your terminal, preserving the original format and structure.

‍

Key Technical Points:

Base64 Encoding: The image is encoded as base64 and embedded directly in the message
Data URI Format: The format data:image/jpeg;base64,{image_base64} tells the API how to decode the image
Temperature Setting: Set to 0 for deterministic, consistent OCR results

‍

3. OCR on Image URL using DeepSeek OCR

‍

For images hosted online (on cloud storage, CDNs, or public URLs), you can process them directly without downloading or encoding. This approach is faster and saves bandwidth.

‍

Use Cases:

‍

Processing images from cloud storage like AWS S3, Google Cloud Storage, or Azure Blob Storage
OCR on images served via CDN
Batch processing images from a web scraper
Integration with document management systems

‍

Here's how to implement URL-based OCR:

‍

# deepseek-ocr/ocr_image_url.py import os from openai import OpenAI from dotenv import load_dotenv load_dotenv() API_KEY = os.getenv("SIMPLISMART_API_KEY") BASE_URL = os.getenv("SIMPLISMART_BASE_URL") DEFAULT_HEADERS_ID = os.getenv("DEFAULT_HEADERS_ID") def ocr_image_url(image_url: str) -> str: """ Extract text from an image URL using DeepSeek OCR. Args: image_url: Public URL of the image Returns: Extracted text from the image """ # Initialize the Simplismart client client = OpenAI( api_key=API_KEY, base_url=BASE_URL, default_headers={"id": DEFAULT_HEADERS_ID} ) messages = [ { "role": "system", "content": "You are an expert OCR assistant. Extract all text from images accurately." }, { "role": "user", "content": [ { "type": "text", "text": "Extract all text from this image." }, { "type": "image_url", "image_url": { "url": image_url } } ] } ] response = client.chat.completions.create( model="deepseek-ocr", messages=messages, max_tokens=2048, temperature=0 ) return response.choices[0].message.content # Usage example if __name__ == "__main__": # Process an online document url = "https://simplismart-public-assets.s3.ap-south-1.amazonaws.com/logos/ocr.png" extracted_text = ocr_image_url(url) print(extracted_text)

‍

Running the Code

‍

python ocr_image_url.py

‍

The Key Difference:

‍

Instead of encoding an image file to base64, you simply pass the URL directly in the message content:

{ "type": "image_url", "image_url": { "url": image_url # Direct URL - no encoding needed } }

‍

Important Notes:

‍

The image URL must be publicly accessible (no authentication required)
For private cloud storage, consider using pre-signed URLs
Ensure the image server allows requests from Simplismart's infrastructure

‍

4. PDF Text Extraction using DeepSeek OCR

‍

PDFs are one of the most common document formats in business environments. Processing multi-page PDFs requires a two-step approach:

‍

Convert PDF Pages to Images: Using PyMuPDF (fitz), each page is rendered as a high-quality image
OCR Each Page: Process each image through DeepSeek OCR
Combine Results: Merge text from all pages into a single output

‍

This code snippet handles everything automatically:

# deepseek-ocr/pdf_to_txt.py import os import fitz import io from tqdm import tqdm from PIL import Image from openai import OpenAI import base64 from dotenv import load_dotenv load_dotenv() API_KEY = os.getenv("SIMPLISMART_API_KEY") BASE_URL = os.getenv("SIMPLISMART_BASE_URL") DEFAULT_HEADERS_ID = os.getenv("DEFAULT_HEADERS_ID") # Your config INPUT_PATH = "samples/deepseek-ocr-paper.pdf" OUTPUT_PATH = "output" PROMPT = "Convert the document to markdown." NUM_WORKERS = 4 DPI = 144 os.makedirs(OUTPUT_PATH, exist_ok=True) os.makedirs(f"{OUTPUT_PATH}/images", exist_ok=True) def pdf_to_images_high_quality(pdf_path, dpi=DPI): images = [] pdf_document = fitz.open(pdf_path) zoom = dpi / 72.0 matrix = fitz.Matrix(zoom, zoom) for page_num in range(pdf_document.page_count): page = pdf_document[page_num] pixmap = page.get_pixmap(matrix=matrix, alpha=False) img = Image.open(io.BytesIO(pixmap.tobytes("png"))) images.append(img) pdf_document.close() return images def image_to_base64(image): buffered = io.BytesIO() image.save(buffered, format="PNG") return base64.b64encode(buffered.getvalue()).decode("utf-8") def ocr_page(image, page_index): img_b64 = image_to_base64(image) client = OpenAI( api_key=API_KEY, base_url=BASE_URL, default_headers={"id": DEFAULT_HEADERS_ID} ) response = client.chat.completions.create( model="deepseek-ocr", messages=[ {"role": "user", "content": [ {"type": "text", "text": PROMPT}, {"type": "image_url", "image_url": {"url": f"data:image/png;base64,{img_b64}"}} ]} ], temperature=0.0, max_tokens=2048, ) text = response.choices[0].message.content print(text) if text: with open(f"{OUTPUT_PATH}/page_{page_index}.txt", "w", encoding="utf-8") as f: f.write(text) else: print(f"No text found for page {page_index}") return text if text else "" def main(): print("Loading PDF...") images = pdf_to_images_high_quality(INPUT_PATH) print("Running OCR via OpenAI API...") results = [] for idx, img in tqdm(list(enumerate(images)), total=len(images)): results.append(ocr_page(img, idx)) combined = "\n\n<--- Page Split --->\n\n".join(results) with open(f"{OUTPUT_PATH}/combined_ocr.txt", "w", encoding="utf-8") as f: f.write(combined) print(f"OCR completed. Output saved to {OUTPUT_PATH}") if __name__ == "__main__": main()

Running the PDF Extraction

‍

python pdf_to_txt.py

‍

What Happens During Execution:

‍

PDF Loading: The script opens the PDF and counts the total pages
Page Rendering: Each page is rendered at 144 DPI (good balance of quality and file size)
Base64 Encoding: Each rendered image is converted to base64
OCR Processing: DeepSeek OCR extracts text from each page with a progress bar
Output Generation:
- Individual page text files: output/page_0.txt, output/page_1.txt, etc.
- Combined text file: output/combined_ocr.txt with page separators

‍

Configuration Options:

‍

# Adjust these settings based on your needs INPUT_PATH = "samples/deepseek-ocr-paper.pdf" # Your PDF path OUTPUT_PATH = "output" # Output directory PROMPT = "Convert the document to markdown." # Customize output format DPI=144 # Image quality

‍

DPI Recommendations:

‍

72-100 DPI: Fast processing, good for clean, large-text documents
144 DPI (default): Balanced quality and speed, suitable for most documents
200-300 DPI: High quality, necessary for documents with small text or fine details

‍

Pro Tips:

‍

For documents with tables, use the prompt: "Extract text preserving table structure"

For technical papers, try: "Convert to markdown with proper heading hierarchy"
Adjust max_tokens based on page density (2048 for typical pages, 4096 for dense content)

‍

Conclusion

‍

DeepSeek OCR on the Simplismart platform represents a significant leap forward in document processing technology. With its impressive 800 tokens/second processing speed, advanced context understanding, and easy integration via the OpenAI SDK, it's an ideal solution for businesses looking to automate document workflows.

‍

Key Takeaways

‍

Speed: 800 tokens/second processing ensures rapid document handling
Accuracy: Advanced neural architecture delivers superior text extraction
Ease of Use: OpenAI SDK compatibility means a minimal learning curve
Versatility: Handles everything from invoices to handwritten notes
Scalability: Optimized infrastructure supports high-volume processing

‍

Next Steps

‍

Ready to supercharge your document processing workflows? Here's how to get started:

‍

Create Your Account: Sign up at app.simplismart.ai (free tier available)
Get API Credentials: Navigate to Settings → API Keys to generate your credentials
Try the Examples: Clone our cookbook repository and run the examples

‍

Resources

‍

Simplismart Documentation - Complete API reference and guides
DeepSeek OCR Paper - Technical deep dive into the model architecture
Simplismart Cookbook - Ready-to-use code examples
Simplismart Playground - Interact with DeepSeek OCR on Simplismart Playground

‍

Ready to transform your document processing workflows? Start building with DeepSeek OCR on Simplismart today and experience the power of 800 tokens/second OCR processing! 🚀