Tech101
How to
​DeepSeek OCR on Simplismart: Lightning-Fast Document Processing at 800 Tokens/Second
Process documents faster than ever - powered by DeepSeek OCR on Simplismart.
TABLE OF CONTENTS
Regular Item
Selected Item
Last Updated
November 6, 2025

Introduction

Document processing has always been a critical challenge for businesses dealing with invoices, receipts, contracts, and various forms of paperwork. Thanks to DeepSeek OCR, a powerful and context-aware vision-language model, the limitations of traditional OCR solutions are being redefined. Unlike older systems that often struggle with complex layouts, handwritten text, or maintaining document context, DeepSeek OCR brings remarkable accuracy and flexibility, reducing errors, manual corrections, and workflow bottlenecks.

We're excited to announce that DeepSeek OCR is now officially supported on the Simplismart platform! Our inference API delivers industry-leading performance at 800 tokens per second, making it one of the fastest OCR solutions available today.

In this comprehensive tutorial, you'll learn how to leverage AI models like DeepSeek OCR’s advanced capabilities through Simplismart's optimized infrastructure. Whether you're building an expense management system, automating invoice processing, or digitizing historical documents, this guide will show you how to integrate powerful OCR into your applications using the OpenAI Python SDK.

What is DeepSeek OCR?

DeepSeek OCR is a state-of-the-art vision-language model specifically designed for optical character recognition tasks. Unlike traditional OCR engines that rely on pattern matching, DeepSeek OCR leverages neural networks to understand document context, enabling it to handle:

  • Complex document layouts (invoices, forms, tables)
  • Handwritten notes and signatures
  • Multi-column text and embedded images
  • Low-quality scans and photographs
  • Multiple languages and special characters

Architecture Breakdown

DeepSeek OCR's architecture consists of two innovative components that work together seamlessly:

1. DeepEncoder: The Vision Processing Powerhouse
DeepSeek OCR: DeepEncoder

The DeepEncoder is a high-performance vision encoder that processes document images at high resolutions. What makes it special?

  • Optical Context Compression: Instead of treating every pixel equally, DeepEncoder intelligently compresses high-resolution images into a manageable set of visual tokens
  • Semantic Preservation: While compressing, it maintains the semantic meaning of text, layout, and visual elements
  • Adaptive Resolution: Automatically adjusts to different document sizes and qualities without losing critical information

DeepSeek-OCR uses an optical “vision token” encoding that compresses text contexts by roughly 10×, where it still achieves about 97% OCR precision. When pushed to a compression ratio of 20×, accuracy drops to around 60%, highlighting the trade-off between token reduction and fidelity.

In practical terms, the model processed entire document pages with as few as ~100 vision tokens per page compared with ~256+ text tokens in prior OCR benchmarks.

2. DeepSeek-3B-MoE Decoder: Efficient Text Reconstruction

The decoder utilizes a Mixture-of-Experts (MoE) architecture, which is key to its efficiency:

  • Sparse Activation: Only activates approximately 570 million parameters per token (out of 3 billion total)
  • Specialized Experts: It employs a total of 64 routed experts and 2 shared experts
  • Context-Aware: Understands document structure to maintain proper reading order and formatting

This MoE approach enables you to harness the power of a large model with the speed of a much smaller one.

What Makes DeepSeek OCR Special?

  1. Context Understanding: Unlike traditional OCR, which reads character by character, DeepSeek OCR understands the relationship between text elements, maintaining proper structure in tables and forms.
  2. Layout Preservation: The model not only extracts text but also understands the spatial relationships, making it ideal for structured documents such as invoices or forms.
  3. Multilingual Support: Trained on diverse datasets, it handles 100 languages seamlessly without requiring language-specific models.
  4. Handwriting Recognition: Reliable recognition of handwritten text, which traditional OCR often fails at.

Getting Started with DeepSeek OCR Inference API

Simplismart makes it incredibly easy to access DeepSeek OCR through our optimized inference API. You'll use the OpenAI Python SDK, which means if you're already familiar with OpenAI's API, you'll feel right at home.

Prerequisites

Before diving into the code, make sure you have:

  1. Simplismart Account: Sign up on the Simplismart platform (free tier available)
  2. API Credentials: Get your API key from the API Keys settings
  3. Python 3.7+: Ensure you have Python 3.7 or higher installed
  4. OpenAI SDK: We'll use the OpenAI Python SDK for API calls

OCR Implementation Guide

1. Basic Setup and Configuration

Let's start by setting up your development environment with the Simplismart cookbook repository, which contains ready-to-use examples.

Step 1: Clone the Simplismart Cookbook

​git clone https://github.com/simpli-smart/cookbook.git

cd cookbook/deepseek-ocr


Step 2: Configure Your API Credentials

Rename the .env-template file to .env:

cp .env-template .env

Step 3: Get Your API Details

Navigate to the Simplismart Playground and select DeepSeek OCR:

  1. Click on the "Get API details" button
  2. Go to the "API Usage" tab
  3. Select "Python" as your language
  4. Copy the BASE_URL and DEFAULT_HEADERS_ID values

Step 4: Update Your .env File

Add your credentials to the .env file:

​SIMPLISMART_API_KEY=your_api_key_here
SIMPLISMART_BASE_URL=your_base_url_here
DEFAULT_HEADERS_ID=your_default_headers_id_here


Step 5: Install Dependencies

pip install -r requirements.txt

Now you're all set to start processing documents! Let's explore different OCR scenarios.


2. OCR on Local Image Files using DeepSeek OCR

The most common use case is extracting text from images stored locally on your machine. The code for this is in ocr_local_image.py.

How it works: When working with local files, you'll need to:

  1. Read the image file and encode it to base64 format
  2. Send it as a data URL to the API

Here's the complete code snippet:

# deepseek-ocr/ocr_local_image.py

​import base64
from pathlib import Path
import os
from openai import OpenAI
from dotenv import load_dotenv
load_dotenv()


API_KEY = os.getenv("SIMPLISMART_API_KEY")
BASE_URL = os.getenv("SIMPLISMART_BASE_URL")
DEFAULT_HEADERS_ID = os.getenv("DEFAULT_HEADERS_ID")

def ocr_local_image(image_path: str) -> str:
"""
   Extract text from a local image file using DeepSeek OCR.

   Args:
       image_path: Path to the image file (jpg, png, etc.)

   Returns:
       Extracted text from the image
   """
# Read and encode the image
   image_base64 = base64.b64encode(Path(image_path).read_bytes()).decode()

# Initialize the Simplismart client
   client = OpenAI(
api_key=API_KEY,
base_url=BASE_URL,
default_headers={"id": DEFAULT_HEADERS_ID}
)


# Prepare messages with system prompt and image
   messages = [
       {
"role": "system",
"content": "You are an expert OCR assistant. Extract all text from images accurately, maintaining the original structure and formatting."
       },
       {
"role": "user",
"content": [
               {
"type": "text",
"text": "Please extract all text from this image, preserving the layout and structure."
               },
               {
"type": "image_url",
"image_url": {
"url": f"data:image/jpeg;base64,{image_base64}"
                   }
               }
]
       }
]

# Make the API call
   response = client.chat.completions.create(
model="deepseek-ocr",
messages=messages,
max_tokens=2048,
temperature=0
)

return response.choices[0].message.content

# Usage example
if __name__ == "__main__":
# Process a receipt
   receipt_text = ocr_local_image("samples/receipt.jpg")
print("Extracted Text:")
print(receipt_text)



Running the Code

Execute the script to extract text from the sample receipt:

python ocr_local_image.py

Output: The script will extract and print all text from the receipt in your terminal, preserving the original format and structure.

Key Technical Points:

  • Base64 Encoding: The image is encoded as base64 and embedded directly in the message
  • Data URI Format: The format data:image/jpeg;base64,{image_base64} tells the API how to decode the image
  • Temperature Setting: Set to 0 for deterministic, consistent OCR results

3. OCR on Image URL using DeepSeek OCR

For images hosted online (on cloud storage, CDNs, or public URLs), you can process them directly without downloading or encoding. This approach is faster and saves bandwidth.

Use Cases:

  • Processing images from cloud storage like AWS S3, Google Cloud Storage, or Azure Blob Storage
  • OCR on images served via CDN
  • Batch processing images from a web scraper
  • Integration with document management systems

Here's how to implement URL-based OCR:

# deepseek-ocr/ocr_image_url.py
​import os
from openai import OpenAI
from dotenv import load_dotenv
load_dotenv()


API_KEY = os.getenv("SIMPLISMART_API_KEY")
BASE_URL = os.getenv("SIMPLISMART_BASE_URL")
DEFAULT_HEADERS_ID = os.getenv("DEFAULT_HEADERS_ID")

def ocr_image_url(image_url: str) -> str:
"""
   Extract text from an image URL using DeepSeek OCR.

   Args:
       image_url: Public URL of the image

   Returns:
       Extracted text from the image
   """
# Initialize the Simplismart client
client = OpenAI(
api_key=API_KEY,
base_url=BASE_URL,
default_headers={"id": DEFAULT_HEADERS_ID}
)

messages = [
       {
"role": "system",
"content": "You are an expert OCR assistant. Extract all text from images accurately."
       },
       {
"role": "user",
"content": [
               {
"type": "text",
"text": "Extract all text from this image."
               },
               {
"type": "image_url",
"image_url": {
"url": image_url
                   }
               }
]
       }
]

response = client.chat.completions.create(
model="deepseek-ocr",
messages=messages,
max_tokens=2048,
temperature=0
)

return response.choices[0].message.content

# Usage example
if __name__ == "__main__":
# Process an online document
url = "https://simplismart-public-assets.s3.ap-south-1.amazonaws.com/logos/ocr.png"
extracted_text = ocr_image_url(url)
print(extracted_text)

Running the Code

python ocr_image_url.py

The Key Difference:

Instead of encoding an image file to base64, you simply pass the URL directly in the message content:

​{
"type": "image_url",
"image_url": {
   "url": image_url # Direct URL - no encoding needed
   }
}

Important Notes:

  • The image URL must be publicly accessible (no authentication required)
  • For private cloud storage, consider using pre-signed URLs
  • Ensure the image server allows requests from Simplismart's infrastructure
4. PDF Text Extraction using DeepSeek OCR

PDFs are one of the most common document formats in business environments. Processing multi-page PDFs requires a two-step approach:

  1. Convert PDF Pages to Images: Using PyMuPDF (fitz), each page is rendered as a high-quality image
  2. OCR Each Page: Process each image through DeepSeek OCR
  3. Combine Results: Merge text from all pages into a single output

This code snippet handles everything automatically:

# deepseek-ocr/pdf_to_txt.py
​import os
import fitz
import io
from tqdm import tqdm
from PIL import Image
from openai import OpenAI
import base64
from dotenv import load_dotenv
load_dotenv()


API_KEY = os.getenv("SIMPLISMART_API_KEY")
BASE_URL = os.getenv("SIMPLISMART_BASE_URL")
DEFAULT_HEADERS_ID = os.getenv("DEFAULT_HEADERS_ID")

# Your config
INPUT_PATH = "samples/deepseek-ocr-paper.pdf"
OUTPUT_PATH = "output"
PROMPT = "Convert the document to markdown."
NUM_WORKERS = 4
DPI = 144

os.makedirs(OUTPUT_PATH, exist_ok=True)
os.makedirs(f"{OUTPUT_PATH}/images", exist_ok=True)


def pdf_to_images_high_quality(pdf_path, dpi=DPI):
   images = []
   pdf_document = fitz.open(pdf_path)
   zoom = dpi / 72.0
   matrix = fitz.Matrix(zoom, zoom)
for page_num in range(pdf_document.page_count):
       page = pdf_document[page_num]
       pixmap = page.get_pixmap(matrix=matrix, alpha=False)
       img = Image.open(io.BytesIO(pixmap.tobytes("png")))
       images.append(img)
   pdf_document.close()
return images


def image_to_base64(image):
   buffered = io.BytesIO()
   image.save(buffered, format="PNG")
return base64.b64encode(buffered.getvalue()).decode("utf-8")


def ocr_page(image, page_index):
   img_b64 = image_to_base64(image)
   client = OpenAI(
api_key=API_KEY,
base_url=BASE_URL,
default_headers={"id": DEFAULT_HEADERS_ID}
)

   response = client.chat.completions.create(
model="deepseek-ocr",
messages=[
           {"role": "user", "content": [
               {"type": "text", "text": PROMPT},
               {"type": "image_url", "image_url": {"url": f"data:image/png;base64,{img_b64}"}}
]}
],
temperature=0.0,
max_tokens=2048,
)

   text = response.choices[0].message.content
print(text)

if text:
with open(f"{OUTPUT_PATH}/page_{page_index}.txt", "w", encoding="utf-8") as f:
         f.write(text)
else:
print(f"No text found for page {page_index}")
return text if text else ""


def main():
print("Loading PDF...")
   images = pdf_to_images_high_quality(INPUT_PATH)

print("Running OCR via OpenAI API...")
   results = []
for idx, img in tqdm(list(enumerate(images)), total=len(images)):
       results.append(ocr_page(img, idx))

   combined = "\n\n<--- Page Split --->\n\n".join(results)
with open(f"{OUTPUT_PATH}/combined_ocr.txt", "w", encoding="utf-8") as f:
       f.write(combined)

print(f"OCR completed. Output saved to {OUTPUT_PATH}")


if __name__ == "__main__":
main()



Running the PDF Extraction

python pdf_to_txt.py

What Happens During Execution:

  1. PDF Loading: The script opens the PDF and counts the total pages
  2. Page Rendering: Each page is rendered at 144 DPI (good balance of quality and file size)
  3. Base64 Encoding: Each rendered image is converted to base64
  4. OCR Processing: DeepSeek OCR extracts text from each page with a progress bar
  5. Output Generation:
    • Individual page text files: output/page_0.txt, output/page_1.txt, etc.
    • Combined text file: output/combined_ocr.txt with page separators

Configuration Options:

​# Adjust these settings based on your needs
INPUT_PATH = "samples/deepseek-ocr-paper.pdf" # Your PDF path
OUTPUT_PATH = "output" # Output directory
PROMPT = "Convert the document to markdown." # Customize output format
DPI=144 # Image quality

DPI Recommendations:

  • 72-100 DPI: Fast processing, good for clean, large-text documents
  • 144 DPI (default): Balanced quality and speed, suitable for most documents
  • 200-300 DPI: High quality, necessary for documents with small text or fine details

Pro Tips:

  • For documents with tables, use the prompt: "Extract text preserving table structure"
  • For technical papers, try: "Convert to markdown with proper heading hierarchy"
  • Adjust max_tokens based on page density (2048 for typical pages, 4096 for dense content)

Conclusion

DeepSeek OCR on the Simplismart platform represents a significant leap forward in document processing technology. With its impressive 800 tokens/second processing speed, advanced context understanding, and easy integration via the OpenAI SDK, it's an ideal solution for businesses looking to automate document workflows.

Key Takeaways

  • Speed: 800 tokens/second processing ensures rapid document handling
  • Accuracy: Advanced neural architecture delivers superior text extraction
  • Ease of Use: OpenAI SDK compatibility means a minimal learning curve
  • Versatility: Handles everything from invoices to handwritten notes
  • Scalability: Optimized infrastructure supports high-volume processing

Next Steps

Ready to supercharge your document processing workflows? Here's how to get started:

  1. Create Your Account: Sign up at app.simplismart.ai (free tier available)
  2. Get API Credentials: Navigate to Settings → API Keys to generate your credentials
  3. Try the Examples: Clone our cookbook repository and run the examples

Resources

Ready to transform your document processing workflows? Start building with DeepSeek OCR on Simplismart today and experience the power of 800 tokens/second OCR processing! 🚀

Find out what is tailor-made inference for you.