microsoft · hankl · May 10, 2026 · May 12, 2026
diff --git a/packages/markitdown-ocr/README.md b/packages/markitdown-ocr/README.md
@@ -1,8 +1,11 @@
 # MarkItDown OCR Plugin
 
-LLM Vision plugin for MarkItDown that extracts text from images embedded in PDF, DOCX, PPTX, and XLSX files.
+OCR plugin for MarkItDown that extracts text from images embedded in PDF, DOCX, PPTX, and XLSX files.
 
-Uses the same `llm_client` / `llm_model` pattern that MarkItDown already supports for image descriptions — no new ML libraries or binary dependencies required.
+Supports **two OCR providers**:
+
+- **glm-ocr** — ZhiPu AI's specialized layout parsing model (better table recognition, lower cost)
+- **LLM Vision** — Any OpenAI-compatible vision model (GPT-4o, Gemini, etc.)
 
 ## Features
 
@@ -11,28 +14,65 @@ Uses the same `llm_client` / `llm_model` pattern that MarkItDown already support
 - **Enhanced PPTX Converter**: OCR for images in PowerPoint presentations
 - **Enhanced XLSX Converter**: OCR for images in Excel spreadsheets
 - **Context Preservation**: Maintains document structure and flow when inserting extracted text
+- **Multiple Providers**: Choose glm-ocr for best table/Chinese text recognition, or LLM Vision for general use
 
 ## Installation
 
 ```bash
 pip install markitdown-ocr
 ```
 
-The plugin uses whatever OpenAI-compatible client you already have. Install one if you don't have it yet:
+Then install at least one OCR provider:
 
 ```bash
-pip install openai
+# Option 1: glm-ocr (recommended for Chinese documents and tables)
+pip install markitdown-ocr[glmocr]
+
+# Option 2: LLM Vision (general purpose, any OpenAI-compatible model)
+pip install markitdown-ocr[llm]
 ```
 
 ## Usage
 
-### Command Line
+### Using glm-ocr Provider (Recommended)
+
+glm-ocr uses ZhiPu AI's specialized layout parsing model — better table recognition, structured output, and lower cost.
+
+**Via environment variable:**
 
 ```bash
-markitdown document.pdf --use-plugins --llm-client openai --llm-model gpt-4o
+export GLMOCR_API_KEY="your-zhipu-api-key"
+markitdown document.pdf --use-plugins
+```
+
+**Via Python API:**
+
+```python
+from markitdown import MarkItDown
+
+# Option 1: Pass API key directly
+md = MarkItDown(
+    enable_plugins=True,
+    glmocr_api_key="your-zhipu-api-key",
+)
+
+# Option 2: Use environment variable (GLMOCR_API_KEY)
+md = MarkItDown(enable_plugins=True)
+
+result = md.convert("document_with_tables.pdf")
+print(result.text_content)
+```
+
+**Via config file** (`pyproject.toml`):
+
+```toml
+[tool.markitdown-ocr.glmocr]
+# api_key = ""  # Recommended: use env var GLMOCR_API_KEY instead
+model = "glm-ocr"
+timeout = 120
 ```
 
-### Python API
+### Using LLM Vision Provider
 
 Pass `llm_client` and `llm_model` to `MarkItDown()` exactly as you would for image descriptions:
 
@@ -50,9 +90,22 @@ result = md.convert("document_with_images.pdf")
 print(result.text_content)
 ```
 
-If no `llm_client` is provided the plugin still loads, but OCR is silently skipped — falling back to the standard built-in converter.
+### Provider Priority
 
-### Custom Prompt
+When both providers are configured, **glm-ocr takes priority**. To force LLM Vision instead, simply don't set `glmocr_api_key`:
+
+```python
+md = MarkItDown(
+    enable_plugins=True,
+    llm_client=OpenAI(),
+    llm_model="gpt-4o",
+    # glmocr_api_key not set → uses LLM Vision
+)
+```
+
+If no provider is configured, the plugin still loads but OCR is silently skipped — falling back to the standard built-in converter.
+
+### Custom Prompt (LLM Vision only)
 
 Override the default extraction prompt for specialized documents:
 
@@ -85,27 +138,34 @@ md = MarkItDown(
 
 ## How It Works
 
-When `MarkItDown(enable_plugins=True, llm_client=..., llm_model=...)` is called:
+### Provider Selection
+
+When `MarkItDown(enable_plugins=True, ...)` is called:
 
 1. MarkItDown discovers the plugin via the `markitdown.plugin` entry point group
-2. It calls `register_converters()`, forwarding all kwargs including `llm_client` and `llm_model`
-3. The plugin creates an `LLMVisionOCRService` from those kwargs
+2. It calls `register_converters()`, forwarding all kwargs
+3. The plugin selects an OCR provider:
+   - If `glmocr_api_key` or `GLMOCR_API_KEY` is set → **GlmOcrService** (zai-sdk + glm-ocr)
+   - Else if `llm_client` + `llm_model` are set → **LLMVisionOCRService** (OpenAI-compatible)
+   - Else → no OCR (standard text extraction)
 4. Four OCR-enhanced converters are registered at **priority -1.0** — before the built-in converters at priority 0.0
 
+### Conversion Flow
+
 When a file is converted:
 
 1. The OCR converter accepts the file
 2. It extracts embedded images from the document
-3. Each image is sent to the LLM with an extraction prompt
+3. Each image is sent to the selected OCR provider
 4. The returned text is inserted inline, preserving document structure
-5. If the LLM call fails, conversion continues without that image's text
+5. If the OCR call fails, conversion continues without that image's text
 
 ## Supported File Formats
 
 ### PDF
 
 - Embedded images are extracted by position (via `page.images` / page XObjects) and OCR'd inline, interleaved with the surrounding text in vertical reading order.
-- **Scanned PDFs** (pages with no extractable text) are detected automatically: each page is rendered at 300 DPI and sent to the LLM as a full-page image.
+- **Scanned PDFs** (pages with no extractable text) are detected automatically: each page is rendered at 300 DPI and sent to the OCR provider as a full-page image.
 - **Malformed PDFs** that pdfplumber/pdfminer cannot open (e.g. truncated EOF) are retried with PyMuPDF page rendering, so content is still recovered.
 
 ### DOCX
@@ -136,21 +196,45 @@ Every extracted OCR block is wrapped as:
 [End OCR]*
 ```
 
+## Configuration Reference
+
+### glm-ocr Provider
+
+| Parameter | Env Variable | Default | Description |
+|-----------|-------------|---------|-------------|
+| `glmocr_api_key` | `GLMOCR_API_KEY` | — | ZhiPu AI API key (required) |
+| `glmocr_model` | `GLMOCR_MODEL` | `"glm-ocr"` | Model name |
+| `glmocr_timeout` | `GLMOCR_TIMEOUT` | `120` | Request timeout (seconds) |
+
+### LLM Vision Provider
+
+| Parameter | Description |
+|-----------|-------------|
+| `llm_client` | OpenAI-compatible client instance |
+| `llm_model` | Model name (e.g., `'gpt-4o'`) |
+| `llm_prompt` | Custom extraction prompt |
+
 ## Troubleshooting
 
 ### OCR text missing from output
 
-The most likely cause is a missing `llm_client` or `llm_model`. Verify:
+The most likely cause is a missing provider configuration. Verify:
 
 ```python
+# For glm-ocr
+md = MarkItDown(enable_plugins=True, glmocr_api_key="your-key")
+
+# For LLM Vision
 from openai import OpenAI
-from markitdown import MarkItDown
+md = MarkItDown(enable_plugins=True, llm_client=OpenAI(), llm_model="gpt-4o")
+```
 
-md = MarkItDown(
-    enable_plugins=True,
-    llm_client=OpenAI(),   # required
-    llm_model="gpt-4o",    # required
-)
+### glm-ocr import error
+
+Make sure zai-sdk is installed:
+
+```bash
+pip install markitdown-ocr[glmocr]
 ```
 
 ### Plugin not loading
@@ -163,7 +247,7 @@ markitdown --list-plugins   # should show: ocr
 
 ### API errors
 
-The plugin propagates LLM API errors as warnings and continues conversion. Check your API key, quota, and that the chosen model supports vision inputs.
+The plugin propagates OCR API errors as warnings and continues conversion. Check your API key, quota, and that the chosen model supports vision inputs.
 
 ## Development
 
@@ -192,6 +276,15 @@ MIT — see [LICENSE](LICENSE).
 
 ## Changelog
 
+### 0.2.0
+
+- **Added glm-ocr provider**: ZhiPu AI layout parsing via zai-sdk
+- Provider selection: glm-ocr (priority) → LLM Vision (fallback)
+- New `GlmOcrService` class with `extract_text()` interface
+- New `GlmOcrConfig` for configuration management (env vars + TOML + kwargs)
+- HTML → Markdown conversion for glm-ocr structured output
+- Optional dependency: `markitdown-ocr[glmocr]`
+
 ### 0.1.0 (Initial Release)
 
 - LLM Vision OCR for PDF, DOCX, PPTX, XLSX

diff --git a/packages/markitdown-ocr/pyproject.toml b/packages/markitdown-ocr/pyproject.toml
@@ -5,11 +5,11 @@ build-backend = "hatchling.build"
 [project]
 name = "markitdown-ocr"
 dynamic = ["version"]
-description = 'OCR plugin for MarkItDown - Extracts text from images in PDF, DOCX, PPTX, and XLSX via LLM Vision'
+description = 'OCR plugin for MarkItDown - Extracts text from images in PDF, DOCX, PPTX, and XLSX via LLM Vision or glm-ocr'
 readme = "README.md"
 requires-python = ">=3.10"
 license = "MIT"
-keywords = ["markitdown", "ocr", "pdf", "docx", "xlsx", "pptx", "llm", "vision"]
+keywords = ["markitdown", "ocr", "pdf", "docx", "xlsx", "pptx", "llm", "vision", "glm-ocr", "zhipu"]
 authors = [
   { name = "Contributors", email = "noreply@github.com" },
 ]
@@ -43,6 +43,9 @@ dependencies = [
 llm = [
   "openai>=1.0.0",
 ]
+glmocr = [
+  "zai-sdk>=0.2.2",
+]
 
 [project.urls]
 Documentation = "https://github.com/microsoft/markitdown#readme"
@@ -55,3 +58,9 @@ path = "src/markitdown_ocr/__about__.py"
 # CRITICAL: Plugin entry point - MarkItDown will discover this plugin through this entry point
 [project.entry-points."markitdown.plugin"]
 ocr = "markitdown_ocr"
+
+# glm-ocr provider configuration (also supports environment variables)
+[tool.markitdown-ocr.glmocr]
+# api_key = ""  # Recommended: set via environment variable GLMOCR_API_KEY
+model = "glm-ocr"
+timeout = 600
diff --git a/packages/markitdown-ocr/src/markitdown_ocr/__init__.py b/packages/markitdown-ocr/src/markitdown_ocr/__init__.py
@@ -4,15 +4,20 @@
 """
 markitdown-ocr: OCR plugin for MarkItDown
 
-Adds LLM Vision-based text extraction from images embedded in PDF, DOCX, PPTX, and XLSX files.
+Adds text extraction from images embedded in PDF, DOCX, PPTX, and XLSX files.
+Supports multiple OCR providers:
+- LLM Vision (OpenAI-compatible: GPT-4o, Gemini, etc.)
+- glm-ocr (ZhiPu AI layout parsing: better table recognition, lower cost)
 """
 
 from ._plugin import __plugin_interface_version__, register_converters
 from .__about__ import __version__
 from ._ocr_service import (
     OCRResult,
     LLMVisionOCRService,
+    GlmOcrService,
 )
+from ._glmocr_config import GlmOcrConfig
 from ._pdf_converter_with_ocr import PdfConverterWithOCR
 from ._docx_converter_with_ocr import DocxConverterWithOCR
 from ._pptx_converter_with_ocr import PptxConverterWithOCR
@@ -24,6 +29,8 @@
     "register_converters",
     "OCRResult",
     "LLMVisionOCRService",
+    "GlmOcrService",
+    "GlmOcrConfig",
     "PdfConverterWithOCR",
     "DocxConverterWithOCR",
     "PptxConverterWithOCR",

diff --git a/packages/markitdown-ocr/src/markitdown_ocr/_glmocr_config.py b/packages/markitdown-ocr/src/markitdown_ocr/_glmocr_config.py
@@ -0,0 +1,93 @@
+"""Configuration management for glm-ocr provider."""
+
+import os
+from dataclasses import dataclass
+from pathlib import Path
+from typing import Optional
+
+try:
+    import tomllib  # Python 3.11+
+except ImportError:
+    try:
+        import tomli as tomllib  # type: ignore[no-redef]
+    except ImportError:
+        tomllib = None  # type: ignore[assignment]
+
+
+@dataclass
+class GlmOcrConfig:
+    """glm-ocr provider configuration for markitdown-ocr.
+
+    Config sources (priority high to low):
+    1. kwargs parameters (passed at registration time)
+    2. Environment variables
+    3. Config file (pyproject.toml [tool.markitdown-ocr.glmocr] section)
+    4. Default values
+    """
+
+    api_key: str = ""
+    model: str = "glm-ocr"
+    timeout: int = 120
+
+    @classmethod
+    def load(cls, config_path: Optional[str] = None) -> "GlmOcrConfig":
+        """Load configuration from multiple sources."""
+        config = cls()
+        config._load_from_file(config_path)
+        config._load_from_env()
+        return config
+
+    def _load_from_file(self, config_path: Optional[str] = None) -> None:
+        """Load from config file (pyproject.toml)."""
+        if tomllib is None:
+            return
+
+        search_paths: list[Path] = []
+
+        if config_path:
+            search_paths.append(Path(config_path))
+
+        # Current directory pyproject.toml
+        search_paths.append(Path("pyproject.toml"))
+
+        # User config directory
+        search_paths.append(
+            Path.home() / ".config" / "markitdown-ocr" / "config.toml"
+        )
+
+        for path in search_paths:
+            if path.exists():
+                try:
+                    with open(path, "rb") as f:
+                        data = tomllib.load(f)
+
+                    # Read [tool.markitdown-ocr.glmocr] section
+                    if "tool" in data and "markitdown-ocr" in data["tool"]:
+                        section = data["tool"]["markitdown-ocr"]
+                        glmocr_section = section.get("glmocr", {})
+                        self._apply_config(glmocr_section)
+
+                    break  # Use first found config file
+                except Exception:
+                    pass
+
+    def _apply_config(self, data: dict) -> None:
+        """Apply config values from a dict."""
+        if "api_key" in data:
+            self.api_key = data["api_key"]
+        if "model" in data:
+            self.model = data["model"]
+        if "timeout" in data:
+            self.timeout = int(data["timeout"])
+
+    def _load_from_env(self) -> None:
+        """Load from environment variables (highest priority)."""
+        if os.environ.get("GLMOCR_API_KEY"):
+            self.api_key = os.environ["GLMOCR_API_KEY"]
+        if os.environ.get("GLMOCR_MODEL"):
+            self.model = os.environ["GLMOCR_MODEL"]
+        if os.environ.get("GLMOCR_TIMEOUT"):
+            try:
+                self.timeout = int(os.environ["GLMOCR_TIMEOUT"])
+            except ValueError:
+                pass