Skip to content

fix: the markitdown converter processes rss/xml docu... in...#1875

Open
orbisai0security wants to merge 1 commit into
microsoft:mainfrom
orbisai0security:fix-xxe-defusedxml-pre-process
Open

fix: the markitdown converter processes rss/xml docu... in...#1875
orbisai0security wants to merge 1 commit into
microsoft:mainfrom
orbisai0security:fix-xxe-defusedxml-pre-process

Conversation

@orbisai0security
Copy link
Copy Markdown

Summary

Fix high severity security issue in packages/markitdown/src/markitdown/converters/_markdownify.py.

Vulnerability

Field Value
ID V-001
Severity HIGH
Scanner multi_agent_ai
Rule V-001
File packages/markitdown/src/markitdown/converters/_markdownify.py:112
CWE CWE-611

Description: The markitdown converter processes RSS/XML documents at the convert_input entry point without disabling XML external entity (XXE) resolution. Python's standard xml.etree.ElementTree and lxml parsers, when used with default settings, may resolve external entity declarations embedded in XML documents. An attacker can craft a malicious RSS/XML document containing an XXE declaration that instructs the parser to read a local file (e.g., /etc/passwd, application secrets, private keys) and embed its contents in the parsed output. Additionally, recursive entity expansion (Billion Laughs attack) can exhaust server memory and CPU, causing a denial of service.

Changes

  • packages/markitdown/src/markitdown/converter_utils/docx/pre_process.py

Verification

  • Build passes
  • Scanner re-scan confirms fix
  • LLM code review passed

Automated security fix by OrbisAI Security

Automated security fix generated by Orbis Security AI
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant