Skip to content

fix: parse charset parameters case-insensitively#1844

Open
guyua9 wants to merge 1 commit into
microsoft:mainfrom
guyua9:fix/charset-parameter-case-guyua9
Open

fix: parse charset parameters case-insensitively#1844
guyua9 wants to merge 1 commit into
microsoft:mainfrom
guyua9:fix/charset-parameter-case-guyua9

Conversation

@guyua9
Copy link
Copy Markdown

@guyua9 guyua9 commented Apr 28, 2026

Summary

  • Treat charset parameters case-insensitively when parsing Content-Type response headers.
  • Normalize data URI parameter names so Charset=utf-8 is handled the same as charset=utf-8.
  • Add regression coverage for both paths.

Why

MIME parameter names are commonly handled case-insensitively, and real HTTP responses may spell charset as Charset=UTF-8. MarkItDown already uses the parsed charset as a conversion hint, so preserving it avoids falling back to charset guessing for otherwise explicit inputs.

Tests

  • uv run --with '.[all]' --with pytest --directory packages/markitdown python -m pytest tests/test_module_misc.py::test_data_uris tests/test_module_misc.py::test_response_content_type_charset_is_case_insensitive -q

@guyua9
Copy link
Copy Markdown
Author

guyua9 commented Apr 28, 2026

@microsoft-github-policy-service agree

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant