feat: add markdown_json output format for unstructured document parsing by rwask · Pull Request #948 · airbytehq/airbyte-python-cdk

Ryan Waskewich (rwask) · 2026-03-10T21:04:01Z

feat: add markdown_json output format for unstructured document parsing

Summary

Adds a new output_format config option to UnstructuredFormat with two choices:

markdown_text (default): existing behavior, renders documents as flat Markdown text
markdown_json: outputs a JSON-serialized array of structured elements preserving type, text, and metadata fields from the unstructured parsing engine

This is an additive, non-breaking change. All existing connections continue to use markdown_text by default. The feature applies to all file-based connectors (S3, Azure Blob, GCS, SharePoint, SFTP, etc.) since they share the CDK's unstructured parser.

Files changed:

unstructured_format.py — new output_format field on the config model
unstructured_parser.py — new element-returning methods (_read_file_locally_elements, _read_file_remotely_elements, _read_file_remotely_elements_with_retries); _read_file now routes through these and conditionally calls json.dumps or _render_markdown
test_unstructured_parser.py — three new test cases covering schema inference, local parsing, and remote API parsing with markdown_json

Review & Testing Checklist for Human

Code path change for default markdown_text mode: _read_file now calls _read_file_remotely_elements_with_retries (new) instead of _read_file_remotely_with_retries (old) for ALL API calls, not just markdown_json. The old _read_file_remotely_with_retries → _read_file_remotely path is now only used by check_config. Verify this doesn't introduce regressions for the default text output.
MD/TXT file handling: When filetype in {FileType.MD, FileType.TXT}, the raw file content is returned as-is, ignoring the output_format setting. Users selecting markdown_json for .md or .txt files will still get plain text, not JSON. Is this the intended behavior or should these files be parsed into JSON elements too?
Code duplication: _read_file_remotely_elements duplicates the HTTP call logic from _read_file_remotely. Consider whether this should be refactored to avoid drift (e.g., have _read_file_remotely call _read_file_remotely_elements internally).
Integration testing: Changes were only unit-tested with mocked unstructured library. Consider testing with a real connector (e.g., source-s3 with a PDF in a test bucket) to verify the JSON output is valid and contains expected element types/metadata.

Notes

This feature was requested for a POC with unstructured documents where downstream processing of structured JSON elements (with types like Title, NarrativeText, ListItem, and metadata like page numbers) is significantly easier than re-parsing Markdown text.
The implementation preserves backward compatibility by defaulting to markdown_text.
All existing unit tests pass with the default format, and new tests cover the markdown_json path.

Devin session | Requested by Ryan Waskewich (@rwask)

Summary by CodeRabbit

New Features
- Added a configurable output format for parsed document content: choose rendered Markdown (default) or a JSON-structured format of individual document elements. MD/TXT inputs can emit the JSON-element format when selected.
Tests
- Added tests covering the new JSON-structured output path, including schema inference and both local and remote parsing behaviors.

Add a new output_format config option to UnstructuredFormat with two choices: - markdown_text (default, backward-compatible): renders documents as flat Markdown text - markdown_json: outputs a JSON array of structured elements with type, text, and metadata fields This preserves document structure (element types, page numbers, coordinates, section depth) for easier downstream processing. The change is additive and non-breaking. Co-Authored-By: Ryan Waskewich <ryan.waskewich@airbyte.io>

devin-ai-integration · 2026-03-10T21:04:05Z

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

Disable automatic comment and CI monitoring

github-actions · 2026-03-10T21:04:14Z

👋 Greetings, Airbyte Team Member!

Here are some helpful tips and reminders for your convenience.

💡 Show Tips and Tricks

Testing This CDK Version

You can test this version of the CDK using the following:

# Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@devin/1773176340-unstructured-markdown-json-output#egg=airbyte-python-cdk[dev]' --help

# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch devin/1773176340-unstructured-markdown-json-output

PR Slash Commands

Airbyte Maintainers can execute the following slash commands on your PR:

/autofix - Fixes most formatting and linting issues
/poetry-lock - Updates poetry.lock file
/test - Runs connector tests with the updated CDK
/prerelease - Triggers a prerelease publish with default arguments
/poe build - Regenerate git-committed build artifacts, such as the pydantic models which are generated from the manifest JSON schema in YAML.
/poe <command> - Runs any poe command in the CDK environment

📚 Show Repo Guidance

Helpful Resources

CDK API Reference

📝 Edit this welcome message.

…wn_json When output_format='markdown_json', MD and TXT files now return a JSON array with a single NarrativeText element instead of raw text, ensuring consistent output format across all file types. Co-Authored-By: Ryan Waskewich <ryan.waskewich@airbyte.io>

coderabbitai · 2026-03-10T21:12:05Z

Warning

Rate limit exceeded

@devin-ai-integration[bot] has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 5 minutes and 31 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: c4373bea-9e9b-4a69-a5c7-1492cb307096

📥 Commits

Reviewing files that changed from the base of the PR and between e484f05 and 25bb2d1.

📒 Files selected for processing (2)

airbyte_cdk/sources/file_based/file_types/unstructured_parser.py
unit_tests/sources/file_based/file_types/test_unstructured_parser.py

📝 Walkthrough

Walkthrough

Adds an output_format option to UnstructuredFormat and extends the unstructured parser to produce either rendered Markdown or JSON-encoded element lists. Introduces element-oriented local/remote read methods, remote retries/422-as-parse-error handling, and tests for the new JSON output path.

Changes

Cohort / File(s)	Summary
Configuration Model `airbyte_cdk/sources/file_based/config/unstructured_format.py`	Adds public `output_format: str` Field (enum `["markdown_text","markdown_json"]`, default `"markdown_text"`, always shown) to expose desired parsed-document output format.
Parser Implementation `airbyte_cdk/sources/file_based/file_types/unstructured_parser.py`	Imports `json`. Adds element-based methods `_read_file_locally_elements`, `_read_file_remotely_elements`, `_read_file_remotely_elements_with_retries` (backoff-wrapped). Updates `infer_schema` and `_read_file` to use `format.output_format`; supports returning JSON-encoded element lists when `markdown_json`. Treats HTTP 422 as parse error, propagates `RecordParseError` vs. other errors.
Tests / Scenarios `unit_tests/sources/file_based/file_types/test_unstructured_parser.py`, `unit_tests/sources/file_based/scenarios/csv_scenarios.py`	Adds tests for `markdown_json` (infer_schema, local/remote parsing, MD/TXT behavior) and adds `output_format` option in CSV scenario schema. Note: test file contains duplicated test blocks for `markdown_json` paths.

Sequence Diagram(s)

sequenceDiagram
    participant Source as File Source
    participant Parser as Unstructured Parser
    participant Local as Local Processor
    participant Remote as Remote API
    participant Output as Formatter

    Source->>Parser: parse_records(file, format.output_format)
    alt local
        Parser->>Local: _read_file_locally_elements(file)
        Local-->>Parser: List[Dict] elements
    else remote
        Parser->>Remote: _read_file_remotely_elements_with_retries(file)
        Remote-->>Parser: List[Dict] elements
    end
    Parser->>Parser: if output_format == "markdown_json"
    alt markdown_json
        Parser->>Output: json.dumps(elements)
    else markdown_text
        Parser->>Output: render_markdown_from(elements)
    end
    Output-->>Source: formatted content

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Would you like me to suggest deduplicating the duplicated markdown_json tests and add a short doc note about the new output_format behavior, wdyt?

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 40.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The pull request title accurately and concisely describes the main change: adding a new markdown_json output format for unstructured document parsing.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch devin/1773176340-unstructured-markdown-json-output

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@airbyte_cdk/sources/file_based/file_types/unstructured_parser.py`:
- Around line 151-156: The content_description assignment advertises JSON when
format.output_format == "markdown_json" even though _read_file() returns raw
text for FileType.MD and FileType.TXT; update the logic in
unstructured_parser.py to align the schema description with actual return types
by checking the file type (e.g., FileType.MD or FileType.TXT) as well as
format.output_format before setting content_description, ensure
content_description describes plain markdown/text for those file types
regardless of output_format, and keep the markdown_json description only for
file types that actually yield JSON-structured records.
- Around line 339-355: The retry wrapper
_read_file_remotely_elements_with_retries currently reuses the same file_handle
across attempts causing requests.post to resend an EOF/partial stream; before
each retry attempt rewind the stream (call file_handle.seek(0)) or, if the
handle is not seekable, buffer the contents (e.g., into a BytesIO) and use that
buffer for each attempt so the uploaded file is complete on retries; apply the
same change to the other retry wrapper that delegates to the remote read (the
similar retry function for the chunked/alternative remote-read path) so both
retry paths reset or rebuffer the file_handle before each request attempt.

In `@unit_tests/sources/file_based/file_types/test_unstructured_parser.py`:
- Around line 782-786: The test fixture is passing the wrong keyword arg to
UnstructuredFormat; update the config.format assignment to use
skip_unprocessable_files (not skip_unprocessable_file_types) so the flag is
actually set—i.e., change the UnstructuredFormat(...) call where config.format
is assigned to pass skip_unprocessable_files=False while leaving the rest
(output_format and processing / APIProcessingConfigModel) unchanged.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: c5eaa411-e18d-490f-a2a4-a69101caf172

📥 Commits

Reviewing files that changed from the base of the PR and between f550424 and 1afcc98.

📒 Files selected for processing (3)

airbyte_cdk/sources/file_based/config/unstructured_format.py
airbyte_cdk/sources/file_based/file_types/unstructured_parser.py
unit_tests/sources/file_based/file_types/test_unstructured_parser.py

airbyte_cdk/sources/file_based/file_types/unstructured_parser.py

coderabbitai · 2026-03-10T21:12:07Z

airbyte_cdk/sources/file_based/file_types/unstructured_parser.py

+    @backoff.on_exception(
+        backoff.expo, requests.exceptions.RequestException, max_tries=5, giveup=user_error
+    )
+    def _read_file_remotely_elements_with_retries(
+        self,
+        file_handle: IOBase,
+        format: APIProcessingConfigModel,
+        filetype: FileType,
+        strategy: str,
+        remote_file: RemoteFile,
+    ) -> List[Dict[str, Any]]:
+        """
+        Read a file remotely and return the raw JSON elements, retrying up to 5 times if the error is not caused by user error.
+        """
+        return self._read_file_remotely_elements(
+            file_handle, format, filetype, strategy, remote_file
+        )


⚠️ Potential issue | 🟠 Major

Could we rewind file_handle before each remote attempt, wdyt?

This retry path reuses the same stream across attempts, but requests.post(..., files=...) consumes it. After the first transient failure, later retries can resend EOF/partial data and make recovery impossible.

Suggested fix

def _read_file_remotely_elements( self, file_handle: IOBase, format: APIProcessingConfigModel, filetype: FileType, strategy: str, remote_file: RemoteFile, ) -> List[Dict[str, Any]]: + file_handle.seek(0) headers = {"accept": "application/json", "unstructured-api-key": format.api_key} data = self._params_to_dict(format.parameters, strategy) file_data = {"files": ("filename", file_handle, FILETYPE_TO_MIMETYPE[filetype])}

Also applies to: 386-410

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@airbyte_cdk/sources/file_based/file_types/unstructured_parser.py` around lines 339 - 355, The retry wrapper _read_file_remotely_elements_with_retries currently reuses the same file_handle across attempts causing requests.post to resend an EOF/partial stream; before each retry attempt rewind the stream (call file_handle.seek(0)) or, if the handle is not seekable, buffer the contents (e.g., into a BytesIO) and use that buffer for each attempt so the uploaded file is complete on retries; apply the same change to the other retry wrapper that delegates to the remote read (the similar retry function for the chunked/alternative remote-read path) so both retry paths reset or rebuffer the file_handle before each request attempt.

This is a pre-existing pattern — the existing _read_file_remotely method (line 361) has the same behavior and doesn't add seek(0) before the request either. The new _read_file_remotely_elements method follows the same pattern for consistency. Addressing the retry seek behavior for both methods would be a separate improvement outside the scope of this PR.

unit_tests/sources/file_based/file_types/test_unstructured_parser.py

…_files in test Co-Authored-By: Ryan Waskewich <ryan.waskewich@airbyte.io>

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

airbyte_cdk/sources/file_based/file_types/unstructured_parser.py (1)
346-359: ⚠️ Potential issue | 🟠 Major

Could we rewind or rebuffer the upload stream before each retry, wdyt?

This wrapper retries with the same file_handle. requests.post(..., files=...) consumes that stream, so after the first transient failure the next attempt can resend EOF/partial content and make recovery impossible on the new remote path.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@airbyte_cdk/sources/file_based/file_types/unstructured_parser.py` around
lines 346 - 359, The wrapper _read_file_remotely_elements_with_retries reuses
the same file_handle across retries but does not rewind or rebuffer it, so
subsequent requests may send EOF/partial data after the first attempt; before
each retry call to _read_file_remotely_elements you should reset the upload
stream: if file_handle is seekable call file_handle.seek(0), otherwise replace
it with a fresh in-memory or temporary buffered stream (e.g., read into BytesIO
or tempfile and use that) so each requests.post(..., files=...) sees the full
content; ensure this rewind/buffering happens before every retry attempt in
_read_file_remotely_elements_with_retries.

🧹 Nitpick comments (1)

unit_tests/sources/file_based/file_types/test_unstructured_parser.py (1)
809-839: Could we parametrize this over FileType.MD and FileType.TXT, wdyt?

The production branch handles both extensions, but this regression test only locks in .md. Covering .txt too would better protect the compatibility fix called out in the PR.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@unit_tests/sources/file_based/file_types/test_unstructured_parser.py` around
lines 809 - 839, Parametrize the test_parse_records_markdown_json_md_file test
to run for both FileType.MD and FileType.TXT: replace the single-case setup of
mock_detect_filetype.return_value = FileType.MD with a pytest.mark.parametrize
over a variable like file_type (values FileType.MD and FileType.TXT), set
mock_detect_filetype.return_value = file_type, and adjust the fake_file URI
extension accordingly (e.g., use ".md" when file_type is FileType.MD and ".txt"
when FileType.TXT) so UnstructuredParser.parse_records is exercised for both
extensions.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@airbyte_cdk/sources/file_based/file_types/unstructured_parser.py`:
- Around line 413-414: The code currently returns response.json() (assigned to
json_response) without verifying its type; update the logic after calling
response.json() in unstructured_parser.py to assert the result is a list (use
isinstance(json_response, list)), and if it's not a list raise a descriptive
exception (e.g., ValueError or RuntimeError) that includes the unexpected
response content (or response.text) so callers see the mismatch instead of
silently receiving a dict; ensure the function no longer returns non-list values
and references the json_response variable and the response.json() call in the
error path.

---

Duplicate comments:
In `@airbyte_cdk/sources/file_based/file_types/unstructured_parser.py`:
- Around line 346-359: The wrapper _read_file_remotely_elements_with_retries
reuses the same file_handle across retries but does not rewind or rebuffer it,
so subsequent requests may send EOF/partial data after the first attempt; before
each retry call to _read_file_remotely_elements you should reset the upload
stream: if file_handle is seekable call file_handle.seek(0), otherwise replace
it with a fresh in-memory or temporary buffered stream (e.g., read into BytesIO
or tempfile and use that) so each requests.post(..., files=...) sees the full
content; ensure this rewind/buffering happens before every retry attempt in
_read_file_remotely_elements_with_retries.

---

Nitpick comments:
In `@unit_tests/sources/file_based/file_types/test_unstructured_parser.py`:
- Around line 809-839: Parametrize the test_parse_records_markdown_json_md_file
test to run for both FileType.MD and FileType.TXT: replace the single-case setup
of mock_detect_filetype.return_value = FileType.MD with a
pytest.mark.parametrize over a variable like file_type (values FileType.MD and
FileType.TXT), set mock_detect_filetype.return_value = file_type, and adjust the
fake_file URI extension accordingly (e.g., use ".md" when file_type is
FileType.MD and ".txt" when FileType.TXT) so UnstructuredParser.parse_records is
exercised for both extensions.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 06a83371-f490-4862-a339-141b4f6b827f

📥 Commits

Reviewing files that changed from the base of the PR and between 1afcc98 and dc9bb1d.

📒 Files selected for processing (2)

airbyte_cdk/sources/file_based/file_types/unstructured_parser.py
unit_tests/sources/file_based/file_types/test_unstructured_parser.py

coderabbitai · 2026-03-10T21:20:02Z

airbyte_cdk/sources/file_based/file_types/unstructured_parser.py

+        json_response: List[Dict[str, Any]] = response.json()
+        return json_response


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cd /tmp && find . -name "unstructured_parser.py" 2>/dev/null | head -5

Repository: airbytehq/airbyte-python-cdk

Length of output: 54

🏁 Script executed:

# Let's search for the file in the repository find . -name "unstructured_parser.py" 2>/dev/null

Repository: airbytehq/airbyte-python-cdk

Length of output: 138

🏁 Script executed:

# Get context around lines 413-414 cat -n airbyte_cdk/sources/file_based/file_types/unstructured_parser.py | sed -n '400,430p'

Repository: airbytehq/airbyte-python-cdk

Length of output: 1487

🏁 Script executed:

# Let's also check if there's validation elsewhere and understand the full method cat -n airbyte_cdk/sources/file_based/file_types/unstructured_parser.py | sed -n '380,440p'

Repository: airbytehq/airbyte-python-cdk

Length of output: 2964

🏁 Script executed:

# Search for markdown_json function and its usage rg "markdown_json" --type py -B 5 -A 10

Repository: airbytehq/airbyte-python-cdk

Length of output: 21328

🏁 Script executed:

# Check what happens to the returned value from _read_file_remotely_elements cat -n airbyte_cdk/sources/file_based/file_types/unstructured_parser.py | sed -n '290,330p'

Repository: airbytehq/airbyte-python-cdk

Length of output: 2039

🏁 Script executed:

# Check the _render_markdown method to see how it handles the data rg "_render_markdown" -B 2 -A 10 airbyte_cdk/sources/file_based/file_types/unstructured_parser.py

Repository: airbytehq/airbyte-python-cdk

Length of output: 2755

🏁 Script executed:

# Let's see the complete parse_records flow cat -n airbyte_cdk/sources/file_based/file_types/unstructured_parser.py | sed -n '290,350p'

Repository: airbytehq/airbyte-python-cdk

Length of output: 2956

🏁 Script executed:

# Check if there are other places where response.json() is called without validation rg "response\.json\(\)" airbyte_cdk/sources/file_based/file_types/unstructured_parser.py -B 3 -A 3

Repository: airbytehq/airbyte-python-cdk

Length of output: 899

🏁 Script executed:

# Let's also check what happens if json.dumps is called on a dict vs a list python3 << 'EOF' import json # Test what happens with dict dict_response = {"detail": "error"} list_response = [{"type": "Title", "text": "heading"}] print("json.dumps on dict:") print(json.dumps(dict_response)) print("\njson.dumps on list:") print(json.dumps(list_response)) # What happens when we iterate? print("\n\nIterating over dict:") for item in dict_response: print(f" {item}") print("\nIterating over list:") for item in list_response: print(f" {item}") EOF

Repository: airbytehq/airbyte-python-cdk

Length of output: 264

Validate that the API response is actually a list before returning it.

Right now if the API returns a 200 with {"detail": "..."} instead of an array, the code silently emits a JSON object even though the schema promises a JSON array of elements. Since json.dumps() works on both dicts and lists, the error goes undetected downstream. Could we add a check that response.json() returns a list, wdyt?

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@airbyte_cdk/sources/file_based/file_types/unstructured_parser.py` around lines 413 - 414, The code currently returns response.json() (assigned to json_response) without verifying its type; update the logic after calling response.json() in unstructured_parser.py to assert the result is a list (use isinstance(json_response, list)), and if it's not a list raise a descriptive exception (e.g., ValueError or RuntimeError) that includes the unexpected response content (or response.text) so callers see the mismatch instead of silently receiving a dict; ensure the function no longer returns non-list values and references the json_response variable and the response.json() call in the error path.

github-actions · 2026-03-10T21:20:32Z

PyTest Results (Fast)

3 909 tests +17 3 897 ✅ +17 6m 10s ⏱️ -36s
1 suites ± 0 12 💤 ± 0
1 files ± 0 0 ❌ ± 0

Results for commit 25bb2d1. ± Comparison against base commit f550424.

♻️ This comment has been updated with latest results.

…red format Co-Authored-By: Ryan Waskewich <ryan.waskewich@airbyte.io>

…r wrapping Co-Authored-By: Ryan Waskewich <ryan.waskewich@airbyte.io>

…edText in test Co-Authored-By: Ryan Waskewich <ryan.waskewich@airbyte.io>

github-actions · 2026-03-10T21:49:39Z

PyTest Results (Full)

3 912 tests +17 3 900 ✅ +17 11m 2s ⏱️ -12s
1 suites ± 0 12 💤 ± 0
1 files ± 0 0 ❌ ± 0

Results for commit 25bb2d1. ± Comparison against base commit f550424.

devin-ai-integration bot assigned Ryan Waskewich (rwask) Mar 10, 2026

This comment was marked as resolved.

Sign in to view

coderabbitai bot requested changes Mar 10, 2026

View reviewed changes

fix: correct skip_unprocessable_file_types typo to skip_unprocessable…

dc9bb1d

…_files in test Co-Authored-By: Ryan Waskewich <ryan.waskewich@airbyte.io>

coderabbitai bot requested changes Mar 10, 2026

View reviewed changes

fix: update expected spec to include output_format field in unstructu…

e484f05

…red format Co-Authored-By: Ryan Waskewich <ryan.waskewich@airbyte.io>

This comment was marked as resolved.

Sign in to view

devin-ai-integration bot and others added 2 commits March 10, 2026 21:27

fix: move rendering/serialization inside try/except for API path erro…

eecb6c2

…r wrapping Co-Authored-By: Ryan Waskewich <ryan.waskewich@airbyte.io>

fix: correct element type assertion from NarrativeText to Uncategoriz…

25bb2d1

…edText in test Co-Authored-By: Ryan Waskewich <ryan.waskewich@airbyte.io>

		json_response: List[Dict[str, Any]] = response.json()
		return json_response

Conversation

Ryan Waskewich (rwask) commented Mar 10, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

feat: add markdown_json output format for unstructured document parsing

Summary

Review & Testing Checklist for Human

Notes

Summary by CodeRabbit

Uh oh!

devin-ai-integration bot commented Mar 10, 2026

🤖 Devin AI Engineer

Uh oh!

github-actions bot commented Mar 10, 2026

👋 Greetings, Airbyte Team Member!

Testing This CDK Version

PR Slash Commands

Helpful Resources

Uh oh!

This comment was marked as resolved.

Uh oh!

coderabbitai bot commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration bot Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PyTest Results (Fast)

Uh oh!

This comment was marked as resolved.

Uh oh!

github-actions bot commented Mar 10, 2026

PyTest Results (Full)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Ryan Waskewich (rwask) commented Mar 10, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 10, 2026 •

edited

Loading

github-actions bot commented Mar 10, 2026 •

edited

Loading