fix: handle non-JSON-serializable types in serialization fallback (AI-Triage PR)#954
Draft
devin-ai-integration[bot] wants to merge 2 commits intomainfrom
Draft
Conversation
The json.dumps() fallback in airbyte_message_to_string() could also fail for types that neither orjson nor stdlib json can serialize (e.g. complex numbers), causing an unhandled exception that leads to deadlocks in the concurrent source pipeline. Add a second fallback using json.dumps(default=str) to ensure serialization never raises an unhandled exception. Co-Authored-By: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Contributor
Author
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
👋 Greetings, Airbyte Team Member!Here are some helpful tips and reminders for your convenience. 💡 Show Tips and TricksTesting This CDK VersionYou can test this version of the CDK using the following: # Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@devin/1773523110-fix-serialization-fallback-complex-types#egg=airbyte-python-cdk[dev]' --help
# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch devin/1773523110-fix-serialization-fallback-complex-typesPR Slash CommandsAirbyte Maintainers can execute the following slash commands on your PR:
|
Co-Authored-By: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a second fallback in
airbyte_message_to_string()to prevent unhandled exceptions (and consequent deadlocks in the concurrent source pipeline) when record data contains types that neitherorjsonnor stdlibjsoncan serialize.Problem: The existing fallback chain is
orjson.dumps()→json.dumps(). Butjson.dumps()can also raiseTypeErrorfor types like Pythoncomplexnumbers. When this unhandled exception occurs in a worker thread of the concurrent source pipeline, the main thread deadlocks waiting onqueue.get()because the worker silently dies.Fix: Wrap the
json.dumps()fallback in its own try/except, falling back tojson.dumps(serialized_message, default=str)which converts any remaining non-serializable values to their string representation.This is a last-resort path — it only activates when both
orjsonand stdlibjsonfail on the same message.Resolves https://github.com/airbytehq/oncall/issues/11654:
Related: airbytehq/airbyte#74883
Review & Testing Checklist for Human
default=strfallback silently converts non-serializable values to string representations (e.g.,complex(1,2)→"(1+2j)"). No additional warning is logged when this second fallback fires — only the original orjson warning is emitted. Decide if this is acceptable or if an extra log line should be added.except Exceptionis appropriate here: The inner catch is intentionally broad since this is a last-resort to prevent deadlocks. Confirm this doesn't mask errors that should propagate.source-google-search-consolewithcomplextype values inctr/positionfields. The test covers this path but an integration-level verification with the actual connector would provide higher confidence.Suggested test plan: Run
source-google-search-consolewith thesearch_analytics_by_querystream against a real account. Confirm the sync completes without deadlock and thatctr/positionvalues are present (even if stringified as a fallback).Notes
complexvalues entering the data flow was separately fixed in CDK PR airbytehq/airbyte-python-cdk#579 (Jinja interpolation rejecting complex types, merged in v6.54.5). This PR hardens the serialization layer as defense-in-depth against any future non-serializable type reaching this code path.orjson→jsonfallback was added in airbytehq/airbyte-python-cdk#210 (v6.16.1, by Artem Inzhyyants (@artem1205) / Maxime Carbonneau-Leclerc (@maxi297)).Link to Devin session: https://app.devin.ai/sessions/d8317c1f4ce64f70b5e807425b72aca2