Skip to content

feat: add ddtrace, in-memory cache, and cache TTL to source-declarative-manifest image#932

Open
devin-ai-integration[bot] wants to merge 15 commits intomainfrom
devin/1772724202-add-ddtrace-profiling-support
Open

feat: add ddtrace, in-memory cache, and cache TTL to source-declarative-manifest image#932
devin-ai-integration[bot] wants to merge 15 commits intomainfrom
devin/1772724202-add-ddtrace-profiling-support

Conversation

@devin-ai-integration
Copy link
Contributor

@devin-ai-integration devin-ai-integration bot commented Mar 5, 2026

Summary

Adds ddtrace (Datadog's tracing/profiling library) to the source-declarative-manifest Docker image so that manifest-only connectors (like Twilio) can be built with Datadog memory profiling support. The library is installed but inactive at runtime unless profiling is explicitly enabled via environment variables (e.g. DD_PROFILING_ENABLED=true).

Also adds an AIRBYTE_USE_IN_MEMORY_CACHE env var to HttpClient that forces requests_cache to use in-memory SQLite instead of file-based SQLite. This prevents OS page cache growth from file I/O, which Kubernetes counts as container memory (container_memory_working_set_bytes).

Additionally sets a 1-hour TTL (expire_after=3600) on all CachedLimiterSession instances to prevent unbounded cache growth during long-running syncs.

This is the CDK-side prerequisite for enabling Datadog memory profiling on manifest-only connectors. A companion change in the airbyte repo (#74308) modifies the connector Dockerfile to use ddtrace-run as the entrypoint wrapper and inject the required env vars.

Updates since last revision

  • Added expire_after=3600 (1-hour TTL) to HTTP response cache: All cached responses now expire after 1 hour. This bounds cache growth during long syncs. Previously cached entries had no expiration and could accumulate indefinitely.

Previous updates

  • Reverted jemalloc LD_PRELOAD: jemalloc did not reduce container memory growth and broke ddtrace profiling (profiling data disappeared entirely). Removed libjemalloc2 installation and LD_PRELOAD from the Dockerfile.
  • Added AIRBYTE_USE_IN_MEMORY_CACHE env var: Investigation using kubernetes.memory.rss vs kubernetes.memory.usage metrics revealed that the container's high memory usage (~2 GB) was not a Python memory leak. RSS stayed flat at ~300 MB while usage (which includes OS page cache) grew to 2 GB. The ~1.7 GB gap is kernel page cache from SQLite file I/O in requests_cache. The new env var forces in-memory SQLite to verify this theory and potentially eliminate the page cache growth.
  • Reverted ddtrace from >=3,<4 back to >=2.16,<3: Testing with ddtrace v3.19.6 showed heap profiling working but reporting ~70 MB for a connector using ~1 GB container memory.
  • Pinned ddtrace from >=3,<5 to >=3,<4: ddtrace v4.5.1 heap profiler reported <3 MB due to API changes in _memalloc.heap().
  • Upgraded ddtrace from >=2.16,<3 to >=3,<5: v2.x profiling stack collector references _PyThread_CurrentExceptions, removed in CPython 3.13.

Review & Testing Checklist for Human

  • expire_after=3600 impact on parent-child stream syncs: This TTL applies globally to ALL connectors using the cache, not just Twilio. If a parent stream's cached responses are consumed by child streams and processing takes >1 hour, the parent data will be evicted mid-sync and need to be re-fetched. Verify that no critical connectors have parent streams whose data is accessed by child streams over a span exceeding 1 hour.
  • AIRBYTE_USE_IN_MEMORY_CACHE impact on process memory: Forcing in-memory SQLite means cached HTTP responses are held in process heap instead of on disk. For connectors with many cached parent stream requests, this could increase Python process RSS. Verify that the trade-off (lower working_set_bytes but potentially higher RSS) is acceptable.
  • Shared in-memory SQLite concurrency: The in-memory path uses file::memory:?cache=shared, which shares the cache across connections in the same process. Verify this doesn't cause database table is locked errors under concurrent access (the file-based path uses fast_save=True and WAL mode to mitigate this).
  • Python 3.13 compatibility of _memalloc in ddtrace v2: ddtrace v2 does not officially support Python 3.13. The stack collector is disabled via DD_PROFILING_STACK_ENABLED=false in the companion PR, but the _memalloc heap profiling C extension has not been verified on Python 3.13.
  • End-to-end verification: Deploy a connector built with these changes and the companion airbyte PR. Confirm: (1) Profiling data appears in Datadog (or fails gracefully if v2 doesn't work on Python 3.13), (2) kubernetes.memory.usage stays significantly lower with AIRBYTE_USE_IN_MEMORY_CACHE=true compared to file-based cache, (3) kubernetes.memory.rss does not increase substantially (i.e., cached data in-memory doesn't push RSS beyond the previous ~300 MB baseline), (4) Syncs that rely on parent stream caching complete successfully even with the 1-hour TTL.

Notes

  • Root cause confirmed via sandbox testing: With AIRBYTE_USE_IN_MEMORY_CACHE=true, kubernetes.memory.usage now matches kubernetes.memory.rss (~300 MB), confirming that the previous ~1.7 GB gap was OS page cache from SQLite file I/O, not a Python memory leak.
  • 1-hour TTL rationale: Prevents unbounded cache growth during long syncs. Most parent-child stream patterns complete within 1 hour per parent stream slice, so this should not impact normal operation.
  • The pyproject.toml manifest-server extra still declares ddtrace = { version = "^3", optional = true } — this Dockerfile explicitly overrides that to v2 for testing.
  • Requested by: gl_anatolii.yatsuk
  • Devin Session

…g profiling support

Co-Authored-By: gl_anatolii.yatsuk <gl_anatolii.yatsuk@airbyte.io>
@devin-ai-integration
Copy link
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@github-actions
Copy link

github-actions bot commented Mar 5, 2026

👋 Greetings, Airbyte Team Member!

Here are some helpful tips and reminders for your convenience.

💡 Show Tips and Tricks

Testing This CDK Version

You can test this version of the CDK using the following:

# Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@devin/1772724202-add-ddtrace-profiling-support#egg=airbyte-python-cdk[dev]' --help

# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch devin/1772724202-add-ddtrace-profiling-support

PR Slash Commands

Airbyte Maintainers can execute the following slash commands on your PR:

  • /autofix - Fixes most formatting and linting issues
  • /poetry-lock - Updates poetry.lock file
  • /test - Runs connector tests with the updated CDK
  • /prerelease - Triggers a prerelease publish with default arguments
  • /poe build - Regenerate git-committed build artifacts, such as the pydantic models which are generated from the manifest JSON schema in YAML.
  • /poe <command> - Runs any poe command in the CDK environment
📚 Show Repo Guidance

Helpful Resources

📝 Edit this welcome message.

@github-actions
Copy link

github-actions bot commented Mar 5, 2026

PyTest Results (Fast)

3 918 tests  +49   3 900 ✅ +43   6m 30s ⏱️ -21s
    1 suites ± 0      18 💤 + 6 
    1 files   ± 0       0 ❌ ± 0 

Results for commit 6eee546. ± Comparison against base commit 7f41401.

This pull request skips 6 tests.
unit_tests.sources.streams.http.test_http ‑ test_that_response_was_cached
unit_tests.sources.streams.http.test_http ‑ test_using_cache
unit_tests.sources.streams.http.test_http_client ‑ test_request_session_returns_valid_session[False-LimiterSession]
unit_tests.sources.streams.http.test_http_client ‑ test_request_session_returns_valid_session[True-CachedLimiterSession]
unit_tests.sources.streams.http.test_http_client ‑ test_that_response_was_cached
unit_tests.sources.streams.test_call_rate.TestHttpStreamIntegration ‑ test_with_cache

♻️ This comment has been updated with latest results.

@tolik0
Copy link
Contributor

Anatolii Yatsuk (tolik0) commented Mar 5, 2026

/prerelease

Prerelease Job Info

This job triggers the publish workflow with default arguments to create a prerelease.

Prerelease job started... Check job output.

✅ Prerelease workflow triggered successfully.

View the publish workflow run: https://github.com/airbytehq/airbyte-python-cdk/actions/runs/22725667788

ddtrace v2.x profiling stack collector references _PyThread_CurrentExceptions
which was removed in CPython 3.13. This causes profiling to silently fail
(tracing works but profiles are never sent to Datadog).

Upgrading to ddtrace>=3,<5 fixes Python 3.13 profiling support.

Co-Authored-By: gl_anatolii.yatsuk <gl_anatolii.yatsuk@airbyte.io>
@tolik0
Copy link
Contributor

Anatolii Yatsuk (tolik0) commented Mar 5, 2026

/prerelease

Prerelease Job Info

This job triggers the publish workflow with default arguments to create a prerelease.

Prerelease job started... Check job output.

✅ Prerelease workflow triggered successfully.

View the publish workflow run: https://github.com/airbytehq/airbyte-python-cdk/actions/runs/22731250562

Co-Authored-By: gl_anatolii.yatsuk <gl_anatolii.yatsuk@airbyte.io>
@tolik0
Copy link
Contributor

Anatolii Yatsuk (tolik0) commented Mar 6, 2026

/prerelease

Prerelease Job Info

This job triggers the publish workflow with default arguments to create a prerelease.

Prerelease job started... Check job output.

✅ Prerelease workflow triggered successfully.

View the publish workflow run: https://github.com/airbytehq/airbyte-python-cdk/actions/runs/22772097503

devin-ai-integration bot and others added 2 commits March 10, 2026 15:42
…results

Co-Authored-By: gl_anatolii.yatsuk <gl_anatolii.yatsuk@airbyte.io>
…gmentation

Co-Authored-By: gl_anatolii.yatsuk <gl_anatolii.yatsuk@airbyte.io>
@tolik0
Copy link
Contributor

Anatolii Yatsuk (tolik0) commented Mar 10, 2026

/prerelease

Prerelease Job Info

This job triggers the publish workflow with default arguments to create a prerelease.

Prerelease job started... Check job output.

✅ Prerelease workflow triggered successfully.

View the publish workflow run: https://github.com/airbytehq/airbyte-python-cdk/actions/runs/22926637020

devin-ai-integration bot and others added 2 commits March 11, 2026 14:20
…emory

Co-Authored-By: gl_anatolii.yatsuk <gl_anatolii.yatsuk@airbyte.io>
…te cache

When set to 'true', forces requests_cache to use in-memory SQLite instead of
file-based SQLite. This avoids OS page cache growth from file I/O, which
Kubernetes counts as container memory (container_memory_working_set_bytes).

Co-Authored-By: gl_anatolii.yatsuk <gl_anatolii.yatsuk@airbyte.io>
@devin-ai-integration devin-ai-integration bot changed the title feat: install ddtrace in source-declarative-manifest image for Datadog profiling support feat: add ddtrace and in-memory cache option to source-declarative-manifest image Mar 11, 2026
@tolik0
Copy link
Contributor

Anatolii Yatsuk (tolik0) commented Mar 11, 2026

/prerelease

Prerelease Job Info

This job triggers the publish workflow with default arguments to create a prerelease.

Prerelease job started... Check job output.

✅ Prerelease workflow triggered successfully.

View the publish workflow run: https://github.com/airbytehq/airbyte-python-cdk/actions/runs/22958287696

@github-actions
Copy link

github-actions bot commented Mar 11, 2026

PyTest Results (Full)

3 921 tests   3 903 ✅  10m 31s ⏱️
    1 suites     18 💤
    1 files        0 ❌

Results for commit 6eee546.

♻️ This comment has been updated with latest results.

Co-Authored-By: gl_anatolii.yatsuk <gl_anatolii.yatsuk@airbyte.io>
@devin-ai-integration devin-ai-integration bot changed the title feat: add ddtrace and in-memory cache option to source-declarative-manifest image feat: add ddtrace, in-memory cache, and cache TTL to source-declarative-manifest image Mar 11, 2026
Co-Authored-By: gl_anatolii.yatsuk <gl_anatolii.yatsuk@airbyte.io>
@tolik0
Copy link
Contributor

Anatolii Yatsuk (tolik0) commented Mar 11, 2026

/prerelease

Prerelease Job Info

This job triggers the publish workflow with default arguments to create a prerelease.

Prerelease job started... Check job output.

✅ Prerelease workflow triggered successfully.

View the publish workflow run: https://github.com/airbytehq/airbyte-python-cdk/actions/runs/22961776456

requests_cache uses lazy expiration: expired entries are only removed when
re-accessed, not automatically deleted. For connectors making thousands of
unique API calls (paginated endpoints), expired entries accumulate in the
SQLite database indefinitely, causing unbounded memory growth.

This adds a _purge_expired_cache_entries() method that is called every 100
requests to actively delete expired entries and reclaim memory. Combined
with the existing expire_after=3600 TTL, this ensures the cache stays
bounded to approximately 1 hour of data instead of growing indefinitely.

Co-Authored-By: gl_anatolii.yatsuk <gl_anatolii.yatsuk@airbyte.io>
@tolik0
Copy link
Contributor

Anatolii Yatsuk (tolik0) commented Mar 11, 2026

/prerelease

Prerelease Job Info

This job triggers the publish workflow with default arguments to create a prerelease.

Prerelease job started... Check job output.

✅ Prerelease workflow triggered successfully.

View the publish workflow run: https://github.com/airbytehq/airbyte-python-cdk/actions/runs/22973317414

Reduces expire_after from 3600 to 600 seconds. The 1-hour TTL was still
allowing too much data to accumulate in the in-memory SQLite cache,
causing container OOM at 2 GB. With 10-minute TTL + periodic purging,
the cache should stay much smaller.

Co-Authored-By: gl_anatolii.yatsuk <gl_anatolii.yatsuk@airbyte.io>
@devin-ai-integration
Copy link
Contributor Author

/prerelease

…sponse caching

Co-Authored-By: gl_anatolii.yatsuk <gl_anatolii.yatsuk@airbyte.io>
devin-ai-integration bot and others added 3 commits March 12, 2026 15:08
…ching for testing

Co-Authored-By: gl_anatolii.yatsuk <gl_anatolii.yatsuk@airbyte.io>
Co-Authored-By: gl_anatolii.yatsuk <gl_anatolii.yatsuk@airbyte.io>
…bugging

Co-Authored-By: gl_anatolii.yatsuk <gl_anatolii.yatsuk@airbyte.io>
@devin-ai-integration
Copy link
Contributor Author

/prerelease

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants