Skip to content

Add SLO checks with a SQLAlchemy read/write workload#116

Open
vgvoleg wants to merge 6 commits into
mainfrom
add-slo-checks
Open

Add SLO checks with a SQLAlchemy read/write workload#116
vgvoleg wants to merge 6 commits into
mainfrom
add-slo-checks

Conversation

@vgvoleg

@vgvoleg vgvoleg commented Jun 17, 2026

Copy link
Copy Markdown
Member

What

Adds SLO (Service Level Objective) testing on top of ydb-platform/ydb-slo-action, following the ydb-python-sdk SLO example but expressed entirely in terms of SQLAlchemy.

Workload (tests/slo/)

A parallel read/write load generator driving the ydb_sqlalchemy dialect:

  • readSELECT ... WHERE object_id = :id for a random id;
  • writeUPSERT INTO ... VALUES (...) for a fresh id;
  • dedicated reader/writer thread pools plus a metrics thread;
  • every operation is wrapped in an idempotent retry loop, so transient errors injected by the action's chaos layer become latency rather than availability drops.

Two modes, selected by WORKLOAD_NAME / --mode:

mode read write
core Connection.execute(select()) Connection.execute(upsert())
orm Session.get(KeyValueRow, id) Session.execute(upsert()) + commit

Metrics are emitted via OTLP with names matching the action's default metrics.yaml (sdk_operations_total, sdk_operation_latency_p{50,95,99}_seconds, sdk_retry_attempts_total, ...).

Workflow (.github/workflows/slo.yml)

Runs on PRs labelled SLO:

  1. builds current (PR) and baseline (merge-base) workload images;
  2. runs ydb-slo-action/init@v2 for the core and orm workloads in parallel;
  3. publishes a current-vs-baseline comparison with ydb-slo-action/report@v2 and gates the PR on regressions.

The cluster is trimmed to fit a GitHub-hosted runner via disable_compose_profiles: extra-nodes (chaos and telemetry stay enabled).

How to run

Label this PR with SLO to trigger the checks. Locally:

python ./tests/slo/src create grpc://localhost:2136 /local --mode core
python ./tests/slo/src run    grpc://localhost:2136 /local --mode core --time 60

Notes

  • The in-run report job needs pull-requests: write, which same-repo PRs have. For fork PRs the report can be moved to a separate workflow_run-triggered workflow.
  • The workload source under tests/slo/ is outside the existing test/ lint scope, so it doesn't affect the style/tests workflows.

Introduce a parallel read/write SLO workload built on the ydb_sqlalchemy dialect (SQLAlchemy Core and ORM modes) and wire it into ydb-slo-action via a label-gated GitHub workflow.

- tests/slo: workload runner, Dockerfile, entrypoint, requirements, README
- .github/workflows/slo.yml: build current+baseline images, run init@v2 and publish report@v2 on PRs labelled "SLO"
@vgvoleg vgvoleg added the SLO Run SLO checks label Jun 17, 2026
vgvoleg added 2 commits June 17, 2026 12:48
The dialect integration tests now live in tests/integration/ alongside tests/slo/, so the repo no longer has both a test/ and a tests/ directory. tox.ini (lint + dialect pytest paths) and setup.cfg (profile_file) are updated accordingly.
@github-actions

Copy link
Copy Markdown

🌋 SLO Test Results

🟢 2 workload(s) tested — All thresholds passed

Commit: 86a7458 · View run

Workload Thresholds Duration Report
orm 🟢 OK 2m 5s 📄 Report
core 🟢 OK 2m 5s 📄 Report

Generated by ydb-slo-action

vgvoleg added 3 commits June 17, 2026 14:44
The dialect runs in AUTOCOMMIT, so each single-statement read/write already goes through the YDB SDK's retry_operation_sync inside ydb-dbapi. The workload now performs one attempt per operation and records any surfaced exception as a real SLO failure, instead of a broad app-level retry loop that masked non-retryable errors. Removes the now-unused timeout/max-retries flags.
Align workload_duration and read/write RPS with ydb-python-sdk's tests/slo workflow. extra-nodes stays disabled to fit a GitHub-hosted runner.
…cluster, 600s, 1000/100 rps)

Run the workload job on the large-runner-sqlalchemy self-hosted runner with the full YDB cluster (all compose profiles), and align workload_duration and read/write RPS with ydb-python-sdk's tests/slo workflow. The report job stays on ubuntu-latest.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

SLO Run SLO checks

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant