feat: Introduce logging context to Tangle #66

morgan-wowk · 2026-01-12T21:20:03Z

Issue
#67

Changes:

Adds trace context helpers
Add trace middleware to apply trace context to API requests
- This helps logging formatters pull trace ids out of context to set them in logs
- Sets the trace id on the response for client consumption
Wraps API requests with trace context using middleware
Passes trace id to pipeline executions using extras parameter
Wraps pipeline execution with trace context (using trace id from extras parameter, or creating a new one if None)

Test pipeline notes:

Requires ytest, starlette, httpx (e.g. pip3 install ...)

cloud_pipelines_backend/instrumentation/trace_middleware.py

cloud_pipelines_backend/orchestrator_sql.py

Ark-kun · 2026-01-13T00:44:51Z

Thank you for this PR.

I'm reviewing the changes and I think we might want to design this slightly differently.
I think that trace IDs are mostly relevant to API Server and API requests.
Orchestrator works autonomously, it's not triggered by API requests, so it does not have them or need them. Instead, the ExecutionNode.id and ContainerExecution.id already serve the same role.

On the other hand, API requests might benefit from trace IDs so that all logging messages that are generated when processing a single API call can be filtered and grouped together.

morgan-wowk · 2026-01-13T02:43:31Z

Thank you for this PR.

I'm reviewing the changes and I think we might want to design this slightly differently. I think that trace IDs are mostly relevant to API Server and API requests. Orchestrator works autonomously, it's not triggered by API requests, so it does not have them or need them. Instead, the ExecutionNode.id and ContainerExecution.id already serve the same role.

On the other hand, API requests might benefit from trace IDs so that all logging messages that are generated when processing a single API call can be filtered and grouped together.

This sounds good to me.

I will make some adjustments. Thanks!

morgan-wowk · 2026-01-19T23:12:40Z

Thank you for this PR.

I'm reviewing the changes and I think we might want to design this slightly differently. I think that trace IDs are mostly relevant to API Server and API requests. Orchestrator works autonomously, it's not triggered by API requests, so it does not have them or need them. Instead, the ExecutionNode.id and ContainerExecution.id already serve the same role.

On the other hand, API requests might benefit from trace IDs so that all logging messages that are generated when processing a single API call can be filtered and grouped together.

The work has been re-designed. Here is a summary:

Logging Context Improvements

Summary

Enhanced the Tangle orchestrator logging to automatically include execution context (execution_id, container_execution_id) in all log messages, making it easier to trace and filter logs for specific executions.

Changes Made

1. Added Logging Context to Orchestrator (`orchestrator_sql.py`)

Wrapped execution processing with logging_context.logging_context() to automatically attach execution IDs to all logs within that scope.

For Queued Executions:

# Set execution context for logging
with logging_context.logging_context(
    execution_id=queued_execution.id
):
    _logger.info("Before processing queued execution")
    # ... process execution ...
    _logger.info("After processing queued execution")

For Running Container Executions:

with logging_context.logging_context(
    execution_id=execution_id,
    container_execution_id=running_container_execution.id
):
    _logger.info("Before processing running container execution")
    # ... process container execution ...
    _logger.info("After processing running container execution")

2. Added Context-Aware Logging Configuration (`start_local.py`)

Created a custom formatter and filter that dynamically includes context fields in logs only when they're set.

class LoggingContextFilter(logging.Filter):
    """Adds contextual metadata to log records."""
    
    def filter(self, record: logging.LogRecord) -> bool:
        for key, value in get_all_context_metadata().items():
            if value is not None:
                setattr(record, key, value)
        return True


class ContextAwareFormatter(logging.Formatter):
    """Formatter that dynamically includes context fields only when they're set."""
    
    def format(self, record: logging.LogRecord) -> str:
        base_format = "%(asctime)s [%(levelname)s] %(name)s"
        
        # Collect context fields that are present
        context_parts = []
        context_metadata = get_all_context_metadata()
        for key, value in context_metadata.items():
            if value is not None and hasattr(record, key):
                context_parts.append(f"{key}={value}")
        
        # Add context to format if any exists
        if context_parts:
            base_format += " [" + " ".join(context_parts) + "]"
        
        base_format += ": %(message)s"
        
        formatter = logging.Formatter(base_format)
        return formatter.format(record)

3. Cleaned Up Redundant IDs in Log Messages

Removed execution IDs from log messages where they're already present in the logging context, eliminating duplication.

Examples of cleaned up logs:

# Before: ID appears twice (in context and message)
_logger.info(f"Before processing {queued_execution.id=}")

# After: ID only in context
_logger.info("Before processing queued execution")

# Before
_logger.info(f"Container execution {container_execution.id} is now in state {new_status}")

# After
_logger.info(f"Container execution is now in state {new_status}")

# Before
_logger.info(f"Terminating container execution {container_execution.id}.")

# After
_logger.info("Terminating container execution.")

Log Output Examples

Without Context (non-orchestrator logs)

2026-01-19 14:40:52,611 [INFO] start_local: Starting the orchestrator
2026-01-19 14:40:52,620 [INFO] uvicorn.error: Application startup complete.

With Execution Context Only

2026-01-19 14:30:13,650 [INFO] cloud_pipelines_backend.orchestrator_sql [execution_id=019bd8612bb80ea96aba]: Before processing queued execution
2026-01-19 14:30:13,674 [INFO] cloud_pipelines_backend.orchestrator_sql [execution_id=019bd8612bb80ea96aba]: Execution will reuse the old_execution.id='019b3428a0f1e29c82d4'
2026-01-19 14:30:13,681 [INFO] cloud_pipelines_backend.orchestrator_sql [execution_id=019bd8612bb80ea96aba]: After processing queued execution

With Both Execution and Container Execution Context

2026-01-19 14:30:14,687 [INFO] cloud_pipelines_backend.orchestrator_sql [execution_id=019bd8612bb888cc61d4 container_execution_id=019b342ab120726fdee2]: Before processing running container execution
2026-01-19 14:30:14,701 [INFO] cloud_pipelines_backend.orchestrator_sql [execution_id=019bd8612bb888cc61d4 container_execution_id=019b342ab120726fdee2]: Container execution is now in state RUNNING (was PENDING)
2026-01-19 14:30:14,713 [INFO] cloud_pipelines_backend.orchestrator_sql [execution_id=019bd8612bb888cc61d4 container_execution_id=019b342ab120726fdee2]: After processing running container execution

Benefits

Easy Log Filtering: Filter logs by execution_id or container_execution_id in log aggregation tools
Better Traceability: Follow an execution through its entire lifecycle
No Duplication: IDs appear once in context, not repeated in every message
Dynamic Context: Formatter automatically includes any context fields without code changes
Cleaner Messages: Log messages are more readable without ID clutter

Extensibility

The context system is fully extensible. You can add any arbitrary metadata:

with logging_context.logging_context(
    execution_id=execution.id,
    container_execution_id=container.id,
    pipeline_run_id=pipeline.id,  # Would work if added
    user_id=user.id,               # Would work if added
    custom_field="value"           # Any custom field works
):
    _logger.info("Processing with custom context")

All fields automatically appear in logs:

[INFO] module [execution_id=123 container_execution_id=456 pipeline_run_id=789 user_id=admin custom_field=value]: Processing with custom context

Files Modified

tangle/cloud_pipelines_backend/orchestrator_sql.py - Added logging context to execution processing
tangle/start_local.py - Added LoggingContextFilter and ContextAwareFormatter

Backward Compatibility

✅ All changes are backward compatible
✅ Logs without context still work (no context fields shown)
✅ Existing log parsing tools continue to work
✅ All tests pass

cloud_pipelines_backend/orchestrator_sql.py

start_local.py

cloud_pipelines_backend/orchestrator_sql.py

start_local.py

cloud_pipelines_backend/instrumentation/contextual_logging.py

cloud_pipelines_backend/instrumentation/api_tracing.py

cloud_pipelines_backend/orchestrator_sql.py

cloud_pipelines_backend/instrumentation/contextual_logging.py

Ark-kun

Thank you. This looks pretty good.
I've left some comments and approved it.

Ark-kun · 2026-01-27T01:19:40Z

cloud_pipelines_backend/orchestrator_sql.py

                    )
-                session.commit()
-            finally:
-                duration_ms = int((time.monotonic_ns() - start_timestamp) / 1_000_000)


Hmm. Are you sure we no longer need to log processing times?

I restored duration. It should have been removed yet. Maybe in the future when we have a replacement solution

Thanks for pointing this out

Ark-kun · 2026-01-27T01:23:26Z

cloud_pipelines_backend/orchestrator_sql.py

+
+            # Set execution context for logging (includes container_execution_id)
+            # Get first execution_node_id for context (there may be multiple nodes using same container)
+            execution_nodes = running_container_execution.execution_nodes


Note that you're slightly changing the logic here by making SQL queries outside the try/except context. If an exception occurs, it will go up the stack.

You might want to swap try and with contexts.

Also, AFAIK, the code now always queries the ExecutionNodes DB table whereas before it would only do that when needed. This is likely not affecting perf too much but I'd like you to understand the changes you're making (basically, you add and extra DB query to extract more information from the DB for the purpose of logging it.).

I will review this area again. My editor during development of this portion was not behaving very well, telling me the code was wrong (showing warnings) and I remember it was difficult for me satisfy. Maybe, this lead to this difference in behaviour.

Also, AFAIK, the code now always queries the ExecutionNodes DB table whereas before it would only do that when needed. This is likely not affecting perf too much but I'd like you to understand the changes you're making (basically, you add and extra DB query to extract more information from the DB for the purpose of logging it.).

Thank you. I do understand this.

I'm curious, would you prefer to take any other approach such as passing metadata around differently to avoid a DB call? Open to any better ideas.

Thanks for pairing. I have implemented the changes we discussed:

Singular with block
with contextual_logging.logging_context( container_execution_id=running_container_execution.id, execution_node_ids=execution_node_ids, ):

Moving the execution_node_ids select we wrote above / outside the try catch

Also:

Restored duration tracking

**Changes:** * Adds logging context helpers * Add request middleware to generate unique request id and set it in the logging context around API requests * Sets the x-tangle-request-id on the response for client consumption

maxy-shpfy requested a review from Ark-kun January 12, 2026 22:55

morgan-wowk force-pushed the trace-id-support branch from 45fca00 to c08c8f2 Compare January 12, 2026 23:11

morgan-wowk commented Jan 12, 2026

View reviewed changes

cloud_pipelines_backend/instrumentation/trace_middleware.py Outdated Show resolved Hide resolved

Ark-kun reviewed Jan 13, 2026

View reviewed changes

cloud_pipelines_backend/orchestrator_sql.py Outdated Show resolved Hide resolved

Ark-kun self-assigned this Jan 13, 2026

morgan-wowk force-pushed the trace-id-support branch from c08c8f2 to e824f64 Compare January 19, 2026 20:33

morgan-wowk changed the title ~~feat: Introduce trace ids to Tangle~~ feat: Introduce logging context to Tangle Jan 19, 2026

morgan-wowk force-pushed the trace-id-support branch 8 times, most recently from 2bfcc23 to 4e7b1a3 Compare January 19, 2026 23:01

Ark-kun reviewed Jan 20, 2026

View reviewed changes

cloud_pipelines_backend/orchestrator_sql.py Show resolved Hide resolved

morgan-wowk commented Jan 20, 2026

View reviewed changes

start_local.py Outdated Show resolved Hide resolved