Skip to content

Compute worker issues #1205

@Didayolo

Description

@Didayolo

Hopefully solving several points: #2223

Containers not removed

  • 11/02/2026: submissions containers staying up forever
Image

Wrong log when storage is full

When docker pull fails because of full storage, we have no clear logs.
See:

Then it gets stuck in Running state.

Progress bar

Related: show_progress and the progress bar adds up to the mess:

  • Make show_progress() more robust (not treating missing keys as errors)

Sometimes also gives lot of errors like this:

2026-02-28 02:38:37.854 | ERROR    | compute_worker:show_progress:137 - There was an error showing the progress bar
2026-02-28 02:38:37.854 | ERROR    | compute_worker:show_progress:138 - 6
2026-02-28 02:38:37.955 | ERROR    | compute_worker:show_progress:137 - There was an error showing the progress bar
2026-02-28 02:38:37.955 | ERROR    | compute_worker:show_progress:138 - 1

Logs

  • Sometimes no submission logs
  • Add logs at the start of submission container with metadata of the competition and submission
  • Add a clear log in the computer worker container with the competition title when receiving a submission
  • Similarly to other problems reported, sometimes we only have "Time limit exceeded" and no other logs (e.g. Stuck at "Preparing submission... this may take a few moments.." #1994)

No space left

How to manage the disks? Should we limit docker images size?

Submissions not marked as Failed

Submissions stuck in "Running" or "Scoring" or status

Related issues:

Example failure during "Preparing":

[2025-09-18 11:25:05,234: ERROR/ForkPoolWorker-2] Task compute_worker_run[fd956bf5-3e2d-4168-ab48-f0896dc80993] raised unexpected: OSError(28, 'No space left on device')
Traceback (most recent call last):
[...]
OSError: [Errno 28] No space left on device

Duplication of submission files

To check

The log level is defined in this way in compute_worker.py:

configure_logging(
    os.environ.get("LOG_LEVEL", "INFO"), os.environ.get("SERIALIZED", "false")
)

Generally we want as much log as possible, so we may want to be in "DEBUG" log level.



Directory structure problem

Docker pull failing

  • Docker pull failing
Pull for image: codalab/codalab-legacy:py39 returned a non-zero exit code! Check if the docker image exists on docker hub.

Related issues:

Solution:

Logs at the wrong place

No hostname in server status when status is "Preparing"

https://www.codabench.org/server_status

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions