Skip to content

feat(orb-jobs-agent): allow local jobs execution#1022

Open
pophilpo wants to merge 73 commits intomainfrom
pophilpo/local-jobs-agent-tests
Open

feat(orb-jobs-agent): allow local jobs execution#1022
pophilpo wants to merge 73 commits intomainfrom
pophilpo/local-jobs-agent-tests

Conversation

@pophilpo
Copy link
Copy Markdown
Contributor

@pophilpo pophilpo commented Feb 25, 2026

This PR will enable us to test new jobs for orb-jobs-agent easily.

Before this PR, in order to make a true e2e test, we had to make a PR in the backend repo, that will allow us to send the new job from fleet-cmdr.

This introduces some branching in the code, but does not affect the production orb-jobs-agent as a service logic.

With new changes introduced here, we will be able to build the orb-jobs-agent binary, move it to the orb and run a single job.

./orb-jobs-agent --run-job "read_file /usr/persistent/versions.json" # refer to specific handler info on the command format

It will mimic the whole job execution flow, with adding a job to a queue, requesting that job for execution, sending JobExecutionUpdates all that stuff.

Here's an example output for the command above:

2026-02-25T13:24:00.370186Z  INFO orb_jobs_agent: Starting jobs agent: Args { config: None, orb_id: None, orb_platform: None, orb_token: None, relay_host: None, relay_namespace: Some("jobs"), target_service_id: Some("job-server"), dbus_addr: "unix:path=/tmp/worldcoin_bus_socket", run_job: Some("read_file /usr/persistent/versions.json") }
2026-02-25T13:24:00.372800Z  INFO zenoh::net::runtime: Using ZID: fcd0c45a5aeb96f71df9921d28ff777
2026-02-25T13:24:00.374799Z  INFO orb_jobs_agent::job_system::client: Successfully requested additional job for parallel execution
2026-02-25T13:24:00.374822Z  INFO orb_jobs_agent::job_system::handler: Successfully requested initial job
2026-02-25T13:24:00.374832Z  INFO orb_jobs_agent::job_system::client: received local JobExecution job_id=local-job job_execution_id=local-job-execution job_document=read_file /usr/persistent/versions.json should_cancel=false
2026-02-25T13:24:00.374845Z  INFO orb_jobs_agent::job_system::handler: Processing job job_execution_id=local-job-execution job_id=local-job
2026-02-25T13:24:00.374904Z  INFO orb_jobs_agent::job_system::handler: executing job job_execution_id="local-job-execution" command="read_file" args="/usr/persistent/versions.json"
2026-02-25T13:24:00.374970Z  INFO handler: orb_jobs_agent::handlers::read_file: Reading file: /usr/persistent/versions.json for job local-job-execution
{"job_id":"local-job","job_execution_id":"local-job-execution","status":3,"std_out":"{\"releases\":{\"slot_a\":\"to-0.0.0-21e98c2-diamond-dev\",\"slot_b\":\"0.0.0-21e98c2-diamond-dev\"},\"slot_a\":{\"jetson\":{\"boot\":\"idk2\",\"sec_mcu\":\"idk2\",\"main_mcu\":\"idk2\",\"system\":\"idk2\"}},\"slot_b\":{\"jetson\":{\"main_mcu\":\"idk2\",\"boot\":\"idk2\",\"sec_mcu\":\"idk2\",\"system\":\"idk2\"}},\"singles\":{\"jetson\":{\"capsule\":\"idk2\"}}}","std_err":""}
2026-02-25T13:24:00.375142Z  INFO orb_jobs_agent::job_system::handler: Job completed job_execution_id=local-job-execution job_id=local-job status=Succeeded
2026-02-25T13:24:00.375163Z  INFO orb_jobs_agent::job_system::client: Successfully requested additional job for parallel execution
2026-02-25T13:24:00.375173Z  INFO orb_jobs_agent::job_system::handler: Requested additional job after job completion job_execution_id=local-job-execution
2026-02-25T13:24:00.375150Z  INFO orb_jobs_agent::job_system::handler: Relay service shutdown detected
2026-02-25T13:24:00.375263Z  INFO zenoh::api::session: close session zid=fcd0c45a5aeb96f71df9921d28ff777
2026-02-25T13:24:00.376877Z  INFO orb_jobs_agent: Shutting down jobs agent completed

Tested on the orb (both local and remote execution)

  • yes (diamond only)

@pophilpo pophilpo requested a review from vmenge February 25, 2026 13:56
@pophilpo pophilpo requested a review from a team as a code owner February 25, 2026 13:56
@AlexKaravaev
Copy link
Copy Markdown
Contributor

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3eb7fae74d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

pophilpo and others added 23 commits February 25, 2026 15:43
The option --progress-frequency is required
## context
i previously implemented serde of Wifi profiles using the same types as
the ones used for the `NetworkManager` abstraction. this smelled bad and
after weeks of feeling the stink of my own code i could take it no
longer
<img
src="https://media0.giphy.com/media/v1.Y2lkPTc5MGI3NjExbXpxcXlic20yZWE4ZjhuaGRodXgxbnJxN29rNXBudzBmZTA3cnZ3NiZlcD12MV9pbnRlcm5hbF9naWZfYnlfaWQmY3Q9Zw/yuVBAJGT3rm5Mn7VZg/giphy.gif"
width="480" height="360">

## changes
- uses `StoredWifiProfile` and `StoredWifiSec` for serialization so we
don't accidentally break deserialization due to changes in
`NetworkManager` abstraction
- adds a backcompat to make sure ciborium can deser `StoredWifiProfile`
from serialized `WifiProfile`s

## todo
- [x] test on an orb
Update the config throttle to publish every 90 seconds
before the orb_config didn't use the file values
adds `-o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null` so
that scripts do not fail if the host is unknown
This pull request introduces a new Bash script,
`scripts/upload-keys.sh`, which automates the upload of Orb key material
and certificates to the backend. The script provides robust error
handling, flexible options for environment and authentication, and
supports both normal and dry-run modes. The most important changes are
grouped below:

**Script functionality and robustness:**

* Added a comprehensive Bash script (`scripts/upload-keys.sh`) for
uploading Orb keys and certificates, including error handling, input
validation, and usage instructions.
* Implemented support for both production and staging environments, with
environment selection via command-line options or environment variables.
* Provided dry-run mode to generate and write JSON payloads without
making network requests, aiding testing and debugging.

**Payload generation and upload:**

* Added functions for generating payloads for attestation, signup,
chipid keys, and certificates, including base64 encoding and JSON
formatting for backend compatibility.
* Integrated Cloudflared authentication for secure backend access and
automated curl POST requests to relevant API endpoints.
This pull request introduces several improvements and new features to
the hardware-in-the-loop (HIL) test suite, focusing on relay board
support, OTA result reporting, and reliability enhancements. The most
significant change is the addition of support for Numato USB relay
boards alongside the existing USB HID relay boards, unified under a new
`Relay` abstraction. OTA result reporting is now more detailed and
user-friendly, and recovery pin handling during device reboot is more
robust. Several minor improvements and refactors are included to support
these changes.

Relay board support and abstraction:

* Added support for Numato USB relay boards (serial protocol), alongside
USB HID relay boards, using a unified `Relay` type and driver
abstraction in `hil/src/relay.rs` and updated configuration in
`hil/src/orb.rs`.
[[1]](diffhunk://#diff-ec28b07638c10f082354a7d11760c38b89fd7ed2155c2b4eb9c8affb58292054L1-R16)
[[2]](diffhunk://#diff-8906451cd7ad39c7be0706afd4805c58409a5b7155fa5676fd0418a84bba38bfL27-R33)
[[3]](diffhunk://#diff-8906451cd7ad39c7be0706afd4805c58409a5b7155fa5676fd0418a84bba38bfL71-R81)
[[4]](diffhunk://#diff-8906451cd7ad39c7be0706afd4805c58409a5b7155fa5676fd0418a84bba38bfL158-R190)
* Updated relay channel handling and validation to accommodate both
1-indexed (USB HID) and 0-indexed (Numato) protocols, with improved
documentation and error messages.
[[1]](diffhunk://#diff-ec28b07638c10f082354a7d11760c38b89fd7ed2155c2b4eb9c8affb58292054L1-R16)
[[2]](diffhunk://#diff-8906451cd7ad39c7be0706afd4805c58409a5b7155fa5676fd0418a84bba38bfL71-R81)
[[3]](diffhunk://#diff-8906451cd7ad39c7be0706afd4805c58409a5b7155fa5676fd0418a84bba38bfL158-R190)

OTA result reporting improvements:

* Refactored OTA result reporting to print detailed summaries, including
boot logs, hardware states, and statuses for each test step, replacing
the previous file listing with a more readable summary.

Recovery pin handling and reboot reliability:

* Improved recovery pin handling during device reboot by holding the pin
in normal boot mode for the entire boot process, releasing it only after
the device is confirmed online, and using a background thread for pin
control.
[[1]](diffhunk://#diff-ea3806106fed5cd38c7fb89471973b5dd24656ee7c4c3dc99b3ba9043eb7cfe7L29-R53)
[[2]](diffhunk://#diff-ea3806106fed5cd38c7fb89471973b5dd24656ee7c4c3dc99b3ba9043eb7cfe7L71-R69)
[[3]](diffhunk://#diff-ea3806106fed5cd38c7fb89471973b5dd24656ee7c4c3dc99b3ba9043eb7cfe7R89-R98)
* Added a brief delay after setting the recovery pin to normal boot mode
to prevent power down when using FTDI.

Minor improvements and refactors:

* Increased login prompt timeout and updated the matching pattern for
improved robustness.
* Updated imports and cleaned up unused code in relay and reboot
modules.
[[1]](diffhunk://#diff-ea3806106fed5cd38c7fb89471973b5dd24656ee7c4c3dc99b3ba9043eb7cfe7L1-R1)
[[2]](diffhunk://#diff-8906451cd7ad39c7be0706afd4805c58409a5b7155fa5676fd0418a84bba38bfR8)

These changes collectively enhance hardware compatibility, reliability,
and user experience in the HIL test suite.

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds support for the binary parsing of the ObjectAttributes as well as
rudimentary support for the .extra.raw files.

A sample CLI is provided, try running it with:

`RUST_BACKTRACE=1 cargo run -p orb-se050 -- --data 60000001.extra.raw`
add 2 more HILs to run GH jobs
This pull request updates the `orb-hil` package to a new version and
ensures the correct checksum is used for the new release.

Version and checksum update:

* Updated the `version` field in `orb-hil.nix` from `"0.0.2-beta.16"` to
`"0.0.2-beta.17"`, and replaced the `sha256` checksum to match the new
release.
it will replace a part of testing code from orb-os workflows
This pull request updates the pin controller configuration for the
`orb-hil` device in the `worldcoin-hil-munich-8` machine setup. The main
change is updating the `serial_path` to reference a different USB
device.

Pin controller configuration update:

* Changed the `serial_path` value in
`nix/machines/worldcoin-hil-munich-8/configuration.nix` to use
`/dev/serial/by-id/usb-FTDI_FT232R_USB_UART_BG010290-if00-port0` instead
of the previous device, ensuring the configuration points to the correct
hardware.
Removes the stale clientkey, that is not used. It will be removed from
orb-os as well
sky-mart and others added 29 commits April 2, 2026 13:20
safer work with OrbConfig - no risk that someone forgets to use the file
when adding a new command
Implements serde::{Serialize, Deserialize} on OrbId.
will eventually save some time during tests
…1108)

- use correct timezones
- install packages system-wide
## context
needed by `orb-connd` for modem self-healing when the modem is
blacklisted by `ModemManager`

## todo
- [x] test on an orb
#1109)

this PR refactors task usage in `connd` to rely on `speare` for easier
task management, free restart / backoff logic and a free broker for
named channels between tasks.

it also publishes new data on the OES, and introduces modem self-healing

## new
- modem self-healing (powercycle whenever it is blacklisted by modem
manager)
- `fw_revision` field on `CellularStatus`
- `CellularStatus` now published on OES
- `ConndReport` and `ActiveConnections` both simplified to publish based
on `net-state` event published internally on `speare` broker
- `NetStats` now on OES
- Datadog reporter now reporting usage for eth / wwan / wlan instead of
only wlan
- logging of number of wifi profiles on startup to help debug potential
issues

## tested on an orb
yes

## do not merge
[until this PR is
merged](#1094) (soon, i
just need to unclankerfy it)
useful for nfsboot

Co-authored-by: Ryan Butler <thebutlah@gmail.com>
## problem
when using the `connect_to_wifi` functionality in `orb-connd`, we udpate
the profile's priority to be the highest (so on further restarts the orb
always connects to it first). we do so by deleting, then re-creating the
profile. when re-creating the profile i forgot to set `.persist()` to
make sure we save it to disk.

## fix
the actual fix is a single line fix in `orb-connd/src/service/dbus.rs`.
everything else were changes required to have a proper regression test
for this. a lot of plumbing for restarting docker and keeping the same
temporary directory. this could probably be cleaned up in the future but
that will be a task for another day
Ports over my implementation of devcontainers in orb-rustzone to
orb-sotware. Replaces the previous nonfunctional and unmaintained
devcontainer.

---------

Co-authored-by: AlexKaravaev <alexkaravev@gmail.com>
## changes
- this PR adds extra information to the active connections report: link
status and routes for every interface
- also removed `net_changed` zenoh event emitted by `orb-connd`, instead
reusing `oes::ActiveConnections` in its place to consolidate things

## bugfixes
- this also fixes a bug where magic qr was not resetting the connection
in `jobs-agent`

## todo
- [ ] test on an orb
## changes
- shared client for backend status endpoint, handling lack of token or
internet
- oes cacheing (hacky version) for `connd/active_connections`
- no silent handling of uptime errors
- use `connd/oes/active_connections` for connectivity state instead of
`connd/net_changed`

## fixes
- bug where backend status client would be stuck with old connection
pool when switching primary connections

- [x] tested on an orb
## changes
better handling of `oes::ActiveConnections` to avoid unnecessary
reconnections through orb relay client

## todo
- [x] test on orb
## changes
power cycles the modem if we can't retrieve sim information

## todo
- [ ] test on an orb
the serial logs will be recorded differently
Add two new commands to orb-hil for copying files to/from an Orb over
SSH or Teleport:
```
orb-hil copy-to  --local ./file.bin --orb /tmp/file.bin --transport ssh --password "..."                                   
orb-hil copy-from --orb /tmp/log.txt --local ./log.txt  --transport teleport
```
Also, refactor the duplicates in working with remote commands

Tested:
- ssh copying
- teleport copying
- remote cmd
- ota
…1126)

## changes

### connd
- propagates the active connections message at least every 5 minutes if
there is a change on general connectivity state, primary connection, or
active connections

### jobs-agent
- account for extra publishes on active connections when forcing relay
reconnection

## todo
- [x] test on an orb
Just adding ETH0 status parsing for ETH0!
@semgrep-code-worldcoin
Copy link
Copy Markdown

Semgrep found 6 tainted-path findings:

The application builds a file path from potentially untrusted data, which can lead to a path traversal vulnerability. An attacker can manipulate the path which the application uses to access files. If the application does not validate user input and sanitize file paths, sensitive files such as configuration or user data can be accessed, potentially creating or overwriting files. To prevent this vulnerability, validate and sanitize any input that is used to create references to file paths. Also, enforce strict file access controls. For example, choose privileges allowing public-facing applications to access only the required files.

View Dataflow Graph
flowchart LR
    classDef invis fill:white, stroke: none
    classDef default fill:#e7f5ff, color:#1c7fd6, stroke: none

    subgraph File0["<b>orb-connd/src/reporters/net_stats.rs</b>"]
        direction LR
        %% Source

        subgraph Source
            direction LR

            v0["<a href=https://github.com/worldcoin/orb-software/blob/2a5730dcea964e300b22bdc81955961945d8c371/orb-connd/src/reporters/net_stats.rs#L62 target=_blank style='text-decoration:none; color:#1c7fd6'>[Line: 62] sysfs</a>"]
        end
        %% Intermediate

        subgraph Traces0[Traces]
            direction TB

            v2["<a href=https://github.com/worldcoin/orb-software/blob/2a5730dcea964e300b22bdc81955961945d8c371/orb-connd/src/reporters/net_stats.rs#L62 target=_blank style='text-decoration:none; color:#1c7fd6'>[Line: 62] ifaces_dir</a>"]
        end
        %% Sink

        subgraph Sink
            direction LR

            v1["<a href=https://github.com/worldcoin/orb-software/blob/2a5730dcea964e300b22bdc81955961945d8c371/orb-connd/src/reporters/net_stats.rs#L63 target=_blank style='text-decoration:none; color:#1c7fd6'>[Line: 63] ifaces_dir</a>"]
        end
    end
    %% Class Assignment
    Source:::invis
    Sink:::invis

    Traces0:::invis
    File0:::invis

    %% Connections

    Source --> Traces0
    Traces0 --> Sink

Loading

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants