Skip to content

Conversation

@ObadaS
Copy link
Collaborator

@ObadaS ObadaS commented Nov 14, 2025

A brief description of the purpose of the changes contained in this PR.

Changed the code to use the Docker and Podman python packages instead of using the current subprocess way of launching containers for the compute worker.

For Podman containers using GPU, we need podman 5.4 minimum : containers/podman#25171

We these changes, we only need 1 compute worker image which will be able to communicate with the Docker and Podman socket, as well as use the available GPU on the host machine without having to add anything inside the image.
This results in a much smaller image (around 400 MB, down from 1 GB for the Docker image and 700 MB for the podman image)

New option in .env

You can now select which GPU you want to assign to the compute worker if you have multiple.
You can get the list by running nvidia-ctk cdi list. You then copy the name of the GPU you want and add it in the .env like this :

# Replace *name* with the name as shown in the previous command. For example : nvidia.com/gpu=0
GPU_DEVICE=*name*

By default, if GPU_DEVICE is not set, it will take the value of nvidia.com/gpu=all

New Github Workflow

I also added workflows to automatically build images and push them to Dockerhub when the Dockerfile.compute_worker file and compute_worker/ directory receives changes. The image will have the test tag when made from the develop branch, release tag and latest tag when made from the master branch and the branchName tag when made from a branch that changes the files.

Since we can not delete images with the Docker command, we will need to clean up the image registry from time to time.

Issues this PR resolves

#2022

Checklist

  • Code review by me
  • Hand tested by me
  • I'm proud of my work
  • Code review by reviewer
  • Hand tested by reviewer
  • CircleCi tests are passing
  • Ready to merge

Copy link
Collaborator

@wlln wlln left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in variable name

@ObadaS ObadaS changed the title [WIP] Using docker-py and podman-py instead of the current subprocess way of doing things Using docker-py and podman-py instead of the current subprocess way of doing things Nov 30, 2025
@ObadaS ObadaS changed the title Using docker-py and podman-py instead of the current subprocess way of doing things Compute Worker Use docker-py instead of the current subprocess way of doing things (for podman and docker) Dec 1, 2025
@ObadaS ObadaS force-pushed the compute_workerDockerwithPython branch 2 times, most recently from f0a6e34 to 9505d48 Compare December 12, 2025 07:35
@ObadaS
Copy link
Collaborator Author

ObadaS commented Dec 12, 2025

I skipped one of the unit tests because it was failing the Circle-CI tests.

class LegacyConverterCommandTests(TestCase):
    def test_ingestion_command_is_converted_correctly(self):
        v15 = 'python $ingestion_program/ingestion.py $input $output $ingestion_program $submission_program'
        v2 = 'python /app/program/ingestion.py /app/input_data /app/output /app/program /app/ingested_program'
        assert replace_legacy_metadata_command(command=v15, kind='ingestion', is_scoring=False) == v2

    def test_scoring_command_is_converted_correctly(self):
        v15 = 'python $program/score.py $input $output'
        v2 = 'python /app/program/score.py /app/input /app/output'
        assert replace_legacy_metadata_command(command=v15, kind='program', is_scoring=False) == v2

We already test this in the e2e tests by trying multiple v15 competition.
For now, the file still exists but it might be better to delete it entirely.

@ObadaS ObadaS force-pushed the compute_workerDockerwithPython branch 2 times, most recently from df81308 to b997f60 Compare December 18, 2025 13:08
@ObadaS ObadaS force-pushed the compute_workerDockerwithPython branch from 48d0dd2 to 0ac5e94 Compare December 23, 2025 19:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants