Skip to content

Image pull issue at high invocation rates #652

@aditya2803

Description

@aditya2803

Describe the bug
I am running a 3 node cluster (1 master and 2 worker nodes). At high invocation rates, I get the following error on one of the worker nodes, for all requests that get sent to that node, and as a result more than 95% requests fail on that node:

time="2022-12-24T07:43:53.953774832-06:00" level=error msg="coordinator failed to start VM" error="Failed to get/pull image: context deadline exceeded" image="[ghcr.io/aditya2803/json:v23](http://ghcr.io/aditya2803/json:v23)" vmID=715
time="2022-12-24T07:43:53.953851537-06:00" level=error msg="failed to start VM" error="Failed to get/pull image: context deadline exceeded"

At lower invocation rates, and even with single-node clusters, I do not see this error. I am using images from the GitHub container registry. How can I solve this issue?

To Reproduce
Set up a 3 node cluster, and try deploying a function at very high invocation rates. This way, there is a lot of demand for pulling images simultaneously. On one of the nodes, you would start seeing errors like above.

Expected behavior
There should be no error regarding the pulling of image from the registry. Also, I believe that once the images are accessed for the first time, they get stored locally. So I am not sure why this is happening.

Logs

The vhive.stdout logs get filled with the below message

time="2022-12-24T07:43:53.953774832-06:00" level=error msg="coordinator failed to start VM" error="Failed to get/pull image: context deadline exceeded" image="[ghcr.io/aditya2803/json:v23](http://ghcr.io/aditya2803/json:v23)" vmID=715
time="2022-12-24T07:43:53.953851537-06:00" level=error msg="failed to start VM" error="Failed to get/pull image: context deadline exceeded"

Notes
I have tested this on bare-metal setups and virtualised setups.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions