-
Notifications
You must be signed in to change notification settings - Fork 121
Description
Bosh-agent itself is already running with higher priority than BOSH/monit jobs to mitigate CPU-intensive workloads blocking the agent <-> director communication, see cloudfoundry/bosh-linux-stemcell-builder@00054bd .
However, as it seems lifecycle hooks like pre-start scripts can as well have the same negative effect on the communication with the director because they are started by the bosh-agent itself and hence run with the same priority. At least this is my assumption because I wasn't able to find a line of code that lowers that priority and looking at a VM while it is running a pre-start reveals that the pre-start script with all sub-processes runs with the same priority as the agent.
In our case cloning a lot of data from the remaining part of a BOSH-managed PostgreSQL cluster can trigger this issue inconsistently, which in extreme situations extends downtime unnecessarily because the bosh task itself errors with an agent timeout and the pre-start has to run from scratch again.
Of course as a quick mitigation we could for example renice the priority in our pre-start script. Yet I would see benefit as well as consistency and hence predictability if bosh agent starts external scripts/binaries with lower priority than itself.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status