Skip to content

Conversation

@bastelfreak
Copy link
Contributor

@bastelfreak bastelfreak commented Aug 4, 2025

This adds a bit of a startup delay to ensure that systemd not just starts the process and thinks the service is already up and running.

Previously this was handled by the wrapper script and Type=forking. This commit fixes a regression from b2de7c7

This also requires changes to the projects to provide the actual port.

This is just a workaround and not a perfect solution. For systemd-based systemd we should implement sd_notify and switch to Type=notify-reload.

https://www.freedesktop.org/software/systemd/man/latest/systemd.service.html#Type=

@bastelfreak bastelfreak added the bug Something isn't working label Aug 4, 2025
@bastelfreak bastelfreak force-pushed the ezbake2 branch 7 times, most recently from 9b89b1a to 6dd907d Compare August 4, 2025 19:59
This adds a bit of a startup delay to ensure that systemd not just
starts the process and thinks the service is already up and running.

Previously this was handled by the wrapper script and Type=forking. This
commit fixes a regression from OpenVoxProject@b2de7c7

This also requires changes to the projects to provide the actual port.

* OpenVoxProject/openvox-server#80
* OpenVoxProject/openvoxdb#69

This is just a workaround and not a perfect solution. For systemd-based
systemd we should implement sd_notify and switch to Type=notify-reload.

https://www.freedesktop.org/software/systemd/man/latest/systemd.service.html#Type=
@bastelfreak
Copy link
Contributor Author

bastelfreak commented Aug 4, 2025

resulting unit file:

# systemctl cat puppetserver --no-pager
# /usr/lib/systemd/system/puppetserver.service
#
# Local settings can be configured without being overwritten by package upgrades, for example
# if you want to increase puppetserver open-files-limit to 10000,
# you need to increase systemd's LimitNOFILE setting, so create a file named
# "/etc/systemd/system/puppetserver.service.d/limits.conf" containing:
#	[Service]
#	LimitNOFILE=10000
# You can confirm it worked by running systemctl daemon-reload
# then running systemctl show puppetserver | grep LimitNOFILE
#
[Unit]
Description=puppetserver Service
After=syslog.target network.target nss-lookup.target

[Service]
Type=exec
LogsDirectory=puppetlabs/puppetserver
RuntimeDirectory=puppetlabs/puppetserver
EnvironmentFile=/etc/sysconfig/puppetserver
User=puppet
TimeoutStartSec=300
TimeoutStopSec=60
Restart=on-failure
StartLimitBurst=5
PrivateTmp=true

# https://tickets.puppetlabs.com/browse/EZ-129
# Prior to systemd v228, TasksMax was unset by default, and unlimited. Starting in 228 a default of '512'
# was implemented. This is low enough to cause problems for certain applications. In systemd 231, the
# default was changed to be 15% of the default kernel limit. This explicitly sets TasksMax to 4915,
# which should match the default in systemd 231 and later.
# See https://github.com/systemd/systemd/issues/3211#issuecomment-233676333
TasksMax=4915

#set default privileges to -rw-r-----
UMask=027


ExecStart=/usr/lib/jvm/jre-21/bin/java $JAVA_ARGS -Dlogappender=F1 \
  '-XX:OnOutOfMemoryError=kill -9 %p' -XX:+CrashOnOutOfMemoryError \
  -XX:ErrorFile="${LOGS_DIRECTORY}/puppetserver_err_pid%p.log" \
  -cp "${INSTALL_DIR}/puppet-server-release.jar" \
  clojure.main \
  -m puppetlabs.trapperkeeper.main \
  --config "${CONFIG}" \
  --bootstrap-config "${BOOTSTRAP_CONFIG}" \
  $TK_ARGS

KillMode=process

# Wait until the service is started and opened the correct TCP port
ExecStartPost=timeout 30s sh -c 'while ! ss -H -t -l -n sport = :8140 | grep -q "^LISTEN.*:8140"; do sleep 1s; done'
ExecReload=kill -HUP $MAINPID
SuccessExitStatus=143

[Install]
WantedBy=multi-user.target

@bastelfreak
Copy link
Contributor Author

bastelfreak commented Aug 4, 2025

A little bit background: Some time ago the unit file used a wrapper:

[Service]
Type=forking
ExecStart=/opt/puppetlabs/server/apps/puppetserver/bin/puppetserver start

This triggered https://github.com/OpenVoxProject/ezbake/blob/adbe61f034bd179e5b4c76bddbd059cb6005794a/resources/puppetlabs/lein-ezbake/template/global/ext/cli/start.erb.

Because it's a sysv leftover, we removed it and the unit now calls java directly. Since openvoxdb/openvox-server don't implement sd_notify, systemd doesn't know when the service is ready. It assumes it started correctly directly after the process is started. But in fact openvoxdb/openvox-server need a few seconds to boot. As a workaround I implemented this ExecStartPost= command. I'm open for better ideas. Also the unit file nor sd_notify will help on FreeBSD, but they ship their own rc style config and can maybe copy our hack?

@m0dular
Copy link

m0dular commented Aug 4, 2025

I don't remember what platform(s) this happened on, but I remember hitting an error where the version of ss did not have the -H flag for skipping the header. That's why we went with sed 1d instead here. Feel free to ignore if this only applies to old versions we won't see anymore.

@jay7x
Copy link
Contributor

jay7x commented Aug 4, 2025

@m0dular I guess it's ss from busybox maybe.

@jay7x
Copy link
Contributor

jay7x commented Aug 4, 2025

My concern here is.. what if a person changes the listening port of an openvox-server/openvoxdb instance? Shall we parse the real config instead maybe? Or at least make it explicit in the config file to update the systemd unit file if the port is changed by some reason.

@austb
Copy link
Contributor

austb commented Aug 5, 2025

This is not a major issue because long running migrations are so infrequently run, but just wanted to call it out.

openVoxDB will bind its port during startup, but before migrations are run, but it will reject all queries until migrations are finished. So I think this will report the service as ready before it can respond to queries. This may be a problem for catalog applications that apply a long running migration where subsequent resources Require the db service and expect to be able to query it.

@jcharaoui
Copy link

jcharaoui commented Aug 5, 2025

Right, this is going to fail hard when the port is changed to something else than 8140, so I wouldn't recommend it.

In Debian we ship this service unit which is basically the same in principle, but instead it uses a quirk from trapperkeeper where it writes 1 in a file called restart in the runtime directory when it's done starting up.

A while ago @rlbdv and I crafted a very simple patch for trapperkeeper to add sd_notify support to it, which would potentially solve this issue for both openvox-server and openvoxdb. It was submitted a long time ago to Puppetlabs where it died a slow death but if ya'll are interested I can dig it out so you can have a look at it. It's certainly the more elegant of any other solutions.

@jay7x
Copy link
Contributor

jay7x commented Aug 5, 2025

@jcharaoui I guess it'd be nice to have sd_notify implemented, especially if you have a patch already 😆

@bastelfreak
Copy link
Contributor Author

@rbrw @jcharaoui are you interested in sending the patch to https://github.com/OpenVoxProject/trapperkeeper ? :)

@jcharaoui
Copy link

@rbrw @jcharaoui are you interested in sending the patch to https://github.com/OpenVoxProject/trapperkeeper ? :)

👉 OpenVoxProject/trapperkeeper#20

Copy link
Contributor

@ekohl ekohl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a hack because the user can change the port in the config file. 👎 IMHO. Let's solve it properly via OpenVoxProject/trapperkeeper#20.

@rwaffen
Copy link
Member

rwaffen commented Aug 19, 2025

Okay. Then let’s fix this other repo. And then update this and then server and db? It’s already quite a queue 😅

@bastelfreak
Copy link
Contributor Author

@ekohl I was hoping we can use the unit file from the debian maintainers (#37 (comment)) as an intermediate solution because that's quicker to implement and test. As a long term solution I would also prefer the trapperkeeper update.

@ekohl
Copy link
Contributor

ekohl commented Aug 20, 2025

@bastelfreak that also looks like a good solution that doesn't require parsing the configuration and I don't mind it as an intermediate solution.

@bastelfreak
Copy link
Contributor Author

#42 I applied the patches to the unit file to rely on the restart counter. Would be great if some people could take a look.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants