-
Notifications
You must be signed in to change notification settings - Fork 382
[CLI] Move native file locking into workers #2997
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: php-wasm/node/testable-syscall-overrides
Are you sure you want to change the base?
[CLI] Move native file locking into workers #2997
Conversation
…Windows FileLockManager tests
|
We can do this with a FileLockManagerForPosix and a FileLockManagerForWindows and fall back to the FileLockManagerForNode if the native locking API is not available. I've working on the implementation for both. The main things that require care are:
For Posix, we can keep it fairly simple for fcntl() by keeping track of which files a process has locked via fcntl() and then unlocking the entire range via fcntl() when locks need to be released. For Windows, implementing fcntl() semantics is more complicated. We'll have to maintain a collection of which ranges are locked per file in order to be able to unlock those ranges. If a caller wants to unlock part of a range, we'll have to unlock the entire range and then obtain locks for the remaining portions of the original locked range. For shared locks, we can obtain the new ranges before releasing the original range, but for exclusive ranges, we'll have to release the original range before attempting to obtain locks on the remaining ranges. (According to a Google answer about whether Windows allows overlapping exclusive locks by the same process) The good news is that we are already tracking locked ranges in the FileLockManagerForNode. The work for the Windows locks shouldn't be that different. cc @adamziel |
60a6f0b to
6023b7d
Compare
…ire remaining range
|
I roughed out native FileLockManager's for POSIX and Windows, but they are yet untested. Tomorrow, I plan to start by adapting native locking tests and testing these new classes. |
|
The pre-requisites for this one seem to be mostly in place. The CLI spawn handler now creates a new OS process for any spawned PHP subprocess. The request handler still uses multiple PHP instances, but can be tuned down by adding In #3014, I'm exploring a CI stress test to confirm multiple workers are indeed used for handling concurrent requests. |
|
One note: It's a bit more awkward because, with a cluster of workers, there is no one place that can judge which request is the first (so we know the autologin-has-happened cookie cannot be from the current Playground CLI session). |
Sounds good and makes sense. Should we have a lower bound on the number of worker processes?
Maybe doing everything in PHP is not useful here? What do you think about moving some of the logic to |
Yes! In single-worker mode, we have a default maximum of 5 php-wasm instances at a time. Let's start our lower bound at 5 instances and see how it goes. |
The previous logic for clearing the autologin-has-happened cookie was in Every php-wasm worker process in the cluster calls Maybe we can do something with a lock file. Will see :) |
|
To fix auto-login even in the presence of a previous auto-login-has-happened cookie, I did the following:
This means that any of the php-wasm workers can handle auto-login, and we don't need to know whether a request is the first one or not so we can remove the |
Since each worker process has a single php-wasm instance, we need other workers to complete WP boot and install now. To make this happen, I've tried to: 1. Initialize all workers with the same bootRequestHandler process 2. Pick one worker for booting WordPress once all workers are running and listening for HTTP requests. 3. Wire up a callback from the WP boot process so all workers mount post-WP-install mounts immediately after WP installed.
|
@adamziel I am trying to build an instrumented Blueprints v2 phar to help debug Playground CLI errors with the Blueprints v2. But I am encountering errors when trying I've focused on Blueprints v1 first, but it's important to prove that the latest changes will be workable with Blueprints v2 as well. The first issue with the build is that I tried with PHP 8.4, 8.2, and 8.1 in that order. Maybe there is something I am missing from the docs. How would you recommend building |
|
@brandonpayton You may need PHP 8.0, that's what the GitHub action uses. |
Thanks, @adamziel! PHP 8.0 from the |
|
I spent some time today wrestling with the failing tests. The next thing to do is to update the description for this PR to make the remaining work clearer to myself and others. There will probably be ways folks can help if they have time. |
|
I updated the description with some notes and plan to add more. I am participating in a meetup over the next two weeks but plan to keep pushing on this one. @adamziel, if you have any time, I would love some help with getting Blueprints v2 working here. IIRC, it is failing during the WP install check, but I haven't found why. This PR is still a bit rough... I haven't had a chance to clean up after a bunch of broad sketching, but the main server command is working: |
|
From the updated PR description:
And:
@adamziel As I've come back to this work after the holiday and some illness, something has been nagging at the back of my mind. This PR is taking quite a while to wrangle, especially the move to multiple processes. I think maybe we should explore option 1 before continuing to wrestle here. Based on what I've seen so far, option 2 (process separation) is doable, but it is a larger change that will require a more effort to make sure we clean up all processes, including additionally php-wasm processes spawned for proc_open(). To try option 1, what I would do is:
I don't love the central file lock manager. I would rather rely totally on the OS, but I think we might be able to ship this faster with fewer changes with option 1. I could create a PoC next to establish some confidence. And shipping option 1 wouldn't prevent us from moving to option 2 (process separation) in the future. What do you think? |
|
Thank you for bringing this up Brandon! Just to be sure - we'd still use multiple worker processes, right? So we'd get nearly all the speed benefits. If yes, then let's start with the central lock manager, good idea. |
|
Actually.. can we really pass the file descriptor to the central lock manager? We'd have to do that across the process boundary, right? |
@adamziel We would have multiple worker threads, not separate processes. At some point in this exploration, I mistakenly thought that worker threads translated to actual separate processes, but this was wrong. Node.js worker threads are part of the original process. This would give us multiple workers threads but under a single OS process.
While we have a single process, we can pass file descriptors across thread boundaries within that single process. I think this is worth a try and will go ahead and make a PoC unless I hear otherwise from you. Thanks for your feedback! |
|
Sounds good! I understand we can still account for the nuances of locking a path vs fd and allowing the same PHP runtime to acquire multiple overlapping locks while not allowing other runtimes to do that, regardless of the worker they run in |
Motivation for the change, related issues
In order to safely run multiple workers in Windows, we need real, native file locking to prevent database corruption. File locking in Windows is "mandatory", enforced by the OS. In contrast, traditional file locking APIs for POSIX-like systems are "advisory", not enforced by the OS.
Windows will prevent another process for writing to a locked file and can even prevent the process owning an exclusive lock from writing the locked file using a different file handle.
This breaks how we are currently handling native lock via the main thread. In today's model:
In Windows, this does not work because the main thread cannot obtain an exclusive lock for a file handle when the process already has another file handle open for the same file.
We can solve this a couple of ways:
a. Stop tracking locks in the main thread
b. Rely completely upon native OS file locking
Implementation details
This PR implements option 2 - Adopt true process separation for php-wasm workers.
If we can get this to work on all supported native platforms, it is a simpler option because it does not involve an intermediate layer where we try to accurately reimplement fcntl() and flock() semantics.
Instead, we make a best effort to map fcntl() and flock() calls to native OS locking APIs.
We cannot perfectly implement fcntl() semantics with the Windows LockFileEx() API, but it appears to work fine for locking the SQLite DB via fcntl() calls.
Some remaining items:
--experimental-blueprints-v2-runneroption does not work.afterAllcleanup stage.More notes coming...
Testing Instructions (or ideally a Blueprint)
CI