Skip to content

Conversation

@brandonpayton
Copy link
Member

@brandonpayton brandonpayton commented Dec 8, 2025

Motivation for the change, related issues

In order to safely run multiple workers in Windows, we need real, native file locking to prevent database corruption. File locking in Windows is "mandatory", enforced by the OS. In contrast, traditional file locking APIs for POSIX-like systems are "advisory", not enforced by the OS.

Windows will prevent another process for writing to a locked file and can even prevent the process owning an exclusive lock from writing the locked file using a different file handle.

This breaks how we are currently handling native lock via the main thread. In today's model:

  • All php-wasm workers request locks from the main thread
  • The main thread obtains a native lock by opening its own native file handle to lock natively.

In Windows, this does not work because the main thread cannot obtain an exclusive lock for a file handle when the process already has another file handle open for the same file.

We can solve this a couple of ways:

  1. Update the centralized file locking to take the native file handles from each worker thread
  2. Adopt true process separation for php-wasm workers
    a. Stop tracking locks in the main thread
    b. Rely completely upon native OS file locking

Implementation details

This PR implements option 2 - Adopt true process separation for php-wasm workers.

If we can get this to work on all supported native platforms, it is a simpler option because it does not involve an intermediate layer where we try to accurately reimplement fcntl() and flock() semantics.

Instead, we make a best effort to map fcntl() and flock() calls to native OS locking APIs.

We cannot perfectly implement fcntl() semantics with the Windows LockFileEx() API, but it appears to work fine for locking the SQLite DB via fcntl() calls.

Some remaining items:

  • The --experimental-blueprints-v2-runner option does not work.
  • Additional PHP processes spawned by the web server worker processes are not tracked or cleaned up during shutdown.
  • We need to be 100% sure we can kill all associated processes when the main Playground CLI process is killed.
    • As an additional measure, maybe we can have the child processes proactively exit if the parent process no longer exists.
  • Sometimes file-locking.spec.ts tests hang in the afterAll cleanup stage.

More notes coming...

Testing Instructions (or ideally a Blueprint)

CI

@brandonpayton
Copy link
Member Author

We can do this with a FileLockManagerForPosix and a FileLockManagerForWindows and fall back to the FileLockManagerForNode if the native locking API is not available.

I've working on the implementation for both. The main things that require care are:

  • Treat zero-length ranges as extending to the end of the file
  • Release locks when a related file descriptor is closed
  • Release locks when the process exits (or in this case, when a PHP request is completed)
  • Implement fcntl() semantics
    • Release fcntl() locks when any file descriptor for the locked file is closed by the locking process.
    • Locked ranges can be unlocked or merged piece by piece. In contrast, Windows locking requires that an unlock range corresponds exactly to the locked range.

For Posix, we can keep it fairly simple for fcntl() by keeping track of which files a process has locked via fcntl() and then unlocking the entire range via fcntl() when locks need to be released.

For Windows, implementing fcntl() semantics is more complicated. We'll have to maintain a collection of which ranges are locked per file in order to be able to unlock those ranges. If a caller wants to unlock part of a range, we'll have to unlock the entire range and then obtain locks for the remaining portions of the original locked range. For shared locks, we can obtain the new ranges before releasing the original range, but for exclusive ranges, we'll have to release the original range before attempting to obtain locks on the remaining ranges. (According to a Google answer about whether Windows allows overlapping exclusive locks by the same process)

The good news is that we are already tracking locked ranges in the FileLockManagerForNode. The work for the Windows locks shouldn't be that different.

cc @adamziel

@brandonpayton brandonpayton force-pushed the playground-cli/move-native-locking-into-workers branch from 60a6f0b to 6023b7d Compare December 9, 2025 18:10
@brandonpayton
Copy link
Member Author

I roughed out native FileLockManager's for POSIX and Windows, but they are yet untested. Tomorrow, I plan to start by adapting native locking tests and testing these new classes.

@adamziel
Copy link
Collaborator

The pre-requisites for this one seem to be mostly in place. The CLI spawn handler now creates a new OS process for any spawned PHP subprocess. The request handler still uses multiple PHP instances, but can be tuned down by adding maxPhpInstances: 1, to every bootWordPress* call in worker-thread-v*.ts files – which we could do in this PR.

In #3014, I'm exploring a CI stress test to confirm multiple workers are indeed used for handling concurrent requests.

@brandonpayton
Copy link
Member Author

One note:
The autologin cookie cleanup testing is failing because that cookie cleanup isn't happening at the moment. That needs to be fixed.

It's a bit more awkward because, with a cluster of workers, there is no one place that can judge which request is the first (so we know the autologin-has-happened cookie cannot be from the current Playground CLI session).

@adamziel
Copy link
Collaborator

adamziel commented Dec 19, 2025

Ultimately, to land this PR, we'll need to always run multiple php-wasm worker processes.

Sounds good and makes sense. Should we have a lower bound on the number of worker processes?

It's a bit more awkward because, with a cluster of workers, there is no one place that can judge which request is the first (so we know the autologin-has-happened cookie cannot be from the current Playground CLI session).

Maybe doing everything in PHP is not useful here? What do you think about moving some of the logic to start-server.ts where we know which request comes in first? We'd still need to keep it functional in the browser version but that's okay.

@brandonpayton
Copy link
Member Author

Should we have a lower bound on the number of worker processes?

Yes! In single-worker mode, we have a default maximum of 5 php-wasm instances at a time.

Let's start our lower bound at 5 instances and see how it goes.

@brandonpayton
Copy link
Member Author

Maybe doing everything in PHP is not useful here? What do you think about moving some of the logic to start-server.ts where we know which request comes in first? We'd still need to keep it functional in the browser version but that's okay.

The previous logic for clearing the autologin-has-happened cookie was in start-server.ts, but now there is not a single place it is used:

Every php-wasm worker process in the cluster calls startServer() to listen on the same port. It's how the cluster works, allowing each member to listen on the same port and coordinating which member gets the next request.

Maybe we can do something with a lock file. Will see :)

@brandonpayton
Copy link
Member Author

To fix auto-login even in the presence of a previous auto-login-has-happened cookie, I did the following:

  • Initialized the Playground with random UUID as the auto-login session ID.
  • Set the playground_auto_login_already_happened cookie with that session ID.
  • If auto-login is enabled and the playground_auto_login_already_happened cookie value does not match the current auto-login session ID, login again.

This means that any of the php-wasm workers can handle auto-login, and we don't need to know whether a request is the first one or not so we can remove the playground_auto_login_already_happened cookie.

Since each worker process has a single php-wasm instance,
we need other workers to complete
WP boot and install now.

To make this happen, I've tried to:
1. Initialize all workers with the same bootRequestHandler process
2. Pick one worker for booting WordPress once all workers are running and listening for HTTP requests.
3. Wire up a callback from the WP boot process so all workers mount post-WP-install mounts immediately after WP installed.
@brandonpayton
Copy link
Member Author

@adamziel I am trying to build an instrumented Blueprints v2 phar to help debug Playground CLI errors with the Blueprints v2. But I am encountering errors when trying composer run build-php-toolkit-phar and composer run build-blueprints-phar.

I've focused on Blueprints v1 first, but it's important to prove that the latest changes will be workable with Blueprints v2 as well.

The first issue with the build is that box is not found. If I try to resolve that by installing humbug/box either within the project or globally, I encounter an error like:

PHP Fatal error:  Declaration of KevinGH\Box\Composer\CompilerPsrLogger::log($level, Stringable|string $message, array $context = []): void must be compatible with Psr\Log\LoggerInterface::log($level, $message, array $context = []) in /Users/brandon/src/php-toolkit/vendor/humbug/box/src/Composer/CompilerPsrLogger.php on line 31

Fatal error: Declaration of KevinGH\Box\Composer\CompilerPsrLogger::log($level, Stringable|string $message, array $context = []): void must be compatible with Psr\Log\LoggerInterface::log($level, $message, array $context = []) in /Users/brandon/src/php-toolkit/vendor/humbug/box/src/Composer/CompilerPsrLogger.php on line 31

I tried with PHP 8.4, 8.2, and 8.1 in that order. Maybe there is something I am missing from the docs.

How would you recommend building blueprints.phar?

@adamziel
Copy link
Collaborator

@brandonpayton You may need PHP 8.0, that's what the GitHub action uses.

@brandonpayton
Copy link
Member Author

@brandonpayton You may need PHP 8.0, that's what the GitHub action uses.

Thanks, @adamziel! PHP 8.0 from the shivammathur/php homebrew tap worked. Strangely, the release workflow in the php-toolkit appears to use PHP 8.2, but it definitely did not work for me for building.

@brandonpayton
Copy link
Member Author

I spent some time today wrestling with the failing tests. The next thing to do is to update the description for this PR to make the remaining work clearer to myself and others. There will probably be ways folks can help if they have time.

@brandonpayton
Copy link
Member Author

brandonpayton commented Jan 6, 2026

I updated the description with some notes and plan to add more.

I am participating in a meetup over the next two weeks but plan to keep pushing on this one. @adamziel, if you have any time, I would love some help with getting Blueprints v2 working here. IIRC, it is failing during the WP install check, but I haven't found why.

This PR is still a bit rough... I haven't had a chance to clean up after a bunch of broad sketching, but the main server command is working:
npx nx unbuilt-asyncify playground-cli -- server

@brandonpayton
Copy link
Member Author

From the updated PR description:

We can solve this a couple of ways:

  1. Update the centralized file locking to use the native file handles from each worker thread
  2. Adopt true process separation for php-wasm workers
    a. Stop tracking locks in the main thread
    b. Rely completely upon native OS file locking

And:

This PR implements option 2 - Adopt true process separation for php-wasm workers.

If we can get this to work on all supported native platforms, it is a simpler option because it does not involve an intermediate layer where we try to accurately reimplement fcntl() and flock() semantics.

Instead, we make a best effort to map fcntl() and flock() calls to native OS locking APIs.

We cannot perfectly implement fcntl() semantics with the Windows LockFileEx() API, but it appears to work fine for locking the SQLite DB via fcntl() calls.

@adamziel As I've come back to this work after the holiday and some illness, something has been nagging at the back of my mind.

This PR is taking quite a while to wrangle, especially the move to multiple processes. I think maybe we should explore option 1 before continuing to wrestle here.

Based on what I've seen so far, option 2 (process separation) is doable, but it is a larger change that will require a more effort to make sure we clean up all processes, including additionally php-wasm processes spawned for proc_open().

To try option 1, what I would do is:

  • Continue using the central file lock manager for now.
  • Update php-wasm calls to pass the native file descriptor/handle to the central file lock manager.
  • Update the central file lock manager to use the native file descriptor for native locking rather than opening its own native file descriptor based on the provided path.
  • The central file lock manager would continue managing file locking across php-wasm instances.
  • The central file lock manager would leverage the native file lock manager for the current platform, only granting a lock when able to obtain a corresponding native lock.

I don't love the central file lock manager. I would rather rely totally on the OS, but I think we might be able to ship this faster with fewer changes with option 1. I could create a PoC next to establish some confidence. And shipping option 1 wouldn't prevent us from moving to option 2 (process separation) in the future.

What do you think?

@adamziel
Copy link
Collaborator

adamziel commented Jan 6, 2026

Thank you for bringing this up Brandon! Just to be sure - we'd still use multiple worker processes, right? So we'd get nearly all the speed benefits. If yes, then let's start with the central lock manager, good idea.

@adamziel
Copy link
Collaborator

adamziel commented Jan 6, 2026

Actually.. can we really pass the file descriptor to the central lock manager? We'd have to do that across the process boundary, right?

@brandonpayton
Copy link
Member Author

Thank you for bringing this up Brandon! Just to be sure - we'd still use multiple worker processes, right? So we'd get nearly all the speed benefits. If yes, then let's start with the central lock manager, good idea.

@adamziel We would have multiple worker threads, not separate processes. At some point in this exploration, I mistakenly thought that worker threads translated to actual separate processes, but this was wrong. Node.js worker threads are part of the original process.

This would give us multiple workers threads but under a single OS process.

Actually.. can we really pass the file descriptor to the central lock manager? We'd have to do that across the process boundary, right?

While we have a single process, we can pass file descriptors across thread boundaries within that single process.

I think this is worth a try and will go ahead and make a PoC unless I hear otherwise from you.

Thanks for your feedback!

@adamziel
Copy link
Collaborator

adamziel commented Jan 7, 2026

Sounds good! I understand we can still account for the nuances of locking a path vs fd and allowing the same PHP runtime to acquire multiple overlapping locks while not allowing other runtimes to do that, regardless of the worker they run in

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants