[CLI] Move native file locking into workers #2997

brandonpayton · 2025-12-08T22:05:26Z

Motivation for the change, related issues

In order to safely run multiple workers in Windows, we need real, native file locking to prevent database corruption. File locking in Windows is "mandatory", enforced by the OS. In contrast, traditional file locking APIs for POSIX-like systems are "advisory", not enforced by the OS.

Windows will prevent another process for writing to a locked file and can even prevent the process owning an exclusive lock from writing the locked file using a different file handle.

This breaks how we are currently handling native lock via the main thread. In today's model:

All php-wasm workers request locks from the main thread
The main thread obtains a native lock by opening its own native file handle to lock natively.

In Windows, this does not work because the main thread cannot obtain an exclusive lock for a file handle when the process already has another file handle open for the same file.

We can solve this a couple of ways:

Update the centralized file locking to take the native file handles from each worker thread
Adopt true process separation for php-wasm workers
a. Stop tracking locks in the main thread
b. Rely completely upon native OS file locking

Implementation details

This PR implements option 2 - Adopt true process separation for php-wasm workers.

If we can get this to work on all supported native platforms, it is a simpler option because it does not involve an intermediate layer where we try to accurately reimplement fcntl() and flock() semantics.

Instead, we make a best effort to map fcntl() and flock() calls to native OS locking APIs.

We cannot perfectly implement fcntl() semantics with the Windows LockFileEx() API, but it appears to work fine for locking the SQLite DB via fcntl() calls.

Some remaining items:

The --experimental-blueprints-v2-runner option does not work.
Additional PHP processes spawned by the web server worker processes are not tracked or cleaned up during shutdown.
We need to be 100% sure we can kill all associated processes when the main Playground CLI process is killed.
- As an additional measure, maybe we can have the child processes proactively exit if the parent process no longer exists.
Sometimes file-locking.spec.ts tests hang in the afterAll cleanup stage.

More notes coming...

Testing Instructions (or ideally a Blueprint)

CI

…Windows FileLockManager tests

brandonpayton · 2025-12-09T05:33:28Z

We can do this with a FileLockManagerForPosix and a FileLockManagerForWindows and fall back to the FileLockManagerForNode if the native locking API is not available.

I've working on the implementation for both. The main things that require care are:

Treat zero-length ranges as extending to the end of the file
Release locks when a related file descriptor is closed
Release locks when the process exits (or in this case, when a PHP request is completed)
Implement fcntl() semantics
- Release fcntl() locks when any file descriptor for the locked file is closed by the locking process.
- Locked ranges can be unlocked or merged piece by piece. In contrast, Windows locking requires that an unlock range corresponds exactly to the locked range.

For Posix, we can keep it fairly simple for fcntl() by keeping track of which files a process has locked via fcntl() and then unlocking the entire range via fcntl() when locks need to be released.

For Windows, implementing fcntl() semantics is more complicated. We'll have to maintain a collection of which ranges are locked per file in order to be able to unlock those ranges. If a caller wants to unlock part of a range, we'll have to unlock the entire range and then obtain locks for the remaining portions of the original locked range. For shared locks, we can obtain the new ranges before releasing the original range, but for exclusive ranges, we'll have to release the original range before attempting to obtain locks on the remaining ranges. (According to a Google answer about whether Windows allows overlapping exclusive locks by the same process)

The good news is that we are already tracking locked ranges in the FileLockManagerForNode. The work for the Windows locks shouldn't be that different.

cc @adamziel

…Posix

…ire remaining range

…and process exit

brandonpayton · 2025-12-10T06:03:14Z

I roughed out native FileLockManager's for POSIX and Windows, but they are yet untested. Tomorrow, I plan to start by adapting native locking tests and testing these new classes.

adamziel · 2025-12-10T17:01:57Z

The pre-requisites for this one seem to be mostly in place. The CLI spawn handler now creates a new OS process for any spawned PHP subprocess. The request handler still uses multiple PHP instances, but can be tuned down by adding maxPhpInstances: 1, to every bootWordPress* call in worker-thread-v*.ts files – which we could do in this PR.

In #3014, I'm exploring a CI stress test to confirm multiple workers are indeed used for handling concurrent requests.

brandonpayton · 2025-12-19T07:11:41Z

One note:
The autologin cookie cleanup testing is failing because that cookie cleanup isn't happening at the moment. That needs to be fixed.

It's a bit more awkward because, with a cluster of workers, there is no one place that can judge which request is the first (so we know the autologin-has-happened cookie cannot be from the current Playground CLI session).

adamziel · 2025-12-19T11:59:04Z

Ultimately, to land this PR, we'll need to always run multiple php-wasm worker processes.

Sounds good and makes sense. Should we have a lower bound on the number of worker processes?

It's a bit more awkward because, with a cluster of workers, there is no one place that can judge which request is the first (so we know the autologin-has-happened cookie cannot be from the current Playground CLI session).

Maybe doing everything in PHP is not useful here? What do you think about moving some of the logic to start-server.ts where we know which request comes in first? We'd still need to keep it functional in the browser version but that's okay.

brandonpayton · 2025-12-19T15:55:34Z

Should we have a lower bound on the number of worker processes?

Yes! In single-worker mode, we have a default maximum of 5 php-wasm instances at a time.

Let's start our lower bound at 5 instances and see how it goes.

brandonpayton · 2025-12-19T16:25:02Z

Maybe doing everything in PHP is not useful here? What do you think about moving some of the logic to start-server.ts where we know which request comes in first? We'd still need to keep it functional in the browser version but that's okay.

The previous logic for clearing the autologin-has-happened cookie was in start-server.ts, but now there is not a single place it is used:

Every php-wasm worker process in the cluster calls startServer() to listen on the same port. It's how the cluster works, allowing each member to listen on the same port and coordinating which member gets the next request.

Maybe we can do something with a lock file. Will see :)

brandonpayton · 2025-12-22T07:07:00Z

To fix auto-login even in the presence of a previous auto-login-has-happened cookie, I did the following:

Initialized the Playground with random UUID as the auto-login session ID.
Set the playground_auto_login_already_happened cookie with that session ID.
If auto-login is enabled and the playground_auto_login_already_happened cookie value does not match the current auto-login session ID, login again.

This means that any of the php-wasm workers can handle auto-login, and we don't need to know whether a request is the first one or not so we can remove the playground_auto_login_already_happened cookie.

Since each worker process has a single php-wasm instance, we need other workers to complete WP boot and install now. To make this happen, I've tried to: 1. Initialize all workers with the same bootRequestHandler process 2. Pick one worker for booting WordPress once all workers are running and listening for HTTP requests. 3. Wire up a callback from the WP boot process so all workers mount post-WP-install mounts immediately after WP installed.

…olved

brandonpayton · 2025-12-23T06:22:09Z

@adamziel I am trying to build an instrumented Blueprints v2 phar to help debug Playground CLI errors with the Blueprints v2. But I am encountering errors when trying composer run build-php-toolkit-phar and composer run build-blueprints-phar.

I've focused on Blueprints v1 first, but it's important to prove that the latest changes will be workable with Blueprints v2 as well.

The first issue with the build is that box is not found. If I try to resolve that by installing humbug/box either within the project or globally, I encounter an error like:

PHP Fatal error:  Declaration of KevinGH\Box\Composer\CompilerPsrLogger::log($level, Stringable|string $message, array $context = []): void must be compatible with Psr\Log\LoggerInterface::log($level, $message, array $context = []) in /Users/brandon/src/php-toolkit/vendor/humbug/box/src/Composer/CompilerPsrLogger.php on line 31

Fatal error: Declaration of KevinGH\Box\Composer\CompilerPsrLogger::log($level, Stringable|string $message, array $context = []): void must be compatible with Psr\Log\LoggerInterface::log($level, $message, array $context = []) in /Users/brandon/src/php-toolkit/vendor/humbug/box/src/Composer/CompilerPsrLogger.php on line 31

I tried with PHP 8.4, 8.2, and 8.1 in that order. Maybe there is something I am missing from the docs.

How would you recommend building blueprints.phar?

adamziel · 2025-12-23T13:07:22Z

@brandonpayton You may need PHP 8.0, that's what the GitHub action uses.

brandonpayton · 2025-12-23T18:34:40Z

@brandonpayton You may need PHP 8.0, that's what the GitHub action uses.

Thanks, @adamziel! PHP 8.0 from the shivammathur/php homebrew tap worked. Strangely, the release workflow in the php-toolkit appears to use PHP 8.2, but it definitely did not work for me for building.

brandonpayton · 2026-01-03T05:36:46Z

I spent some time today wrestling with the failing tests. The next thing to do is to update the description for this PR to make the remaining work clearer to myself and others. There will probably be ways folks can help if they have time.

brandonpayton · 2026-01-06T03:50:37Z

I updated the description with some notes and plan to add more.

I am participating in a meetup over the next two weeks but plan to keep pushing on this one. @adamziel, if you have any time, I would love some help with getting Blueprints v2 working here. IIRC, it is failing during the WP install check, but I haven't found why.

This PR is still a bit rough... I haven't had a chance to clean up after a bunch of broad sketching, but the main server command is working:
npx nx unbuilt-asyncify playground-cli -- server

brandonpayton · 2026-01-06T04:43:19Z

From the updated PR description:

We can solve this a couple of ways:

Update the centralized file locking to use the native file handles from each worker thread

Adopt true process separation for php-wasm workers
a. Stop tracking locks in the main thread
b. Rely completely upon native OS file locking

And:

This PR implements option 2 - Adopt true process separation for php-wasm workers.

If we can get this to work on all supported native platforms, it is a simpler option because it does not involve an intermediate layer where we try to accurately reimplement fcntl() and flock() semantics.

Instead, we make a best effort to map fcntl() and flock() calls to native OS locking APIs.

We cannot perfectly implement fcntl() semantics with the Windows LockFileEx() API, but it appears to work fine for locking the SQLite DB via fcntl() calls.

@adamziel As I've come back to this work after the holiday and some illness, something has been nagging at the back of my mind.

This PR is taking quite a while to wrangle, especially the move to multiple processes. I think maybe we should explore option 1 before continuing to wrestle here.

Based on what I've seen so far, option 2 (process separation) is doable, but it is a larger change that will require a more effort to make sure we clean up all processes, including additionally php-wasm processes spawned for proc_open().

To try option 1, what I would do is:

Continue using the central file lock manager for now.
Update php-wasm calls to pass the native file descriptor/handle to the central file lock manager.
Update the central file lock manager to use the native file descriptor for native locking rather than opening its own native file descriptor based on the provided path.
The central file lock manager would continue managing file locking across php-wasm instances.
The central file lock manager would leverage the native file lock manager for the current platform, only granting a lock when able to obtain a corresponding native lock.

I don't love the central file lock manager. I would rather rely totally on the OS, but I think we might be able to ship this faster with fewer changes with option 1. I could create a PoC next to establish some confidence. And shipping option 1 wouldn't prevent us from moving to option 2 (process separation) in the future.

What do you think?

adamziel · 2026-01-06T16:06:32Z

Thank you for bringing this up Brandon! Just to be sure - we'd still use multiple worker processes, right? So we'd get nearly all the speed benefits. If yes, then let's start with the central lock manager, good idea.

adamziel · 2026-01-06T16:07:31Z

Actually.. can we really pass the file descriptor to the central lock manager? We'd have to do that across the process boundary, right?

brandonpayton · 2026-01-06T18:44:13Z

Thank you for bringing this up Brandon! Just to be sure - we'd still use multiple worker processes, right? So we'd get nearly all the speed benefits. If yes, then let's start with the central lock manager, good idea.

@adamziel We would have multiple worker threads, not separate processes. At some point in this exploration, I mistakenly thought that worker threads translated to actual separate processes, but this was wrong. Node.js worker threads are part of the original process.

This would give us multiple workers threads but under a single OS process.

Actually.. can we really pass the file descriptor to the central lock manager? We'd have to do that across the process boundary, right?

While we have a single process, we can pass file descriptors across thread boundaries within that single process.

I think this is worth a try and will go ahead and make a PoC unless I hear otherwise from you.

Thanks for your feedback!

adamziel · 2026-01-07T16:25:35Z

Sounds good! I understand we can still account for the nuances of locking a path vs fd and allowing the same PHP runtime to acquire multiple overlapping locks while not allowing other runtimes to do that, regardless of the worker they run in

WIP: Make separate FileLockManagers for native locking

f32a8c6

brandonpayton self-assigned this Dec 8, 2025

brandonpayton added [Type] Bug An existing feature does not function as intended [Focus] Windows Support [Package][@php-wasm] Node [Package][@wp-playground] CLI labels Dec 8, 2025

brandonpayton added 6 commits December 8, 2025 20:33

Rename releaseLocksForProcessFd to releaseLocksOnFdClose

f4b7bd1

Restore tests that were accidentally left commented out in project.json

aac607e

Fix type errors in file-lock-manager-for-node tests

a6a2d25

Skip native file locking tests because they'll be moved to POSIX and …

872e8a3

…Windows FileLockManager tests

Fix posix manager class name

69302cf

Add tracking and cleanup for POSIX-native whole file locks

f4963d6

Implement release-on-close and release-on-exit for FileLockManagerFor…

6023b7d

…Posix

brandonpayton force-pushed the playground-cli/move-native-locking-into-workers branch from 60a6f0b to 6023b7d Compare December 9, 2025 18:10

brandonpayton added 8 commits December 9, 2025 23:40

Explain Path, Pid, and Fd types

d65feca

Use Path type instead of string in POSXI lock manager

8a02e76

Implement whole-file locking for Windows

2319613

Add POSIX lock manager TODO

d713beb

Cleanup relevant lock records on FD close for POSIX

47a4c1c

Implement whole-file lock cleanup on FD close and process exit

33d8d8f

Update POSIX lock manager to treat zero length ranges as covering ent…

222273d

…ire remaining range

Implement initial fcntl() for Windows along with cleanup on FD close …

00668c4

…and process exit

Address typechecking errors

9ab68d9

brandonpayton added 4 commits December 10, 2025 12:27

Update fs-ext-extra-prebuilt

7c22502

Use fcntlSync() with start/end params

677bca0

Declare tests for FileLockManager for Windows and POSIX

0d08953

Rename FileLockManagerForNode to FileLockManagerInMemory

f29e544

brandonpayton added 5 commits December 20, 2025 17:25

Always create multiple, single-instance php-wasm workers

4a3ea2c

Update parseOptionsAndRunCLI() to return RunCLIServer object

6943423

Start on Blueprint v2 worker server

50e408e

Fix type errors

66a5a87

Make sure auto-login works with each new Playground session

6c77a74

brandonpayton added 4 commits December 22, 2025 02:13

Fix auto-login tests

c3ee20d

Require fs-ext-extra-prebuilt

2cc11f7

Adjust how post-install mounts are handled after blueprint.target_res…

036dbdb

…olved

brandonpayton added 2 commits January 2, 2026 19:16

Update some TODOs

3dacde6

Minor test fix and some TODO updates

c4d8a9b

brandonpayton added 4 commits January 5, 2026 13:34

Fix worker events test

4381601

Preserve worker-thread-vX.ts naming

9b7b5c7

Use alternate port to avoid conflicts with parallel Playground CLI tests

e57c4fd

Fix some incorrect worker paths

6e63b03

[CLI] Move native file locking into workers #2997

Are you sure you want to change the base?

[CLI] Move native file locking into workers #2997

Uh oh!

Conversation

brandonpayton commented Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation for the change, related issues

Implementation details

Testing Instructions (or ideally a Blueprint)

Uh oh!

brandonpayton commented Dec 9, 2025

Uh oh!

brandonpayton commented Dec 10, 2025

Uh oh!

adamziel commented Dec 10, 2025

Uh oh!

brandonpayton commented Dec 19, 2025

Uh oh!

adamziel commented Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

brandonpayton commented Dec 19, 2025

Uh oh!

brandonpayton commented Dec 19, 2025

Uh oh!

brandonpayton commented Dec 22, 2025

Uh oh!

brandonpayton commented Dec 23, 2025

Uh oh!

adamziel commented Dec 23, 2025

Uh oh!

brandonpayton commented Dec 23, 2025

Uh oh!

brandonpayton commented Jan 3, 2026

Uh oh!

brandonpayton commented Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

brandonpayton commented Jan 6, 2026

Uh oh!

adamziel commented Jan 6, 2026

Uh oh!

adamziel commented Jan 6, 2026

Uh oh!

brandonpayton commented Jan 6, 2026

Uh oh!

adamziel commented Jan 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

brandonpayton commented Dec 8, 2025 •

edited

Loading

adamziel commented Dec 19, 2025 •

edited

Loading

brandonpayton commented Jan 6, 2026 •

edited

Loading