Skip to content

Conversation

@seddonym
Copy link
Collaborator

@seddonym seddonym commented Aug 1, 2025

Prior to this PR, we used Python's joblib library to scan imports in parallel.

This PR, enabled by the refactor of #237, moves the parallelism to Rust-based multithreading instead of doing multiprocessing with Python.

The benchmarks say it'll make it much slower, but I'm not seeing that in practice: running this on a very large graph, uncached, seems to speed it up from ~11s to ~8s, locally (compared with Grimp's latest unyanked release, 3.9). With a fully populated cache it's about the same, which is what we'd expect since this change is limited to the uncached path.

This is good news because I had to yank the Grimp 3.10 release due to it performing unexpectedly poorly on an uncached graph - this should allow us to do another release.

@codspeed-hq
Copy link

codspeed-hq bot commented Aug 1, 2025

CodSpeed Instrumentation Performance Report

Merging #236 will degrade performances by 80.31%

Comparing concurrent-scanning (f764d6c) with main (67d1ec5)

Summary

❌ 2 (👁 2) regressions
✅ 21 untouched benchmarks

Benchmarks breakdown

Benchmark BASE HEAD Change
👁 test_build_django_from_cache_a_few_misses[350] 297.9 ms 740.3 ms -59.76%
👁 test_build_django_uncached 143.4 ms 728.1 ms -80.31%

@seddonym seddonym force-pushed the concurrent-scanning branch from 1d90a2a to 4fc6090 Compare August 1, 2025 13:12
@seddonym seddonym changed the title Concurrent import scanning Multithreaded import scanning Aug 1, 2025
@seddonym seddonym force-pushed the concurrent-scanning branch from 50f575b to 89359d4 Compare August 4, 2025 17:18
@seddonym seddonym force-pushed the concurrent-scanning branch from 89359d4 to f4a669a Compare August 18, 2025 16:39
@seddonym seddonym force-pushed the concurrent-scanning branch from f4a669a to f764d6c Compare August 18, 2025 16:40
@seddonym seddonym marked this pull request as ready for review August 18, 2025 17:01
@seddonym seddonym requested a review from Peter554 August 22, 2025 08:00
exclude_type_checking_imports,
);
let imports_by_module_result = py.allow_threads(|| {
scan_for_imports_no_py(
Copy link
Collaborator

@Peter554 Peter554 Aug 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Temporarily releases the GIL, thus allowing other Python threads to run

Which other python threads are running?

Are you sure allow_threads is needed? In the context of this commit it makes sense, but after the next commit, once joblib is removed, I'm unsure if it's still needed.

🐼 I think allow_threads is deprecating in favor of detach.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted re. detach - I'll update when I update pyo3.

This was the crucial line I had to add to get things working - without it, everything was much slower - it didn't seem to be operating in parallel.

From https://pyo3.rs/main/parallelism.html

You should always call detach in situations that spawn worker threads, but especially so in cases where worker threads need to acquire the GIL, to prevent deadlocks.
(emphasis mine)

FWIW the explanation I got from Gemini of why this is was this:

Rayon's parallel iterators release the Global Interpreter Lock (GIL) internally, but pyo3 needs to be aware of this to ensure the Rust code can safely run in a separate thread without holding the GIL. The allow_threads block handles this by temporarily releasing the GIL, allowing Rayon to create its thread pool and execute the parallel workload. Without allow_threads, Rayon's thread spawn attempts would be blocked by the GIL, leading to deadlocks or other unexpected behavior.

]
requires-python = ">=3.9"
dependencies = [
"joblib>=1.3.0",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

Copy link
Collaborator

@Peter554 Peter554 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

I'm a but unsure if allow_threads is needed #236 (comment), but that's only minor

@seddonym seddonym merged commit 39c5c0a into main Aug 22, 2025
18 checks passed
@seddonym seddonym deleted the concurrent-scanning branch August 22, 2025 15:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants