Skip to content

Conversation

@Peter554
Copy link
Collaborator

@Peter554 Peter554 commented Jan 17, 2025

This PR contains a rewrite of the import graph in rust. A lot of the code in this PR is ported from https://github.com/Peter554/pyimports.

Performance tricks in this PR:

  • I've restructured the graph. There are no strings anymore - the graph is just ints (and tokens, which are pairs of ints).
  • I've tried to reduce cloning as much as possible, especially of the entire graph.
  • I've found some methods for more efficient pathfinding.
    • Do one BFS with multiple start nodes rather than a for loop of many BFS. This is useful in e.g. find_downstream_modules, chain_exists.
    • The pathfinding algorithm allows us to dynamically exclude imports, without having to clone and mutate the entire graph. This is useful in e.g. find_shortest_chains and find_illegal_dependencies_for_layers.
  • I noticed that we were doing if module in self.modules in the python code, sometimes in a for loop. This is actually rather inefficient, since the entire graph has to be copied over from rust to a python set (every time). I added graph.contains_module as a more performant alternative. We should ideally find a way to avoid others making this same mistake. At the very least, document this. Perhaps also make it a method rather than a property e.g. graph.modules(), to show more clearly that work is being done (breaking change, but perhaps reasonable).
  • I've short circuited the implementation of find_shortest_chains and find_illegal_dependencies_for_layers in the case that the contract is kept (via an initial chain_exists check). This should give significant benefits for users running import linter in CI, since most of the time most contracts should be kept - it makes sense to optimise for this case.
  • Bidirectional shortest path algorithm.

@codspeed-hq
Copy link

codspeed-hq bot commented Jan 17, 2025

CodSpeed Performance Report

Merging #175 will degrade performances by 96.65%

Comparing Peter554:peter.rust-graph (fbeb020) with master (0523253)

Summary

⚡ 11 improvements
❌ 1 regressions
✅ 2 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark BASE HEAD Change
test_deep_layers_large_graph_kept 1,470.3 ms 19.2 ms ×77
test_deep_layers_large_graph_violated 1,529.7 ms 11.6 ms ×130
test_top_level_large_graph_kept 765.9 ms 48.2 ms ×16
test_top_level_large_graph_violated 785.2 ms 236.2 ms ×3.3
test_no_chain 39.7 µs 1,185.9 µs -96.65%
test_chains_found 127,986.7 µs 84.6 µs ×1,500
test_no_chains 128.7 ms 1.2 ms ×110
test_build_django_from_cache 225.3 ms 134.5 ms +67.48%
test_copy_graph 80.4 ms 43.1 ms +86.56%
test_find_descendants 72.6 ms 41.4 ms +75.21%
test_find_downstream_modules 2,488,785 µs 187.2 µs ×13,000
test_find_upstream_modules 1,828.9 ms 3.9 ms ×460

@Peter554 Peter554 changed the base branch from master to rust-graph January 17, 2025 23:52
@Peter554 Peter554 force-pushed the peter.rust-graph branch 8 times, most recently from 947e083 to 4f71bc0 Compare January 18, 2025 08:02
@Peter554 Peter554 mentioned this pull request Jan 18, 2025
@Peter554 Peter554 changed the base branch from rust-graph to master January 18, 2025 16:13
@Peter554 Peter554 force-pushed the peter.rust-graph branch 16 times, most recently from 8a78c75 to 835dded Compare January 20, 2025 15:57
@Peter554 Peter554 force-pushed the peter.rust-graph branch 8 times, most recently from ba5d947 to 2f7ce49 Compare January 22, 2025 19:45
@Peter554 Peter554 changed the base branch from master to rust-graph January 22, 2025 19:58
@Peter554 Peter554 changed the base branch from rust-graph to master January 22, 2025 19:58
@Peter554 Peter554 marked this pull request as ready for review January 22, 2025 20:03
@Peter554 Peter554 force-pushed the peter.rust-graph branch 3 times, most recently from b278283 to ebe4fdb Compare January 25, 2025 08:03
@Peter554 Peter554 force-pushed the peter.rust-graph branch 3 times, most recently from 8e62e75 to fbeb020 Compare February 2, 2025 19:46
@Peter554 Peter554 changed the base branch from master to rust-graph February 6, 2025 11:30
@seddonym seddonym changed the base branch from rust-graph to master February 6, 2025 11:31
@seddonym seddonym changed the base branch from master to rust-graph February 6, 2025 11:31
@Peter554 Peter554 force-pushed the peter.rust-graph branch 2 times, most recently from 2fe004a to 216dee2 Compare February 6, 2025 11:39
pytest-benchmark will actually determine a suitable number
of rounds+iterations automatically (typically much more than 3!).
With only 3 rounds the results are quite noisy.
Copy link
Collaborator

@seddonym seddonym left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving some initial comments but I've still got lots to read through and digest.


pub type GrimpResult<T> = Result<T, GrimpError>;

impl From<GrimpError> for PyErr {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-blocking, but feels like we have a mixture of Python concerns and pure Rust concerns in this module. I think there's benefit in keeping them separate - e.g. to keep lib.rs for the py03 stuff and then everything else is pure Rust.

What do you think about moving this implementation either to exceptions.rs or to lib.rs? (Will that even compile?) We could leave the definition of GrimpError and GrimpResult here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No strong feeling about this one. Happy to change it.

}

impl GraphWrapper {
fn get_visible_module_by_name(&self, name: &str) -> Result<&Module, GrimpError> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why we have this on the GraphWrapper, rather than on the inner Graph?

I envisage GraphWrapper as purely being a translation layer from py03 concerns to pure Rust.

Copy link
Collaborator Author

@Peter554 Peter554 Feb 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it was the "visible" part that had me putting this here. Currently the methods within graph/ return modules whether they are visible or not. The filtering to visible/invisible is kept as a concern for lib.rs.

That said, we have the visible() method on ModuleIterator within graph/, so maybe the division isn't as clear as I'd imagined.

Tldr; I'm happy for it to be moved, but no strong feeling.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, I think we should keep GraphWrapper focused on py03 concerns and try to move everything else into the Graph object.

use string_interner::backend::StringBackend;
use string_interner::{DefaultSymbol, StringInterner};

pub mod direct_import_queries;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason for making these pub mod but the pathfinding module pub(crate) mod? Would it be better to be consistent here?

Copy link
Collaborator Author

@Peter554 Peter554 Feb 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good spot. I think I was imaging that if the rust crate was published on its own then all the methods of Graph would be public (pub), but the pathfinding module would probably be internal to the crate (pub(crate)). That said, I don't think we ever envisage the rust code to be published independently, so probably it could all just be pub for consistency.


pub(crate) mod pathfinding;

lazy_static! {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO - let's talk through this so I understand the reasoning behind the approach.

token: ModuleToken,

#[getset(get_copy = "pub")]
name: DefaultSymbol,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This, and the token field, are both effectively tokens right? I wonder if it is possible to combine them.

impl<'a, T: Iterator<Item = &'a ModuleToken>> ModuleTokenIterator<'a> for T {}

#[derive(Debug, Clone, PartialEq, Eq, Hash, new, Getters, CopyGetters)]
pub struct ImportDetails {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is maybe using slightly different terminology to https://grimp.readthedocs.io/en/stable/usage.html#ImportGraph.get_import_details. I'm not massively happy with the name I chose for that method originally. But I think it might be worth coming up with a non-plural name for the struct here as it's more ergonomic. ImportMetadata?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes true your suggestion is better 👍 Let's change it.

}

pub fn get_modules(&self) -> FxHashSet<String> {
pub fn get_modules(&self) -> HashSet<String> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason for this change?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting I forgot about this one. I thought it was necessary since FxHashSet doesn't implement IntoPy, but presumably your code was working before so I'm now unsure 🤔

.get_modules()
.iter()
.map(|module| module.name.clone())
.all_modules()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason why we're renaming this method? Maybe easier to have the same name as on the wrapper?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to change this - I think it's a copy over from pyimports.

._graph
.module_name_to_self_and_ancestors(module)
.into_iter()
.skip(1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain this line?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The skip(1) removes the self, leaving just the ancestors. I'll comment that in the code?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good!

@seddonym seddonym merged commit cb3e1fa into python-grimp:rust-graph Feb 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants