Skip to content

Conversation

@rainwoodman
Copy link

The old loop was O(mn). This is O(m+n).

The old loop was O(mn). This is O(m+n).
Copy link
Owner

@mgedmin mgedmin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's hard for me to make decisions about this function since I don't use it myself. I except that for small address sets naive iteration can be faster (and should use less memory), but I don't know what the expected average size of the address set is.

Also the KeyError thing -- perhaps it makes sense, but old behavior was to suppress errors, and changing it this way would probably require a major version bump according to SemVer. Again, I don't know what makes more sense to users, as I'm not one.

id_to_obj = dict((id(o), o) for o in gc.get_objects())

for i in address_set:
o = id_to_obj[i]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can raise KeyError.

@rainwoodman
Copy link
Author

As there is no hard copy, the memory used by the dict should be close to 16 * number of GC objects. GC itself probably consumes as much memory as this.

I did not benchmark this in typical use cases (I only ran objgraph on small cases), though I can imagine it is difficult to find cases where O(m+n) is slower than O(mn).

For a stress test, one can probably collect new IDs with debug level of SAVEALL, and run this on the full id list.

Copy link
Owner

@mgedmin mgedmin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eh, LGTM.

Would you care to add a small changelog note in CHANGES.rst?

for o in gc.get_objects():
if id(o) in address_set:
res.append(o)
id_to_obj = dict((id(o), o) for o in gc.get_objects())

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest

id_to_obj = {id(o): o for o in gc.get_objects()}

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

res.append(o)
id_to_obj = dict((id(o), o) for o in gc.get_objects())

for i in address_set:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest change these lines to

return [
    id_to_obj[i] 
    for i in address_set
    if i in id_to_obj
]

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 except I'd put it all on a single line

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants