Skip to content

Conversation

@roccomoretti
Copy link
Member

At the recent Bootcamp, several people (namely @LouisaMe09 and @zyajahuggan) expressed interest in the ability to create scorefile info from PyRosetta. Add a convenience function which allows easy creation of JD2-like scorefiles through the PyRosetta interface. (pyrosetta.poses_to_scorefile())

I've also added a function (pyrosetta.io.get_scorefile_info()) which gets what would be reported to the scorefile as a Python dictionary.

Additionally, this PR also cleans up some of the scorefile writing interface at the C++ level.

At the recent Bootcamp, several people expressed interest in the ability to create scorefile info from PyRosetta.
Add a convience function which allows easy creation of JD2-like scorefiles through the PyRosetta interface.
@roccomoretti roccomoretti requested a review from lyskov November 11, 2025 21:17
@roccomoretti roccomoretti added ready_for_review This PR is ready to be reviewed and merged. 90 standard tests labels Nov 11, 2025
@weitzner
Copy link
Member

IIRC, there are similar functions in the pyrosetta.distributed module. May be nice to make sure they do the same thing/call the same code under the hood.

@roccomoretti
Copy link
Member Author

IIRC, there are similar functions in the pyrosetta.distributed module. May be nice to make sure they do the same thing/call the same code under the hood.

From what I can tell from a quick grep through the code, the use case for pyrosetta.distributed scorefile handling is quite a bit more complicated, and doesn't seem to be aimed at a simple scorefile output. I'm not sure how much overlap there is. (Thoughts, @klimaj ?)

Comment on lines +100 to +101
lenghts = lengths[-n_score:]
self.assertEqual( min(lenghts), max(lengths) )
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a typo for lengths

Comment on lines +140 to +141
pyrosetta.rosetta.core.pose.setPoseExraScore(pose_clone, "extra_real", 3.14159 )
pyrosetta.rosetta.core.pose.setPoseExraScore(pose_clone, "extra_string", "TAG_VALUE" )
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be setPoseExtraScore, just missing the t

@lyskov
Copy link
Member

lyskov commented Nov 12, 2025

Probably a bit outside the exact scope of this PR, but it feels like a good moment to bring up:

One thing I’ve repeatedly found myself doing over the years is parsing Rosetta score files into a more structured format—usually JSON. Maybe when we write a score file, we could also write a companion .json file containing the same data in a dictionary-style structure (e.g., top-level keys as structure names, each mapping to a dict of score terms → values).

Thoughts?

@klimaj
Copy link
Member

klimaj commented Nov 12, 2025

There are quite a few mechanisms to get a scores dictionary in the pyrosetta.distributed module, but I don't think they actually dump them to disk (which can easily be accomplished with the json module, but this PR intends to also support JD2-style scorefiles, so I think it's a bit different). But also note that the pyrosetta.distributed methods are obtaining scores from the (recently added) Pose.cache dictionary, whereas this PR is obtaining scores from:

  1. rosetta.protocols.jd2.get_string_real_pairs_from_current_job()
  2. rosetta.protocols.jd2.get_string_string_pairs_from_current_job()
  • These are really good to see added, because existing scorefile machinery does not include this data, to which some protocols [if I'm not mistaken, like InterfaceAnalyzerMover and ShapeComplementarityFilter] write data
  1. rosetta.core.io.raw_data.ScoreMap.add_arbitrary_score_data_from_pose()
  2. rosetta.core.io.raw_data.ScoreMap.add_arbitrary_string_data_from_pose()
  • I do not recommend using these methods, since the data is prone to being clobbered silently -- not only can data get clobbered within each method itself, but it looks like data can get clobbered in the scorefile_info() subfunction in this PR. For this reason, the Pose.cache dictionary has been developed to warn the user about any clobbered data. I've outlined the data override precedences here, here, and here.

I think this PR is similar to the idea of the PyJobDistributor's output_scorefile function, however that is using another method that is prone to clobbering.

Finally, PyRosettaCluster has it's own job distributor code, which dumps a scorefile after distributing tasks, and is somewhat dissimilar to the motive of this PR (and is more complex than the implementation in this PR).

@klimaj
Copy link
Member

klimaj commented Nov 12, 2025

As another quick comment, arbitrary python types can now be serialized into strings using the Pose.cache dictionary (for example, pose.cache["foo"] = complex(1, 2)), but this PR (in its current form) does not deserialize them using the Pose.cache machinery or by implementing the PoseScoreSerializer.maybe_decode(value) method separately.

@roccomoretti instead of retrieving data from rosetta.core.io.raw_data.ScoreMap.add_arbitrary_score_data_from_pose() and rosetta.core.io.raw_data.ScoreMap.add_arbitrary_string_data_from_pose(), you may want to consider retrieving all data from Pose.cache in a single call, which retrieves that data with appropriate clobber warnings and does data deserialization automatically. Also note that your implementation of rosetta.core.io.raw_data.ScoreMap.add_energies_data_from_scored_pose() would also not be necessary since the Pose.cache dictionary retrieves pose energy data from pose.energies().active_total_energies().items(), which I think is the same data (but you'd have to double check).

@roccomoretti
Copy link
Member Author

One thing I’ve repeatedly found myself doing over the years is parsing Rosetta score files into a more structured format—usually JSON.

Jared added the ability to make JSON-formatted scorefiles from command-line Rosetta, via the -out:file:scorefile_format json option.

This PR uses that framework to allow you to output either the conventional (default) or JSON formats (with use_json=True). There's also the convenience function to get the same data as a Python dict.

bool use_json
);

/// @brief write the given data to a scorefile, autodetecting the format.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"autodetecting the format" sounds a bit misleading to me, - maybe "use command line option to determine output format"?

}

return write_scorefile(tag, score_map, string_map, use_json);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to specify options to dump both "classic" and JSON score files? Rationale is this: JSON could be used down the line as input for tools but plain text is more readable for humans.

As I mentioned in previous comments: maybe dumping JSON version unconditionally is preferable from practical point of view?

Comment on lines 61 to 71
poses_to_scorefile,
dump_file,
dump_scored_pdb,
dump_pdb,
dump_multimodel_pdb,
dump_cif,
dump_mmtf,
create_score_function,
get_fa_scorefxn,
get_score_function,
get_scorefile_info,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really want to import these by default? I’m open to it if we genuinely expect everyone to need them, but otherwise it’s probably better not to expand the default imports.

My hesitation comes from past experience — we’ve been burned by this approach before: over time, people start relying on these defaults, and it makes refactoring much harder.

(Also: https://xkcd.com/1172/)

except ImportError:
_skip_xz = True


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add the standard lines we include in all PyRosetta tests?
They ensure that no unexpected output accumulates and that runs are deterministic.
(I know the test code already calls init(), but in this case uniformity is probably preferable to purity.)

init(extra_options = "-constant_seed")  # WARNING: option '-constant_seed' is for testing only! MAKE SURE TO REMOVE IT IN PRODUCTION RUNS!!!!!
import os; os.chdir('.test.output')

Thanks,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

90 standard tests ready_for_review This PR is ready to be reviewed and merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants