Add scorefile output convieniece function for PyRosetta #563

roccomoretti · 2025-11-11T21:17:03Z

At the recent Bootcamp, several people (namely @LouisaMe09 and @zyajahuggan) expressed interest in the ability to create scorefile info from PyRosetta. Add a convenience function which allows easy creation of JD2-like scorefiles through the PyRosetta interface. (pyrosetta.poses_to_scorefile())

I've also added a function (pyrosetta.io.get_scorefile_info()) which gets what would be reported to the scorefile as a Python dictionary.

Additionally, this PR also cleans up some of the scorefile writing interface at the C++ level.

At the recent Bootcamp, several people expressed interest in the ability to create scorefile info from PyRosetta. Add a convience function which allows easy creation of JD2-like scorefiles through the PyRosetta interface.

weitzner · 2025-11-12T15:57:38Z

IIRC, there are similar functions in the pyrosetta.distributed module. May be nice to make sure they do the same thing/call the same code under the hood.

roccomoretti · 2025-11-12T16:38:00Z

IIRC, there are similar functions in the pyrosetta.distributed module. May be nice to make sure they do the same thing/call the same code under the hood.

From what I can tell from a quick grep through the code, the use case for pyrosetta.distributed scorefile handling is quite a bit more complicated, and doesn't seem to be aimed at a simple scorefile output. I'm not sure how much overlap there is. (Thoughts, @klimaj ?)

woodsh17 · 2025-11-12T15:54:35Z

source/src/python/PyRosetta/src/test/T011_SaveIO.py

+        lenghts = lengths[-n_score:]
+        self.assertEqual( min(lenghts), max(lengths) )


Looks like a typo for lengths

woodsh17 · 2025-11-12T15:58:54Z

source/src/python/PyRosetta/src/test/T011_SaveIO.py

+        pyrosetta.rosetta.core.pose.setPoseExraScore(pose_clone, "extra_real", 3.14159 )
+        pyrosetta.rosetta.core.pose.setPoseExraScore(pose_clone, "extra_string", "TAG_VALUE" )


Should be setPoseExtraScore, just missing the t

lyskov · 2025-11-12T20:26:44Z

Probably a bit outside the exact scope of this PR, but it feels like a good moment to bring up:

One thing I’ve repeatedly found myself doing over the years is parsing Rosetta score files into a more structured format—usually JSON. Maybe when we write a score file, we could also write a companion .json file containing the same data in a dictionary-style structure (e.g., top-level keys as structure names, each mapping to a dict of score terms → values).

Thoughts?

klimaj · 2025-11-12T20:38:49Z

There are quite a few mechanisms to get a scores dictionary in the pyrosetta.distributed module, but I don't think they actually dump them to disk (which can easily be accomplished with the json module, but this PR intends to also support JD2-style scorefiles, so I think it's a bit different). But also note that the pyrosetta.distributed methods are obtaining scores from the (recently added) Pose.cache dictionary, whereas this PR is obtaining scores from:

rosetta.protocols.jd2.get_string_real_pairs_from_current_job()
rosetta.protocols.jd2.get_string_string_pairs_from_current_job()

These are really good to see added, because existing scorefile machinery does not include this data, to which some protocols [if I'm not mistaken, like InterfaceAnalyzerMover and ShapeComplementarityFilter] write data

rosetta.core.io.raw_data.ScoreMap.add_arbitrary_score_data_from_pose()
rosetta.core.io.raw_data.ScoreMap.add_arbitrary_string_data_from_pose()

I do not recommend using these methods, since the data is prone to being clobbered silently -- not only can data get clobbered within each method itself, but it looks like data can get clobbered in the scorefile_info() subfunction in this PR. For this reason, the Pose.cache dictionary has been developed to warn the user about any clobbered data. I've outlined the data override precedences here, here, and here.

I think this PR is similar to the idea of the PyJobDistributor's output_scorefile function, however that is using another method that is prone to clobbering.

Finally, PyRosettaCluster has it's own job distributor code, which dumps a scorefile after distributing tasks, and is somewhat dissimilar to the motive of this PR (and is more complex than the implementation in this PR).

klimaj · 2025-11-12T20:40:40Z

As another quick comment, arbitrary python types can now be serialized into strings using the Pose.cache dictionary (for example, pose.cache["foo"] = complex(1, 2)), but this PR (in its current form) does not deserialize them using the Pose.cache machinery or by implementing the PoseScoreSerializer.maybe_decode(value) method separately.

@roccomoretti instead of retrieving data from rosetta.core.io.raw_data.ScoreMap.add_arbitrary_score_data_from_pose() and rosetta.core.io.raw_data.ScoreMap.add_arbitrary_string_data_from_pose(), you may want to consider retrieving all data from Pose.cache in a single call, which retrieves that data with appropriate clobber warnings and does data deserialization automatically. Also note that your implementation of rosetta.core.io.raw_data.ScoreMap.add_energies_data_from_scored_pose() would also not be necessary since the Pose.cache dictionary retrieves pose energy data from pose.energies().active_total_energies().items(), which I think is the same data (but you'd have to double check).

roccomoretti · 2025-11-12T20:41:46Z

One thing I’ve repeatedly found myself doing over the years is parsing Rosetta score files into a more structured format—usually JSON.

Jared added the ability to make JSON-formatted scorefiles from command-line Rosetta, via the -out:file:scorefile_format json option.

This PR uses that framework to allow you to output either the conventional (default) or JSON formats (with use_json=True). There's also the convenience function to get the same data as a Python dict.

lyskov · 2025-11-12T20:36:47Z

source/src/core/io/raw_data/ScoreFileData.hh

+		bool use_json
+	);
+
+	/// @brief write the given data to a scorefile, autodetecting the format.


"autodetecting the format" sounds a bit misleading to me, - maybe "use command line option to determine output format"?

lyskov · 2025-11-12T20:37:32Z

source/src/core/io/raw_data/ScoreFileData.cc

+	}
+
+	return write_scorefile(tag, score_map, string_map, use_json);
+}


Would it be possible to specify options to dump both "classic" and JSON score files? Rationale is this: JSON could be used down the line as input for tools but plain text is more readable for humans.

As I mentioned in previous comments: maybe dumping JSON version unconditionally is preferable from practical point of view?

lyskov · 2025-11-12T20:44:50Z

source/src/python/PyRosetta/src/pyrosetta/__init__.py

+    poses_to_scorefile,
    dump_file,
    dump_scored_pdb,
    dump_pdb,
    dump_multimodel_pdb,
    dump_cif,
    dump_mmtf,
    create_score_function,
    get_fa_scorefxn,
    get_score_function,
+    get_scorefile_info,


Do we really want to import these by default? I’m open to it if we genuinely expect everyone to need them, but otherwise it’s probably better not to expand the default imports.

My hesitation comes from past experience — we’ve been burned by this approach before: over time, people start relying on these defaults, and it makes refactoring much harder.

(Also: https://xkcd.com/1172/)

lyskov · 2025-11-12T20:51:23Z

source/src/python/PyRosetta/src/test/T011_SaveIO.py

+except ImportError:
+    _skip_xz = True
+
+


Could you please add the standard lines we include in all PyRosetta tests?
They ensure that no unexpected output accumulates and that runs are deterministic.
(I know the test code already calls init(), but in this case uniformity is probably preferable to purity.)

init(extra_options = "-constant_seed") # WARNING: option '-constant_seed' is for testing only! MAKE SURE TO REMOVE IT IN PRODUCTION RUNS!!!!! import os; os.chdir('.test.output')

Thanks,

roccomoretti added 2 commits November 11, 2025 15:01

Add scorefile output convieniece function.

e620da6

At the recent Bootcamp, several people expressed interest in the ability to create scorefile info from PyRosetta. Add a convience function which allows easy creation of JD2-like scorefiles through the PyRosetta interface.

Add a within-Python getter of scorefunction info.

301523d

roccomoretti requested a review from lyskov November 11, 2025 21:17

roccomoretti added ready_for_review This PR is ready to be reviewed and merged. 90 standard tests labels Nov 11, 2025

woodsh17 reviewed Nov 12, 2025

View reviewed changes

lyskov reviewed Nov 12, 2025

View reviewed changes

		lenghts = lengths[-n_score:]
		self.assertEqual( min(lenghts), max(lengths) )

		pyrosetta.rosetta.core.pose.setPoseExraScore(pose_clone, "extra_real", 3.14159 )
		pyrosetta.rosetta.core.pose.setPoseExraScore(pose_clone, "extra_string", "TAG_VALUE" )

Add scorefile output convieniece function for PyRosetta #563

Are you sure you want to change the base?

Add scorefile output convieniece function for PyRosetta #563

Uh oh!

Conversation

roccomoretti commented Nov 11, 2025

Uh oh!

weitzner commented Nov 12, 2025

Uh oh!

roccomoretti commented Nov 12, 2025

Uh oh!

woodsh17 Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

woodsh17 Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

lyskov commented Nov 12, 2025

Uh oh!

klimaj commented Nov 12, 2025

Uh oh!

klimaj commented Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

roccomoretti commented Nov 12, 2025

Uh oh!

lyskov Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

lyskov Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

lyskov Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

lyskov Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

klimaj commented Nov 12, 2025 •

edited

Loading