-
Notifications
You must be signed in to change notification settings - Fork 1
Split out utils #272
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Split out utils #272
Conversation
Reviewer's GuideThis PR restructures the codebase by extracting and centralizing helper utilities: it enriches the train_and_evaluate and evaluate_only workflows with non-blocking cache queries and orchestration functions, and introduces a new mozoo.experiments.utils package that provides configuration management, display formatting, results handling, and path utilities for experiments. Sequence diagram for non-blocking model cache lookup in train_and_evaluate workflowsequenceDiagram
participant "Caller"
participant "TrainAndEvaluateConfig"
participant "StageCache"
participant "TrainingJobAtom"
participant "ModelAtom"
"Caller"->>"TrainAndEvaluateConfig": from_dict(config)
"Caller"->>"StageCache": get("prepare_dataset", config, {})
"StageCache"-->>"Caller": cached_dataset_state
"Caller"->>"StageCache": get("submit_training", config, {prepared_dataset})
"StageCache"-->>"Caller": cached_submit_state
"Caller"->>"TrainingJobAtom": load(job_atom_id)
"TrainingJobAtom"-->>"Caller": job_atom
"Caller"->>"TrainingJobAtom": get_status()
"TrainingJobAtom"-->>"Caller": job_status
"Caller"->>"StageCache": get("wait_for_training", config, {training_job})
"StageCache"-->>"Caller": cached_wait_state
"Caller"->>"ModelAtom": load(model_atom_id)
"ModelAtom"-->>"Caller": model_atom
"Caller"-->>"Caller": return model_atom_id/status_message
Sequence diagram for evaluating a model on a task using evaluate_only workflowsequenceDiagram
participant "Caller"
participant "EvaluateOnlyConfig"
participant "evaluate_only_workflow"
"Caller"->>"EvaluateOnlyConfig": from_dict(config)
"Caller"->>"evaluate_only_workflow": run_workflow(config, user)
"evaluate_only_workflow"-->>"Caller": result
"Caller"->>"Caller": get_step_state("evaluate_model")
"Caller"-->>"Caller": return eval_results_atom_id
Class diagram for ExperimentPaths utility classclassDiagram
class ExperimentPaths {
- experiment_dir: Path
+ __init__(experiment_dir: Path)
+ config_file: Path
+ results_file: Path
+ training_results_file: Path
+ plots_dir: Path
}
Class diagram for results management utilitiesclassDiagram
class results {
+ save_results(results: list[dict], path: Path)
+ load_results(path: Path) list[dict]
+ results_to_dataframe(results: list[dict]) pd.DataFrame
}
Class diagram for configuration management utilitiesclassDiagram
class config {
+ get_experiment_dir(script_path: Path) Path
+ setup_experiment_env(experiment_dir: Path)
+ load_experiment_config(experiment_dir: Path, config_file: str) dict
}
Class diagram for display formatting utilitiesclassDiagram
class display {
+ print_section(title: str, width: int)
+ print_subsection(title: str, width: int)
+ print_config_summary(config: dict)
+ print_progress(current: int, total: int, item_name: str)
}
Class diagram for new workflow utility functionsclassDiagram
class train_and_evaluate {
+ find_model_from_cache(variant_config: dict, training_config: dict, ...)
+ train_variant(variant: dict, training_config: dict, ...)
+ check_training_status(variant_config: dict, training_config: dict, ...)
}
class evaluate_only {
+ evaluate_model_on_task(model_atom_id: str, model_id: str, eval_task: str, eval_config: dict, ...)
}
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughThe changes expand the motools.workflows public API with four new helper functions ( Changes
Sequence DiagramssequenceDiagram
participant Caller
participant evaluate_model_on_task
participant run_workflow
participant Workflow
Caller->>evaluate_model_on_task: model_atom_id, model_id, eval_task, eval_config
activate evaluate_model_on_task
Note over evaluate_model_on_task: Build EvaluateOnlyConfig<br/>with prepared_model, prepared_task,<br/>evaluate_model steps
evaluate_model_on_task->>run_workflow: Execute workflow
activate run_workflow
run_workflow->>Workflow: Execute steps
Workflow-->>run_workflow: workflow_results
run_workflow-->>evaluate_model_on_task: results
deactivate run_workflow
Note over evaluate_model_on_task: Extract eval_results<br/>from evaluate_model step
evaluate_model_on_task->>Caller: return eval_results
deactivate evaluate_model_on_task
sequenceDiagram
participant Caller
participant train_variant
participant run_workflow
participant Workflow
Caller->>train_variant: variant, training_config
activate train_variant
Note over train_variant: Build TrainAndEvaluateConfig<br/>with training stages only<br/>(prepare_dataset, submit_training, wait)
train_variant->>run_workflow: Execute partial workflow
activate run_workflow
run_workflow->>Workflow: Execute training steps
Workflow-->>run_workflow: workflow_results
run_workflow-->>train_variant: results
deactivate run_workflow
Note over train_variant: Extract model_atom<br/>and model_id from results
train_variant->>Caller: return {model_id, model_atom, ...}
deactivate train_variant
sequenceDiagram
participant Caller
participant find_model_from_cache
participant StageCache
Caller->>find_model_from_cache: variant_config, training_config
activate find_model_from_cache
Note over find_model_from_cache: Check prepare_dataset<br/>in cache
find_model_from_cache->>StageCache: Query step result
StageCache-->>find_model_from_cache: result or None
alt Dataset Missing
find_model_from_cache->>Caller: return (None, "dataset_missing")
end
Note over find_model_from_cache: Check submit_training<br/>in cache
find_model_from_cache->>StageCache: Query step result
StageCache-->>find_model_from_cache: result or None
alt Training Not Submitted
find_model_from_cache->>Caller: return (None, "training_not_submitted")
end
Note over find_model_from_cache: Check wait_for_training<br/>status in cache
find_model_from_cache->>StageCache: Query training status
StageCache-->>find_model_from_cache: status
alt Training In Progress
find_model_from_cache->>Caller: return (None, "training_in_progress")
else Training Complete
Note over find_model_from_cache: Extract model_id and<br/>training_job_atom
find_model_from_cache->>Caller: return (model_id, "success")
end
deactivate find_model_from_cache
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes
Possibly related PRs
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes - here's some feedback:
- There’s a lot of repeated TrainAndEvaluateConfig.from_dict logic across the new workflow utilities—consider extracting a shared helper to DRY up the config construction and reduce drift.
- The hardcoded dummy_task_loader default appears in multiple function signatures; extracting it into a single module‐level constant or configuration entry would improve consistency and make updates easier.
- Many utility functions catch broad Exception; narrowing these to specific error types or at least logging the full traceback could make debugging unexpected failures much easier.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- There’s a lot of repeated TrainAndEvaluateConfig.from_dict logic across the new workflow utilities—consider extracting a shared helper to DRY up the config construction and reduce drift.
- The hardcoded dummy_task_loader default appears in multiple function signatures; extracting it into a single module‐level constant or configuration entry would improve consistency and make updates easier.
- Many utility functions catch broad Exception; narrowing these to specific error types or at least logging the full traceback could make debugging unexpected failures much easier.
## Individual Comments
### Comment 1
<location> `mozoo/experiments/utils/paths.py:39` </location>
<code_context>
+ def plots_dir(self) -> Path:
+ """Path to plots directory (creates if doesn't exist)."""
+ plots_dir = self.experiment_dir / "plots"
+ plots_dir.mkdir(exist_ok=True)
+ return plots_dir
</code_context>
<issue_to_address>
**suggestion:** Consider using exist_ok=True and parents=True for mkdir to handle nested experiment directories.
Using parents=True ensures all parent directories are created, improving robustness if experiment_dir or its parents are missing.
```suggestion
plots_dir.mkdir(exist_ok=True, parents=True)
```
</issue_to_address>
### Comment 2
<location> `mozoo/experiments/utils/results.py:75` </location>
<code_context>
def results_to_dataframe(results: list[dict[str, Any]]) -> pd.DataFrame:
"""Convert results to a pandas DataFrame for analysis.
Args:
results: List of evaluation results
Returns:
DataFrame with columns: variant_name, trait, strength, task, metric_name, metric_value
Example:
>>> df = results_to_dataframe(results)
>>> df.groupby("trait")["value"].mean()
"""
rows = []
for result in results:
variant_name = result["variant_name"]
trait = result.get("trait")
strength = result.get("strength")
model_id = result.get("model_id")
for task_name, task_result in result.get("evaluations", {}).items():
if "error" in task_result:
continue
metrics = task_result.get("metrics", {})
# Handle case where metrics dict directly contains "mean" and "stderr"
if set(metrics.keys()) == {"mean", "stderr"} or set(metrics.keys()) == {"mean"}:
metric_name = task_name
value = metrics.get("mean")
stderr = metrics.get("stderr", 0)
rows.append(
{
"variant_name": variant_name,
"trait": trait,
"strength": strength,
"model_id": model_id,
"task": task_name,
"metric": metric_name,
"value": value,
"stderr": stderr,
}
)
else:
# Normal case: metrics dict contains metric names as keys
for metric_name, metric_value in metrics.items():
if isinstance(metric_value, dict):
if "mean" in metric_value:
value = metric_value["mean"]
stderr = metric_value.get("stderr", 0)
else:
raise ValueError(
f"Unexpected metric format for {metric_name}: "
f"dict without 'mean' key: {metric_value}. "
f"Expected dict with 'mean' (and optionally 'stderr') or a numeric value."
)
else:
value = metric_value
stderr = 0
rows.append(
{
"variant_name": variant_name,
"trait": trait,
"strength": strength,
"model_id": model_id,
"task": task_name,
"metric": metric_name,
"value": value,
"stderr": stderr,
}
)
return pd.DataFrame(rows)
</code_context>
<issue_to_address>
**issue (code-quality):** We've found these issues:
- Replace multiple comparisons of same variable with `in` operator ([`merge-comparisons`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/merge-comparisons/))
- Swap if/else branches ([`swap-if-else-branches`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/swap-if-else-branches/))
- Remove unnecessary else after guard condition ([`remove-unnecessary-else`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/remove-unnecessary-else/))
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (9)
mozoo/experiments/utils/results.py (3)
10-21: Consider more explicit serialization handling.The
default=strfallback may silently convert non-serializable objects (e.g., datetime, Path, custom classes) in potentially unexpected ways. While convenient, this can make debugging serialization issues harder.Consider one of these approaches:
- Use a custom serializer that explicitly handles expected types
- Add a docstring note documenting the str-conversion behavior
- Log a warning when the fallback is used (though this requires hooking into json.dump)
Example custom serializer:
def _serialize_default(obj: Any) -> str: """Convert non-serializable objects to strings.""" if isinstance(obj, (Path, datetime)): return str(obj) # Add other expected types here return str(obj)Then use
json.dump(results, f, indent=2, default=_serialize_default)
75-91: Consider relaxing the exact set equality check.The condition
set(metrics.keys()) == {"mean", "stderr"}uses exact equality, meaning a metrics dict with additional keys (e.g.,{"mean": 0.5, "stderr": 0.1, "n_samples": 100}) would fall through to the else branch and be processed differently. This could be surprising.Consider either:
- Documenting this exact-match behavior in a comment
- Changing to subset check if the intent is to detect the "compact" format:
{"mean", "stderr"}.issuperset(metrics.keys())Example:
- # Handle case where metrics dict directly contains "mean" and "stderr" - if set(metrics.keys()) == {"mean", "stderr"} or set(metrics.keys()) == {"mean"}: + # Handle compact format: metrics dict contains only "mean" (and optionally "stderr") + # Note: This check requires exact key match - additional keys will use normal format + if set(metrics.keys()) == {"mean", "stderr"} or set(metrics.keys()) == {"mean"}:
78-78: Consider using None instead of 0 for missing stderr.When stderr is not present in the data, defaulting to
0might be misleading as it suggests a measured stderr of zero rather than missing data. UsingNoneorpd.NAwould be more semantically accurate.Apply this diff:
- stderr = metrics.get("stderr", 0) + stderr = metrics.get("stderr", None)And:
- stderr = metric_value.get("stderr", 0) + stderr = metric_value.get("stderr", None)And:
- stderr = 0 + stderr = NoneAlso applies to: 98-98
mozoo/experiments/utils/display.py (2)
37-61: Hardcoded filename in output.Line 50 prints
Config file: config.yamlbut this string is hardcoded. If a caller uses a different config file name (via theload_experiment_configparameter), this output would be misleading.Consider either:
- Removing the hardcoded filename line
- Adding a parameter to accept the actual config file path/name
- Adding a comment noting this assumes the default config.yaml
Example with parameter:
-def print_config_summary(config: dict[str, Any]) -> None: +def print_config_summary(config: dict[str, Any], config_file: str = "config.yaml") -> None: """Print a summary of experiment configuration. Args: config: Configuration dictionary + config_file: Name of the config file (default: "config.yaml")Then:
- print(" Config file: config.yaml") + print(f" Config file: {config_file}")
64-76: Naive pluralization in progress output.Line 76 always appends 's' to
item_name, which works for most English nouns but the capitalized prefix on the same line doesn't match the example in the docstring. The docstring example showsTraining: 1/5 modelsbut the actual output would beModel: 1/5 models.The implementation doesn't match the docstring example. The current code produces:
Model: 1/5 modelsBut the docstring example shows:
Training: 1/5 modelsConsider clarifying the docstring or adjusting the implementation to match intended usage.
mozoo/experiments/utils/paths.py (1)
35-40: Property with side effect may be surprising.The
plots_dirproperty creates the directory on access (line 39), which is a side effect that may be surprising for a property getter. Users might not expect a simple property access to perform filesystem operations.Consider one of these approaches:
- Document the side effect clearly (minimal change):
@property def plots_dir(self) -> Path: - """Path to plots directory (creates if doesn't exist).""" + """Path to plots directory. + + Note: Automatically creates the directory on first access if it doesn't exist. + """
- Move creation to a method (more explicit):
@property def plots_dir(self) -> Path: """Path to plots directory.""" return self.experiment_dir / "plots" def ensure_plots_dir(self) -> Path: """Ensure plots directory exists and return path.""" plots_dir = self.plots_dir plots_dir.mkdir(exist_ok=True) return plots_dir
- Use cached_property to avoid repeated mkdir calls:
from functools import cached_property @cached_property def plots_dir(self) -> Path: """Path to plots directory (creates if doesn't exist).""" plots_dir = self.experiment_dir / "plots" plots_dir.mkdir(exist_ok=True) return plots_dirmotools/workflows/train_and_evaluate.py (2)
317-322: Skip the dummyprepare_taskstage in training-only runs.The training path never consumes
prepared_task, so running that stage just invokes the dummy loader, creates throwaway atoms, and risks import failures if the placeholder task isn’t present. Dropping it keeps the helper lean and avoids an unnecessary dependency.- training_stages = [ - "prepare_dataset", - "prepare_task", - "submit_training", - "wait_for_training", - ] + training_stages = [ + "prepare_dataset", + "submit_training", + "wait_for_training", + ]
482-487: Broaden the refresh fallback to coverNotImplementedError.Some backends define
refresh()but raiseNotImplementedError(rather than omitting the method). At the moment that bubbles into the outerexceptblock and misreports the status as an error. Treat it like the missing-method case so we still surface a valid status.- try: - await job_atom.refresh() - except AttributeError: + try: + await job_atom.refresh() + except (AttributeError, NotImplementedError):motools/workflows/evaluate_only.py (1)
75-124: Validatemodel_atom_idbefore running the workflow.We accept both the atom ID and the raw
model_id, but never check that they match. A typo slips through silently and we end up evaluating an unexpected model. Load the atom up front and assert the IDs line up so callers get a fast, actionable failure.-from typing import Any +from typing import Any + +from motools.atom import ModelAtom @@ async def evaluate_model_on_task( model_atom_id: str, model_id: str, @@ ) -> str: """Evaluate a model on a specific task using the evaluate_only workflow.""" + model_atom = ModelAtom.load(model_atom_id) + expected_model_id = model_atom.get_model_id() + if expected_model_id != model_id: + raise ValueError( + f"model_id '{model_id}' does not match stored model_id '{expected_model_id}' for atom {model_atom_id}" + ) config = EvaluateOnlyConfig.from_dict(
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
motools/workflows/__init__.py(1 hunks)motools/workflows/evaluate_only.py(3 hunks)motools/workflows/train_and_evaluate.py(3 hunks)mozoo/experiments/utils/__init__.py(1 hunks)mozoo/experiments/utils/config.py(1 hunks)mozoo/experiments/utils/display.py(1 hunks)mozoo/experiments/utils/paths.py(1 hunks)mozoo/experiments/utils/results.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (5)
motools/workflows/__init__.py (2)
motools/workflows/evaluate_only.py (2)
EvaluateOnlyConfig(25-36)evaluate_model_on_task(75-137)motools/workflows/train_and_evaluate.py (4)
TrainAndEvaluateConfig(32-47)check_training_status(350-555)find_model_from_cache(99-238)train_variant(241-347)
mozoo/experiments/utils/config.py (2)
motools/system.py (1)
load_dotenv(12-38)mozoo/experiments/utils/paths.py (1)
config_file(21-23)
motools/workflows/train_and_evaluate.py (4)
motools/atom/base.py (4)
ModelAtom(563-632)TrainingJobAtom(636-773)get_model_id(620-632)_load_training_run(700-739)motools/cache/stage_cache.py (1)
StageCache(39-319)motools/workflow/execution.py (1)
run_workflow(18-56)motools/workflow/state.py (1)
get_step_state(77-89)
motools/workflows/evaluate_only.py (3)
motools/workflow/execution.py (1)
run_workflow(18-56)motools/workflow/config.py (1)
from_dict(62-111)motools/workflow/state.py (1)
get_step_state(77-89)
mozoo/experiments/utils/__init__.py (4)
mozoo/experiments/utils/config.py (3)
get_experiment_dir(10-24)load_experiment_config(46-70)setup_experiment_env(27-43)mozoo/experiments/utils/display.py (4)
print_config_summary(37-61)print_progress(64-76)print_section(6-19)print_subsection(22-34)mozoo/experiments/utils/paths.py (1)
ExperimentPaths(6-40)mozoo/experiments/utils/results.py (3)
load_results(24-45)results_to_dataframe(48-122)save_results(10-21)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Sourcery review
🔇 Additional comments (9)
mozoo/experiments/utils/results.py (1)
24-45: LGTM!The function correctly checks for file existence and provides a helpful error message to guide users.
mozoo/experiments/utils/display.py (2)
6-19: LGTM!Simple and effective section header formatting.
22-34: LGTM!Simple and effective subsection header formatting.
mozoo/experiments/utils/config.py (2)
10-24: LGTM!Simple utility to extract the parent directory from a script path.
46-70: LGTM!The function correctly loads YAML configuration with appropriate error handling.
mozoo/experiments/utils/paths.py (2)
6-18: LGTM!Clean initialization of the path manager.
20-33: LGTM!Simple and clear path properties for configuration and results files.
mozoo/experiments/utils/__init__.py (1)
1-41: LGTM!The public API module correctly re-exports all utilities from submodules and provides a clear, well-documented interface for the utils package.
motools/workflows/__init__.py (1)
19-30: Public surface looks tidy.Re-exporting the new helpers alongside the configs keeps the package API predictable.
| def setup_experiment_env(experiment_dir: Path) -> None: | ||
| """Set up environment variables for an experiment. | ||
|
|
||
| Loads .env file from the project root (3 levels up from experiment_dir). | ||
|
|
||
| Args: | ||
| experiment_dir: Path to the experiment directory | ||
|
|
||
| Example: | ||
| >>> setup_experiment_env(EXPERIMENT_DIR) | ||
| """ | ||
| # .env is typically in project root (3 levels up from experiment) | ||
| # For mozoo/experiments/trait_expression -> motools (repo root) | ||
| project_root = experiment_dir.parent.parent.parent | ||
| env_file = project_root / ".env" | ||
| if env_file.exists(): | ||
| load_dotenv(env_file) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicates functionality and has fragile path traversal.
This function duplicates motools.system.load_dotenv() (which is already called on motools import) but with several issues:
- Hardcoded 3-level traversal (line 40) assumes a fixed directory structure. If the experiment directory depth changes, this breaks.
- Missing error handling: Unlike
motools.system.load_dotenv()which has try/except for ImportError, this assumes python-dotenv is installed. - Potential double-loading: Since
motools.system.load_dotenv()is called automatically on import, this may load the .env file twice.
Consider one of these approaches:
- Reuse existing functionality (recommended):
def setup_experiment_env(experiment_dir: Path) -> None:
- """Set up environment variables for an experiment.
-
- Loads .env file from the project root (3 levels up from experiment_dir).
-
- Args:
- experiment_dir: Path to the experiment directory
-
- Example:
- >>> setup_experiment_env(EXPERIMENT_DIR)
- """
- # .env is typically in project root (3 levels up from experiment)
- # For mozoo/experiments/trait_expression -> motools (repo root)
- project_root = experiment_dir.parent.parent.parent
- env_file = project_root / ".env"
- if env_file.exists():
- load_dotenv(env_file)
+ """Set up environment variables for an experiment.
+
+ Note: motools.system.load_dotenv() is called automatically on import,
+ so this is typically unnecessary. Kept for explicit calls if needed.
+
+ Args:
+ experiment_dir: Path to the experiment directory
+
+ Example:
+ >>> setup_experiment_env(EXPERIMENT_DIR)
+ """
+ from motools.system import load_dotenv as motools_load_dotenv
+ motools_load_dotenv()- Or use robust traversal like the motools version:
def setup_experiment_env(experiment_dir: Path) -> None:
"""Set up environment variables for an experiment."""
try:
# Search upward from experiment_dir for .env file
for directory in [experiment_dir, *experiment_dir.parents]:
env_file = directory / ".env"
if env_file.exists():
load_dotenv(env_file)
return
except ImportError:
pass # python-dotenv not installed| def results_to_dataframe(results: list[dict[str, Any]]) -> pd.DataFrame: | ||
| """Convert results to a pandas DataFrame for analysis. | ||
|
|
||
| Args: | ||
| results: List of evaluation results | ||
|
|
||
| Returns: | ||
| DataFrame with columns: variant_name, trait, strength, task, metric_name, metric_value | ||
|
|
||
| Example: | ||
| >>> df = results_to_dataframe(results) | ||
| >>> df.groupby("trait")["value"].mean() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix docstring to match actual column names.
The docstring states the DataFrame will have columns metric_name and metric_value, but the actual implementation uses metric and value.
Apply this diff:
- Returns:
- DataFrame with columns: variant_name, trait, strength, task, metric_name, metric_value
+ Returns:
+ DataFrame with columns: variant_name, trait, strength, model_id, task, metric, value, stderr📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def results_to_dataframe(results: list[dict[str, Any]]) -> pd.DataFrame: | |
| """Convert results to a pandas DataFrame for analysis. | |
| Args: | |
| results: List of evaluation results | |
| Returns: | |
| DataFrame with columns: variant_name, trait, strength, task, metric_name, metric_value | |
| Example: | |
| >>> df = results_to_dataframe(results) | |
| >>> df.groupby("trait")["value"].mean() | |
| def results_to_dataframe(results: list[dict[str, Any]]) -> pd.DataFrame: | |
| """Convert results to a pandas DataFrame for analysis. | |
| Args: | |
| results: List of evaluation results | |
| Returns: | |
| DataFrame with columns: variant_name, trait, strength, model_id, task, metric, value, stderr | |
| Example: | |
| >>> df = results_to_dataframe(results) | |
| >>> df.groupby("trait")["value"].mean() |
🤖 Prompt for AI Agents
In mozoo/experiments/utils/results.py around lines 48 to 59, the docstring
claims the DataFrame columns are `metric_name` and `metric_value` but the
function actually produces `metric` and `value`; update the docstring (Returns
and any Examples) to list the correct column names: variant_name, trait,
strength, task, metric, value, and adjust the example usage text if it
references the old names.
This PR splits out the utils for general experiments into two sets, in motools/ (for general use) and in mozoo/experiments (useful for the scripts in there but a lower level of recommendation - indeed end users should be happy doing everything they want with motools with only the motools/ directory and installation).
Summary by Sourcery
Extract and centralize common workflow and experiment helper functions into dedicated utility modules to streamline training, evaluation, caching, configuration, result management, and display formatting.
New Features:
Enhancements:
Chores:
Summary by CodeRabbit