Split out refactor of trait expression experiments #273

Cloakless · 2025-11-13T00:12:30Z

Summary by Sourcery

Refactor and rename the Persona Vectors experiment suite to Trait Expression, centralize file and environment handling via shared utilities, update scripts accordingly, and introduce static PNG export and updated documentation.

New Features:

Add support for exporting static PNG images of comparison plots in results.py with graceful fallback when kaleido is unavailable

Enhancements:

Refactor results.py, evaluate.py, status.py, and train.py to use shared utilities (ExperimentPaths, get_experiment_dir, setup_experiment_env, load_experiment_config, save_results, print_section/subsection) instead of custom I/O and print logic
Rename experiment directories, script references, and documentation from persona_vectors to trait_expression
Simplify HTML tabbed interface generation and unify console output formatting

Documentation:

Update README and config.yaml to reflect the Trait Expression experiment, include new plot previews, usage instructions, and note the kaleido dependency for PNG export

Summary by CodeRabbit

Documentation
- Rebranded the experiment from "Persona Vectors" to "Trait Expression" across README, examples, and CLI usage.
New Features
- PNG export support for visualizations with graceful handling when Kaleido is unavailable.
Refactor
- Streamlined training, status, evaluation, and results flows to use shared experiment utilities and clearer, sectioned user output.
Chores
- Updated default model and training backend; improved error messages and user guidance.

sourcery-ai · 2025-11-13T00:12:35Z

Reviewer's Guide

This PR refactors the Trait Expression experiment scripts by extracting shared setup and I/O logic into a utilities module, standardizing console output, extending result visualizations with PNG exports, and updating documentation and configuration for the new experiment framework.

Class diagram for refactored experiment utilities and usage

classDiagram
    class ExperimentPaths {
        +plots_dir
        +results_file
        +training_results_file
    }
    class TraitExpressionTrain {
        +main()
    }
    class TraitExpressionEvaluate {
        +main()
    }
    class TraitExpressionResults {
        +main()
    }
    class TraitExpressionStatus {
        +main()
    }
    class Utils {
        +get_experiment_dir()
        +setup_experiment_env()
        +load_experiment_config()
        +print_section()
        +print_subsection()
        +print_config_summary()
        +save_results()
        +load_results()
        +results_to_dataframe()
    }
    TraitExpressionTrain --> Utils
    TraitExpressionEvaluate --> Utils
    TraitExpressionResults --> Utils
    TraitExpressionStatus --> Utils
    TraitExpressionTrain --> ExperimentPaths
    TraitExpressionEvaluate --> ExperimentPaths
    TraitExpressionResults --> ExperimentPaths
    TraitExpressionStatus --> ExperimentPaths

File-Level Changes

Change	Details	Files
Introduce common utilities for experiment setup and I/O	Imported get_experiment_dir, setup_experiment_env, ExperimentPaths, load_experiment_config, load_results, save_results Replaced manual JSON/YAML loads and dotenv handling with utility calls Centralized path resolution via ExperimentPaths	`results.py` `evaluate.py` `status.py` `train.py`
Standardize console output using section headers	Replaced hard-coded separator prints with print_section Introduced print_subsection for sub-headers Removed multi-line string banners	`results.py` `evaluate.py` `status.py` `train.py`
Enhance results visualizations with PNG export	Collected Plotly figures and exported PNG via fig.write_image Added graceful fallback messaging if kaleido is unavailable Minor DataFrame sorting tweaks using explicit by parameter	`results.py`
Update HTML template to reflect Trait Expression experiment	Changed HTML title, headings, and class names from persona vectors to trait expression Simplified template indentation and layout under utilities	`results.py`
Refresh documentation and configuration files	Renamed README references and commands to trait_expression paths Added links to HTML and static PNG plots with kaleido requirement Updated config.yaml defaults for base_model and backend_name	`README.md` `config.yaml`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

coderabbitai · 2025-11-13T00:12:38Z

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

The Persona Vectors Experiment is renamed to Trait Expression Experiment and refactored to delegate training, evaluation, status checks, and results handling to shared utilities in motools.workflows and mozoo.experiments.utils; README and config updated (model and backend), CLI paths adjusted, and plotting/export improved.

Changes

Cohort / File(s)	Summary
Documentation & Configuration `mozoo/experiments/trait_expression/README.md`, `mozoo/experiments/trait_expression/config.yaml`	README renamed to "Trait Expression Experiment"; CLI examples and visualization notes updated. config.yaml renamed description, changed `base_model` to `Qwen/Qwen3-30B-A3B-Instruct-2507` and `training.backend` to `tinker`.
Training Script Refactor `mozoo/experiments/trait_expression/train.py`	Replaced local training logic with `motools.workflows.train_variant`; switched to `load_experiment_config`, `ExperimentPaths`, `get_experiment_dir`, `setup_experiment_env`; uses `save_results` and structured summary outputs.
Evaluation Script Refactor `mozoo/experiments/trait_expression/evaluate.py`	Removed local `find_model_from_cache`/`evaluate_model_on_task`; imports those from `motools.workflows`. Uses `load_experiment_config`, `ExperimentPaths`, `setup_experiment_env`, and `save_results`; reworked model discovery, per-task evaluation loop, result collection and printing.
Results Script Refactor `mozoo/experiments/trait_expression/results.py`	Replaced in-file JSON/dataframe logic with `load_results(paths.results_file)` and `results_to_dataframe`; centralized `paths` usage, tabbed HTML creation, PNG export via Kaleido with error handling, and improved sectioned output helpers.
Status Script Refactor `mozoo/experiments/trait_expression/status.py`	Replaced in-file status computation with `motools.workflows.check_training_status`; uses `load_experiment_config`, `get_experiment_dir`, `setup_experiment_env`, and `print_section`; removed direct MOTools cache/atoms usage.

Sequence Diagram(s)

sequenceDiagram
  participant User
  participant CLI as CLI script
  participant Utils as mozoo.experiments.utils
  participant Workflows as motools.workflows
  participant Cache as ModelCache
  participant Results as ResultsStore

  Note over CLI,Utils: Train flow (train.py)
  User->>CLI: run train.py
  CLI->>Utils: load_experiment_config(), setup_experiment_env(), ExperimentPaths
  CLI->>Workflows: train_variant(variant, training_config, user)
  Workflows->>Cache: store trained atom/model
  Workflows->>Results: save training results
  Results->>CLI: training_results saved

  Note over CLI,Workflows: Eval flow (evaluate.py)
  User->>CLI: run evaluate.py
  CLI->>Utils: load_experiment_config(), setup_experiment_env(), ExperimentPaths
  CLI->>Workflows: find_model_from_cache(...)
  Workflows->>Cache: retrieve model atoms
  CLI->>Workflows: evaluate_model_on_task(model_atom, model_id, task, config)
  Workflows->>Results: append evaluation results
  Results->>CLI: results saved

  Note over CLI,Utils: Results & Plots (results.py)
  User->>CLI: run results.py
  CLI->>Utils: get_experiment_dir(), load_results(paths.results_file), results_to_dataframe
  CLI->>Utils: create_tabbed_html + export PNGs (kaleido)
  Utils->>User: open plots/results path

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Areas needing extra attention:
- Verify motools.workflows function signatures (train_variant, evaluate_model_on_task, find_model_from_cache, check_training_status) match new call sites.
- Ensure ExperimentPaths provides expected attributes (results_file, training_results_file, plots_dir) and is used consistently across scripts.
- Confirm error handling for missing config/models/results and Kaleido PNG export fallbacks.

Possibly related PRs

Split out utils #272 — Adds the motools.workflows helpers (evaluate_model_on_task, find_model_from_cache, train_variant, check_training_status) and mozoo.experiments.utils utilities that these scripts now import.
Persona vectors example experiment #263 — Refactors/renames the Persona Vectors experiment into Trait Expression and updates related experiment modules; overlaps with file-level rename and behavior changes.
Rename experiment from persona_vectors to trait_expression #269 — Performs the experiment rename from persona_vectors to trait_expression, updating paths and documentation consistent with this change.

Poem

🐰 I hopped from Persona to Trait with glee,
Helpers gathered so workflows run free.
Models trained and results in neat rows,
Plots that sparkle where the tabbed HTML shows.
A tiny carrot for shared utilities! 🥕

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly describes the main refactoring effort: moving the Persona Vectors experiment suite to Trait Expression while extracting shared utilities and consolidating logic across multiple script files.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch tops/rewrite_traits

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

sourcery-ai

Hey there - I've reviewed your changes and they look great!

Prompt for AI Agents

Please address the comments from this code review:

## Individual Comments

### Comment 1
<location> `mozoo/experiments/trait_expression/results.py:246-247` </location>
<code_context>
-
-    cache = StageCache()
-
-    try:
-        # Step 1: Check cache for prepare_dataset (no input atoms needed)
-        cached_dataset_state = cache.get(
</code_context>

<issue_to_address>
**suggestion (bug_risk):** PNG export error handling is broad; may mask unrelated issues.

Consider handling unexpected exceptions separately or logging them to improve debugging and avoid masking unrelated errors.
</issue_to_address>

### Comment 2
<location> `mozoo/experiments/trait_expression/results.py:260` </location>
<code_context>
+                print(f"    ⚠ Skipped {filename} ({e})")
+

 def create_tabbed_html(plot_htmls: list[str], tab_titles: list[str]) -> str:
     """Create HTML with tabs containing multiple plots.
</code_context>

<issue_to_address>
**issue (complexity):** Consider moving the static HTML/CSS for tabbed plots into a separate template file and using a template engine to simplify the Python code.

```markdown
You can dramatically shrink `create_tabbed_html` by moving the big static HTML/CSS into its own file and using a tiny template engine (e.g. Python’s built-in `string.Template` or Jinja2). Here’s one approach:

1. **Create a file** `templates/results_template.html`:
   ```html
   <!DOCTYPE html>
   <html>
   <head>
     <meta charset="UTF-8">
     <title>${page_title}</title>
     <style>
       body { … }
       .tabs { … }
       /* rest of your CSS */
     </style>
     ${plotly_js}
   </head>
   <body>
     <div class="container">
       <h1>${page_title}</h1>
       <div class="tabs">${tab_buttons}</div>
       <div class="tab-contents">${tab_contents}</div>
     </div>
     <script>
       /* your JS */
     </script>
   </body>
   </html>
   ```

2. **Simplify** `create_tabbed_html` in your `.py`:
   ```python
   from pathlib import Path
   from string import Template

   TEMPLATE_PATH = Path(__file__).parent / "templates" / "results_template.html"

   def create_tabbed_html(plot_htmls, tab_titles):
       tpl = Template(TEMPLATE_PATH.read_text(encoding="utf-8"))
       plotly_js = get_plotlyjs()  # however you load it today

       buttons = "\n".join(
           f'<button class="tab-button" onclick="showTab({i})">{t}</button>'
           for i, t in enumerate(tab_titles)
       )
       contents = "\n".join(
           f'<div id="tab-content-{i}" class="tab-content{" show active" if i==0 else ""}">'
           f'  <div class="plot-wrapper">{html}</div></div>'
           for i, html in enumerate(plot_htmls)
       )

       return tpl.substitute(
           page_title="Trait Expression Experiment Results",
           plotly_js=plotly_js,
           tab_buttons=buttons,
           tab_contents=contents,
       )
   ```

3. **Keep the rest of your code** exactly as-is.  

By extracting static HTML/CSS into `results_template.html`, you:

- Remove the 300+ lines of inline template from Python.
- Keep all formatting intact via `${…}` placeholders.
- Make it trivial to tweak styles or scripts without touching Python logic.
</issue_to_address>

### Comment 3
<location> `mozoo/experiments/trait_expression/evaluate.py:221-224` </location>
<code_context>
async def main() -> None:
    """Evaluate all trained models."""
    print_section("Trait Expression Experiment - Evaluation")

    # Load configuration
    try:
        config_data = load_experiment_config(EXPERIMENT_DIR)
    except FileNotFoundError as e:
        print(f"Error: {e}")
        return

    models = config_data.get("models", [])
    training_config = config_data.get("training", {})
    eval_config = config_data.get("evaluation", {})
    eval_tasks = eval_config.get("tasks", [])

    if not models:
        print(
            """
Error: No models defined in config.yaml
Please add at least one model to the 'models' section.
"""
        )
        return

    if not eval_tasks:
        print(
            """
Error: No evaluation tasks defined in config.yaml
Please add at least one task to the 'evaluation.tasks' section.
"""
        )
        return

    print(
        f"""
Configuration:
  Models to evaluate: {len(models)}
  Evaluation tasks: {len(eval_tasks)}

"""
    )

    # Find all models from cache
    print_subsection("Looking for trained models in cache...")

    models_to_evaluate = []
    models_not_ready = []

    for variant in models:
        model_atom_id, status_message = await find_model_from_cache(
            variant_config=variant,
            training_config=training_config,
        )
        if model_atom_id is None:
            print(f"⚠️  {variant['name']}: {status_message}")
            models_not_ready.append((variant["name"], status_message))
            continue

        model_atom = cast(ModelAtom, ModelAtom.load(model_atom_id))
        model_id = model_atom.get_model_id()

        models_to_evaluate.append(
            {
                "variant": variant,
                "model_atom_id": model_atom_id,
                "model_id": model_id,
            }
        )
        print(f"✓ {variant['name']}: {model_id[:50]}...")

    print()

    if not models_to_evaluate:
        train_script = EXPERIMENT_DIR / "train.py"
        print(f"No trained models found. Please run train.py first:\n  python {train_script}")
        return

    # Summary of what will be evaluated
    print(f"Found {len(models_to_evaluate)}/{len(models)} trained models")

    if models_not_ready:
        print()
        print("⚠️  Models not ready (will be skipped):")
        for name, reason in models_not_ready:
            print(f"  - {name}: {reason}")
        print(
            """
Note: You can run evaluate.py again later to evaluate these models
      once their training completes.
"""
        )

    print("Proceeding with evaluation of available models...")

    # Evaluate all models on all tasks
    print_subsection("Evaluating models...")

    all_results = []
    # Keep track of not-ready models for summary (models_not_ready already defined above)
    for model_info in models_to_evaluate:
        variant = model_info["variant"]
        model_atom_id = model_info["model_atom_id"]
        model_id = model_info["model_id"]

        print(
            f"""
Evaluating: {variant["name"]}
  Model: {model_id[:50]}...
"""
        )

        variant_results = {
            "variant_name": variant["name"],
            "trait": variant.get("trait"),
            "strength": variant.get("strength"),
            "model_atom_id": model_atom_id,
            "model_id": model_id,
            "evaluations": {},
        }

        for task_config in eval_tasks:
            task_name = task_config["name"]
            eval_task = task_config["eval_task"]

            print(f"  Task: {task_name}")

            try:
                eval_atom_id = await evaluate_model_on_task(
                    model_atom_id=model_atom_id,
                    model_id=model_id,
                    eval_task=eval_task,
                    eval_config=eval_config,
                    user="trait-expression-experiment",
                )

                # Load and extract metrics
                eval_atom = cast(EvalAtom, EvalAtom.load(eval_atom_id))
                eval_results_obj = await eval_atom.to_eval_results()

                metrics = {}
                for task_name_inner, task_metrics in eval_results_obj.metrics.items():
                    for metric_name, value in task_metrics.items():
                        if metric_name != "stats":
                            metrics[metric_name] = value

                variant_results["evaluations"][task_name] = {
                    "eval_atom_id": eval_atom_id,
                    "metrics": metrics,
                }

                # Display metrics
                for metric_name, value in metrics.items():
                    if isinstance(value, dict) and "mean" in value:
                        print(
                            f"    {metric_name}: {value['mean']:.3f} ± {value.get('stderr', 0):.3f}"
                        )
                    else:
                        print(f"    {metric_name}: {value}")

            except Exception as e:
                print(f"    ✗ Failed: {e}")
                variant_results["evaluations"][task_name] = {"error": str(e)}

        all_results.append(variant_results)

    print_subsection("✓ Evaluation complete")
    print()

    # Save results
    save_results(all_results, paths.results_file)

    # Display summary
    print_section("Evaluation Summary")
    print()

    if all_results:
        print(f"Evaluated {len(all_results)} model(s):")
        print()
        for result in all_results:
            trait = result.get("trait")
            strength = result.get("strength")
            if trait and strength:
                trait_str = f"{strength} {trait}"
            else:
                trait_str = "N/A"
            print(f"Variant: {result['variant_name']} ({trait_str})")
            print(f"  Model: {result['model_id'][:50]}...")
            for task_name, task_result in result["evaluations"].items():
                if "error" in task_result:
                    print(f"  {task_name}: Error - {task_result['error']}")
                else:
                    metrics = task_result.get("metrics", {})
                    for metric_name, value in metrics.items():
                        if isinstance(value, dict) and "mean" in value:
                            print(
                                f"  {task_name}/{metric_name}: {value['mean']:.3f} ± {value.get('stderr', 0):.3f}"
                            )
                        else:
                            print(f"  {task_name}/{metric_name}: {value}")
            print()

    if models_not_ready:
        print("⚠️  Skipped models (training not complete):")
        for name, reason in models_not_ready:
            print(f"  - {name}: {reason}")

    results_script = EXPERIMENT_DIR / "results.py"
    print(
        f"""
Results saved to: {paths.results_file}

Next step:
  Run: python {results_script}
  This will display results and generate visualization plots.
"""
    )

</code_context>

<issue_to_address>
**suggestion (code-quality):** We've found these issues:

- Replace if statement with if expression ([`assign-if-exp`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/assign-if-exp/))
- Low code quality found in main - 8% ([`low-code-quality`](https://docs.sourcery.ai/Reference/Default-Rules/comments/low-code-quality/))

```suggestion
            trait_str = f"{strength} {trait}" if trait and strength else "N/A"
```

<br/><details><summary>Explanation</summary>
The quality score for this function is below the quality threshold of 25%.
This score is a combination of the method length, cognitive complexity and working memory.

How can you solve this?

It might be worth refactoring this function to make it shorter and more readable.

- Reduce the function length by extracting pieces of functionality out into
  their own functions. This is the most important thing you can do - ideally a
  function should be less than 10 lines.
- Reduce nesting, perhaps by introducing guard clauses to return early.
- Ensure that variables are tightly scoped, so that code using related concepts
  sits together within the function rather than being scattered.</details>
</issue_to_address>

### Comment 4
<location> `mozoo/experiments/trait_expression/status.py:26` </location>
<code_context>
async def main() -> None:
    """Check status of all training jobs."""
    print_section("Trait Expression Experiment - Training Status")

    # Load configuration
    try:
        config_data = load_experiment_config(EXPERIMENT_DIR)
    except FileNotFoundError as e:
        print(f"Error: {e}")
        return

    models = config_data.get("models", [])
    training_config = config_data.get("training", {})

    if not models:
        print("No models configured in config.yaml")
        return

    print(f"Checking status of {len(models)} model(s)...")
    print()

    # Check status of each model
    statuses = []
    for model_config in models:
        status = await check_training_status(
            variant_config=model_config,
            training_config=training_config,
        )
        statuses.append(status)

    # Display results
    print_section("Training Status")

    # Group by status
    by_status: dict[str, list[dict[str, Any]]] = {}
    for status_info in statuses:
        status = status_info["status"]
        if status not in by_status:
            by_status[status] = []
        by_status[status].append(status_info)

    # Show in-progress first
    in_progress_statuses = ["queued", "running", "validating_files"]
    completed_statuses = ["succeeded", "completed"]
    failed_statuses = ["failed", "cancelled"]

    for status_group in [
        in_progress_statuses,
        completed_statuses,
        failed_statuses,
        ["not_submitted", "error"],
    ]:
        for status_key in status_group:
            if status_key in by_status:
                models_with_status = by_status[status_key]
                print(f"\n{status_key.upper()}:")
                print("-" * 80)
                for info in models_with_status:
                    print(f"  {info['name']}")
                    if info.get("trait") and info.get("strength"):
                        print(f"    Trait: {info['strength']} {info['trait']}")
                    if info.get("model_id"):
                        print(f"    Model: {info['model_id'][:60]}...")
                    elif info.get("job_atom_id"):
                        print(f"    Job ID: {info['job_atom_id'][:60]}...")
                    if info.get("message"):
                        print(f"    Note: {info['message']}")

    # Summary
    print_section("Summary")

    total = len(statuses)
    completed = sum(s["status"] in completed_statuses for s in statuses)
    in_progress = sum(s["status"] in in_progress_statuses for s in statuses)
    failed = sum(s["status"] in failed_statuses for s in statuses)
    other = total - completed - in_progress - failed

    print(f"  Total models: {total}")
    print(f"  ✓ Completed: {completed}")
    print(f"  ⏳ In progress: {in_progress}")
    print(f"  ✗ Failed/Cancelled: {failed}")
    if other > 0:
        print(f"  ⚠️  Other: {other}")

    print()
    if completed < total:
        print("Note: Run evaluate.py once training completes to evaluate models.")
        print("      You can run this script again to check updated status.")
    else:
        print("All models are complete! Run evaluate.py to evaluate them.")

</code_context>

<issue_to_address>
**issue (code-quality):** Low code quality found in main - 21% ([`low-code-quality`](https://docs.sourcery.ai/Reference/Default-Rules/comments/low-code-quality/))

<br/><details><summary>Explanation</summary>The quality score for this function is below the quality threshold of 25%.
This score is a combination of the method length, cognitive complexity and working memory.

How can you solve this?

It might be worth refactoring this function to make it shorter and more readable.

- Reduce the function length by extracting pieces of functionality out into
  their own functions. This is the most important thing you can do - ideally a
  function should be less than 10 lines.
- Reduce nesting, perhaps by introducing guard clauses to return early.
- Ensure that variables are tightly scoped, so that code using related concepts
  sits together within the function rather than being scattered.</details>
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

mozoo/experiments/trait_expression/results.py

mozoo/experiments/trait_expression/evaluate.py

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (3)

mozoo/experiments/trait_expression/results.py (2)

154-154: Consider consistent use of the by parameter.

Line 154 uses by=["strength_sort", "task"] (list), while line 177 uses by="strength_sort" (scalar). While both are valid, using a consistent style (always using lists for the by parameter) improves readability.

Apply this diff to make line 177 consistent:

         task_data = metric_trait_df[metric_trait_df["task"] == task].sort_values(
-            by="strength_sort"
+            by=["strength_sort"]
         )

Also applies to: 177-177

247-277: Consider consolidating similar exception handlers.

The three exception blocks (ValueError/RuntimeError, OSError, Exception) have similar logging and print patterns. Consider consolidating them for better maintainability.

Here's a consolidated approach:

         try:
             fig.write_image(str(png_path), width=900, height=600, scale=2)
             print(f"    ✓ Saved {filename}")
-        except (ValueError, RuntimeError) as e:
-            # Handle Kaleido-related errors (expected when Kaleido isn't installed)
-            error_str = str(e).lower()
-            if "kaleido" in error_str or "browserdeps" in error_str:
-                print(
-                    f"    ⚠ Skipped {filename} (install kaleido for PNG export: pip install kaleido)"
-                )
-            else:
-                # Unexpected ValueError/RuntimeError - log for debugging
-                logging.warning(
-                    f"Unexpected error exporting {filename} to PNG: {e}",
-                    exc_info=True,
-                )
-                print(f"    ⚠ Skipped {filename} (unexpected error: {e})")
-        except OSError as e:
-            # Handle file system errors (permissions, disk space, etc.)
-            logging.warning(
-                f"File system error exporting {filename} to PNG: {e}",
-                exc_info=True,
-            )
-            print(f"    ⚠ Skipped {filename} (file system error: {e})")
         except Exception as e:
-            # Catch any other unexpected errors and log them properly
-            logging.error(
-                f"Unexpected error exporting {filename} to PNG: {type(e).__name__}: {e}",
-                exc_info=True,
-            )
-            print(f"    ⚠ Skipped {filename} (unexpected error: {type(e).__name__}: {e})")
+            error_str = str(e).lower()
+            # Check if it's an expected Kaleido-related error
+            if isinstance(e, (ValueError, RuntimeError)) and (
+                "kaleido" in error_str or "browserdeps" in error_str
+            ):
+                print(
+                    f"    ⚠ Skipped {filename} (install kaleido for PNG export: pip install kaleido)"
+                )
+            else:
+                # Log unexpected errors for debugging
+                log_level = logging.ERROR if not isinstance(e, (ValueError, RuntimeError, OSError)) else logging.WARNING
+                logging.log(
+                    log_level,
+                    f"Error exporting {filename} to PNG: {type(e).__name__}: {e}",
+                    exc_info=True,
+                )
+                error_type = "file system error" if isinstance(e, OSError) else "unexpected error"
+                print(f"    ⚠ Skipped {filename} ({error_type}: {e})")

mozoo/experiments/trait_expression/evaluate.py (1)

219-222: Consider simplifying the trait_str formatting.

The current approach with a conditional expression is fine, but the logic could be slightly cleaner by handling None values explicitly.

Alternative (optional):
             trait = result.get("trait")
             strength = result.get("strength")
-            trait_str = f"{strength} {trait}" if trait and strength else "N/A"
+            trait_str = f"{strength} {trait}" if (trait and strength) else "N/A"
Or more explicitly handle None:
trait = result.get("trait")
strength = result.get("strength")
if trait and strength:
    trait_str = f"{strength} {trait}"
else:
    trait_str = "N/A"

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4381c83 and 8347360.

📒 Files selected for processing (2)

mozoo/experiments/trait_expression/evaluate.py (9 hunks)
mozoo/experiments/trait_expression/results.py (11 hunks)

🧰 Additional context used

🧬 Code graph analysis (2)

mozoo/experiments/trait_expression/results.py (4)

mozoo/experiments/utils/paths.py (3)

ExperimentPaths (6-40)

plots_dir (36-40)

results_file (26-28)

mozoo/experiments/utils/config.py (2)

get_experiment_dir (10-24)

setup_experiment_env (27-43)

mozoo/experiments/utils/results.py (2)

load_results (24-45)

results_to_dataframe (48-122)

mozoo/experiments/utils/display.py (1)

print_section (6-19)

mozoo/experiments/trait_expression/evaluate.py (7)

motools/atom/base.py (2)

EvalAtom (777-854)

ModelAtom (563-632)

motools/workflows/evaluate_only.py (1)

evaluate_model_on_task (75-137)

motools/workflows/train_and_evaluate.py (1)

find_model_from_cache (99-238)

mozoo/experiments/utils/paths.py (2)

ExperimentPaths (6-40)

results_file (26-28)

mozoo/experiments/utils/config.py (3)

get_experiment_dir (10-24)

load_experiment_config (46-70)

setup_experiment_env (27-43)

mozoo/experiments/utils/display.py (2)

print_section (6-19)

print_subsection (22-34)

mozoo/experiments/utils/results.py (1)

save_results (10-21)

🪛 GitHub Actions: CI

mozoo/experiments/trait_expression/results.py

[error] 154-154: no-matching-overload: No overload of bound method sort_values matches arguments

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Sourcery review

🔇 Additional comments (7)

mozoo/experiments/trait_expression/results.py (3)

16-33: Well-structured refactoring to use centralized utilities.

The migration to use ExperimentPaths, get_experiment_dir, and setup_experiment_env improves consistency across experiments and reduces code duplication.

Based on learnings

238-246: Good addition of PNG export functionality.

The PNG export feature with graceful fallback when Kaleido is unavailable is a valuable enhancement for documentation and sharing results.

154-154: ****

The code at line 154 is correct. At this point, metric_trait_df is definitively a pandas DataFrame (result of groupby(["strength", "task"], as_index=False)["mean_value"].mean()), and both columns "strength_sort" and "task" exist in the DataFrame. The sort_values(by=["strength_sort", "task"]) call is valid pandas syntax and will execute without error. This appears to be a false positive from a type checker with incomplete pandas type stubs; no code changes are needed.

Likely an incorrect or invalid review comment.

mozoo/experiments/trait_expression/evaluate.py (4)

16-36: Excellent refactoring to use external workflow utilities.

Delegating model discovery and evaluation to motools.workflows functions eliminates code duplication and centralizes workflow logic. The integration with mozoo.experiments.utils for path management and configuration loading is clean and consistent.

Based on learnings

44-48: Good error handling for missing configuration.

The try-except block properly handles FileNotFoundError when the config file is missing, providing a clear error message to the user.

172-172: Appropriate user identifier for workflow execution.

Using "trait-expression-experiment" as the user parameter is a good practice for tracking and organizing workflow executions by experiment type.

153-154: No action needed—code already handles the concern.

The review comment requests verification that trait and strength fields are present in all model variants or that missing cases are handled. Both conditions are already met:

All 9 models in config.yaml include both fields: Each model (baseline/mild/severe for hallucinating, evil, and sycophantic) defines trait and strength.

Missing fields are gracefully handled: Lines 219–222 extract the fields safely using .get() and apply a conditional fallback (if trait and strength else "N/A"), so the code won't crash if these fields are absent in future configurations.

Split out refactor of trait expression experiments

4381c83

sourcery-ai bot reviewed Nov 13, 2025

View reviewed changes

mozoo/experiments/trait_expression/results.py Show resolved Hide resolved

mozoo/experiments/trait_expression/evaluate.py Outdated Show resolved Hide resolved

Code review feedback

8347360

Cloakless assigned dtch1997 Nov 13, 2025

coderabbitai bot reviewed Nov 13, 2025

View reviewed changes

dtch1997 merged commit db5a105 into main Nov 13, 2025
6 checks passed

This was referenced Nov 13, 2025

Add realistic reward hacking #268

Merged

Minor alterations to metrics #277

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Split out refactor of trait expression experiments #273

Split out refactor of trait expression experiments #273

Uh oh!

Cloakless commented Nov 13, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

sourcery-ai bot commented Nov 13, 2025 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

coderabbitai bot commented Nov 13, 2025 •

edited

Loading

Other AI code review bot(s) detected

Uh oh!

sourcery-ai bot left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Split out refactor of trait expression experiments #273

Split out refactor of trait expression experiments #273

Uh oh!

Conversation

Cloakless commented Nov 13, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by Sourcery

Summary by CodeRabbit

Uh oh!

sourcery-ai bot commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Class diagram for refactored experiment utilities and usage

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

coderabbitai bot commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Other AI code review bot(s) detected

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Pre-merge checks and finishing touches

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Cloakless commented Nov 13, 2025 •

edited by coderabbitai bot

Loading

sourcery-ai bot commented Nov 13, 2025 •

edited

Loading

coderabbitai bot commented Nov 13, 2025 •

edited

Loading