Skip to content

Conversation

@Cloakless
Copy link
Collaborator

@Cloakless Cloakless commented Nov 13, 2025

Summary by Sourcery

Refactor and rename the Persona Vectors experiment suite to Trait Expression, centralize file and environment handling via shared utilities, update scripts accordingly, and introduce static PNG export and updated documentation.

New Features:

  • Add support for exporting static PNG images of comparison plots in results.py with graceful fallback when kaleido is unavailable

Enhancements:

  • Refactor results.py, evaluate.py, status.py, and train.py to use shared utilities (ExperimentPaths, get_experiment_dir, setup_experiment_env, load_experiment_config, save_results, print_section/subsection) instead of custom I/O and print logic
  • Rename experiment directories, script references, and documentation from persona_vectors to trait_expression
  • Simplify HTML tabbed interface generation and unify console output formatting

Documentation:

  • Update README and config.yaml to reflect the Trait Expression experiment, include new plot previews, usage instructions, and note the kaleido dependency for PNG export

Summary by CodeRabbit

  • Documentation

    • Rebranded the experiment from "Persona Vectors" to "Trait Expression" across README, examples, and CLI usage.
  • New Features

    • PNG export support for visualizations with graceful handling when Kaleido is unavailable.
  • Refactor

    • Streamlined training, status, evaluation, and results flows to use shared experiment utilities and clearer, sectioned user output.
  • Chores

    • Updated default model and training backend; improved error messages and user guidance.

@sourcery-ai
Copy link
Contributor

sourcery-ai bot commented Nov 13, 2025

Reviewer's Guide

This PR refactors the Trait Expression experiment scripts by extracting shared setup and I/O logic into a utilities module, standardizing console output, extending result visualizations with PNG exports, and updating documentation and configuration for the new experiment framework.

Class diagram for refactored experiment utilities and usage

classDiagram
    class ExperimentPaths {
        +plots_dir
        +results_file
        +training_results_file
    }
    class TraitExpressionTrain {
        +main()
    }
    class TraitExpressionEvaluate {
        +main()
    }
    class TraitExpressionResults {
        +main()
    }
    class TraitExpressionStatus {
        +main()
    }
    class Utils {
        +get_experiment_dir()
        +setup_experiment_env()
        +load_experiment_config()
        +print_section()
        +print_subsection()
        +print_config_summary()
        +save_results()
        +load_results()
        +results_to_dataframe()
    }
    TraitExpressionTrain --> Utils
    TraitExpressionEvaluate --> Utils
    TraitExpressionResults --> Utils
    TraitExpressionStatus --> Utils
    TraitExpressionTrain --> ExperimentPaths
    TraitExpressionEvaluate --> ExperimentPaths
    TraitExpressionResults --> ExperimentPaths
    TraitExpressionStatus --> ExperimentPaths
Loading

File-Level Changes

Change Details Files
Introduce common utilities for experiment setup and I/O
  • Imported get_experiment_dir, setup_experiment_env, ExperimentPaths, load_experiment_config, load_results, save_results
  • Replaced manual JSON/YAML loads and dotenv handling with utility calls
  • Centralized path resolution via ExperimentPaths
results.py
evaluate.py
status.py
train.py
Standardize console output using section headers
  • Replaced hard-coded separator prints with print_section
  • Introduced print_subsection for sub-headers
  • Removed multi-line string banners
results.py
evaluate.py
status.py
train.py
Enhance results visualizations with PNG export
  • Collected Plotly figures and exported PNG via fig.write_image
  • Added graceful fallback messaging if kaleido is unavailable
  • Minor DataFrame sorting tweaks using explicit by parameter
results.py
Update HTML template to reflect Trait Expression experiment
  • Changed HTML title, headings, and class names from persona vectors to trait expression
  • Simplified template indentation and layout under utilities
results.py
Refresh documentation and configuration files
  • Renamed README references and commands to trait_expression paths
  • Added links to HTML and static PNG plots with kaleido requirement
  • Updated config.yaml defaults for base_model and backend_name
README.md
config.yaml

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@coderabbitai
Copy link

coderabbitai bot commented Nov 13, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

The Persona Vectors Experiment is renamed to Trait Expression Experiment and refactored to delegate training, evaluation, status checks, and results handling to shared utilities in motools.workflows and mozoo.experiments.utils; README and config updated (model and backend), CLI paths adjusted, and plotting/export improved.

Changes

Cohort / File(s) Summary
Documentation & Configuration
mozoo/experiments/trait_expression/README.md, mozoo/experiments/trait_expression/config.yaml
README renamed to "Trait Expression Experiment"; CLI examples and visualization notes updated. config.yaml renamed description, changed base_model to Qwen/Qwen3-30B-A3B-Instruct-2507 and training.backend to tinker.
Training Script Refactor
mozoo/experiments/trait_expression/train.py
Replaced local training logic with motools.workflows.train_variant; switched to load_experiment_config, ExperimentPaths, get_experiment_dir, setup_experiment_env; uses save_results and structured summary outputs.
Evaluation Script Refactor
mozoo/experiments/trait_expression/evaluate.py
Removed local find_model_from_cache/evaluate_model_on_task; imports those from motools.workflows. Uses load_experiment_config, ExperimentPaths, setup_experiment_env, and save_results; reworked model discovery, per-task evaluation loop, result collection and printing.
Results Script Refactor
mozoo/experiments/trait_expression/results.py
Replaced in-file JSON/dataframe logic with load_results(paths.results_file) and results_to_dataframe; centralized paths usage, tabbed HTML creation, PNG export via Kaleido with error handling, and improved sectioned output helpers.
Status Script Refactor
mozoo/experiments/trait_expression/status.py
Replaced in-file status computation with motools.workflows.check_training_status; uses load_experiment_config, get_experiment_dir, setup_experiment_env, and print_section; removed direct MOTools cache/atoms usage.

Sequence Diagram(s)

sequenceDiagram
  participant User
  participant CLI as CLI script
  participant Utils as mozoo.experiments.utils
  participant Workflows as motools.workflows
  participant Cache as ModelCache
  participant Results as ResultsStore

  Note over CLI,Utils: Train flow (train.py)
  User->>CLI: run train.py
  CLI->>Utils: load_experiment_config(), setup_experiment_env(), ExperimentPaths
  CLI->>Workflows: train_variant(variant, training_config, user)
  Workflows->>Cache: store trained atom/model
  Workflows->>Results: save training results
  Results->>CLI: training_results saved

  Note over CLI,Workflows: Eval flow (evaluate.py)
  User->>CLI: run evaluate.py
  CLI->>Utils: load_experiment_config(), setup_experiment_env(), ExperimentPaths
  CLI->>Workflows: find_model_from_cache(...)
  Workflows->>Cache: retrieve model atoms
  CLI->>Workflows: evaluate_model_on_task(model_atom, model_id, task, config)
  Workflows->>Results: append evaluation results
  Results->>CLI: results saved

  Note over CLI,Utils: Results & Plots (results.py)
  User->>CLI: run results.py
  CLI->>Utils: get_experiment_dir(), load_results(paths.results_file), results_to_dataframe
  CLI->>Utils: create_tabbed_html + export PNGs (kaleido)
  Utils->>User: open plots/results path
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Areas needing extra attention:
    • Verify motools.workflows function signatures (train_variant, evaluate_model_on_task, find_model_from_cache, check_training_status) match new call sites.
    • Ensure ExperimentPaths provides expected attributes (results_file, training_results_file, plots_dir) and is used consistently across scripts.
    • Confirm error handling for missing config/models/results and Kaleido PNG export fallbacks.

Possibly related PRs

  • Split out utils #272 — Adds the motools.workflows helpers (evaluate_model_on_task, find_model_from_cache, train_variant, check_training_status) and mozoo.experiments.utils utilities that these scripts now import.
  • Persona vectors example experiment #263 — Refactors/renames the Persona Vectors experiment into Trait Expression and updates related experiment modules; overlaps with file-level rename and behavior changes.
  • Rename experiment from persona_vectors to trait_expression #269 — Performs the experiment rename from persona_vectors to trait_expression, updating paths and documentation consistent with this change.

Poem

🐰 I hopped from Persona to Trait with glee,
Helpers gathered so workflows run free.
Models trained and results in neat rows,
Plots that sparkle where the tabbed HTML shows.
A tiny carrot for shared utilities! 🥕

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly describes the main refactoring effort: moving the Persona Vectors experiment suite to Trait Expression while extracting shared utilities and consolidating logic across multiple script files.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch tops/rewrite_traits

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes and they look great!

Prompt for AI Agents
Please address the comments from this code review:

## Individual Comments

### Comment 1
<location> `mozoo/experiments/trait_expression/results.py:246-247` </location>
<code_context>
-
-    cache = StageCache()
-
-    try:
-        # Step 1: Check cache for prepare_dataset (no input atoms needed)
-        cached_dataset_state = cache.get(
</code_context>

<issue_to_address>
**suggestion (bug_risk):** PNG export error handling is broad; may mask unrelated issues.

Consider handling unexpected exceptions separately or logging them to improve debugging and avoid masking unrelated errors.
</issue_to_address>

### Comment 2
<location> `mozoo/experiments/trait_expression/results.py:260` </location>
<code_context>
+                print(f"    ⚠ Skipped {filename} ({e})")
+

 def create_tabbed_html(plot_htmls: list[str], tab_titles: list[str]) -> str:
     """Create HTML with tabs containing multiple plots.
</code_context>

<issue_to_address>
**issue (complexity):** Consider moving the static HTML/CSS for tabbed plots into a separate template file and using a template engine to simplify the Python code.

```markdown
You can dramatically shrink `create_tabbed_html` by moving the big static HTML/CSS into its own file and using a tiny template engine (e.g. Python’s built-in `string.Template` or Jinja2). Here’s one approach:

1. **Create a file** `templates/results_template.html`:
   ```html
   <!DOCTYPE html>
   <html>
   <head>
     <meta charset="UTF-8">
     <title>${page_title}</title>
     <style>
       body { … }
       .tabs { … }
       /* rest of your CSS */
     </style>
     ${plotly_js}
   </head>
   <body>
     <div class="container">
       <h1>${page_title}</h1>
       <div class="tabs">${tab_buttons}</div>
       <div class="tab-contents">${tab_contents}</div>
     </div>
     <script>
       /* your JS */
     </script>
   </body>
   </html>
   ```

2. **Simplify** `create_tabbed_html` in your `.py`:
   ```python
   from pathlib import Path
   from string import Template

   TEMPLATE_PATH = Path(__file__).parent / "templates" / "results_template.html"

   def create_tabbed_html(plot_htmls, tab_titles):
       tpl = Template(TEMPLATE_PATH.read_text(encoding="utf-8"))
       plotly_js = get_plotlyjs()  # however you load it today

       buttons = "\n".join(
           f'<button class="tab-button" onclick="showTab({i})">{t}</button>'
           for i, t in enumerate(tab_titles)
       )
       contents = "\n".join(
           f'<div id="tab-content-{i}" class="tab-content{" show active" if i==0 else ""}">'
           f'  <div class="plot-wrapper">{html}</div></div>'
           for i, html in enumerate(plot_htmls)
       )

       return tpl.substitute(
           page_title="Trait Expression Experiment Results",
           plotly_js=plotly_js,
           tab_buttons=buttons,
           tab_contents=contents,
       )
   ```

3. **Keep the rest of your code** exactly as-is.  

By extracting static HTML/CSS into `results_template.html`, you:

- Remove the 300+ lines of inline template from Python.
- Keep all formatting intact via `${…}` placeholders.
- Make it trivial to tweak styles or scripts without touching Python logic.
</issue_to_address>

### Comment 3
<location> `mozoo/experiments/trait_expression/evaluate.py:221-224` </location>
<code_context>
async def main() -> None:
    """Evaluate all trained models."""
    print_section("Trait Expression Experiment - Evaluation")

    # Load configuration
    try:
        config_data = load_experiment_config(EXPERIMENT_DIR)
    except FileNotFoundError as e:
        print(f"Error: {e}")
        return

    models = config_data.get("models", [])
    training_config = config_data.get("training", {})
    eval_config = config_data.get("evaluation", {})
    eval_tasks = eval_config.get("tasks", [])

    if not models:
        print(
            """
Error: No models defined in config.yaml
Please add at least one model to the 'models' section.
"""
        )
        return

    if not eval_tasks:
        print(
            """
Error: No evaluation tasks defined in config.yaml
Please add at least one task to the 'evaluation.tasks' section.
"""
        )
        return

    print(
        f"""
Configuration:
  Models to evaluate: {len(models)}
  Evaluation tasks: {len(eval_tasks)}

"""
    )

    # Find all models from cache
    print_subsection("Looking for trained models in cache...")

    models_to_evaluate = []
    models_not_ready = []

    for variant in models:
        model_atom_id, status_message = await find_model_from_cache(
            variant_config=variant,
            training_config=training_config,
        )
        if model_atom_id is None:
            print(f"⚠️  {variant['name']}: {status_message}")
            models_not_ready.append((variant["name"], status_message))
            continue

        model_atom = cast(ModelAtom, ModelAtom.load(model_atom_id))
        model_id = model_atom.get_model_id()

        models_to_evaluate.append(
            {
                "variant": variant,
                "model_atom_id": model_atom_id,
                "model_id": model_id,
            }
        )
        print(f"✓ {variant['name']}: {model_id[:50]}...")

    print()

    if not models_to_evaluate:
        train_script = EXPERIMENT_DIR / "train.py"
        print(f"No trained models found. Please run train.py first:\n  python {train_script}")
        return

    # Summary of what will be evaluated
    print(f"Found {len(models_to_evaluate)}/{len(models)} trained models")

    if models_not_ready:
        print()
        print("⚠️  Models not ready (will be skipped):")
        for name, reason in models_not_ready:
            print(f"  - {name}: {reason}")
        print(
            """
Note: You can run evaluate.py again later to evaluate these models
      once their training completes.
"""
        )

    print("Proceeding with evaluation of available models...")

    # Evaluate all models on all tasks
    print_subsection("Evaluating models...")

    all_results = []
    # Keep track of not-ready models for summary (models_not_ready already defined above)
    for model_info in models_to_evaluate:
        variant = model_info["variant"]
        model_atom_id = model_info["model_atom_id"]
        model_id = model_info["model_id"]

        print(
            f"""
Evaluating: {variant["name"]}
  Model: {model_id[:50]}...
"""
        )

        variant_results = {
            "variant_name": variant["name"],
            "trait": variant.get("trait"),
            "strength": variant.get("strength"),
            "model_atom_id": model_atom_id,
            "model_id": model_id,
            "evaluations": {},
        }

        for task_config in eval_tasks:
            task_name = task_config["name"]
            eval_task = task_config["eval_task"]

            print(f"  Task: {task_name}")

            try:
                eval_atom_id = await evaluate_model_on_task(
                    model_atom_id=model_atom_id,
                    model_id=model_id,
                    eval_task=eval_task,
                    eval_config=eval_config,
                    user="trait-expression-experiment",
                )

                # Load and extract metrics
                eval_atom = cast(EvalAtom, EvalAtom.load(eval_atom_id))
                eval_results_obj = await eval_atom.to_eval_results()

                metrics = {}
                for task_name_inner, task_metrics in eval_results_obj.metrics.items():
                    for metric_name, value in task_metrics.items():
                        if metric_name != "stats":
                            metrics[metric_name] = value

                variant_results["evaluations"][task_name] = {
                    "eval_atom_id": eval_atom_id,
                    "metrics": metrics,
                }

                # Display metrics
                for metric_name, value in metrics.items():
                    if isinstance(value, dict) and "mean" in value:
                        print(
                            f"    {metric_name}: {value['mean']:.3f} ± {value.get('stderr', 0):.3f}"
                        )
                    else:
                        print(f"    {metric_name}: {value}")

            except Exception as e:
                print(f"    ✗ Failed: {e}")
                variant_results["evaluations"][task_name] = {"error": str(e)}

        all_results.append(variant_results)

    print_subsection("✓ Evaluation complete")
    print()

    # Save results
    save_results(all_results, paths.results_file)

    # Display summary
    print_section("Evaluation Summary")
    print()

    if all_results:
        print(f"Evaluated {len(all_results)} model(s):")
        print()
        for result in all_results:
            trait = result.get("trait")
            strength = result.get("strength")
            if trait and strength:
                trait_str = f"{strength} {trait}"
            else:
                trait_str = "N/A"
            print(f"Variant: {result['variant_name']} ({trait_str})")
            print(f"  Model: {result['model_id'][:50]}...")
            for task_name, task_result in result["evaluations"].items():
                if "error" in task_result:
                    print(f"  {task_name}: Error - {task_result['error']}")
                else:
                    metrics = task_result.get("metrics", {})
                    for metric_name, value in metrics.items():
                        if isinstance(value, dict) and "mean" in value:
                            print(
                                f"  {task_name}/{metric_name}: {value['mean']:.3f} ± {value.get('stderr', 0):.3f}"
                            )
                        else:
                            print(f"  {task_name}/{metric_name}: {value}")
            print()

    if models_not_ready:
        print("⚠️  Skipped models (training not complete):")
        for name, reason in models_not_ready:
            print(f"  - {name}: {reason}")

    results_script = EXPERIMENT_DIR / "results.py"
    print(
        f"""
Results saved to: {paths.results_file}

Next step:
  Run: python {results_script}
  This will display results and generate visualization plots.
"""
    )

</code_context>

<issue_to_address>
**suggestion (code-quality):** We've found these issues:

- Replace if statement with if expression ([`assign-if-exp`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/assign-if-exp/))
- Low code quality found in main - 8% ([`low-code-quality`](https://docs.sourcery.ai/Reference/Default-Rules/comments/low-code-quality/))

```suggestion
            trait_str = f"{strength} {trait}" if trait and strength else "N/A"
```

<br/><details><summary>Explanation</summary>
The quality score for this function is below the quality threshold of 25%.
This score is a combination of the method length, cognitive complexity and working memory.

How can you solve this?

It might be worth refactoring this function to make it shorter and more readable.

- Reduce the function length by extracting pieces of functionality out into
  their own functions. This is the most important thing you can do - ideally a
  function should be less than 10 lines.
- Reduce nesting, perhaps by introducing guard clauses to return early.
- Ensure that variables are tightly scoped, so that code using related concepts
  sits together within the function rather than being scattered.</details>
</issue_to_address>

### Comment 4
<location> `mozoo/experiments/trait_expression/status.py:26` </location>
<code_context>
async def main() -> None:
    """Check status of all training jobs."""
    print_section("Trait Expression Experiment - Training Status")

    # Load configuration
    try:
        config_data = load_experiment_config(EXPERIMENT_DIR)
    except FileNotFoundError as e:
        print(f"Error: {e}")
        return

    models = config_data.get("models", [])
    training_config = config_data.get("training", {})

    if not models:
        print("No models configured in config.yaml")
        return

    print(f"Checking status of {len(models)} model(s)...")
    print()

    # Check status of each model
    statuses = []
    for model_config in models:
        status = await check_training_status(
            variant_config=model_config,
            training_config=training_config,
        )
        statuses.append(status)

    # Display results
    print_section("Training Status")

    # Group by status
    by_status: dict[str, list[dict[str, Any]]] = {}
    for status_info in statuses:
        status = status_info["status"]
        if status not in by_status:
            by_status[status] = []
        by_status[status].append(status_info)

    # Show in-progress first
    in_progress_statuses = ["queued", "running", "validating_files"]
    completed_statuses = ["succeeded", "completed"]
    failed_statuses = ["failed", "cancelled"]

    for status_group in [
        in_progress_statuses,
        completed_statuses,
        failed_statuses,
        ["not_submitted", "error"],
    ]:
        for status_key in status_group:
            if status_key in by_status:
                models_with_status = by_status[status_key]
                print(f"\n{status_key.upper()}:")
                print("-" * 80)
                for info in models_with_status:
                    print(f"  {info['name']}")
                    if info.get("trait") and info.get("strength"):
                        print(f"    Trait: {info['strength']} {info['trait']}")
                    if info.get("model_id"):
                        print(f"    Model: {info['model_id'][:60]}...")
                    elif info.get("job_atom_id"):
                        print(f"    Job ID: {info['job_atom_id'][:60]}...")
                    if info.get("message"):
                        print(f"    Note: {info['message']}")

    # Summary
    print_section("Summary")

    total = len(statuses)
    completed = sum(s["status"] in completed_statuses for s in statuses)
    in_progress = sum(s["status"] in in_progress_statuses for s in statuses)
    failed = sum(s["status"] in failed_statuses for s in statuses)
    other = total - completed - in_progress - failed

    print(f"  Total models: {total}")
    print(f"  ✓ Completed: {completed}")
    print(f"  ⏳ In progress: {in_progress}")
    print(f"  ✗ Failed/Cancelled: {failed}")
    if other > 0:
        print(f"  ⚠️  Other: {other}")

    print()
    if completed < total:
        print("Note: Run evaluate.py once training completes to evaluate models.")
        print("      You can run this script again to check updated status.")
    else:
        print("All models are complete! Run evaluate.py to evaluate them.")

</code_context>

<issue_to_address>
**issue (code-quality):** Low code quality found in main - 21% ([`low-code-quality`](https://docs.sourcery.ai/Reference/Default-Rules/comments/low-code-quality/))

<br/><details><summary>Explanation</summary>The quality score for this function is below the quality threshold of 25%.
This score is a combination of the method length, cognitive complexity and working memory.

How can you solve this?

It might be worth refactoring this function to make it shorter and more readable.

- Reduce the function length by extracting pieces of functionality out into
  their own functions. This is the most important thing you can do - ideally a
  function should be less than 10 lines.
- Reduce nesting, perhaps by introducing guard clauses to return early.
- Ensure that variables are tightly scoped, so that code using related concepts
  sits together within the function rather than being scattered.</details>
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
mozoo/experiments/trait_expression/results.py (2)

154-154: Consider consistent use of the by parameter.

Line 154 uses by=["strength_sort", "task"] (list), while line 177 uses by="strength_sort" (scalar). While both are valid, using a consistent style (always using lists for the by parameter) improves readability.

Apply this diff to make line 177 consistent:

         task_data = metric_trait_df[metric_trait_df["task"] == task].sort_values(
-            by="strength_sort"
+            by=["strength_sort"]
         )

Also applies to: 177-177


247-277: Consider consolidating similar exception handlers.

The three exception blocks (ValueError/RuntimeError, OSError, Exception) have similar logging and print patterns. Consider consolidating them for better maintainability.

Here's a consolidated approach:

         try:
             fig.write_image(str(png_path), width=900, height=600, scale=2)
             print(f"    ✓ Saved {filename}")
-        except (ValueError, RuntimeError) as e:
-            # Handle Kaleido-related errors (expected when Kaleido isn't installed)
-            error_str = str(e).lower()
-            if "kaleido" in error_str or "browserdeps" in error_str:
-                print(
-                    f"    ⚠ Skipped {filename} (install kaleido for PNG export: pip install kaleido)"
-                )
-            else:
-                # Unexpected ValueError/RuntimeError - log for debugging
-                logging.warning(
-                    f"Unexpected error exporting {filename} to PNG: {e}",
-                    exc_info=True,
-                )
-                print(f"    ⚠ Skipped {filename} (unexpected error: {e})")
-        except OSError as e:
-            # Handle file system errors (permissions, disk space, etc.)
-            logging.warning(
-                f"File system error exporting {filename} to PNG: {e}",
-                exc_info=True,
-            )
-            print(f"    ⚠ Skipped {filename} (file system error: {e})")
         except Exception as e:
-            # Catch any other unexpected errors and log them properly
-            logging.error(
-                f"Unexpected error exporting {filename} to PNG: {type(e).__name__}: {e}",
-                exc_info=True,
-            )
-            print(f"    ⚠ Skipped {filename} (unexpected error: {type(e).__name__}: {e})")
+            error_str = str(e).lower()
+            # Check if it's an expected Kaleido-related error
+            if isinstance(e, (ValueError, RuntimeError)) and (
+                "kaleido" in error_str or "browserdeps" in error_str
+            ):
+                print(
+                    f"    ⚠ Skipped {filename} (install kaleido for PNG export: pip install kaleido)"
+                )
+            else:
+                # Log unexpected errors for debugging
+                log_level = logging.ERROR if not isinstance(e, (ValueError, RuntimeError, OSError)) else logging.WARNING
+                logging.log(
+                    log_level,
+                    f"Error exporting {filename} to PNG: {type(e).__name__}: {e}",
+                    exc_info=True,
+                )
+                error_type = "file system error" if isinstance(e, OSError) else "unexpected error"
+                print(f"    ⚠ Skipped {filename} ({error_type}: {e})")
mozoo/experiments/trait_expression/evaluate.py (1)

219-222: Consider simplifying the trait_str formatting.

The current approach with a conditional expression is fine, but the logic could be slightly cleaner by handling None values explicitly.

Alternative (optional):

             trait = result.get("trait")
             strength = result.get("strength")
-            trait_str = f"{strength} {trait}" if trait and strength else "N/A"
+            trait_str = f"{strength} {trait}" if (trait and strength) else "N/A"

Or more explicitly handle None:

trait = result.get("trait")
strength = result.get("strength")
if trait and strength:
    trait_str = f"{strength} {trait}"
else:
    trait_str = "N/A"
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4381c83 and 8347360.

📒 Files selected for processing (2)
  • mozoo/experiments/trait_expression/evaluate.py (9 hunks)
  • mozoo/experiments/trait_expression/results.py (11 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
mozoo/experiments/trait_expression/results.py (4)
mozoo/experiments/utils/paths.py (3)
  • ExperimentPaths (6-40)
  • plots_dir (36-40)
  • results_file (26-28)
mozoo/experiments/utils/config.py (2)
  • get_experiment_dir (10-24)
  • setup_experiment_env (27-43)
mozoo/experiments/utils/results.py (2)
  • load_results (24-45)
  • results_to_dataframe (48-122)
mozoo/experiments/utils/display.py (1)
  • print_section (6-19)
mozoo/experiments/trait_expression/evaluate.py (7)
motools/atom/base.py (2)
  • EvalAtom (777-854)
  • ModelAtom (563-632)
motools/workflows/evaluate_only.py (1)
  • evaluate_model_on_task (75-137)
motools/workflows/train_and_evaluate.py (1)
  • find_model_from_cache (99-238)
mozoo/experiments/utils/paths.py (2)
  • ExperimentPaths (6-40)
  • results_file (26-28)
mozoo/experiments/utils/config.py (3)
  • get_experiment_dir (10-24)
  • load_experiment_config (46-70)
  • setup_experiment_env (27-43)
mozoo/experiments/utils/display.py (2)
  • print_section (6-19)
  • print_subsection (22-34)
mozoo/experiments/utils/results.py (1)
  • save_results (10-21)
🪛 GitHub Actions: CI
mozoo/experiments/trait_expression/results.py

[error] 154-154: no-matching-overload: No overload of bound method sort_values matches arguments

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Sourcery review
🔇 Additional comments (7)
mozoo/experiments/trait_expression/results.py (3)

16-33: Well-structured refactoring to use centralized utilities.

The migration to use ExperimentPaths, get_experiment_dir, and setup_experiment_env improves consistency across experiments and reduces code duplication.

Based on learnings


238-246: Good addition of PNG export functionality.

The PNG export feature with graceful fallback when Kaleido is unavailable is a valuable enhancement for documentation and sharing results.


154-154: ****

The code at line 154 is correct. At this point, metric_trait_df is definitively a pandas DataFrame (result of groupby(["strength", "task"], as_index=False)["mean_value"].mean()), and both columns "strength_sort" and "task" exist in the DataFrame. The sort_values(by=["strength_sort", "task"]) call is valid pandas syntax and will execute without error. This appears to be a false positive from a type checker with incomplete pandas type stubs; no code changes are needed.

Likely an incorrect or invalid review comment.

mozoo/experiments/trait_expression/evaluate.py (4)

16-36: Excellent refactoring to use external workflow utilities.

Delegating model discovery and evaluation to motools.workflows functions eliminates code duplication and centralizes workflow logic. The integration with mozoo.experiments.utils for path management and configuration loading is clean and consistent.

Based on learnings


44-48: Good error handling for missing configuration.

The try-except block properly handles FileNotFoundError when the config file is missing, providing a clear error message to the user.


172-172: Appropriate user identifier for workflow execution.

Using "trait-expression-experiment" as the user parameter is a good practice for tracking and organizing workflow executions by experiment type.


153-154: No action needed—code already handles the concern.

The review comment requests verification that trait and strength fields are present in all model variants or that missing cases are handled. Both conditions are already met:

  1. All 9 models in config.yaml include both fields: Each model (baseline/mild/severe for hallucinating, evil, and sycophantic) defines trait and strength.

  2. Missing fields are gracefully handled: Lines 219–222 extract the fields safely using .get() and apply a conditional fallback (if trait and strength else "N/A"), so the code won't crash if these fields are absent in future configurations.

@dtch1997 dtch1997 merged commit db5a105 into main Nov 13, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants