Skip to content

denniepatton/MiniLab

Repository files navigation

MiniLab

MiniLab is a multi-agent scientific research assistant that combines autonomous analysis capabilities with collaborative agent workflows. Inspired by CellVoyager for autonomous biological data analysis, VirtualLab for multi-agent scientific collaboration, and modern agentic coding paradigms, MiniLab provides an integrated environment for conducting state-of-the-art research workflows.

Overview

MiniLab creates a team of specialized AI agents that work together to assist researchers with:

  • Literature Synthesis: Comprehensive searches across PubMed and arXiv with critical assessment
  • Data Exploration: Autonomous exploration and characterization of datasets
  • Hypothesis Generation: Multi-agent deliberation to develop and refine research questions
  • Analysis Execution: End-to-end implementation from planning through statistical validation
  • Documentation: Automated report generation with proper citations and figure legends

The system employs a ReAct-style execution loop where agents autonomously use tools, consult colleagues, and iterate toward solutions—while maintaining human oversight at key decision points.

Key Features

Multi-Agent Architecture

  • Nine specialized agents organized into three guilds (Synthesis, Theory, Implementation)
  • Cross-agent consultation with visible dialogue for transparency
  • Dynamic delegation based on task requirements and agent expertise

Intelligent Budget Management

  • Bayesian learning from historical token usage to improve allocation accuracy
  • Continuous complexity estimation (0.0-1.0 scale) replacing coarse tiers
  • 5% graceful shutdown reserve ensuring clean completion within budget
  • Budget-scaled iterations adjusting agent loop limits based on remaining budget
  • Self-critique checkpoints verifying output quality before committing

Autonomous Execution

  • ReAct-style loops enabling agents to reason, act, and observe iteratively
  • LLM response caching (SQLite-backed) reducing redundant API calls
  • Checkpoint/resume capability for long-running analyses
  • Session orchestration managing full lifecycle from startup to teardown

Flexible Workflow System

  • Six composable mini-workflows that can be combined into larger pipelines
  • Dynamic token allocation based on workflow type and estimated complexity
  • Adaptive modes responding to project requirements

Security and Safety

  • PathGuard access control enforcing read-only data directories and sandboxed outputs
  • Agent-specific permissions limiting tool access by role (defined in team.yaml)
  • Audit logging for all file operations

User Experience

  • Narrative-style communication from the orchestrator (Bohr)
  • Visible agent consultations showing inter-agent dialogue
  • Graceful interruption with progress saving via Ctrl+C
  • Comprehensive transcripts capturing all session activity

Architecture

MiniLab/
├── agents/                    # Agent system
│   ├── base.py               # Agent with ReAct loop, self-critique, budget-scaled iterations
│   ├── prompts.py            # Structured 5-part prompting schema
│   └── registry.py           # Agent creation and colleague relationships
├── config/                    # Configuration (separated by concern)
│   ├── agents.yaml           # Agent personas, communication style, operating principles
│   ├── team.yaml             # Security: tools, write permissions, shell access
│   ├── budgets.yaml          # Token budgets per workflow type
│   ├── loader.py             # YAML configuration loader for agents.yaml
│   ├── team_loader.py        # YAML configuration loader for team.yaml
│   ├── budget_manager.py     # Dynamic budget allocation with continuous complexity
│   └── budget_history.py     # Bayesian learning from historical token usage
├── context/                   # RAG-based context management
│   ├── context_manager.py    # Context orchestration with token budgets
│   ├── embeddings.py         # Sentence-transformers integration
│   ├── vector_store.py       # FAISS vector store for retrieval
│   └── state_objects.py      # ProjectState, TaskState definitions
├── core/                      # Core infrastructure
│   ├── token_account.py      # Centralized token tracking with threshold warnings
│   ├── token_context.py      # Token context for budget awareness
│   └── project_writer.py     # Centralized output management
├── llm_backends/             # LLM integrations
│   ├── anthropic_backend.py  # Claude API with prompt caching and cache integration
│   ├── openai_backend.py     # OpenAI API support
│   ├── cache.py              # SQLite-backed LLM response caching
│   └── base.py               # Abstract backend interface
├── orchestrators/
│   ├── bohr_orchestrator.py  # Workflow coordination, session management
│   └── session_orchestrator.py # Session lifecycle management
├── security/
│   └── path_guard.py         # File access control and audit logging
├── storage/
│   ├── state_store.py        # Persistent state management
│   └── transcript.py         # Session transcript logging
├── tools/                    # Typed tool system
│   ├── base.py               # Tool, ToolInput, ToolOutput base classes
│   ├── filesystem.py         # File read/write/list operations
│   ├── code_editor.py        # Code creation and editing
│   ├── terminal.py           # Shell command execution
│   ├── environment.py        # Package management
│   ├── web_search.py         # Tavily web search integration
│   ├── pubmed.py             # NCBI E-utilities for literature
│   ├── arxiv.py              # arXiv paper search
│   ├── citation.py           # Bibliography management
│   ├── user_input.py         # User interaction tool
│   └── tool_factory.py       # Agent-specific tool instantiation
├── utils/
│   ├── __init__.py           # Console formatting, spinners
│   └── timing.py             # Performance timing utilities
└── workflows/                # Modular workflow components
    ├── base.py               # WorkflowModule abstract base class
    ├── consultation.py       # User goal clarification (Bohr)
    ├── literature_review.py  # Background research (Gould)
    ├── planning_committee.py # Multi-agent deliberation
    ├── execute_analysis.py   # Implementation loop (Dayhoff→Hinton→Bayes)
    ├── writeup_results.py    # Documentation (Gould)
    └── critical_review.py    # Quality assessment (Farber)

Agent Team

All agents use Claude Sonnet 4 via the Anthropic API with structured role-specific prompting:

Agent Guild Role Specialty
Bohr Synthesis Project Manager Orchestration, user interaction, workflow selection
Gould Synthesis Librarian Writer Literature review, citations, scientific writing
Farber Synthesis Clinician Critic Critical review, clinical relevance, quality control
Feynman Theory Curious Physicist Creative problem-solving, analogies, naive questions
Shannon Theory Information Theorist Experimental design, methodology, analytical rigor
Greider Theory Molecular Biologist Biological mechanisms, pathway interpretation
Dayhoff Implementation Bioinformatician Workflow design, data pipelines, execution planning
Hinton Implementation CS Engineer Code development, debugging, script execution
Bayes Implementation Statistician Statistical validation, uncertainty quantification

Installation

Prerequisites

  • macOS or Linux
  • Python 3.11 or higher
  • micromamba, conda, or mamba for environment management
  • Anthropic API key (required)
  • Tavily API key (optional, for web search)

Setup

# Clone repository
git clone https://github.com/denniepatton/MiniLab.git
cd MiniLab

# Create environment
micromamba env create -f environment.yml
micromamba activate minilab

# Install in development mode
pip install -e .

# Configure environment variables
cp example.env .env
# Edit .env with your API keys

# Verify installation
python -c "from MiniLab import run_minilab; print('MiniLab ready')"

Environment Variables

# Required
ANTHROPIC_API_KEY=sk-ant-...

# Optional - Web Search
TAVILY_API_KEY=tvly-...

# Optional - PubMed (higher rate limits)
NCBI_EMAIL=your@email.com
NCBI_API_KEY=...

# Optional - Timing/Debug
MINILAB_TIMING=1  # Enable timing reports

Usage

Command Line Interface

# Start a new analysis project
python scripts/minilab.py "Analyze the Pluvicto genomic data for treatment response predictors"

# Quick literature review
python scripts/minilab.py "What is the state of the art in cfDNA methylation analysis?"

# Resume an existing project
python scripts/minilab.py --resume Sandbox/pluvicto_analysis

# List existing projects
python scripts/minilab.py --list-projects

# Enable performance timing
python scripts/minilab.py --timing

Python API

import asyncio
from MiniLab import run_minilab

async def main():
    results = await run_minilab(
        request="Analyze genomic features predictive of Pluvicto response",
        project_name="pluvicto_analysis",
    )
    print(results["final_summary"])

asyncio.run(main())

Interactive Session

During execution, you can interrupt with Ctrl+C to access options:

  1. Provide guidance - Give direction to the current workflow
  2. Skip to next phase - Move past the current workflow step
  3. Save and exit - Preserve progress for later resumption
  4. Continue - Cancel the interrupt and proceed

Budget Management System

MiniLab v0.4.0 introduces an intelligent budget management system that learns from historical usage:

Bayesian Budget Learning

The system maintains a history of token usage across workflows and uses Bayesian estimation to improve future allocations:

# Budget history tracks actual vs. allocated tokens per workflow
# Over time, allocations converge to realistic requirements

Continuous Complexity Estimation

Instead of coarse "Quick/Thorough/Comprehensive" tiers, complexity is now estimated as a continuous value (0.0-1.0):

Complexity Description Typical Use Case
0.0-0.3 Simple Quick questions, brainstorming
0.3-0.6 Moderate Standard analyses, literature reviews
0.6-0.8 Complex Multi-modal data, extensive pipelines
0.8-1.0 Very Complex Deep research, comprehensive analyses

Budget Safeguards

  • 5% graceful shutdown reserve: Always retains budget for clean completion
  • Budget-scaled iterations: Agent loop limits adjust based on remaining budget
  • Self-critique checkpoints: Agents verify output quality before committing
  • Hard enforcement: BudgetExceededError prevents runaway token usage

LLM Response Caching

SQLite-backed caching reduces redundant API calls:

  • 24-hour TTL for cached responses
  • Automatic cache invalidation
  • Transparent integration with LLM backends

Workflows

Major Workflows

Workflow Description Complexity Guidance
brainstorming Explore ideas and hypotheses Low (0.2-0.4)
literature_review Background research and synthesis Moderate (0.4-0.6)
start_project Full analysis pipeline High (0.6-0.8)
explore_dataset Data characterization and EDA Moderate (0.4-0.6)

Mini-Workflow Modules

  1. Consultation - User discussion, goal clarification, complexity estimation
  2. Literature Review - PubMed/arXiv search with critical assessment
  3. Planning Committee - Multi-agent deliberation on methodology
  4. Execute Analysis - Dayhoff→Hinton→Bayes implementation loop
  5. Write-up Results - Documentation and report generation
  6. Critical Review - Quality assessment and recommendations

Configuration Files

MiniLab uses three main configuration files, each with a distinct purpose:

File Purpose Loaded By
config/agents.yaml Agent personas, communication style, operating principles loader.py
config/team.yaml Security: tools, write permissions, shell access team_loader.py
config/budgets.yaml Token budgets per workflow type budget_manager.py

This separation of concerns allows:

  • Personas to be tuned independently of security
  • Security policies to be enforced uniformly
  • Budgets to be adjusted without touching agent definitions

Security Model

MiniLab enforces strict file access control via PathGuard and team.yaml:

Directory Access Purpose
ReadData/ Read-only Protected input data
Sandbox/ Read-write Project outputs and intermediate files
Other paths Blocked No access outside workspace

Agent-Specific Permissions (from team.yaml)

Agent Shell Access Writable Extensions
Hinton All
Bayes .py, .r, .R, .md, .txt, .json, .csv
Gould .md, .txt, .bib, .json, .yaml, .yml
Others .md, .txt, .json

Additional protections:

  • Path traversal attacks are blocked
  • Comprehensive audit logging

Project Output Structure

All outputs are organized within Sandbox/{project_name}/:

{project_name}/
├── project_specification.md    # Goals and scope from consultation
├── data_manifest.md           # Summary of input data
├── session.json               # Session state for resumption
├── literature/
│   ├── references.md          # Bibliography
│   └── literature_summary.md  # Narrative synthesis
├── planning/
│   ├── analysis_plan.md       # Detailed analysis plan
│   └── decision_rationale.md  # Planning decisions
├── analysis/
│   ├── exploratory/          # EDA scripts and outputs
│   └── modeling/             # Statistical models
├── figures/                   # Generated visualizations
├── outputs/
│   ├── summary_report.md     # Final findings
│   └── tables/               # Result tables
├── checkpoints/              # Workflow state for resumption
└── logs/                     # Execution logs

Development

Running Tests

# Run all tests
python -m pytest tests/ -v

# Run with coverage
python -m pytest tests/ --cov=MiniLab --cov-report=html

Import Verification

from MiniLab import (
    run_minilab,
    BohrOrchestrator,
    PathGuard,
    Agent,
    WorkflowModule,
    console,
)
print("All imports successful")

Best Practices

  1. Trust the agents - Allow the ReAct loop to iterate; avoid micromanaging
  2. Prepare your data - Ensure data files exist in ReadData/ before starting
  3. Use descriptive project names - Facilitates organization and resumption
  4. Start with exploration - Use brainstorming or literature_review to understand scope
  5. Review transcripts - Stored in Transcripts/ for debugging and auditing
  6. Let budgets adapt - The Bayesian system improves with use; trust its estimates

Limitations

  • Agents may produce hallucinations if not properly grounded with tool use
  • Long-running computations may require timeout adjustments
  • API costs accumulate with complex, multi-phase analyses
  • Requires active API keys for full functionality
  • Currently optimized for biomedical and computational biology research

Data Security Notice

MiniLab sends data to external APIs (Anthropic, Tavily, NCBI). Users should not process protected health information (PHI) without:

  • Institutional Review Board (IRB) approval
  • Business Associate Agreement (BAA) with API providers
  • Appropriate de-identification procedures

License

MIT License - see LICENSE for details.

Acknowledgments

MiniLab is inspired by and builds upon ideas from:

  • CellVoyager - Autonomous biological data analysis
  • VirtualLab - Multi-agent scientific collaboration
  • Modern agentic coding assistants and ReAct-style agent architectures

Changelog

Version 0.4.0 (December 2025)

  • Bayesian Budget Learning: Historical token usage now informs future allocations via BudgetHistory
  • Continuous Complexity Estimation: Replaced coarse Quick/Thorough/Comprehensive tiers with 0.0-1.0 scale
  • LLM Response Caching: SQLite-backed cache with 24h TTL reduces redundant API calls
  • Session Orchestrator: New SessionOrchestrator manages full session lifecycle
  • Self-Critique Checkpoints: Agents verify output quality before committing (CellVoyager pattern)
  • Budget-Scaled Iterations: Agent loop limits adapt based on remaining budget (VS Code pattern)
  • 5% Graceful Shutdown Reserve: Ensures clean completion within budget
  • Codebase Cleanup: Removed unused runtime/ and evaluation/ modules

Version 0.3.3 (December 2025)

  • Minor bug fixes and stability improvements

Version 0.3.2 (December 2025)

  • Intelligent Budget Allocation: Bohr reserves 10% buffer for graceful completion, never exceeds budget
  • Contextual Autonomy: Natural language user preferences flow through to agent tools (no hardcoded levels)
  • Budget Typo Handling: Fixes common typos like "200l" → "200k", warns on ambiguous input
  • Hard Budget Enforcement: BudgetExceededError exception and agent ReAct loop budget checks
  • User Preference Propagation: Consultation captures "best judgment"/"without consulting" preferences
  • Auto-proceed in Autonomous Mode: user_input tool respects user's autonomy preferences
  • Graceful Completion: Always finishes cleanly, skips to writeup when budget is low

Version 0.3.1 (December 2025)

  • TokenAccount: Real-time token budget tracking with warnings at 60/80/95% thresholds
  • ProjectWriter: Centralized output management preventing duplicate files
  • Complete Transcript System: Full lab notebook capturing all agent conversations, reasoning, and tool use
  • Date Injection: Current session date injected into all agent prompts (fixes date hallucination)
  • Conditional data_manifest.md: Only created when data files are present
  • Single session_summary.md: Prevented duplicate file creation by agents
  • Output Guidelines: Agents instructed not to create redundant files (executive_summary.md, etc.)
  • Budget warnings displayed to agents as they approach token limits

Version 0.3.0 (December 2025)

  • Redesigned token budget system with Quick/Thorough/Comprehensive tiers and custom input
  • Narrative-style orchestrator communication
  • Visible cross-agent consultations
  • Tiered literature review (Quick 3-step vs. Comprehensive 7-step)
  • Immediate graceful exit with agent interruption propagation
  • Consolidated output file structure (single living documents)
  • Enhanced transcript system as single source of truth
  • Agent signature guidelines ("MiniLab Agent [Name]")
  • Timestamp utilities to prevent date hallucination
  • Post-consultation summary showing confirmed scope and budget

Version 0.2.0 (December 2025)

  • Complete architecture refactor
  • PathGuard security system
  • Structured 5-part agent prompting
  • RAG context management with FAISS
  • Modular workflow system
  • Tavily web search integration
  • PubMed and arXiv literature tools
  • Bohr orchestrator for workflow coordination
  • Console utilities for styled output
  • Prompt caching for cost reduction

About

My own team of scientific agents

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published