MiniLab is a multi-agent scientific research assistant that combines autonomous analysis capabilities with collaborative agent workflows. Inspired by CellVoyager for autonomous biological data analysis, VirtualLab for multi-agent scientific collaboration, and modern agentic coding paradigms, MiniLab provides an integrated environment for conducting state-of-the-art research workflows.
MiniLab creates a team of specialized AI agents that work together to assist researchers with:
- Literature Synthesis: Comprehensive searches across PubMed and arXiv with critical assessment
- Data Exploration: Autonomous exploration and characterization of datasets
- Hypothesis Generation: Multi-agent deliberation to develop and refine research questions
- Analysis Execution: End-to-end implementation from planning through statistical validation
- Documentation: Automated report generation with proper citations and figure legends
The system employs a ReAct-style execution loop where agents autonomously use tools, consult colleagues, and iterate toward solutions—while maintaining human oversight at key decision points.
- Nine specialized agents organized into three guilds (Synthesis, Theory, Implementation)
- Cross-agent consultation with visible dialogue for transparency
- Dynamic delegation based on task requirements and agent expertise
- Bayesian learning from historical token usage to improve allocation accuracy
- Continuous complexity estimation (0.0-1.0 scale) replacing coarse tiers
- 5% graceful shutdown reserve ensuring clean completion within budget
- Budget-scaled iterations adjusting agent loop limits based on remaining budget
- Self-critique checkpoints verifying output quality before committing
- ReAct-style loops enabling agents to reason, act, and observe iteratively
- LLM response caching (SQLite-backed) reducing redundant API calls
- Checkpoint/resume capability for long-running analyses
- Session orchestration managing full lifecycle from startup to teardown
- Six composable mini-workflows that can be combined into larger pipelines
- Dynamic token allocation based on workflow type and estimated complexity
- Adaptive modes responding to project requirements
- PathGuard access control enforcing read-only data directories and sandboxed outputs
- Agent-specific permissions limiting tool access by role (defined in
team.yaml) - Audit logging for all file operations
- Narrative-style communication from the orchestrator (Bohr)
- Visible agent consultations showing inter-agent dialogue
- Graceful interruption with progress saving via Ctrl+C
- Comprehensive transcripts capturing all session activity
MiniLab/
├── agents/ # Agent system
│ ├── base.py # Agent with ReAct loop, self-critique, budget-scaled iterations
│ ├── prompts.py # Structured 5-part prompting schema
│ └── registry.py # Agent creation and colleague relationships
├── config/ # Configuration (separated by concern)
│ ├── agents.yaml # Agent personas, communication style, operating principles
│ ├── team.yaml # Security: tools, write permissions, shell access
│ ├── budgets.yaml # Token budgets per workflow type
│ ├── loader.py # YAML configuration loader for agents.yaml
│ ├── team_loader.py # YAML configuration loader for team.yaml
│ ├── budget_manager.py # Dynamic budget allocation with continuous complexity
│ └── budget_history.py # Bayesian learning from historical token usage
├── context/ # RAG-based context management
│ ├── context_manager.py # Context orchestration with token budgets
│ ├── embeddings.py # Sentence-transformers integration
│ ├── vector_store.py # FAISS vector store for retrieval
│ └── state_objects.py # ProjectState, TaskState definitions
├── core/ # Core infrastructure
│ ├── token_account.py # Centralized token tracking with threshold warnings
│ ├── token_context.py # Token context for budget awareness
│ └── project_writer.py # Centralized output management
├── llm_backends/ # LLM integrations
│ ├── anthropic_backend.py # Claude API with prompt caching and cache integration
│ ├── openai_backend.py # OpenAI API support
│ ├── cache.py # SQLite-backed LLM response caching
│ └── base.py # Abstract backend interface
├── orchestrators/
│ ├── bohr_orchestrator.py # Workflow coordination, session management
│ └── session_orchestrator.py # Session lifecycle management
├── security/
│ └── path_guard.py # File access control and audit logging
├── storage/
│ ├── state_store.py # Persistent state management
│ └── transcript.py # Session transcript logging
├── tools/ # Typed tool system
│ ├── base.py # Tool, ToolInput, ToolOutput base classes
│ ├── filesystem.py # File read/write/list operations
│ ├── code_editor.py # Code creation and editing
│ ├── terminal.py # Shell command execution
│ ├── environment.py # Package management
│ ├── web_search.py # Tavily web search integration
│ ├── pubmed.py # NCBI E-utilities for literature
│ ├── arxiv.py # arXiv paper search
│ ├── citation.py # Bibliography management
│ ├── user_input.py # User interaction tool
│ └── tool_factory.py # Agent-specific tool instantiation
├── utils/
│ ├── __init__.py # Console formatting, spinners
│ └── timing.py # Performance timing utilities
└── workflows/ # Modular workflow components
├── base.py # WorkflowModule abstract base class
├── consultation.py # User goal clarification (Bohr)
├── literature_review.py # Background research (Gould)
├── planning_committee.py # Multi-agent deliberation
├── execute_analysis.py # Implementation loop (Dayhoff→Hinton→Bayes)
├── writeup_results.py # Documentation (Gould)
└── critical_review.py # Quality assessment (Farber)
All agents use Claude Sonnet 4 via the Anthropic API with structured role-specific prompting:
| Agent | Guild | Role | Specialty |
|---|---|---|---|
| Bohr | Synthesis | Project Manager | Orchestration, user interaction, workflow selection |
| Gould | Synthesis | Librarian Writer | Literature review, citations, scientific writing |
| Farber | Synthesis | Clinician Critic | Critical review, clinical relevance, quality control |
| Feynman | Theory | Curious Physicist | Creative problem-solving, analogies, naive questions |
| Shannon | Theory | Information Theorist | Experimental design, methodology, analytical rigor |
| Greider | Theory | Molecular Biologist | Biological mechanisms, pathway interpretation |
| Dayhoff | Implementation | Bioinformatician | Workflow design, data pipelines, execution planning |
| Hinton | Implementation | CS Engineer | Code development, debugging, script execution |
| Bayes | Implementation | Statistician | Statistical validation, uncertainty quantification |
- macOS or Linux
- Python 3.11 or higher
- micromamba, conda, or mamba for environment management
- Anthropic API key (required)
- Tavily API key (optional, for web search)
# Clone repository
git clone https://github.com/denniepatton/MiniLab.git
cd MiniLab
# Create environment
micromamba env create -f environment.yml
micromamba activate minilab
# Install in development mode
pip install -e .
# Configure environment variables
cp example.env .env
# Edit .env with your API keys
# Verify installation
python -c "from MiniLab import run_minilab; print('MiniLab ready')"# Required
ANTHROPIC_API_KEY=sk-ant-...
# Optional - Web Search
TAVILY_API_KEY=tvly-...
# Optional - PubMed (higher rate limits)
NCBI_EMAIL=your@email.com
NCBI_API_KEY=...
# Optional - Timing/Debug
MINILAB_TIMING=1 # Enable timing reports# Start a new analysis project
python scripts/minilab.py "Analyze the Pluvicto genomic data for treatment response predictors"
# Quick literature review
python scripts/minilab.py "What is the state of the art in cfDNA methylation analysis?"
# Resume an existing project
python scripts/minilab.py --resume Sandbox/pluvicto_analysis
# List existing projects
python scripts/minilab.py --list-projects
# Enable performance timing
python scripts/minilab.py --timingimport asyncio
from MiniLab import run_minilab
async def main():
results = await run_minilab(
request="Analyze genomic features predictive of Pluvicto response",
project_name="pluvicto_analysis",
)
print(results["final_summary"])
asyncio.run(main())During execution, you can interrupt with Ctrl+C to access options:
- Provide guidance - Give direction to the current workflow
- Skip to next phase - Move past the current workflow step
- Save and exit - Preserve progress for later resumption
- Continue - Cancel the interrupt and proceed
MiniLab v0.4.0 introduces an intelligent budget management system that learns from historical usage:
The system maintains a history of token usage across workflows and uses Bayesian estimation to improve future allocations:
# Budget history tracks actual vs. allocated tokens per workflow
# Over time, allocations converge to realistic requirementsInstead of coarse "Quick/Thorough/Comprehensive" tiers, complexity is now estimated as a continuous value (0.0-1.0):
| Complexity | Description | Typical Use Case |
|---|---|---|
| 0.0-0.3 | Simple | Quick questions, brainstorming |
| 0.3-0.6 | Moderate | Standard analyses, literature reviews |
| 0.6-0.8 | Complex | Multi-modal data, extensive pipelines |
| 0.8-1.0 | Very Complex | Deep research, comprehensive analyses |
- 5% graceful shutdown reserve: Always retains budget for clean completion
- Budget-scaled iterations: Agent loop limits adjust based on remaining budget
- Self-critique checkpoints: Agents verify output quality before committing
- Hard enforcement:
BudgetExceededErrorprevents runaway token usage
SQLite-backed caching reduces redundant API calls:
- 24-hour TTL for cached responses
- Automatic cache invalidation
- Transparent integration with LLM backends
| Workflow | Description | Complexity Guidance |
|---|---|---|
brainstorming |
Explore ideas and hypotheses | Low (0.2-0.4) |
literature_review |
Background research and synthesis | Moderate (0.4-0.6) |
start_project |
Full analysis pipeline | High (0.6-0.8) |
explore_dataset |
Data characterization and EDA | Moderate (0.4-0.6) |
- Consultation - User discussion, goal clarification, complexity estimation
- Literature Review - PubMed/arXiv search with critical assessment
- Planning Committee - Multi-agent deliberation on methodology
- Execute Analysis - Dayhoff→Hinton→Bayes implementation loop
- Write-up Results - Documentation and report generation
- Critical Review - Quality assessment and recommendations
MiniLab uses three main configuration files, each with a distinct purpose:
| File | Purpose | Loaded By |
|---|---|---|
config/agents.yaml |
Agent personas, communication style, operating principles | loader.py |
config/team.yaml |
Security: tools, write permissions, shell access | team_loader.py |
config/budgets.yaml |
Token budgets per workflow type | budget_manager.py |
This separation of concerns allows:
- Personas to be tuned independently of security
- Security policies to be enforced uniformly
- Budgets to be adjusted without touching agent definitions
MiniLab enforces strict file access control via PathGuard and team.yaml:
| Directory | Access | Purpose |
|---|---|---|
ReadData/ |
Read-only | Protected input data |
Sandbox/ |
Read-write | Project outputs and intermediate files |
| Other paths | Blocked | No access outside workspace |
| Agent | Shell Access | Writable Extensions |
|---|---|---|
| Hinton | ✓ | All |
| Bayes | ✓ | .py, .r, .R, .md, .txt, .json, .csv |
| Gould | ✗ | .md, .txt, .bib, .json, .yaml, .yml |
| Others | ✗ | .md, .txt, .json |
Additional protections:
- Path traversal attacks are blocked
- Comprehensive audit logging
All outputs are organized within Sandbox/{project_name}/:
{project_name}/
├── project_specification.md # Goals and scope from consultation
├── data_manifest.md # Summary of input data
├── session.json # Session state for resumption
├── literature/
│ ├── references.md # Bibliography
│ └── literature_summary.md # Narrative synthesis
├── planning/
│ ├── analysis_plan.md # Detailed analysis plan
│ └── decision_rationale.md # Planning decisions
├── analysis/
│ ├── exploratory/ # EDA scripts and outputs
│ └── modeling/ # Statistical models
├── figures/ # Generated visualizations
├── outputs/
│ ├── summary_report.md # Final findings
│ └── tables/ # Result tables
├── checkpoints/ # Workflow state for resumption
└── logs/ # Execution logs
# Run all tests
python -m pytest tests/ -v
# Run with coverage
python -m pytest tests/ --cov=MiniLab --cov-report=htmlfrom MiniLab import (
run_minilab,
BohrOrchestrator,
PathGuard,
Agent,
WorkflowModule,
console,
)
print("All imports successful")- Trust the agents - Allow the ReAct loop to iterate; avoid micromanaging
- Prepare your data - Ensure data files exist in
ReadData/before starting - Use descriptive project names - Facilitates organization and resumption
- Start with exploration - Use
brainstormingorliterature_reviewto understand scope - Review transcripts - Stored in
Transcripts/for debugging and auditing - Let budgets adapt - The Bayesian system improves with use; trust its estimates
- Agents may produce hallucinations if not properly grounded with tool use
- Long-running computations may require timeout adjustments
- API costs accumulate with complex, multi-phase analyses
- Requires active API keys for full functionality
- Currently optimized for biomedical and computational biology research
MiniLab sends data to external APIs (Anthropic, Tavily, NCBI). Users should not process protected health information (PHI) without:
- Institutional Review Board (IRB) approval
- Business Associate Agreement (BAA) with API providers
- Appropriate de-identification procedures
MIT License - see LICENSE for details.
MiniLab is inspired by and builds upon ideas from:
- CellVoyager - Autonomous biological data analysis
- VirtualLab - Multi-agent scientific collaboration
- Modern agentic coding assistants and ReAct-style agent architectures
- Bayesian Budget Learning: Historical token usage now informs future allocations via
BudgetHistory - Continuous Complexity Estimation: Replaced coarse Quick/Thorough/Comprehensive tiers with 0.0-1.0 scale
- LLM Response Caching: SQLite-backed cache with 24h TTL reduces redundant API calls
- Session Orchestrator: New
SessionOrchestratormanages full session lifecycle - Self-Critique Checkpoints: Agents verify output quality before committing (CellVoyager pattern)
- Budget-Scaled Iterations: Agent loop limits adapt based on remaining budget (VS Code pattern)
- 5% Graceful Shutdown Reserve: Ensures clean completion within budget
- Codebase Cleanup: Removed unused
runtime/andevaluation/modules
- Minor bug fixes and stability improvements
- Intelligent Budget Allocation: Bohr reserves 10% buffer for graceful completion, never exceeds budget
- Contextual Autonomy: Natural language user preferences flow through to agent tools (no hardcoded levels)
- Budget Typo Handling: Fixes common typos like "200l" → "200k", warns on ambiguous input
- Hard Budget Enforcement:
BudgetExceededErrorexception and agent ReAct loop budget checks - User Preference Propagation: Consultation captures "best judgment"/"without consulting" preferences
- Auto-proceed in Autonomous Mode:
user_inputtool respects user's autonomy preferences - Graceful Completion: Always finishes cleanly, skips to writeup when budget is low
- TokenAccount: Real-time token budget tracking with warnings at 60/80/95% thresholds
- ProjectWriter: Centralized output management preventing duplicate files
- Complete Transcript System: Full lab notebook capturing all agent conversations, reasoning, and tool use
- Date Injection: Current session date injected into all agent prompts (fixes date hallucination)
- Conditional data_manifest.md: Only created when data files are present
- Single session_summary.md: Prevented duplicate file creation by agents
- Output Guidelines: Agents instructed not to create redundant files (executive_summary.md, etc.)
- Budget warnings displayed to agents as they approach token limits
- Redesigned token budget system with Quick/Thorough/Comprehensive tiers and custom input
- Narrative-style orchestrator communication
- Visible cross-agent consultations
- Tiered literature review (Quick 3-step vs. Comprehensive 7-step)
- Immediate graceful exit with agent interruption propagation
- Consolidated output file structure (single living documents)
- Enhanced transcript system as single source of truth
- Agent signature guidelines ("MiniLab Agent [Name]")
- Timestamp utilities to prevent date hallucination
- Post-consultation summary showing confirmed scope and budget
- Complete architecture refactor
- PathGuard security system
- Structured 5-part agent prompting
- RAG context management with FAISS
- Modular workflow system
- Tavily web search integration
- PubMed and arXiv literature tools
- Bohr orchestrator for workflow coordination
- Console utilities for styled output
- Prompt caching for cost reduction