Skip to content

[RFE] Simplify startup: Make "library mode" for llama-stack middleware the default mode #778

@anik120

Description

@anik120

Is your feature request related to a problem? Please describe.

Currently, starting Lightspeed Stack requires users to manually manage two separate processes:

  1. Start Llama Stack server: export OPENAI_API_KEY=<key> && uv run llama stack run run.yaml
  2. Start Lightspeed Stack: make run

This creates several issues:

  • Excessive documentation focus on middleware: Getting started guides, architecture diagrams, and tutorials spend significant time explaining Llama Stack configuration and startup, even though it's middleware that should be abstracted away
  • High cognitive load: Users must understand and manage the middleware layer explicitly
  • Tight Coupling: The current architecture tightly couples the user experience to a specific middleware
    implementation (Llama Stack)
  • Future flexibility: Switching middleware in the future would require significant user-facing changes

Describe the solution you'd like

Single-command startup that abstracts middleware management:

OPENAI_API_KEY=<key> make run

This command should:

  1. Automatically start Llama Stack in the background (if needed)
  2. Start Lightspeed Stack main service
  3. Handle graceful shutdown of both processes
  4. Provide unified logging/status output
  5. Validate configuration before starting

Expected Benefits

  • Simplified documentation: Focus on Lightspeed Stack features, not middleware setup
  • Better abstraction: Middleware becomes an implementation detail
  • Future-proof architecture: Switching middleware doesn't change user-facing startup process

Describe alternatives you've considered

Option 1: Python Subprocess Management

Create a launcher script (scripts/start.py) that:

  • Validates configuration
  • Starts Llama Stack as subprocess
  • Monitors health endpoints
  • Starts Lightspeed Stack
  • Handles SIGTERM/SIGINT for graceful shutdown

Option 2: Process Management in Makefile

run:
@echo "Starting Lightspeed Stack..."
@# Start Llama Stack in background
@UV run llama stack run run.yaml > .llama_stack.log 2>&1 & echo $$! > .llama_stack.pid
@sleep 2 # Wait for Llama Stack to be ready
@# Start Lightspeed Stack
@UV run src/lightspeed_stack.py
@# Cleanup on exit
@Kill cat .llama_stack.pid 2>/dev/null || true

Option 3: Use Existing Container Orchestration

Leverage the existing docker-compose.yaml for local development:
make run # calls: podman compose up

Acceptance Criteria

  • Users can start both services with single command: OPENAI_API_KEY= make run
  • Llama Stack automatically starts in the background (when using service mode)
  • Both services shut down gracefully with Ctrl+C
  • Error messages are clear if configuration is invalid
  • Documentation updated to reflect simplified startup
  • Library mode (embedded Llama Stack) continues to work as-is
  • Existing make run behavior smoothly migrated

Additional context

Architecture Philosophy

Middleware should be invisible: Users should focus on Lightspeed Stack capabilities (queries, RAG, agents, safety), not on how the middleware layer is implemented. The abstraction should be clean enough that switching from Llama Stack to another middleware in the future requires minimal user-facing changes.

This aligns with the following software architecture principles:

  • Separation of concerns: Application layer vs middleware layer
  • Single Responsibility: Users manage one service (Lightspeed Stack), not two
  • Open/Closed Principle: Open for middleware extension, closed for modification of user experience

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions