browser-use · sunnymodi21 · Aug 21, 2025 · Aug 21, 2025 · Aug 21, 2025 · Aug 26, 2025
diff --git a/CURL_USAGE.md b/CURL_USAGE.md
@@ -0,0 +1,172 @@
+# Using the Browser Agent with curl
+
+The simple browser agent endpoint provides a REST API that you can interact with using curl.
+
+## Quick Examples
+
+### Method 1: One-liner (Simple but waits for completion)
+
+```bash
+# Submit task and wait for result
+EVENT_ID=$(curl -s -X POST 'http://127.0.0.1:7789/gradio_api/call/predict' \
+  -H 'Content-Type: application/json' \
+  -d '{"data":["Go to example.com and get the page title"]}' | \
+  python3 -c "import sys, json; print(json.load(sys.stdin)['event_id'])")
+
+echo "Event ID: $EVENT_ID"
+
+# Wait a bit for the agent to complete (adjust time based on task complexity)
+sleep 30
+
+# Get the result
+curl -s -N "http://127.0.0.1:7789/gradio_api/call/predict/${EVENT_ID}" | \
+  grep 'process_completed' | \
+  sed 's/^data: //' | \
+  python3 -c "import sys, json; print(json.load(sys.stdin)['output']['data'][0])"
+```
+
+### Method 2: Using the provided script
+
+```bash
+# Use the ready-made script
+./curl_simple.sh "Your task here"
+
+# Examples:
+./curl_simple.sh "Go to wikipedia.org and search for 'artificial intelligence'"
+./curl_simple.sh "Navigate to google.com and tell me what you see"
+```
+
+## API Details
+
+### Endpoint: `/gradio_api/call/predict`
+
+**Base URL:** `http://127.0.0.1:7789`
+
+### Step 1: Submit a Task
+
+**Request:**
+```bash
+POST /gradio_api/call/predict
+Content-Type: application/json
+
+{
+  "data": ["Your task description here"]
+}
+```
+
+**Example:**
+```bash
+curl -X POST 'http://127.0.0.1:7789/gradio_api/call/predict' \
+  -H 'Content-Type: application/json' \
+  -d '{"data":["Go to example.com and tell me the page title"]}'
+```
+
+**Response:**
+```json
+{
+  "event_id": "e29f3556d3754156ad7ed239780db6e8"
+}
+```
+
+### Step 2: Get the Result
+
+**Request:**
+```bash
+GET /gradio_api/call/predict/{event_id}
+```
+
+**Example:**
+```bash
+curl -N 'http://127.0.0.1:7789/gradio_api/call/predict/e29f3556d3754156ad7ed239780db6e8'
+```
+
+**Response Stream:**
+The endpoint returns Server-Sent Events (SSE). Look for the `process_completed` event:
+
+```
+event: generating
+data: {"msg":"process_generating",...}
+
+event: complete
+data: {"msg":"process_completed","output":{"data":["The page title is: Example Domain"]},...}
+```
+
+## Advanced: Complete curl Example
+
+```bash
+#!/bin/bash
+
+# Configuration
+API_URL="http://127.0.0.1:7789/gradio_api"
+TASK="Go to example.com and get the page title"
+
+# Submit task
+echo "Submitting task: $TASK"
+RESPONSE=$(curl -s -X POST "${API_URL}/call/predict" \
+  -H "Content-Type: application/json" \
+  -d "{\"data\":[\"${TASK}\"]}")
+
+EVENT_ID=$(echo "$RESPONSE" | python3 -c "import sys, json; print(json.load(sys.stdin)['event_id'])")
+echo "Event ID: $EVENT_ID"
+
+# Poll for result (adjust sleep time based on task complexity)
+echo "Waiting for completion..."
+for i in {1..60}; do
+  RESULT=$(curl -s -N "${API_URL}/call/predict/${EVENT_ID}" 2>/dev/null | \
+    grep -m 1 'process_completed' | \
+    sed 's/^data: //')
+
+  if [ ! -z "$RESULT" ]; then
+    echo "$RESULT" | python3 -c "import sys, json; print(json.load(sys.stdin)['output']['data'][0])"
+    exit 0
+  fi
+
+  echo -n "."
+  sleep 2
+done
+
+echo "\nTimeout"
+```
+
+## Task Examples
+
+```bash
+# Web navigation
+"Go to github.com and find the trending repositories"
+
+# Information extraction
+"Visit weather.com and tell me the temperature in San Francisco"
+
+# Web search
+"Go to google.com and search for 'machine learning tutorials'"
+
+# Form interaction
+"Navigate to example.com/contact and fill out the contact form"
+```
+
+## Model Configuration
+
+The simple agent uses these defaults:
+- **Model:** claude-sonnet-4-5-20250929 (Anthropic)
+- **Temperature:** 0.6
+- **Max Steps:** 100
+- **Max Actions per Step:** 10
+- **Vision:** Enabled
+- **Browser:** Headless mode
+
+The API key is read from the `ANTHROPIC_API_KEY` environment variable.
+
+## Tips
+
+1. **Task Complexity:** Simple tasks (viewing a page) complete in ~10-30 seconds. Complex tasks (form filling, searches) may take 1-3 minutes.
+
+2. **Timeout:** Set appropriate timeouts based on your task. The script defaults to 5 minutes.
+
+3. **Error Handling:** Check for errors in the response:
+```bash
+if echo "$RESPONSE" | grep -q "Error:"; then
+  echo "Task failed"
+fi
+```
+
+4. **Multiple Requests:** Each task creates a new browser session, so requests are independent.
diff --git a/Dockerfile b/Dockerfile
@@ -45,6 +45,7 @@ RUN apt-get update && apt-get install -y \
     fonts-dejavu-core \
     fonts-dejavu-extra \
     vim \
+    pipx \
     && rm -rf /var/lib/apt/lists/*
 
 # Install noVNC
@@ -70,20 +71,28 @@ WORKDIR /app
 COPY requirements.txt .
 RUN pip install --no-cache-dir -r requirements.txt
 
-# Playwright setup
-ENV PLAYWRIGHT_BROWSERS_PATH=/ms-browsers
-RUN mkdir -p $PLAYWRIGHT_BROWSERS_PATH
+# Install uv and uvx for browser-use
+RUN pip install --no-cache-dir uv
 
-# Install Chromium via Playwright without --with-deps
-RUN PLAYWRIGHT_SKIP_BROWSER_DOWNLOAD=0 playwright install chromium
+# Install Chromium browser for browser-use
+RUN apt-get update \
+    && apt-get install -y chromium chromium-driver \
+    && rm -rf /var/lib/apt/lists/*
+
+# Set Chrome path for browser-use
+ENV CHROME_BIN=/usr/bin/chromium
+ENV DISPLAY=:99
+
+# Also create a symlink for uvx
+RUN ln -s /usr/local/bin/uv /usr/local/bin/uvx || true
 
 # Copy application code
 COPY . .
 
-# Set up supervisor configuration
-RUN mkdir -p /var/log/supervisor
+# Set up supervisor configuration and DBus
+RUN mkdir -p /var/log/supervisor /run/dbus
 COPY supervisord.conf /etc/supervisor/conf.d/supervisord.conf
 
-EXPOSE 7788 6080 5901 9222
+EXPOSE 7788 6080 5901 9222 3000
 
 CMD ["/usr/bin/supervisord", "-c", "/etc/supervisor/conf.d/supervisord.conf"]
diff --git a/README.md b/README.md
@@ -55,21 +55,12 @@ Activate the virtual environment:
 source .venv/bin/activate
 ```
 
-#### Step 3: Install Dependencies
-Install Python packages:
+#### Step 3: Install Python Packages
+Install the required Python packages using uv:
 ```bash
 uv pip install -r requirements.txt
 ```
 
-Install Browsers in playwright. 
-```bash
-playwright install --with-deps
-```
-Or you can install specific browsers by running:
-```bash
-playwright install chromium --with-deps
-```
-
 #### Step 4: Configure Environment
 1. Create a copy of the example environment file:
 - Windows (Command Prompt):

diff --git a/docker-compose.yml b/docker-compose.yml
@@ -42,6 +42,8 @@ services:
       # Application Settings
       - ANONYMIZED_TELEMETRY=${ANONYMIZED_TELEMETRY:-false}
       - BROWSER_USE_LOGGING_LEVEL=${BROWSER_USE_LOGGING_LEVEL:-info}
+      - BROWSER_USE_API_KEY=${BROWSER_USE_API_KEY:-}
+      - DEFAULT_LLM=${DEFAULT_LLM:-anthropic}
 
       # Browser Settings
       - BROWSER_PATH=
@@ -54,9 +56,6 @@ services:
 
       # Display Settings
       - DISPLAY=:99
-      # This ENV is used by the Dockerfile during build time if playwright respects it.
-      # It's not strictly needed at runtime by docker-compose unless your app or scripts also read it.
-      - PLAYWRIGHT_BROWSERS_PATH=/ms-browsers # Matches Dockerfile ENV
       - RESOLUTION=${RESOLUTION:-1920x1080x24}
       - RESOLUTION_WIDTH=${RESOLUTION_WIDTH:-1920}
       - RESOLUTION_HEIGHT=${RESOLUTION_HEIGHT:-1080}

diff --git a/requirements.txt b/requirements.txt
@@ -1,10 +1,5 @@
-browser-use==0.1.48
+browser-use==0.10.1
 pyperclip==1.9.0
-gradio==5.27.0
-json-repair
-langchain-mistralai==0.2.4
+gradio==5.49.1
+json-repair==0.49.0
 MainContentExtractor==0.0.4
-langchain-ibm==0.3.10
-langchain_mcp_adapters==0.0.9
-langgraph==0.3.34
-langchain-community
diff --git a/simple_agent.py b/simple_agent.py
@@ -0,0 +1,109 @@
+"""
+Simple Gradio endpoint for browser-use agent.
+Uses Anthropic claude-sonnet-4-5 with default settings.
+Only accepts user input - everything else is default.
+"""
+from dotenv import load_dotenv
+load_dotenv()
+
+import asyncio
+import os
+import logging
+import gradio as gr
+from browser_use import Agent, Controller
+from browser_use.browser import BrowserSession
+from browser_use.browser.profile import BrowserProfile
+from browser_use.llm.anthropic.chat import ChatAnthropic
+
+logging.basicConfig(level=logging.INFO)
+logger = logging.getLogger(__name__)
+
+# Default configuration
+DEFAULT_MODEL = "claude-sonnet-4-5-20250929"
+DEFAULT_TEMPERATURE = 0.6
+DEFAULT_MAX_STEPS = 100
+DEFAULT_MAX_ACTIONS_PER_STEP = 10
+
+
+async def run_agent(task: str) -> str:
+    """Run the browser agent with the given task."""
+    if not task.strip():
+        return "Please enter a task."
+
+    # Get API key from environment
+    api_key = os.getenv("ANTHROPIC_API_KEY")
+    if not api_key:
+        return "Error: ANTHROPIC_API_KEY not found in environment variables."
+
+    # Initialize LLM
+    llm = ChatAnthropic(
+        model=DEFAULT_MODEL,
+        api_key=api_key,
+        temperature=DEFAULT_TEMPERATURE,
+    )
+
+    # Initialize browser
+    browser_profile = BrowserProfile(
+        headless=True,
+        viewport={'width': 1280, 'height': 1100},
+        chrome_binary_path="/usr/bin/google-chrome"
+    )
+
+    browser = BrowserSession(browser_profile=browser_profile, is_local=True)
+    controller = Controller()
+
+    # Create agent
+    agent = Agent(
+        task=task,
+        llm=llm,
+        browser_session=browser,
+        controller=controller,
+        use_vision=True,
+        max_actions_per_step=DEFAULT_MAX_ACTIONS_PER_STEP,
+    )
+
+    try:
+        # Run agent
+        logger.info(f"Starting agent with task: {task}")
+        history = await agent.run(max_steps=DEFAULT_MAX_STEPS)
+
+        # Get result
+        result = history.final_result() if history else "No result"
+        steps = len(history.history) if history and history.history else 0
+
+        return f"Task completed in {steps} steps.\n\nResult:\n{result}"
+    except Exception as e:
+        logger.error(f"Agent error: {e}", exc_info=True)
+        return f"Error: {str(e)}"
+    finally:
+        await browser.stop()
+
+
+def run_agent_sync(task: str) -> str:
+    """Synchronous wrapper for the async agent."""
+    return asyncio.run(run_agent(task))
+
+
+# Create Gradio interface
+demo = gr.Interface(
+    fn=run_agent_sync,
+    inputs=gr.Textbox(
+        label="Task",
+        placeholder="Enter your task for the browser agent...",
+        lines=3,
+    ),
+    outputs=gr.Textbox(label="Result", lines=10),
+    title="Browser Agent",
+    description=f"Simple browser agent using Anthropic {DEFAULT_MODEL}. Enter a task and the agent will execute it.",
+    allow_flagging="never",
+)
+
+
+if __name__ == "__main__":
+    import argparse
+    parser = argparse.ArgumentParser(description="Simple Browser Agent Endpoint")
+    parser.add_argument("--ip", type=str, default="127.0.0.1", help="IP address to bind to")
+    parser.add_argument("--port", type=int, default=7789, help="Port to listen on")
+    args = parser.parse_args()
+
+    demo.launch(server_name=args.ip, server_port=args.port)