Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
c760552
Update dependencies and refactor browser_use integration for compatib…
sunnymodi21 Aug 21, 2025
35419f5
Remove deprecated dependency `libgconf-2-4` from Dockerfile to stream…
sunnymodi21 Aug 21, 2025
3b653c5
Enhance Dockerfile and requirements for improved browser support and …
sunnymodi21 Aug 21, 2025
372f638
Refactor update dependencies, and enhance deep research agent functio…
sunnymodi21 Aug 26, 2025
844e2f2
Update `requirements.txt` to use `browser-use==0.7.10` and refactor i…
sunnymodi21 Oct 7, 2025
f1cdc11
Update dependencies in `requirements.txt` to `browser-use==0.9.4` and…
sunnymodi21 Oct 30, 2025
f09ee93
Merge upstream changes: sync with browser-use/web-ui
sunnymodi21 Oct 30, 2025
692808e
remove playwright
sunnymodi21 Oct 30, 2025
fa09c4c
clean up redundant files
sunnymodi21 Oct 30, 2025
e020caf
Update src/webui/components/browser_use_agent_tab.py
sunnymodi21 Nov 2, 2025
3721276
Update src/agent/deep_research/deep_research_agent.py
sunnymodi21 Nov 3, 2025
425f0f3
cubic-dev-ai comments
sunnymodi21 Nov 3, 2025
60102ad
Update src/browser/browser_compat.py
sunnymodi21 Nov 5, 2025
f458429
agent only option
sunnymodi21 Dec 5, 2025
1427bc0
Update `requirements.txt` to upgrade `browser-use` from version `0.9.…
sunnymodi21 Dec 6, 2025
39faaa6
fix local browser use
sunnymodi21 Dec 9, 2025
79e92e3
Delete curl_simple.sh
sunnymodi21 Dec 10, 2025
6d9dd69
Delete curl_example.sh
sunnymodi21 Dec 10, 2025
a386f57
Delete test_agent.py
sunnymodi21 Dec 10, 2025
64d2c25
Merge pull request #1 from sunnymodi21/agent-only
sunnymodi21 Dec 10, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
172 changes: 172 additions & 0 deletions CURL_USAGE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,172 @@
# Using the Browser Agent with curl

The simple browser agent endpoint provides a REST API that you can interact with using curl.

## Quick Examples

### Method 1: One-liner (Simple but waits for completion)

```bash
# Submit task and wait for result
EVENT_ID=$(curl -s -X POST 'http://127.0.0.1:7789/gradio_api/call/predict' \
-H 'Content-Type: application/json' \
-d '{"data":["Go to example.com and get the page title"]}' | \
python3 -c "import sys, json; print(json.load(sys.stdin)['event_id'])")

echo "Event ID: $EVENT_ID"

# Wait a bit for the agent to complete (adjust time based on task complexity)
sleep 30

# Get the result
curl -s -N "http://127.0.0.1:7789/gradio_api/call/predict/${EVENT_ID}" | \
grep 'process_completed' | \
sed 's/^data: //' | \
python3 -c "import sys, json; print(json.load(sys.stdin)['output']['data'][0])"
```

### Method 2: Using the provided script

```bash
# Use the ready-made script
./curl_simple.sh "Your task here"

# Examples:
./curl_simple.sh "Go to wikipedia.org and search for 'artificial intelligence'"
./curl_simple.sh "Navigate to google.com and tell me what you see"
```

## API Details

### Endpoint: `/gradio_api/call/predict`

**Base URL:** `http://127.0.0.1:7789`

### Step 1: Submit a Task

**Request:**
```bash
POST /gradio_api/call/predict
Content-Type: application/json

{
"data": ["Your task description here"]
}
```

**Example:**
```bash
curl -X POST 'http://127.0.0.1:7789/gradio_api/call/predict' \
-H 'Content-Type: application/json' \
-d '{"data":["Go to example.com and tell me the page title"]}'
```

**Response:**
```json
{
"event_id": "e29f3556d3754156ad7ed239780db6e8"
}
```

### Step 2: Get the Result

**Request:**
```bash
GET /gradio_api/call/predict/{event_id}
```

**Example:**
```bash
curl -N 'http://127.0.0.1:7789/gradio_api/call/predict/e29f3556d3754156ad7ed239780db6e8'
```

**Response Stream:**
The endpoint returns Server-Sent Events (SSE). Look for the `process_completed` event:

```
event: generating
data: {"msg":"process_generating",...}

event: complete
data: {"msg":"process_completed","output":{"data":["The page title is: Example Domain"]},...}
```

## Advanced: Complete curl Example

```bash
#!/bin/bash

# Configuration
API_URL="http://127.0.0.1:7789/gradio_api"
TASK="Go to example.com and get the page title"

# Submit task
echo "Submitting task: $TASK"
RESPONSE=$(curl -s -X POST "${API_URL}/call/predict" \
-H "Content-Type: application/json" \
-d "{\"data\":[\"${TASK}\"]}")

EVENT_ID=$(echo "$RESPONSE" | python3 -c "import sys, json; print(json.load(sys.stdin)['event_id'])")
echo "Event ID: $EVENT_ID"

# Poll for result (adjust sleep time based on task complexity)
echo "Waiting for completion..."
for i in {1..60}; do
RESULT=$(curl -s -N "${API_URL}/call/predict/${EVENT_ID}" 2>/dev/null | \
grep -m 1 'process_completed' | \
sed 's/^data: //')

if [ ! -z "$RESULT" ]; then
echo "$RESULT" | python3 -c "import sys, json; print(json.load(sys.stdin)['output']['data'][0])"
exit 0
fi

echo -n "."
sleep 2
done

echo "\nTimeout"
```

## Task Examples

```bash
# Web navigation
"Go to github.com and find the trending repositories"

# Information extraction
"Visit weather.com and tell me the temperature in San Francisco"

# Web search
"Go to google.com and search for 'machine learning tutorials'"

# Form interaction
"Navigate to example.com/contact and fill out the contact form"
```

## Model Configuration

The simple agent uses these defaults:
- **Model:** claude-sonnet-4-5-20250929 (Anthropic)
- **Temperature:** 0.6
- **Max Steps:** 100
- **Max Actions per Step:** 10
- **Vision:** Enabled
- **Browser:** Headless mode

The API key is read from the `ANTHROPIC_API_KEY` environment variable.

## Tips

1. **Task Complexity:** Simple tasks (viewing a page) complete in ~10-30 seconds. Complex tasks (form filling, searches) may take 1-3 minutes.

2. **Timeout:** Set appropriate timeouts based on your task. The script defaults to 5 minutes.

3. **Error Handling:** Check for errors in the response:
```bash
if echo "$RESPONSE" | grep -q "Error:"; then
echo "Task failed"
fi
```

4. **Multiple Requests:** Each task creates a new browser session, so requests are independent.
25 changes: 17 additions & 8 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@ RUN apt-get update && apt-get install -y \
fonts-dejavu-core \
fonts-dejavu-extra \
vim \
pipx \
&& rm -rf /var/lib/apt/lists/*

# Install noVNC
Expand All @@ -70,20 +71,28 @@ WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Playwright setup
ENV PLAYWRIGHT_BROWSERS_PATH=/ms-browsers
RUN mkdir -p $PLAYWRIGHT_BROWSERS_PATH
# Install uv and uvx for browser-use
RUN pip install --no-cache-dir uv

# Install Chromium via Playwright without --with-deps
RUN PLAYWRIGHT_SKIP_BROWSER_DOWNLOAD=0 playwright install chromium
# Install Chromium browser for browser-use
RUN apt-get update \
&& apt-get install -y chromium chromium-driver \
&& rm -rf /var/lib/apt/lists/*

# Set Chrome path for browser-use
ENV CHROME_BIN=/usr/bin/chromium
ENV DISPLAY=:99

# Also create a symlink for uvx
RUN ln -s /usr/local/bin/uv /usr/local/bin/uvx || true

# Copy application code
COPY . .

# Set up supervisor configuration
RUN mkdir -p /var/log/supervisor
# Set up supervisor configuration and DBus
RUN mkdir -p /var/log/supervisor /run/dbus
COPY supervisord.conf /etc/supervisor/conf.d/supervisord.conf

EXPOSE 7788 6080 5901 9222
EXPOSE 7788 6080 5901 9222 3000

CMD ["/usr/bin/supervisord", "-c", "/etc/supervisor/conf.d/supervisord.conf"]
13 changes: 2 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,21 +55,12 @@ Activate the virtual environment:
source .venv/bin/activate
```

#### Step 3: Install Dependencies
Install Python packages:
#### Step 3: Install Python Packages
Install the required Python packages using uv:
```bash
uv pip install -r requirements.txt
```

Install Browsers in playwright.
```bash
playwright install --with-deps
```
Or you can install specific browsers by running:
```bash
playwright install chromium --with-deps
```

#### Step 4: Configure Environment
1. Create a copy of the example environment file:
- Windows (Command Prompt):
Expand Down
5 changes: 2 additions & 3 deletions docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,8 @@ services:
# Application Settings
- ANONYMIZED_TELEMETRY=${ANONYMIZED_TELEMETRY:-false}
- BROWSER_USE_LOGGING_LEVEL=${BROWSER_USE_LOGGING_LEVEL:-info}
- BROWSER_USE_API_KEY=${BROWSER_USE_API_KEY:-}
- DEFAULT_LLM=${DEFAULT_LLM:-anthropic}

# Browser Settings
- BROWSER_PATH=
Expand All @@ -54,9 +56,6 @@ services:

# Display Settings
- DISPLAY=:99
# This ENV is used by the Dockerfile during build time if playwright respects it.
# It's not strictly needed at runtime by docker-compose unless your app or scripts also read it.
- PLAYWRIGHT_BROWSERS_PATH=/ms-browsers # Matches Dockerfile ENV
- RESOLUTION=${RESOLUTION:-1920x1080x24}
- RESOLUTION_WIDTH=${RESOLUTION_WIDTH:-1920}
- RESOLUTION_HEIGHT=${RESOLUTION_HEIGHT:-1080}
Expand Down
11 changes: 3 additions & 8 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -1,10 +1,5 @@
browser-use==0.1.48
browser-use==0.10.1
pyperclip==1.9.0
gradio==5.27.0
json-repair
langchain-mistralai==0.2.4
gradio==5.49.1
json-repair==0.49.0
MainContentExtractor==0.0.4
langchain-ibm==0.3.10
langchain_mcp_adapters==0.0.9
langgraph==0.3.34
langchain-community
109 changes: 109 additions & 0 deletions simple_agent.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
"""
Simple Gradio endpoint for browser-use agent.
Uses Anthropic claude-sonnet-4-5 with default settings.
Only accepts user input - everything else is default.
"""
from dotenv import load_dotenv
load_dotenv()

import asyncio
import os
import logging
import gradio as gr
from browser_use import Agent, Controller
from browser_use.browser import BrowserSession
from browser_use.browser.profile import BrowserProfile
from browser_use.llm.anthropic.chat import ChatAnthropic

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

# Default configuration
DEFAULT_MODEL = "claude-sonnet-4-5-20250929"
DEFAULT_TEMPERATURE = 0.6
DEFAULT_MAX_STEPS = 100
DEFAULT_MAX_ACTIONS_PER_STEP = 10


async def run_agent(task: str) -> str:
"""Run the browser agent with the given task."""
if not task.strip():
return "Please enter a task."

# Get API key from environment
api_key = os.getenv("ANTHROPIC_API_KEY")
if not api_key:
return "Error: ANTHROPIC_API_KEY not found in environment variables."

# Initialize LLM
llm = ChatAnthropic(
model=DEFAULT_MODEL,
api_key=api_key,
temperature=DEFAULT_TEMPERATURE,
)

# Initialize browser
browser_profile = BrowserProfile(
headless=True,
viewport={'width': 1280, 'height': 1100},
chrome_binary_path="/usr/bin/google-chrome"
)

browser = BrowserSession(browser_profile=browser_profile, is_local=True)
controller = Controller()

# Create agent
agent = Agent(
task=task,
llm=llm,
browser_session=browser,
controller=controller,
use_vision=True,
max_actions_per_step=DEFAULT_MAX_ACTIONS_PER_STEP,
)

try:
# Run agent
logger.info(f"Starting agent with task: {task}")
history = await agent.run(max_steps=DEFAULT_MAX_STEPS)

# Get result
result = history.final_result() if history else "No result"
steps = len(history.history) if history and history.history else 0

return f"Task completed in {steps} steps.\n\nResult:\n{result}"
except Exception as e:
logger.error(f"Agent error: {e}", exc_info=True)
return f"Error: {str(e)}"
finally:
await browser.stop()


def run_agent_sync(task: str) -> str:
"""Synchronous wrapper for the async agent."""
return asyncio.run(run_agent(task))


# Create Gradio interface
demo = gr.Interface(
fn=run_agent_sync,
inputs=gr.Textbox(
label="Task",
placeholder="Enter your task for the browser agent...",
lines=3,
),
outputs=gr.Textbox(label="Result", lines=10),
title="Browser Agent",
description=f"Simple browser agent using Anthropic {DEFAULT_MODEL}. Enter a task and the agent will execute it.",
allow_flagging="never",
)


if __name__ == "__main__":
import argparse
parser = argparse.ArgumentParser(description="Simple Browser Agent Endpoint")
parser.add_argument("--ip", type=str, default="127.0.0.1", help="IP address to bind to")
parser.add_argument("--port", type=int, default=7789, help="Port to listen on")
args = parser.parse_args()

demo.launch(server_name=args.ip, server_port=args.port)
Loading