GitHub - loadmill/tiny-cua

A minimal implementation of a Computer-Using Agent on top of OpenAI's computer use model, using Node.js and Playwright. It's only four files and has fewer than 350 lines of code.

node-playwright-cua-demo-2.mov

Goal

Automate web interactions in a browser with Node.js, Playwright, and OpenAI's computer use API. tiny-CUA can click, type, scroll, and navigate by analyzing screenshots and receiving AI-generated actions.

Note

Your OpenAI platform account must be Tier 3 to access the computer-use model.
More info: https://platform.openai.com/docs/models/computer-use-preview

How It Works

The agent launches a browser using Playwright.
It navigates to a provided URL.
The user enters commands in the terminal (or the agent reads them from a text file if specified).
The user input, along with a screenshot, is sent to the OpenAI computer-use model.
OpenAI manages the conversation context automatically.
If the API returns actions (for example, click or type), the agent performs them.
After each action, the agent takes another screenshot and sends it to OpenAI for further steps.
The loop continues until the user types exit.

Run the Agent

Install dependencies:
```
npm install
```
Install the Playwright browser:
```
npx playwright install chromium
```
Create a .env file with your OpenAI API key:
```
echo "OPENAI_API_KEY=your-key" > .env
```
Run tiny-CUA:
```
node index.js
```

Flags

--url=https://example.com/
Sets the initial URL to open. If not provided, a default test URL is used.
--save-har
Captures a HAR file of the session. When the session ends, the file cua-session-<timestamp>.har is created in the current directory.
--instructions=FILENAME or -i FILENAME
Reads commands from a text file, one line at a time. When the file is fully read, the agent continues in interactive mode unless one of the instructions was exit.

Example

node index.js --url=https://loadmill-center-12baa23ad9e4.herokuapp.com --save-har --instructions=example-instructions.txt

Sample example-instructions.txt:

Start a new chat
Send a hello world message in the chat
Go back to the main page
Go to the agent login and login using a@b.com and the pass 123456
reply "ok" to the first message
exit

When run, each line is passed to the agent as if you typed it in. If you include exit in the file, the session ends after that instruction. If you do not include exit, the agent switches to interactive mode once all file lines are consumed.

Code Structure

index.js
- Manages user input and the main loop
- Feeds either file-based instructions or interactive terminal input to the agent
- Sends the user's commands and a screenshot to the API with previousResponseId to maintain context on the OpenAI server
- Processes any returned computer actions until there are none left
actions.js
- Contains functions to execute actions on the browser page, including clicking, dragging, scrolling, typing, and more
openai.js
- Builds requests to the CUA API with messages, screenshots, and safety checks
browser.js
- Uses Playwright to launch Chromium with a fixed window size
- (Optional) Records a HAR file if launched with the --save-har flag

Handling Safety Checks

The CUA API may return pending_safety_checks for sensitive or potentially harmful requests. To proceed, you must include them as acknowledged_safety_checks in your next request. The current code acknowledges them automatically, but a real production system would likely pause or log them for confirmation.

Features

Performs actions in the browser (click, double-click, scroll, drag-and-drop, typing, and more).
Uses OpenAI to plan actions and maintain conversation context on the server side.
Sends iterative screenshots for real-time guidance from the model.
Acknowledges safety checks automatically for demonstration purposes.
Uses previousResponseId to keep messages minimal while linking conversation turns.
Captures a HAR file of the network activity when run with --save-har.
Reads commands line-by-line from a file when run with --instructions=FILENAME or -i FILENAME.

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
.gitignore		.gitignore
README.md		README.md
actions.js		actions.js
browser.js		browser.js
example-instructions.txt		example-instructions.txt
index.js		index.js
openai.js		openai.js
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Goal

How It Works

Run the Agent

Flags

Example

Code Structure

Handling Safety Checks

Features

About

Uh oh!

Releases

Packages

Languages

loadmill/tiny-cua

Folders and files

Latest commit

History

Repository files navigation

Goal

How It Works

Run the Agent

Flags

Example

Code Structure

Handling Safety Checks

Features

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages