A minimal implementation of a Computer-Using Agent on top of OpenAI's computer use model, using Node.js and Playwright. It's only four files and has fewer than 350 lines of code.
node-playwright-cua-demo-2.mov
Automate web interactions in a browser with Node.js, Playwright, and OpenAI's computer use API. tiny-CUA can click, type, scroll, and navigate by analyzing screenshots and receiving AI-generated actions.
Note
Your OpenAI platform account must be Tier 3 to access the computer-use model.
More info: https://platform.openai.com/docs/models/computer-use-preview
- The agent launches a browser using Playwright.
- It navigates to a provided URL.
- The user enters commands in the terminal (or the agent reads them from a text file if specified).
- The user input, along with a screenshot, is sent to the OpenAI computer-use model.
- OpenAI manages the conversation context automatically.
- If the API returns actions (for example, click or type), the agent performs them.
- After each action, the agent takes another screenshot and sends it to OpenAI for further steps.
- The loop continues until the user types exit.
- Install dependencies:
npm install
- Install the Playwright browser:
npx playwright install chromium
- Create a
.envfile with your OpenAI API key:echo "OPENAI_API_KEY=your-key" > .env
- Run tiny-CUA:
node index.js
-
--url=https://example.com/
Sets the initial URL to open. If not provided, a default test URL is used. -
--save-har
Captures a HAR file of the session. When the session ends, the filecua-session-<timestamp>.haris created in the current directory. -
--instructions=FILENAMEor-i FILENAME
Reads commands from a text file, one line at a time. When the file is fully read, the agent continues in interactive mode unless one of the instructions wasexit.
node index.js --url=https://loadmill-center-12baa23ad9e4.herokuapp.com --save-har --instructions=example-instructions.txtSample example-instructions.txt:
Start a new chat
Send a hello world message in the chat
Go back to the main page
Go to the agent login and login using a@b.com and the pass 123456
reply "ok" to the first message
exit
When run, each line is passed to the agent as if you typed it in. If you include exit in the file, the session ends after that instruction. If you do not include exit, the agent switches to interactive mode once all file lines are consumed.
-
index.js
- Manages user input and the main loop
- Feeds either file-based instructions or interactive terminal input to the agent
- Sends the user's commands and a screenshot to the API with
previousResponseIdto maintain context on the OpenAI server - Processes any returned computer actions until there are none left
-
actions.js
- Contains functions to execute actions on the browser page, including clicking, dragging, scrolling, typing, and more
-
openai.js
- Builds requests to the CUA API with messages, screenshots, and safety checks
-
browser.js
- Uses Playwright to launch Chromium with a fixed window size
- (Optional) Records a HAR file if launched with the
--save-harflag
The CUA API may return pending_safety_checks for sensitive or potentially harmful requests. To proceed, you must include them as acknowledged_safety_checks in your next request. The current code acknowledges them automatically, but a real production system would likely pause or log them for confirmation.
- Performs actions in the browser (click, double-click, scroll, drag-and-drop, typing, and more).
- Uses OpenAI to plan actions and maintain conversation context on the server side.
- Sends iterative screenshots for real-time guidance from the model.
- Acknowledges safety checks automatically for demonstration purposes.
- Uses
previousResponseIdto keep messages minimal while linking conversation turns. - Captures a HAR file of the network activity when run with
--save-har. - Reads commands line-by-line from a file when run with
--instructions=FILENAMEor-i FILENAME.
