Getting Started
Train, evaluate, and improve your browser agents with Trayn.
Installation
Quick Start
Record a workflow in the Trayn web app. Trayn generates a task definition (goal, verifiers) from your recording and stores it alongside the sandbox. Then point the SDK at the sandbox URL:
The SDK fetches the task definition from the backend, runs your agent against the sandbox, grades the result, and stores memories for learning.
Programmatic Usage
How Tasks Work
- Record a workflow in the browser extension on a real website
- Task generated — Trayn AI analyzes the recording and creates a task definition with goal text, verifiers, and expected actions
- Edit — You can edit the goal and verifiers in the web app; edits are persisted
- Run — The SDK fetches the latest task definition from the backend when you provide a sandbox URL
Tasks are stored in S3 at sandbox-tasks/{sessionId}/task.json. The SDK reads this via fetchTaskFromUrl(). If you edit a task locally with --confirm, changes are pushed back to the backend via pushTaskToRemote().
Curated Benchmarks
Pre-built benchmark suites are available for testing without recordings:
These use local task files bundled with the SDK. The primary workflow is URL-based fetching from the backend.
Benchmark Suites
| Type | Domain | Tasks |
|---|---|---|
| omnizon | E-commerce | 10 |
| dashdish | Food delivery | 11 |
| staynb | Travel booking | 9 |
| fly-unified | Flight booking | 14 |
| gocalendar | Calendar | 10 |
| gomail | 8 | |
| networkin | Social network | 10 |
| opendining | Reservations | 10 |
| topwork | Freelance | 9 |
| udriver | Ride sharing | 11 |
| zilloft | Real estate | 10 |