Observations
The observation data your agent receives each step.
Observation Fields
Each step, the harness passes an Observation to your agent's get_action method:
| Field | Type | Description |
|---|---|---|
goal | string | Task objective |
url | string | Current page URL |
axtree_txt | string | Accessibility tree as text with element BIDs |
axtree_object | AXTreeObject | Structured accessibility tree |
extra_element_properties | Record<string, ExtraElementProperties> | Visibility, bounding box, clickable flag per element |
screenshot | Uint8Array | Page screenshot (when task has screenshot verifier) |
last_action | string | Previous action executed |
last_action_error | string | Error from previous action (includes verifier feedback) |
elapsed_time | number | Seconds since task start |
chat_messages | ChatMessage[] | All chat messages with timestamps |
open_pages_urls | string[] | URLs of all open pages |
active_page_index | number | Index of current active page |
focused_element_bid | string | BID of currently focused element |
action_history | string[] | Semantic summaries of prior actions |
Element Identification (BID System)
Elements are identified by a Browser ID (BID) that encodes their position across frames:
| BID | Meaning |
|---|---|
42 | Element 42 in main frame |
a5 | Element 5 inside iframe a |
m12 | Element 12 inside iframe m |
ab3 | Element 3 inside iframe b, nested inside iframe a |
The SDK automatically marks elements in all frames (including nested iframes) with unique BIDs via CDP, extracts accessibility trees from all frames, and resolves the correct frame context when executing actions.
Accessibility Tree Format
| Flag | Meaning |
|---|---|
visible | Element is at least 50% visible in viewport |
clickable | Element has CDP isClickable flag |