Physical AI Agent Capabilities

Physical AI Agents are designed to translate natural language instructions into executable robotics and computer vision code using the Telekinesis Skill Library. They operate at the code generation layer, composing Skills into structured programs that can be run in standard Python environments.

This section describes what Physical AI Agents can and cannot do in practice.

Core Capabilities

End-to-End Orchestration

Physical AI Agents generate complete perception-to-action pipelines from a single instruction. Rather than producing isolated code snippets, they reason about the full task — selecting the right Skills, chaining them in the correct order, and handling the data flow between steps. This includes everything from camera capture and object detection through to 3D pose estimation and robot motion commands.

Skill Discovery

Physical AI Agents understand the Telekinesis Skill Library and can identify the right Skills for your problem. Ask a question in natural language and the agent will surface the relevant Skills, explain when to use each one, and show how to combine them — without you needing to browse documentation first.

Physical AI Agents can:

Generate executable Python code from natural language instructions
Select and compose Telekinesis Skills into functional pipelines
Build multi-step programs (e.g., detect → filter → track → output)
Adapt generated code through conversational follow-up
Produce structured outputs suitable for robotics and perception systems

Supported Task Types

Physical AI Agents are optimized for tasks involving perception, vision, and structured automation.

Vision Tasks

Object detection in images and video streams
Real-time webcam-based detection pipelines
Classification of visual inputs
Bounding box generation and annotation

3D & Point Cloud Tasks

Point cloud filtering, segmentation, and transformation
3D pose estimation from RGB-D data
Pick pose computation from depth-registered detections

Robotics Tasks

Vision-guided pick-and-place pipelines
Eye-in-hand camera calibration and projection workflows
Multi-step perception-to-motion code generation

Video & Streaming Tasks

Frame-by-frame processing pipelines
Multi-object tracking across time
Real-time inference workflows

System Boundaries

Physical AI Agents operate within defined constraints:

No direct control of physical robots or hardware execution
No access to file systems or external project environments
No execution outside the provided Skill-based environment
No implicit system or workspace awareness

All outputs are generated as code that must be explicitly executed in a supported environment.

What Makes This System Different

Unlike end-to-end robotics policies, Physical AI Agents operate at the code generation layer rather than the control layer.

This enables:

Composability — complex behaviors built from reusable Skills
Transparency — generated logic is explicit and inspectable
Flexibility — output can be modified before execution
Modularity — Skills (including perception models) can be swapped or extended

Key Principle

Physical AI Agents do not directly perform actions. They generate structured programs that describe how actions should be executed using the Telekinesis Skill Library.

Their capability is defined by the Skills available to them and the structure of the generated code.

Physical AI Agent Capabilities ​

Core Capabilities ​

End-to-End Orchestration ​

Skill Discovery ​

Code Generation & Refinement ​

Supported Task Types ​

Vision Tasks ​

3D & Point Cloud Tasks ​

Robotics Tasks ​

Video & Streaming Tasks ​

System Boundaries ​

What Makes This System Different ​

Key Principle ​