Physical AI Agent Capabilities
Physical AI Agents are designed to translate natural language instructions into executable robotics and computer vision code using the Telekinesis Skill Library. They operate at the code generation layer, composing Skills into structured programs that can be run in standard Python environments.
This section describes what Physical AI Agents can and cannot do in practice.
Core Capabilities
Physical AI Agents can:
- Generate executable Python code from natural language instructions
- Select and compose Telekinesis Skills into functional pipelines
- Translate vision-based tasks into structured processing workflows
- Build multi-step programs (e.g., detect → filter → track → output)
- Adapt generated code based on conversational refinement
- Produce structured outputs suitable for robotics and perception systems
Supported Task Types
Physical AI Agents are optimized for tasks involving perception, vision, and structured automation.
Vision Tasks
- Object detection in images and video streams
- Real-time webcam-based detection pipelines
- Classification of visual inputs
- Bounding box generation and annotation
Video & Streaming Tasks
- Frame-by-frame processing pipelines
- Multi-object tracking across time
- Real-time inference workflows
Pipeline Construction
- Multi-step vision processing workflows
- Composition of detection, filtering, and transformation steps
- Structured data extraction from visual inputs
Code Generation Tasks
- Python code generation using Telekinesis Skills
- Integration of vision libraries and robotics tooling
- Modular pipeline construction for reuse and extension
System Boundaries
Physical AI Agents operate within defined constraints:
- No direct control of physical robots or hardware execution
- No access to file systems or external project environments
- No persistent memory across sessions
- No execution outside the provided Skill-based environment
- No implicit system or workspace awareness
All outputs are generated as code that must be explicitly executed in a supported environment.
What Makes This System Different
Unlike end-to-end robotics policies, Physical AI Agents operate at the code generation layer rather than the control layer.
This enables:
- Composability — complex behaviors built from reusable Skills
- Transparency — generated logic is explicit and inspectable
- Flexibility — output can be modified before execution
- Modularity — Skills (including perception models) can be swapped or extended
Key Principle
Physical AI Agents do not directly perform actions. They generate structured programs that describe how actions should be executed using the Telekinesis Skill Library.
Their capability is defined by the Skills available to them and the structure of the generated code.

