Skip to content

Physical AI Agent Capabilities

Physical AI Agents are designed to translate natural language instructions into executable robotics and computer vision code using the Telekinesis Skill Library. They operate at the code generation layer, composing Skills into structured programs that can be run in standard Python environments.

This section describes what Physical AI Agents can and cannot do in practice.

Core Capabilities

End-to-End Orchestration

Physical AI Agents generate complete perception-to-action pipelines from a single instruction. Rather than producing isolated code snippets, they reason about the full task — selecting the right Skills, chaining them in the correct order, and handling the data flow between steps. This includes everything from camera capture and object detection through to 3D pose estimation and robot motion commands.

Skill Discovery

Physical AI Agents understand the Telekinesis Skill Library and can identify the right Skills for your problem. Ask a question in natural language and the agent will surface the relevant Skills, explain when to use each one, and show how to combine them — without you needing to browse documentation first.

Code Generation & Refinement

Physical AI Agents can:

  • Generate executable Python code from natural language instructions
  • Select and compose Telekinesis Skills into functional pipelines
  • Build multi-step programs (e.g., detect → filter → track → output)
  • Adapt generated code through conversational follow-up
  • Produce structured outputs suitable for robotics and perception systems

Supported Task Types

Physical AI Agents are optimized for tasks involving perception, vision, and structured automation.

Vision Tasks

  • Object detection in images and video streams
  • Real-time webcam-based detection pipelines
  • Classification of visual inputs
  • Bounding box generation and annotation

3D & Point Cloud Tasks

  • Point cloud filtering, segmentation, and transformation
  • 3D pose estimation from RGB-D data
  • Pick pose computation from depth-registered detections

Robotics Tasks

  • Vision-guided pick-and-place pipelines
  • Eye-in-hand camera calibration and projection workflows
  • Multi-step perception-to-motion code generation

Video & Streaming Tasks

  • Frame-by-frame processing pipelines
  • Multi-object tracking across time
  • Real-time inference workflows

System Boundaries

Physical AI Agents operate within defined constraints:

  • No direct control of physical robots or hardware execution
  • No access to file systems or external project environments
  • No execution outside the provided Skill-based environment
  • No implicit system or workspace awareness

All outputs are generated as code that must be explicitly executed in a supported environment.

What Makes This System Different

Unlike end-to-end robotics policies, Physical AI Agents operate at the code generation layer rather than the control layer.

This enables:

  • Composability — complex behaviors built from reusable Skills
  • Transparency — generated logic is explicit and inspectable
  • Flexibility — output can be modified before execution
  • Modularity — Skills (including perception models) can be swapped or extended

Key Principle

Physical AI Agents do not directly perform actions. They generate structured programs that describe how actions should be executed using the Telekinesis Skill Library.

Their capability is defined by the Skills available to them and the structure of the generated code.