Prompting Guide for Physical AI Agents

Physical AI Agents generate code from natural language instructions. The quality of the output depends heavily on how clearly the task is specified. This guide explains how to write effective prompts that produce reliable, production-ready code.

Basic Prompt Structure

A good prompt clearly defines three things:

What to do (the task)
What to use (input source like webcam or image)
What to return (expected output format)

Example

Detect people in a webcam feed and return bounding boxes.

This is sufficient for the agent to select appropriate Skills and generate code.

Good vs Bad Prompts

Good Prompts

Detect people in a webcam feed and draw bounding boxes on the output frame

Detect cell phones in a video stream and return confidence scores

Capture an image from the webcam and save it to disk

These work well because they are:

specific
action-oriented
clearly scoped

Bad Prompts

Make it smart

Build a vision system

Do object detection

These fail because they:

do not define inputs
do not define outputs
leave interpretation too open-ended

Adding Constraints

You can improve reliability by explicitly adding constraints to your prompt:

Performance constraints

run in real-time

process at 10 FPS

Output format

return bounding boxes in JSON

save annotated image to disk

Behavior constraints

only detect people, ignore other objects

filter out low-confidence detections

Example:

Detect people in a webcam feed, return bounding boxes in JSON format, and run in real time.

Multi-Step Tasks

Physical AI Agents can handle multi-step workflows when explicitly described.

Instead of:

Analyze webcam feed

Use structured steps:

Detect people in a webcam feed, track their movement across frames, and return their trajectories over time.

This allows the agent to compose multiple Telekinesis Skills into a pipeline.

Referencing Inputs Clearly

Always specify the input source explicitly:

webcam feed

image from file system

video stream

single frame capture

Avoid ambiguous references like:

this

the input

it

Iterative Prompting

You can refine results by iterating on your prompt:

Start simple:

Detect objects in webcam feed

Then add constraints:

Detect people in webcam feed and return bounding boxes

Then refine output format:

Detect people in webcam feed and return bounding boxes in JSON format

Each iteration helps the agent narrow down behavior more precisely.

Debugging Poor Outputs

If results are not as expected:

Make the task more specific
Add explicit input/output format
Reduce ambiguity in object definitions
Break complex tasks into steps

Example fix:

Instead of:

Track objects

Use:

Detect and track people in a webcam feed, assign unique IDs, and return trajectories over time

Key Principle

Physical AI Agents do not infer missing requirements. They generate code strictly based on what is explicitly stated in the prompt and available Skills.

Clear prompts produce predictable, production-grade outputs.

Prompting Guide for Physical AI Agents ​

Basic Prompt Structure ​

Good vs Bad Prompts ​

Good Prompts ​

Bad Prompts ​

Adding Constraints ​

Performance constraints ​

Output format ​

Behavior constraints ​

Multi-Step Tasks ​

Referencing Inputs Clearly ​

Iterative Prompting ​

Debugging Poor Outputs ​

Key Principle ​