Prompting Guide for Physical AI Agents
Physical AI Agents generate code from natural language instructions. The quality of the output depends heavily on how clearly the task is specified. This guide explains how to write effective prompts that produce reliable, production-ready code.
Basic Prompt Structure
A good prompt clearly defines three things:
- What to do (the task)
- What to use (input source like webcam or image)
- What to return (expected output format)
Example
Detect people in a webcam feed and return bounding boxes.This is sufficient for the agent to select appropriate Skills and generate code.
Good vs Bad Prompts
Good Prompts
Detect people in a webcam feed and draw bounding boxes on the output frameDetect cell phones in a video stream and return confidence scoresCapture an image from the webcam and save it to diskThese work well because they are:
- specific
- action-oriented
- clearly scoped
Bad Prompts
Make it smartBuild a vision systemDo object detectionThese fail because they:
- do not define inputs
- do not define outputs
- leave interpretation too open-ended
Adding Constraints
You can improve reliability by explicitly adding constraints to your prompt:
Performance constraints
run in real-timeprocess at 10 FPSOutput format
return bounding boxes in JSONsave annotated image to diskBehavior constraints
only detect people, ignore other objectsfilter out low-confidence detectionsExample:
Detect people in a webcam feed, return bounding boxes in JSON format, and run in real time.Multi-Step Tasks
Physical AI Agents can handle multi-step workflows when explicitly described.
Instead of:
Analyze webcam feedUse structured steps:
Detect people in a webcam feed, track their movement across frames, and return their trajectories over time.This allows the agent to compose multiple Telekinesis Skills into a pipeline.
Referencing Inputs Clearly
Always specify the input source explicitly:
webcam feedimage from file systemvideo streamsingle frame captureAvoid ambiguous references like:
thisthe inputitIterative Prompting
You can refine results by iterating on your prompt:
Start simple:
Detect objects in webcam feedThen add constraints:
Detect people in webcam feed and return bounding boxesThen refine output format:
Detect people in webcam feed and return bounding boxes in JSON formatEach iteration helps the agent narrow down behavior more precisely.
Debugging Poor Outputs
If results are not as expected:
- Make the task more specific
- Add explicit input/output format
- Reduce ambiguity in object definitions
- Break complex tasks into steps
Example fix:
Instead of:
Track objectsUse:
Detect and track people in a webcam feed, assign unique IDs, and return trajectories over timeKey Principle
Physical AI Agents do not infer missing requirements. They generate code strictly based on what is explicitly stated in the prompt and available Skills.
Clear prompts produce predictable, production-grade outputs.

