Skip to content

Prompting Guide for Physical AI Agents

Physical AI Agents generate code from natural language instructions. The quality of the output depends heavily on how clearly the task is specified. This guide explains how to write effective prompts that produce reliable, production-ready code.

Basic Prompt Structure

A good prompt clearly defines three things:

  • What to do (the task)
  • What to use (input source like webcam or image)
  • What to return (expected output format)

Example

Detect people in a webcam feed and return bounding boxes.

This is sufficient for the agent to select appropriate Skills and generate code.

Good vs Bad Prompts

Good Prompts

Detect people in a webcam feed and draw bounding boxes on the output frame
Detect cell phones in a video stream and return confidence scores
Capture an image from the webcam and save it to disk

These work well because they are:

  • specific
  • action-oriented
  • clearly scoped

Bad Prompts

Make it smart
Build a vision system
Do object detection

These fail because they:

  • do not define inputs
  • do not define outputs
  • leave interpretation too open-ended

Adding Constraints

You can improve reliability by explicitly adding constraints to your prompt:

Performance constraints

run in real-time
process at 10 FPS

Output format

return bounding boxes in JSON
save annotated image to disk

Behavior constraints

only detect people, ignore other objects
filter out low-confidence detections

Example:

Detect people in a webcam feed, return bounding boxes in JSON format, and run in real time.

Multi-Step Tasks

Physical AI Agents can handle multi-step workflows when explicitly described.

Instead of:

Analyze webcam feed

Use structured steps:

Detect people in a webcam feed, track their movement across frames, and return their trajectories over time.

This allows the agent to compose multiple Telekinesis Skills into a pipeline.

Referencing Inputs Clearly

Always specify the input source explicitly:

webcam feed
image from file system
video stream
single frame capture

Avoid ambiguous references like:

this
the input
it

Iterative Prompting

You can refine results by iterating on your prompt:

Start simple:

Detect objects in webcam feed

Then add constraints:

Detect people in webcam feed and return bounding boxes

Then refine output format:

Detect people in webcam feed and return bounding boxes in JSON format

Each iteration helps the agent narrow down behavior more precisely.

Debugging Poor Outputs

If results are not as expected:

  • Make the task more specific
  • Add explicit input/output format
  • Reduce ambiguity in object definitions
  • Break complex tasks into steps

Example fix:

Instead of:

Track objects

Use:

Detect and track people in a webcam feed, assign unique IDs, and return trajectories over time

Key Principle

Physical AI Agents do not infer missing requirements. They generate code strictly based on what is explicitly stated in the prompt and available Skills.

Clear prompts produce predictable, production-grade outputs.