Detect People from Webcam Image

This example demonstrates how to use Tzara, the Telekinesis Physical AI Agent, to build a computer vision pipeline that detects people from a webcam feed. Tzara generates code that captures an image from the webcam using OpenCV, runs a pre-trained YOLOx model for person detection, and outputs bounding boxes and visualizations with Rerun.

The Natural Language Instruction

The natural language instruction for this task is:

Capture an image with the webcam and detect all the people in the image

The Generated Code

The generated code captures an image from the webcam, detects people using YOLOx, and prints the bounding boxes and confidence scores for each detected person.

python

# Pipeline: Capture image from webcam and detect people using YOLOX object detection

# Step 1: Import required modules
from telekinesis.medulla.cameras.webcam import Webcam
from telekinesis import retina

# Step 2: Initialize and connect the webcam
cam = Webcam(name="main_webcam", camera_id=0)
cam.connect()

# Step 3: Capture a single color frame from the webcam
frame = cam.capture_single_color_frame()

# Step 4: Disconnect the webcam after capturing
cam.disconnect()

# Step 5: Detect objects in the captured frame using YOLOX
# YOLOX returns COCO-style annotations and category labels
annotations, categories = retina.detect_objects_using_yolox(
    image=frame,
    score_threshold=0.25,
    nms_threshold=0.45,
)

# Step 6: Filter annotations to keep only "person" detections
# COCO class for person has category name "person"
# Build a set of category IDs that correspond to "person"
person_category_ids = set()
for cat in categories.categories:
    if cat["name"].lower() == "person":
        person_category_ids.add(cat["id"])

# Step 7: Filter the annotations to only include people
people_annotations = [
    ann for ann in annotations.annotations
    if ann["category_id"] in person_category_ids
]

# Step 8: Print results
print(f"Total objects detected: {len(annotations.annotations)}")
print(f"People detected: {len(people_annotations)}")
for i, person in enumerate(people_annotations):
    print(f"  Person {i+1}: bbox={person['bbox']}, score={person.get('score', 'N/A')}")

Detect People from Webcam Image ​

The Natural Language Instruction ​

The Generated Code ​

Detect People from Webcam Image

The Natural Language Instruction

The Generated Code