Skip to content

Pedestrian Segmentation From RGB Image

SUMMARY

Segment pedestrians from one RGB image for safety zones, people counting, or collision-free navigation. Uses SAM with a bounding box around each person; outputs masks and boxes, with Rerun visualization.

Overview

Safety zones, people counting, and mobile robot navigation in warehouses, factories, or public spaces often require segmenting pedestrians from a single RGB frame. This example shows how to segment one or more pedestrians in an RGB image using a bounding box prompt: you provide the image and an ROI (or multiple ROIs) around each person, and the pipeline returns per-person instance masks and bounding boxes for safety logic, counting, or obstacle avoidance.

Inputs

  • Single RGB image with one or more pedestrians in view
  • Bounding box around each pedestrian as [x_min, y_min, x_max, y_max] (one per person per run, or batch multiple boxes)

Required Telekinesis Skills

Optional: Rerun for visualization.

Use Cases

This pipeline segments pedestrians in RGB images using SAM with a bounding box prompt.

Typical applications include:

  • Safety zones — Isolate each person to enforce keep-out zones or alert when too close to machinery.
  • People counting — Segment and count pedestrians for occupancy or flow analytics.
  • Collision-free navigation — Provide masks and boxes for mobile robots or AGVs to avoid people.
  • Monitoring — Visualize pedestrian regions for dashboards or incident review.

Input-Output

Raw Sensor Input
Pedestrian Segmentation Input
Raw image showing pedestrians in a space.
Segmentation and Boxes
Pedestrian Segmentation Output
Segmented image with mask and bounding box for the pedestrian.

The Pipeline

The pipeline loads an RGB image, defines a bounding box around the pedestrian, runs SAM for instance segmentation, then extracts the mask and bounding box and visualizes with Rerun.

text
Load RGB Image

Define ROI (Bounding Box Prompt)

Segment Image Using SAM

Postprocess Masks

Extract Bounding Boxes

Visualize with Rerun
  • Segment Image Using SAM — Instance segmentation from a bounding box prompt; outputs mask and bbox per pedestrian.

The Code

The script loads an image, defines a bounding box for the pedestrian, runs SAM, extracts the mask and bounding box from the annotations, and visualizes with Rerun. Image path and ROI are set at the top; the pipeline runs in the main block with no function arguments.

python
# Load image
image_path = DATA_DIR / "images/pedestrians.jpg"
image = io.load_image(image_path)
logger.info(f"Loaded image shape: {image.to_numpy().shape}")

# Define a bounding box: (x_min, y_min, x_max, y_max)
bounding_box = [40, 70, 330, 414]

# Segment using SAM
result = cornea.segment_image_using_sam(
    image=image,
    bboxes=[bounding_box],
)
annotations = result.to_list()

# Rerun visualization
rr.init("pedestrian_segmentation_using_sam", spawn=False)
try:
    rr.connect()
except Exception as e:
    rr.spawn()

rr.send_blueprint(
    rrb.Blueprint(
            rrb.Horizontal(
                rrb.Spatial2DView(name="Input", origin="input"),
                rrb.Spatial2DView(name="Bboxes & Segments", origin="segmented"),
            ),
        rrb.SelectionPanel(),
        rrb.TimePanel(),
    ),
    make_active=True,
)

image = image.to_numpy()
rr.log("input/image", rr.Image(image=image))
rr.log("segmented/image", rr.Image(image=image))

h, w = image.shape[:2]
segmentation_img = np.zeros((h, w), dtype=np.uint16)
ann_bboxes = []
class_ids = []

for idx, ann in enumerate(annotations):
    label = idx + 1
    mask_i = np.zeros((h, w), dtype=np.uint8)
    if "mask" in ann and isinstance(ann["mask"], np.ndarray):
        m = ann["mask"]
        if m.dtype.kind in ("f", "b"):
            mask_i = (m > 0.5).astype(np.uint8)
        else:
            mask_i = (m > 0).astype(np.uint8)
    elif "segmentation" in ann and ann["segmentation"]:
        seg = ann["segmentation"]
        if isinstance(seg, dict):
            mask_dec = mask_utils.decode(seg)
            if mask_dec.ndim == 3:
                mask_dec = mask_dec[:, :, 0]
            mask_i = (mask_dec > 0).astype(np.uint8)
        elif isinstance(seg, list) and len(seg) > 0:
            temp = np.zeros((h, w), dtype=np.uint8)
            polys = seg if isinstance(seg[0], list) else [seg]
            for poly in polys:
                pts = np.array(poly).reshape(-1, 2).astype(np.int32)
                cv2.fillPoly(temp, [pts], 1)
            mask_i = (temp > 0).astype(np.uint8)
    if mask_i.sum() == 0:
        continue
    segmentation_img[mask_i > 0] = label
    bbox = ann.get("bbox", None)
    if bbox is None:
        continue
    ann_bboxes.append(list(bbox))
    class_ids.append(label)

rr.log("segmented/masks", rr.SegmentationImage(segmentation_img))
if ann_bboxes:
    rr.log(
        "segmented/boxes",
        rr.Boxes2D(
            array=np.asarray(ann_bboxes, dtype=np.float32),
            array_format=rr.Box2DFormat.XYWH,
            class_ids=np.asarray(class_ids, dtype=np.int32),
        ),
    )