Pedestrian Segmentation From RGB Image

SUMMARY

Segment pedestrians from one RGB image for safety zones, people counting, or collision-free navigation. Uses SAM with a bounding box around each person; outputs masks and boxes, with Rerun visualization.

Overview

Safety zones, people counting, and mobile robot navigation in warehouses, factories, or public spaces often require segmenting pedestrians from a single RGB frame. This example shows how to segment one or more pedestrians in an RGB image using a bounding box prompt: you provide the image and an ROI (or multiple ROIs) around each person, and the pipeline returns per-person instance masks and bounding boxes for safety logic, counting, or obstacle avoidance.

Inputs

Single RGB image with one or more pedestrians in view
Bounding box around each pedestrian as [x_min, y_min, x_max, y_max] (one per person per run, or batch multiple boxes)

Required Telekinesis Skills

Cornea — Segment Image Using SAM for instance segmentation from bounding box prompts

Optional: Rerun for visualization.

Use Cases

This pipeline segments pedestrians in RGB images using SAM with a bounding box prompt.

Typical applications include:

Safety zones — Isolate each person to enforce keep-out zones or alert when too close to machinery.
People counting — Segment and count pedestrians for occupancy or flow analytics.
Collision-free navigation — Provide masks and boxes for mobile robots or AGVs to avoid people.
Monitoring — Visualize pedestrian regions for dashboards or incident review.

Input-Output

Raw Sensor Input
Pedestrian Segmentation Input

Raw image showing pedestrians in a space.

Segmentation and Boxes
Pedestrian Segmentation Output

Segmented image with mask and bounding box for the pedestrian.

The Pipeline

The pipeline loads an RGB image, defines a bounding box around the pedestrian, runs SAM for instance segmentation, then extracts the mask and bounding box and visualizes with Rerun.

text

Load RGB Image
        ↓
Define ROI (Bounding Box Prompt)
        ↓
Segment Image Using SAM
        ↓
Postprocess Masks
        ↓
Extract Bounding Boxes
        ↓
Visualize with Rerun

Segment Image Using SAM — Instance segmentation from a bounding box prompt; outputs mask and bbox per pedestrian.

The Code

The script loads an image, defines a bounding box for the pedestrian, runs SAM, extracts the mask and bounding box from the annotations, and visualizes with Rerun. Image path and ROI are set at the top; the pipeline runs in the main block with no function arguments.

python

# Load image
image_path = DATA_DIR / "images/pedestrians.jpg"
image = io.load_image(image_path)
logger.info(f"Loaded image shape: {image.to_numpy().shape}")

# Define a bounding box: (x_min, y_min, x_max, y_max)
bounding_box = [40, 70, 330, 414]

# Segment using SAM
result = cornea.segment_image_using_sam(
    image=image,
    bboxes=[bounding_box],
)
annotations = result.to_list()

# Rerun visualization
rr.init("pedestrian_segmentation_using_sam", spawn=False)
try:
    rr.connect()
except Exception as e:
    rr.spawn()

rr.send_blueprint(
    rrb.Blueprint(
            rrb.Horizontal(
                rrb.Spatial2DView(name="Input", origin="input"),
                rrb.Spatial2DView(name="Bboxes & Segments", origin="segmented"),
            ),
        rrb.SelectionPanel(),
        rrb.TimePanel(),
    ),
    make_active=True,
)

image = image.to_numpy()
rr.log("input/image", rr.Image(image=image))
rr.log("segmented/image", rr.Image(image=image))

h, w = image.shape[:2]
segmentation_img = np.zeros((h, w), dtype=np.uint16)
ann_bboxes = []
class_ids = []

for idx, ann in enumerate(annotations):
    label = idx + 1
    mask_i = np.zeros((h, w), dtype=np.uint8)
    if "mask" in ann and isinstance(ann["mask"], np.ndarray):
        m = ann["mask"]
        if m.dtype.kind in ("f", "b"):
            mask_i = (m > 0.5).astype(np.uint8)
        else:
            mask_i = (m > 0).astype(np.uint8)
    elif "segmentation" in ann and ann["segmentation"]:
        seg = ann["segmentation"]
        if isinstance(seg, dict):
            mask_dec = mask_utils.decode(seg)
            if mask_dec.ndim == 3:
                mask_dec = mask_dec[:, :, 0]
            mask_i = (mask_dec > 0).astype(np.uint8)
        elif isinstance(seg, list) and len(seg) > 0:
            temp = np.zeros((h, w), dtype=np.uint8)
            polys = seg if isinstance(seg[0], list) else [seg]
            for poly in polys:
                pts = np.array(poly).reshape(-1, 2).astype(np.int32)
                cv2.fillPoly(temp, [pts], 1)
            mask_i = (temp > 0).astype(np.uint8)
    if mask_i.sum() == 0:
        continue
    segmentation_img[mask_i > 0] = label
    bbox = ann.get("bbox", None)
    if bbox is None:
        continue
    ann_bboxes.append(list(bbox))
    class_ids.append(label)

rr.log("segmented/masks", rr.SegmentationImage(segmentation_img))
if ann_bboxes:
    rr.log(
        "segmented/boxes",
        rr.Boxes2D(
            array=np.asarray(ann_bboxes, dtype=np.float32),
            array_format=rr.Box2DFormat.XYWH,
            class_ids=np.asarray(class_ids, dtype=np.int32),
        ),
    )

Pedestrian Segmentation From RGB Image ​

Overview ​

Use Cases ​

Input-Output ​