Skip to content

Forklift Segmentation From RGB Image

SUMMARY

Segment forklifts or industrial vehicles from one RGB image for warehouse safety, fleet tracking, or collision avoidance. Uses SAM with a bounding box prompt; outputs masks and bounding boxes, with Rerun visualization.

Overview

Warehouse safety, fleet tracking, and AGV or robot collision avoidance often require segmenting vehicles (e.g. forklifts) from a single RGB frame.

This example shows how to segment a forklift or similar vehicle in an RGB image using a bounding box prompt: you provide the image and an ROI around the vehicle, and the pipeline returns the instance mask and bounding box for monitoring, path planning, or collision checks.

Inputs

  • Single RGB image of the scene (e.g. warehouse) with the forklift or vehicle in view
  • Bounding box around the forklift or vehicle as [x_min, y_min, x_max, y_max]

Optional: Rerun for visualization.

Use Cases

This pipeline segments forklifts or industrial vehicles in RGB images using SAM with a bounding box prompt.

Typical applications include:

  • Warehouse safety — Isolate vehicles to enforce safety zones or alert when humans are too close.
  • Fleet tracking — Segment and localize each vehicle for occupancy or flow analytics.
  • Collision avoidance — Provide masks and boxes for AGV or robot path planning around vehicles.
  • Monitoring — Visualize vehicle regions for dashboards or logging.

Input-Output

Raw Sensor Input
Forklift Segmentation Input
Raw image of a warehouse scene with a forklift.
Segmentation and Boxes
Forklift Segmentation Output
Segmented image with mask and bounding box for the forklift.

The Pipeline

The pipeline loads an RGB image, defines a bounding box around the vehicle, runs SAM for instance segmentation, then extracts the mask and bounding box and visualizes with Rerun.

text
Load RGB Image

Define ROI (Bounding Box Prompt)

Segment Image Using SAM

Postprocess Masks

Extract Bounding Boxes

Visualize with Rerun
  • Segment Image Using SAM — Instance segmentation from a bounding box prompt; outputs mask and bbox for the vehicle.

The Code

The script loads an image, defines a bounding box for the forklift, runs SAM, extracts the mask and bounding box from the annotations, and visualizes with Rerun. Image path and ROI are set at the top; the pipeline runs in the main block with no function arguments.

python
# Load image
image_path = DATA_DIR / "images/forklift.jpg"
image = io.load_image(image_path)
logger.info(f"Loaded image shape: {image.to_numpy().shape}")

# Define a bounding box: (x_min, y_min, x_max, y_max)
bounding_box = [18, 216, 303, 389]

# Segment using SAM
result = cornea.segment_image_using_sam(
    image=image,
    bboxes=[bounding_box],
)
annotations = result.to_list()

# Rerun visualization
rr.init("forklift_segmentation_using_sam", spawn=False)
try:
    rr.connect()
except Exception as e:
    rr.spawn()

rr.send_blueprint(
    rrb.Blueprint(
            rrb.Horizontal(
                rrb.Spatial2DView(name="Input", origin="input"),
                rrb.Spatial2DView(name="Bboxes & Segments", origin="segmented"),
            ),
        rrb.SelectionPanel(),
        rrb.TimePanel(),
    ),
    make_active=True,
)

image = image.to_numpy()
rr.log("input/image", rr.Image(image=image))
rr.log("segmented/image", rr.Image(image=image))

h, w = image.shape[:2]
segmentation_img = np.zeros((h, w), dtype=np.uint16)
ann_bboxes = []
class_ids = []

for idx, ann in enumerate(annotations):
    label = idx + 1
    mask_i = np.zeros((h, w), dtype=np.uint8)
    if "mask" in ann and isinstance(ann["mask"], np.ndarray):
        m = ann["mask"]
        if m.dtype.kind in ("f", "b"):
            mask_i = (m > 0.5).astype(np.uint8)
        else:
            mask_i = (m > 0).astype(np.uint8)
    elif "segmentation" in ann and ann["segmentation"]:
        seg = ann["segmentation"]
        if isinstance(seg, dict):
            mask_dec = mask_utils.decode(seg)
            if mask_dec.ndim == 3:
                mask_dec = mask_dec[:, :, 0]
            mask_i = (mask_dec > 0).astype(np.uint8)
        elif isinstance(seg, list) and len(seg) > 0:
            temp = np.zeros((h, w), dtype=np.uint8)
            polys = seg if isinstance(seg[0], list) else [seg]
            for poly in polys:
                pts = np.array(poly).reshape(-1, 2).astype(np.int32)
                cv2.fillPoly(temp, [pts], 1)
            mask_i = (temp > 0).astype(np.uint8)
    if mask_i.sum() == 0:
        continue
    segmentation_img[mask_i > 0] = label
    bbox = ann.get("bbox", None)
    if bbox is None:
        continue
    ann_bboxes.append(list(bbox))
    class_ids.append(label)

rr.log("segmented/masks", rr.SegmentationImage(segmentation_img))
if ann_bboxes:
    rr.log(
        "segmented/boxes",
        rr.Boxes2D(
            array=np.asarray(ann_bboxes, dtype=np.float32),
            array_format=rr.Box2DFormat.XYWH,
            class_ids=np.asarray(class_ids, dtype=np.int32),
        ),
    )