Skip to content

Depalletizing Box Segmentation From RGB Image

SUMMARY

Segment boxes or cases on a pallet from one RGB image for depalletizing: gripper planning, collision-free paths, and inventory. Uses SAM with a bounding-box prompt; outputs masks and boxes, with Rerun visualization.

Overview

Depalletizing workflows often require instance segmentation of individual boxes or cases on a pallet from a single RGB view. This example shows how to segment one box (or region) on a pallet using a bounding box prompt: you provide the image and an ROI around the target box, and the pipeline returns the instance mask and bounding box for gripper planning, path planning, or inventory checks.

Inputs

  • Single RGB image of the pallet
  • Bounding box around the target box or case as [x, y, width, height]

Required Telekinesis Skills

Optional: Rerun for visualization.

Use Cases

This pipeline segments boxes or cases on a pallet in RGB images using SAM with a bounding box prompt.

Typical applications include:

  • Gripper planning — Get a precise mask and box for each case to compute grasp poses.
  • Collision-free paths — Use masks to plan robot paths that avoid other boxes.
  • Inventory — Segment and count boxes or validate load configuration.
  • Unloading — Isolate one box per run for robotic or manual depalletizing.

Input-Output

Raw Sensor Input
Depalletizing Input
Raw image of a pallet with boxes.
Segmentation and Boxes
Depalletizing Output
Segmented image with mask and bounding box for the selected box.

The Pipeline

The pipeline loads an RGB image, defines a bounding box around the target box, runs SAM for instance segmentation, then extracts the mask and bounding box and visualizes with Rerun.

text
Load RGB Image

Define ROI (Bounding Box Prompt)

Segment Image Using SAM

Postprocess Masks

Extract Bounding Boxes

Visualize with Rerun

The Code

The script loads an image, defines a bounding box for the target box on the pallet, runs SAM, extracts the mask and bounding box from the annotations, and visualizes with Rerun. Image path and ROI are set at the top; the pipeline runs in the main block with no function arguments.

python
# Load image
image_path = DATA_DIR / "images/depalletizing.png"
image = io.load_image(image_path)
logger.info(f"Loaded image shape: {image.to_numpy().shape}")

# Define a bounding box: (x, y, width, height)
bounding_box = [170, 370, 360, 500]

# Segment using SAM
result = cornea.segment_image_using_sam(
    image=image,
    bboxes=[bounding_box],
)
annotations = result.to_list()

# Rerun visualization
rr.init("depalletizing_using_sam", spawn=False)
try:
    rr.connect()
except Exception as e:
    rr.spawn()

rr.send_blueprint(
    rrb.Blueprint(
            rrb.Horizontal(
                rrb.Spatial2DView(name="Input", origin="input"),
                rrb.Spatial2DView(name="Bboxes & Segments", origin="segmented"),
            ),
        rrb.SelectionPanel(),
        rrb.TimePanel(),
    ),
    make_active=True,
)

image = image.to_numpy()
rr.log("input/image", rr.Image(image=image))
rr.log("segmented/image", rr.Image(image=image))

h, w = image.shape[:2]
segmentation_img = np.zeros((h, w), dtype=np.uint16)
ann_bboxes = []
class_ids = []

for idx, ann in enumerate(annotations):
    label = idx + 1
    mask_i = np.zeros((h, w), dtype=np.uint8)

    if "mask" in ann and isinstance(ann["mask"], np.ndarray):
        m = ann["mask"]
        if m.dtype.kind in ("f", "b"):
            mask_i = (m > 0.5).astype(np.uint8)
        else:
            mask_i = (m > 0).astype(np.uint8)
    elif "segmentation" in ann and ann["segmentation"]:
        seg = ann["segmentation"]
        if isinstance(seg, dict):
            mask_dec = mask_utils.decode(seg)
            if mask_dec.ndim == 3:
                mask_dec = mask_dec[:, :, 0]
            mask_i = (mask_dec > 0).astype(np.uint8)
        elif isinstance(seg, list) and len(seg) > 0:
            temp = np.zeros((h, w), dtype=np.uint8)
            polys = seg if isinstance(seg[0], list) else [seg]
            for poly in polys:
                pts = np.array(poly).reshape(-1, 2).astype(np.int32)
                cv2.fillPoly(temp, [pts], 1)
            mask_i = (temp > 0).astype(np.uint8)

    if mask_i.sum() == 0:
        continue
    segmentation_img[mask_i > 0] = label
    bbox = ann.get("bbox", None)
    if bbox is None:
        continue
    ann_bboxes.append(list(bbox))
    class_ids.append(label)

rr.log("segmented/masks", rr.SegmentationImage(segmentation_img))
if ann_bboxes:
    rr.log(
        "segmented/boxes",
        rr.Boxes2D(
            array=np.asarray(ann_bboxes, dtype=np.float32),
            array_format=rr.Box2DFormat.XYWH,
            class_ids=np.asarray(class_ids, dtype=np.int32),
        ),
    )