Depalletizing Box Segmentation From RGB Image
SUMMARY
Segment boxes or cases on a pallet from one RGB image for depalletizing: gripper planning, collision-free paths, and inventory. Uses SAM with a bounding-box prompt; outputs masks and boxes, with Rerun visualization.
Overview
Depalletizing workflows often require instance segmentation of individual boxes or cases on a pallet from a single RGB view. This example shows how to segment one box (or region) on a pallet using a bounding box prompt: you provide the image and an ROI around the target box, and the pipeline returns the instance mask and bounding box for gripper planning, path planning, or inventory checks.
Inputs
- Single RGB image of the pallet
- Bounding box around the target box or case as
[x, y, width, height]
Required Telekinesis Skills
- Cornea — Segment Image Using SAM for instance segmentation from a bounding box prompt
Optional: Rerun for visualization.
Use Cases
This pipeline segments boxes or cases on a pallet in RGB images using SAM with a bounding box prompt.
Typical applications include:
- Gripper planning — Get a precise mask and box for each case to compute grasp poses.
- Collision-free paths — Use masks to plan robot paths that avoid other boxes.
- Inventory — Segment and count boxes or validate load configuration.
- Unloading — Isolate one box per run for robotic or manual depalletizing.
Input-Output


The Pipeline
The pipeline loads an RGB image, defines a bounding box around the target box, runs SAM for instance segmentation, then extracts the mask and bounding box and visualizes with Rerun.
Load RGB Image
↓
Define ROI (Bounding Box Prompt)
↓
Segment Image Using SAM
↓
Postprocess Masks
↓
Extract Bounding Boxes
↓
Visualize with Rerun- Segment Image Using SAM — Instance segmentation from a bounding box prompt; outputs mask and bbox for the box.
The Code
The script loads an image, defines a bounding box for the target box on the pallet, runs SAM, extracts the mask and bounding box from the annotations, and visualizes with Rerun. Image path and ROI are set at the top; the pipeline runs in the main block with no function arguments.
# Load image
image_path = DATA_DIR / "images/depalletizing.png"
image = io.load_image(image_path)
logger.info(f"Loaded image shape: {image.to_numpy().shape}")
# Define a bounding box: (x, y, width, height)
bounding_box = [170, 370, 360, 500]
# Segment using SAM
result = cornea.segment_image_using_sam(
image=image,
bboxes=[bounding_box],
)
annotations = result.to_list()
# Rerun visualization
rr.init("depalletizing_using_sam", spawn=False)
try:
rr.connect()
except Exception as e:
rr.spawn()
rr.send_blueprint(
rrb.Blueprint(
rrb.Horizontal(
rrb.Spatial2DView(name="Input", origin="input"),
rrb.Spatial2DView(name="Bboxes & Segments", origin="segmented"),
),
rrb.SelectionPanel(),
rrb.TimePanel(),
),
make_active=True,
)
image = image.to_numpy()
rr.log("input/image", rr.Image(image=image))
rr.log("segmented/image", rr.Image(image=image))
h, w = image.shape[:2]
segmentation_img = np.zeros((h, w), dtype=np.uint16)
ann_bboxes = []
class_ids = []
for idx, ann in enumerate(annotations):
label = idx + 1
mask_i = np.zeros((h, w), dtype=np.uint8)
if "mask" in ann and isinstance(ann["mask"], np.ndarray):
m = ann["mask"]
if m.dtype.kind in ("f", "b"):
mask_i = (m > 0.5).astype(np.uint8)
else:
mask_i = (m > 0).astype(np.uint8)
elif "segmentation" in ann and ann["segmentation"]:
seg = ann["segmentation"]
if isinstance(seg, dict):
mask_dec = mask_utils.decode(seg)
if mask_dec.ndim == 3:
mask_dec = mask_dec[:, :, 0]
mask_i = (mask_dec > 0).astype(np.uint8)
elif isinstance(seg, list) and len(seg) > 0:
temp = np.zeros((h, w), dtype=np.uint8)
polys = seg if isinstance(seg[0], list) else [seg]
for poly in polys:
pts = np.array(poly).reshape(-1, 2).astype(np.int32)
cv2.fillPoly(temp, [pts], 1)
mask_i = (temp > 0).astype(np.uint8)
if mask_i.sum() == 0:
continue
segmentation_img[mask_i > 0] = label
bbox = ann.get("bbox", None)
if bbox is None:
continue
ann_bboxes.append(list(bbox))
class_ids.append(label)
rr.log("segmented/masks", rr.SegmentationImage(segmentation_img))
if ann_bboxes:
rr.log(
"segmented/boxes",
rr.Boxes2D(
array=np.asarray(ann_bboxes, dtype=np.float32),
array_format=rr.Box2DFormat.XYWH,
class_ids=np.asarray(class_ids, dtype=np.int32),
),
)
