Skip to content

Bin Picking Object Segmentation From RGB Image

SUMMARY

Segment individual objects in a bin from one RGB image for grasp planning and collision-free picking. Uses SAM with a bounding box (or point) prompt to get instance masks; outputs labeled masks and boxes, with Rerun visualization.

Overview

Bin picking and pick-and-place often require instance segmentation of items in a bin from a single RGB view. This example shows how to segment one or more objects in a bin using a bounding box prompt: you provide the image and an ROI around the target object, and the pipeline returns the instance mask and bounding box for grasp planning, collision avoidance, or downstream 3D pose estimation.

Inputs

  • Single RGB image of the bin
  • Bounding box (or point) around the target object; box is typical for clarity

Required Telekinesis Skills

Optional: Rerun for visualization.

Use Cases

This pipeline segments objects in a bin in RGB images using SAM with a bounding box prompt.

Typical applications include:

  • Grasp planning — Get a precise mask and box for each item to compute grasp poses or collision-free paths.
  • Pick-and-place — Isolate one object per run for robotic picking.
  • Collision avoidance — Use masks to avoid other items when moving the end-effector.
  • Multi-object handling — Segment several items and process each (e.g. one centroid per mask for 3D pose).

Input-Output

Raw Sensor Input
Bin Picking Input
Raw image of a bin with metal parts.
Segmentation and Boxes
Bin Picking Output
Segmented image with mask and bounding box for the selected object.

The Pipeline

The pipeline loads an RGB image, defines a bounding box around the target object, runs SAM for instance segmentation, then extracts the mask and bounding box and visualizes with Rerun.

text
Load RGB Image

Define ROI (Bounding Box Prompt)

Segment Image Using SAM

Postprocess Masks

Extract Bounding Boxes

Visualize with Rerun
  • Segment Image Using SAM — Instance segmentation from a bounding box or point prompt; outputs mask and bbox per instance.

The Code

The script loads an image, defines a bounding box for the target object, runs SAM, extracts the mask and bounding box from the annotations, and visualizes with Rerun. Image path and ROI are set at the top; the pipeline runs in the main block with no function arguments.

python
# Load image
image_path = DATA_DIR / "images/bin_picking_metal_2.jpg"
image = io.load_image(image_path)
logger.info(f"Loaded image shape: {image.to_numpy().shape}")

# Define a bounding box: (x_min, y_min, x_max, y_max)
bounding_box = [550, 260, 680, 350]

# Segment using SAM
result = cornea.segment_image_using_sam(
    image=image,
    bboxes=[bounding_box],
)
annotations = result.to_list()

# Rerun visualization
rr.init("bin_picking_using_sam", spawn=False)
try:
    rr.connect()
except Exception as e:
    rr.spawn()

rr.send_blueprint(
    rrb.Blueprint(
            rrb.Horizontal(
                rrb.Spatial2DView(name="Input", origin="input"),
                rrb.Spatial2DView(name="Bboxes & Segments", origin="segmented"),
            ),
        rrb.SelectionPanel(),
        rrb.TimePanel(),
    ),
    make_active=True,
)

image = image.to_numpy()
rr.log("input/image", rr.Image(image=image))
rr.log("segmented/image", rr.Image(image=image))

h, w = image.shape[:2]
segmentation_img = np.zeros((h, w), dtype=np.uint16)
ann_bboxes = []
class_ids = []

for idx, ann in enumerate(annotations):
    label = idx + 1
    mask_i = np.zeros((h, w), dtype=np.uint8)
    if "mask" in ann and isinstance(ann["mask"], np.ndarray):
        m = ann["mask"]
        if m.dtype.kind in ("f", "b"):
            mask_i = (m > 0.5).astype(np.uint8)
        else:
            mask_i = (m > 0).astype(np.uint8)
    elif "segmentation" in ann and ann["segmentation"]:
        seg = ann["segmentation"]
        if isinstance(seg, dict):
            mask_dec = mask_utils.decode(seg)
            if mask_dec.ndim == 3:
                mask_dec = mask_dec[:, :, 0]
            mask_i = (mask_dec > 0).astype(np.uint8)
        elif isinstance(seg, list) and len(seg) > 0:
            temp = np.zeros((h, w), dtype=np.uint8)
            polys = seg if isinstance(seg[0], list) else [seg]
            for poly in polys:
                pts = np.array(poly).reshape(-1, 2).astype(np.int32)
                cv2.fillPoly(temp, [pts], 1)
            mask_i = (temp > 0).astype(np.uint8)
    if mask_i.sum() == 0:
        continue
    segmentation_img[mask_i > 0] = label
    bbox = ann.get("bbox", None)
    if bbox is None:
        continue
    ann_bboxes.append(list(bbox))
    class_ids.append(label)

rr.log("segmented/masks", rr.SegmentationImage(segmentation_img))
if ann_bboxes:
    rr.log(
        "segmented/boxes",
        rr.Boxes2D(
            array=np.asarray(ann_bboxes, dtype=np.float32),
            array_format=rr.Box2DFormat.XYWH,
            class_ids=np.asarray(class_ids, dtype=np.int32),
        ),
    )