Skip to content

Object Pose Estimation from RGBD Using Mask

SUMMARY

Estimate a single 3D object pose (centroid) from a point cloud by filtering points with a 2D mask (e.g. from an RGB segmentation), then computing the centroid of the remaining points. Uses Vitreous for mask filtering and centroid computation; Rerun for 3D visualization.

Overview

This example shows how to get a 3D pose (centroid) for one object when you have an RGBD point cloud and a binary mask that isolates that object in image space—for example from SAM, a detector, or a hand-drawn ROI.

The pipeline loads a PLY point cloud and a mask image, keeps only points that project into the mask region, and computes the 3D centroid of those points as the object pose. Results are visualized in Rerun (input cloud, filtered cloud, centroid, and coordinate frame).

Inputs

  • Path to a PLY point cloud file (aligned with the camera that produced the mask)
  • Path to a binary mask image (same resolution as the depth/image used to build the point cloud)

Required Telekinesis Skills

  • Vitreous — filter point cloud using mask, calculate point cloud centroid

Optional: Rerun for visualization.

Use Cases

  • Single-object 3D pose — Segment an object in the RGB image (e.g. with SAM), then get its 3D centroid from the point cloud.
  • RGB + depth fusion — Combine 2D segmentation with depth to get 3D position and extent.
  • Grasping / picking — Use the centroid as a 3D target for the end-effector.

Input-Output

Input
PLY point cloud; binary mask image (path).
Output
Filtered point cloud (points under mask); single 3D centroid; Rerun view with input cloud, filtered points, centroid, and coordinate frame.

The Pipeline

This pipeline focuses on 3D pose estimation by filtering the point cloud with a 2D mask and computing the centroid of the remaining points.

text
Load Point Cloud (PLY) & Mask Image

Filter Point Cloud Using Mask

Calculate Point Cloud Centroid

Estimated 3D Pose (centroid)

The Code

Visualization is defined at the top. The pipeline is in get_3d_pose_using_mask(); each Vitreous step is followed by the relevant visualization. main() has no arguments; paths are constants.

python
import numpy as np
from loguru import logger
import rerun as rr
import rerun.blueprint as rrb
import pathlib

from datatypes import datatypes
from datatypes import io
from telekinesis import vitreous

ROOT_DIR = pathlib.Path(__file__).parent
DATA_DIR = ROOT_DIR / "data"
PLY_PATH = DATA_DIR / "point_clouds/test_point_cloud_001.ply"
MASK_PATH = DATA_DIR / "masks/filtered_mask_001.png"
FRAME_SIZE = 0.02  # axis length for coordinate frame

# -----------------------------------------------------------------------------
# Visualization
# -----------------------------------------------------------------------------

def visualize_mask_filtered(point_cloud, filtered_point_cloud, centroid: np.ndarray) -> None:
    """Log input cloud, filtered cloud, and centroid with coordinate frame to Rerun."""
    rr.init("object_pose_estimation_mask", spawn=True)
    blueprint = rrb.Blueprint(
        rrb.Vertical(
            contents=[
                rrb.Spatial3DView(
                    name="input point cloud",
                    origin="/input",
                    contents=["/input/points"],
                ),
                rrb.Spatial3DView(
                    name="filtered point cloud",
                    origin="/filtered",
                    contents=["/filtered/points", "/filtered/centroids/**"],
                ),
            ],
        )
    )
    rr.send_blueprint(blueprint)

    rr.log("/input/points", rr.Points3D(positions=point_cloud.positions, colors=point_cloud.colors))
    rr.log("/filtered/points", rr.Points3D(
        positions=filtered_point_cloud.positions,
        colors=filtered_point_cloud.colors,
    ))
    rr.log("/filtered/centroids", rr.Points3D(
        positions=[centroid],
        colors=[255, 0, 0],
        radii=0.001,
    ))
    origins = np.array([centroid, centroid, centroid])
    vectors = np.array([
        [FRAME_SIZE, 0, 0],
        [0, FRAME_SIZE, 0],
        [0, 0, FRAME_SIZE],
    ])
    colors = np.array([[255, 0, 0], [0, 255, 0], [0, 0, 255]])
    rr.log("/filtered/centroids/frame", rr.Arrows3D(
        origins=origins, vectors=vectors, colors=colors,
    ))


# -----------------------------------------------------------------------------
# Pipeline
# -----------------------------------------------------------------------------

def get_3d_pose_using_mask():
    """Load PLY and mask, filter point cloud by mask, compute centroid; then visualize."""
    # 1. Load mask and point cloud
    filtered_mask = io.load_image(str(MASK_PATH), as_binary=True)
    logger.info(f"Loaded mask from {MASK_PATH}")

    point_cloud = io.load_point_cloud(
        filepath=str(PLY_PATH),
        remove_nan_points=False,
        remove_duplicated_points=False,
        remove_infinite_points=False,
    )
    logger.info(f"Loaded point cloud with {len(point_cloud.positions)} points")

    # 2. Filter point cloud using mask (Vitreous)
    filtered_point_cloud = vitreous.filter_point_cloud_using_mask(
        point_cloud=point_cloud,
        mask=filtered_mask,
    )
    logger.info(f"Filtered point cloud has {len(filtered_point_cloud.positions)} points")

    # 3. Centroid (Vitreous)
    centroid = vitreous.calculate_point_cloud_centroid(point_cloud=filtered_point_cloud)
    centroid = np.asarray(centroid, dtype=np.float32)
    logger.success("Estimated 3D pose (centroid) from mask-filtered point cloud")

    # Visualization
    visualize_mask_filtered(point_cloud, filtered_point_cloud, centroid)

    return point_cloud, filtered_point_cloud, centroid


def main():
    get_3d_pose_using_mask()


if __name__ == "__main__":
    main()