Object Pose Estimation from RGBD Using Mask
SUMMARY
Estimate a single 3D object pose (centroid) from a point cloud by filtering points with a 2D mask (e.g. from an RGB segmentation), then computing the centroid of the remaining points. Uses Vitreous for mask filtering and centroid computation; Rerun for 3D visualization.
Overview
This example shows how to get a 3D pose (centroid) for one object when you have an RGBD point cloud and a binary mask that isolates that object in image space—for example from SAM, a detector, or a hand-drawn ROI.
The pipeline loads a PLY point cloud and a mask image, keeps only points that project into the mask region, and computes the 3D centroid of those points as the object pose. Results are visualized in Rerun (input cloud, filtered cloud, centroid, and coordinate frame).
Inputs
- Path to a PLY point cloud file (aligned with the camera that produced the mask)
- Path to a binary mask image (same resolution as the depth/image used to build the point cloud)
Required Telekinesis Skills
- Vitreous — filter point cloud using mask, calculate point cloud centroid
Optional: Rerun for visualization.
Use Cases
- Single-object 3D pose — Segment an object in the RGB image (e.g. with SAM), then get its 3D centroid from the point cloud.
- RGB + depth fusion — Combine 2D segmentation with depth to get 3D position and extent.
- Grasping / picking — Use the centroid as a 3D target for the end-effector.
Input-Output
The Pipeline
This pipeline focuses on 3D pose estimation by filtering the point cloud with a 2D mask and computing the centroid of the remaining points.
Load Point Cloud (PLY) & Mask Image
↓
Filter Point Cloud Using Mask
↓
Calculate Point Cloud Centroid
↓
Estimated 3D Pose (centroid)- Filter Point Cloud Using Mask — Keeps only points that project into the mask region (image-aligned).
- Calculate Point Cloud Centroid — Computes the 3D centroid of the filtered points as the pose.
The Code
Visualization is defined at the top. The pipeline is in get_3d_pose_using_mask(); each Vitreous step is followed by the relevant visualization. main() has no arguments; paths are constants.
import numpy as np
from loguru import logger
import rerun as rr
import rerun.blueprint as rrb
import pathlib
from datatypes import datatypes
from datatypes import io
from telekinesis import vitreous
ROOT_DIR = pathlib.Path(__file__).parent
DATA_DIR = ROOT_DIR / "data"
PLY_PATH = DATA_DIR / "point_clouds/test_point_cloud_001.ply"
MASK_PATH = DATA_DIR / "masks/filtered_mask_001.png"
FRAME_SIZE = 0.02 # axis length for coordinate frame
# -----------------------------------------------------------------------------
# Visualization
# -----------------------------------------------------------------------------
def visualize_mask_filtered(point_cloud, filtered_point_cloud, centroid: np.ndarray) -> None:
"""Log input cloud, filtered cloud, and centroid with coordinate frame to Rerun."""
rr.init("object_pose_estimation_mask", spawn=True)
blueprint = rrb.Blueprint(
rrb.Vertical(
contents=[
rrb.Spatial3DView(
name="input point cloud",
origin="/input",
contents=["/input/points"],
),
rrb.Spatial3DView(
name="filtered point cloud",
origin="/filtered",
contents=["/filtered/points", "/filtered/centroids/**"],
),
],
)
)
rr.send_blueprint(blueprint)
rr.log("/input/points", rr.Points3D(positions=point_cloud.positions, colors=point_cloud.colors))
rr.log("/filtered/points", rr.Points3D(
positions=filtered_point_cloud.positions,
colors=filtered_point_cloud.colors,
))
rr.log("/filtered/centroids", rr.Points3D(
positions=[centroid],
colors=[255, 0, 0],
radii=0.001,
))
origins = np.array([centroid, centroid, centroid])
vectors = np.array([
[FRAME_SIZE, 0, 0],
[0, FRAME_SIZE, 0],
[0, 0, FRAME_SIZE],
])
colors = np.array([[255, 0, 0], [0, 255, 0], [0, 0, 255]])
rr.log("/filtered/centroids/frame", rr.Arrows3D(
origins=origins, vectors=vectors, colors=colors,
))
# -----------------------------------------------------------------------------
# Pipeline
# -----------------------------------------------------------------------------
def get_3d_pose_using_mask():
"""Load PLY and mask, filter point cloud by mask, compute centroid; then visualize."""
# 1. Load mask and point cloud
filtered_mask = io.load_image(str(MASK_PATH), as_binary=True)
logger.info(f"Loaded mask from {MASK_PATH}")
point_cloud = io.load_point_cloud(
filepath=str(PLY_PATH),
remove_nan_points=False,
remove_duplicated_points=False,
remove_infinite_points=False,
)
logger.info(f"Loaded point cloud with {len(point_cloud.positions)} points")
# 2. Filter point cloud using mask (Vitreous)
filtered_point_cloud = vitreous.filter_point_cloud_using_mask(
point_cloud=point_cloud,
mask=filtered_mask,
)
logger.info(f"Filtered point cloud has {len(filtered_point_cloud.positions)} points")
# 3. Centroid (Vitreous)
centroid = vitreous.calculate_point_cloud_centroid(point_cloud=filtered_point_cloud)
centroid = np.asarray(centroid, dtype=np.float32)
logger.success("Estimated 3D pose (centroid) from mask-filtered point cloud")
# Visualization
visualize_mask_filtered(point_cloud, filtered_point_cloud, centroid)
return point_cloud, filtered_point_cloud, centroid
def main():
get_3d_pose_using_mask()
if __name__ == "__main__":
main()
