Skip to content

Object Pose Estimation from RGBD Using Bounding Box

SUMMARY

Estimate 3D object poses (centroids) from a point cloud by defining a 3D bounding box ROI, filtering points inside it, clustering with DBSCAN, and computing the centroid of each cluster. Uses Vitreous for filtering, clustering, and centroid computation; Rerun for 3D visualization.

Overview

This example shows how to get 3D pose estimates (centroids) from an RGBD point cloud when you have a known region of interest (ROI) as a 3D bounding box—for example from a CAD model or a rough workspace volume.

The pipeline loads a PLY point cloud, keeps only points inside the box, clusters the remaining points with DBSCAN to separate individual objects, and computes the centroid of each cluster as a 3D pose. Results are visualized in Rerun (input cloud, bbox, filtered cloud, centroids, and coordinate frames).

Inputs

  • Path to a PLY point cloud file
  • Bounding box center and half-sizes (3D ROI)

Required Telekinesis Skills

  • Vitreous — filter by bounding box, DBSCAN clustering, point cloud centroid

Optional: Rerun for visualization.

Use Cases

  • Bin picking / tabletop — Restrict to a table or bin volume, then cluster and get object centroids.
  • Multi-object 3D pose — When multiple objects lie in a known 3D region, get one centroid per object.
  • ROI-based processing — Reduce data and compute cost by only processing points inside a box.

Input-Output

Input
PLY point cloud; 3D bounding box (center + half-sizes).
Output
Filtered point cloud; list of 3D centroids (one per cluster); Rerun view with input cloud, bbox, filtered points, centroids, and coordinate frames.

The Pipeline

This pipeline focuses on 3D pose estimation by filtering the point cloud with a bounding box, clustering, and computing one centroid per cluster.

text
Load Point Cloud (PLY)

Filter Point Cloud Using Bounding Box

Cluster Point Cloud Using DBSCAN

Calculate Point Cloud Centroid (per cluster)

Estimated 3D Poses (centroids)

The Code

Visualization is defined at the top. The pipeline is in get_3d_pose_using_bounding_box(); each Vitreous step is followed by the relevant visualization. main() has no arguments; paths and bbox parameters are constants.

python
import numpy as np
from loguru import logger
import rerun as rr
import rerun.blueprint as rrb
import pathlib

from datatypes import datatypes
from datatypes import io
from telekinesis import vitreous

ROOT_DIR = pathlib.Path(__file__).parent
DATA_DIR = ROOT_DIR / "data"
PLY_PATH = DATA_DIR / "point_clouds/test_point_cloud_001.ply"
BBOX_CENTER = [0.132, -0.0909, 0.741]
BBOX_HALF_SIZE = [0.11, 0.1, 0.005]
FRAME_SIZE = 0.02  # axis length for coordinate frames

# -----------------------------------------------------------------------------
# Visualization
# -----------------------------------------------------------------------------

def visualize_bbox_filtered(
    point_cloud,
    bbox: datatypes.Boxes3D,
    filtered_point_cloud,
    centroids: np.ndarray,
) -> None:
    """Log input cloud, bbox, filtered cloud, and centroids with coordinate frames to Rerun."""
    rr.init("object_pose_estimation_bbox", spawn=True)
    blueprint = rrb.Blueprint(
        rrb.Vertical(
            contents=[
                rrb.Spatial3DView(
                    name="input point cloud",
                    origin="/input",
                    contents=["/input/points", "/input/bbox"],
                ),
                rrb.Spatial3DView(
                    name="filtered point cloud",
                    origin="/filtered",
                    contents=["/filtered/points", "/filtered/centroids/**"],
                ),
            ],
        )
    )
    rr.send_blueprint(blueprint)

    rr.log("/input/points", rr.Points3D(positions=point_cloud.positions, colors=point_cloud.colors))
    rr.log("/input/bbox", rr.Boxes3D(centers=bbox.centers, half_sizes=bbox.half_sizes))
    rr.log("/filtered/points", rr.Points3D(
        positions=filtered_point_cloud.positions,
        colors=filtered_point_cloud.colors,
    ))

    if len(centroids) > 0:
        rr.log("/filtered/centroids", rr.Points3D(
            positions=centroids,
            colors=[255, 0, 0],
            radii=0.001,
        ))
        for idx, centroid in enumerate(centroids):
            origins = np.array([centroid, centroid, centroid])
            vectors = np.array([
                [FRAME_SIZE, 0, 0],
                [0, FRAME_SIZE, 0],
                [0, 0, FRAME_SIZE],
            ])
            colors = np.array([[255, 0, 0], [0, 255, 0], [0, 0, 255]])
            rr.log(f"/filtered/centroids/frame_{idx}", rr.Arrows3D(
                origins=origins, vectors=vectors, colors=colors,
            ))


# -----------------------------------------------------------------------------
# Pipeline
# -----------------------------------------------------------------------------

def get_3d_pose_using_bounding_box():
    """Load PLY, filter by bbox, cluster with DBSCAN, compute centroid per cluster; then visualize."""
    # 1. Load point cloud
    point_cloud = io.load_point_cloud(str(PLY_PATH))
    logger.info(f"Loaded point cloud with {len(point_cloud.positions)} points")

    # 2. Filter using bounding box (Vitreous)
    bbox = datatypes.Boxes3D(
        centers=np.array([BBOX_CENTER]),
        half_sizes=np.array([BBOX_HALF_SIZE]),
    )
    filtered_point_cloud = vitreous.filter_point_cloud_using_bounding_box(
        point_cloud=point_cloud,
        bbox=bbox,
    )
    logger.info(f"Filtered point cloud has {len(filtered_point_cloud.positions)} points")

    # 3. Cluster with DBSCAN (Vitreous)
    clusters = vitreous.cluster_point_cloud_using_dbscan(
        point_cloud=filtered_point_cloud,
        max_distance=0.00065,
        min_points=8,
    )
    logger.info(f"DBSCAN resulted in {len(clusters)} clusters")

    # 4. Centroid per cluster (Vitreous)
    centroids = []
    for cluster in clusters.to_list():
        c = vitreous.calculate_point_cloud_centroid(cluster)
        centroids.append(c)
    centroids = np.asarray(centroids, dtype=np.float32)
    logger.success(f"Estimated {len(centroids)} 3D poses (centroids)")

    # Visualization
    visualize_bbox_filtered(point_cloud, bbox, filtered_point_cloud, centroids)

    return point_cloud, filtered_point_cloud, centroids


def main():
    get_3d_pose_using_bounding_box()


if __name__ == "__main__":
    main()