Planar Pose Estimation from RGB Images

SUMMARY

Estimate planar 6-DOF object poses from a single RGB image. The pipeline segments foreground objects, detects contours, computes centroid and in-plane orientation using PCA, and backprojects the result to the robot base frame using camera calibration.

Overview

This example demonstrates how to estimate planar 6-DOF object poses in the robot base frame from a single RGB image when parts lie on a planar surface.

The pipeline segments the foreground, detects individual objects within the mask, and computes each object's centroid and in-plane orientation using PCA. The resulting pixel coordinates and orientation are then backprojected using camera intrinsics and the tool–camera transform to obtain a pose in the robot base frame.

This approach provides a lightweight vision pipeline for applications such as bin picking, part placement, and robotic manipulation, where objects lie on a known plane.

Inputs

RGB image (e.g. from a camera above the bin or part)
Camera intrinsics, distortion, and tool–camera transform; capture height (depth) for backprojection
Reference axis for in-plane angle (e.g. ZERO_DEGREE_AXIS in config)
Optional: base–tool0 transform (e.g. from robot) for poses in robot base frame

Required Telekinesis Skills / Packages

Cornea — Segment Image Using Foreground BiRefNet for foreground segmentation
Retina — Detect Contours for object instances in the mask
Pupil — PCA (e.g. morphology.PCAComputer) for centroid and principal axis
Medulla — pixel-to-camera projection (e.g. PixelToCameraPointProjector)

Optional: Rerun for visualization (segmentation, bboxes, PCA).

Use Cases

Planar pose estimation from a single camera
Segment foreground objects, compute centroid and orientation with PCA, and backproject to obtain a 6-DOF pose.
Single-object pose estimation
One part in view → segmentation + PCA + backprojection → single 6-DOF pose.
Multi-object pose estimation
Multiple parts → contour detection + PCA per object → list of 6-DOF poses.

Input-Output

Input

RGB image; camera intrinsics, distortion, tool–camera transform; capture height (depth); reference axis for angle; optional base–tool0 transform.

Output

Single 6-DOF pose [x, y, z, rx, ry, rz] in robot base frame, or a list of poses (one per detected object). Optional Rerun logs.

The Pipeline

The pipeline estimates planar object poses by combining foreground segmentation, contour-based object detection, PCA-based orientation estimation, and camera backprojection.

text

Capture Image (or load RGB)
        ↓
Segment Image Using Foreground BiRefNet
        ↓
Detect Contours (objects in mask)
        ↓
Optional: Multi-Object Selection (e.g. closest to center)
        ↓
PCA (centroid, principal axis, in-plane angle vs reference axis)
        ↓
Pixel → Camera projection
        ↓
6-DOF pose in camera frame

Segment Image Using Foreground BiRefNet — Foreground segmentation to obtain a binary mask of the part(s).
Detect Contours — Detect individual object regions in the segmentation mask. (bbox + isolated mask per instance).
Orientation — PCA (Pupil) for centroid and principal axis; in-plane angle from PCA eigenvector vs a reference axis (e.g. ZERO_DEGREE_AXIS).
Backproject — Pixel + depth + angle → 3D point in camera frame (e.g. Medulla projection); transform through tool–camera and base–tool to get 6-DOF pose in robot base.

The Code

The pose estimation pipeline: object detection in mask, PCA for centroid and angle, and backprojection. Entry points are estimate_single_object_planar_pose (one pose from one image) and estimate_object_poses (list of poses from one image). Config (camera intrinsics, TOOL0_T_CAMERA, ZERO_DEGREE_AXIS, etc.) is in config_params; robot_base_T_tool0 can be obtained from the robot or passed in.

python

import numpy as np
from typing import Optional

from telekinesis import cornea, retina
from pupil import morphology
from medulla import projection
from tf import tfutils

# config_params: CAMERA_INTRINSICS, CAMERA_DISTORTION, TOOL0_T_CAMERA,
#                CAMERA_T_TOOL0_ROTATION, CAPTURE_HEIGHT, ZERO_DEGREE_AXIS, ROT_TYPE, etc.
# robot_base_T_tool0: 4x4 from robot or lookup_transform("base", "tool0")


def detect_objects_in_mask(
    mask: np.ndarray,
    config_params: dict,
    min_area: int = 20_000,
    max_area: int = 10_000_000,
) -> list[dict]:
    """Find individual objects in a binary mask; return list of {bbox, isolated_mask}."""
    annotations = retina.detect_contours(
        image=mask,
        retrieval_mode="retrieve_external",
        approx_method="chain_approximate_simple",
        min_area=min_area,
        max_area=max_area,
    )
    annotations = annotations.to_list()
    object_infos = []
    for ann in annotations:
        if not ann["geometry"]["points"]:
            continue
        x, y, w, h = ann["bbox"]
        x, y, w, h = int(x), int(y), int(w), int(h)
        isolated_mask = np.zeros_like(mask)
        isolated_mask[y : y + h, x : x + w] = mask[y : y + h, x : x + w]
        object_infos.append({"bbox": [x, y, w, h], "isolated_mask": isolated_mask})
    return object_infos


def compute_object_angle_pca(
    config_params: dict,
    eigenvectors: np.ndarray,
) -> float:
    """In-plane angle (rad) from PCA principal axis vs ZERO_DEGREE_AXIS."""
    ax, ay = config_params["ZERO_DEGREE_AXIS"]
    bx, by = eigenvectors[:, 0]
    dot = ax * bx + ay * by
    cross = ax * by - ay * bx
    return float(np.arctan2(cross, dot))


def backproject_to_camera_pose(
    x_pixel,
    y_pixel,
    angle,
    config_params
):
    projector = projection.PixelToCameraPointProjector(
        config_params["CAMERA_INTRINSICS"],
        config_params["CAMERA_DISTORTION"]
    )

    camera_T_point = projector.execute(
        np.array([x_pixel, y_pixel]),
        depth=config_params["CAPTURE_HEIGHT"],
        theta=angle
    )

    pose = tfutils.transformation_matrix_to_pose(camera_T_point)

    return pose


def estimate_single_object_planar_pose(
    image: np.ndarray,
    config_params: dict,
    robot_base_T_tool0: np.ndarray,
    return_rot_type: str = "rad",
    hardcoded_height: Optional[float] = None,
    include_angle: bool = True,
) -> Optional[np.ndarray]:
    """
    One image → one 6-DOF pose (or None). If multiple objects, selects closest to image center.
    """
    seg = cornea.segment_using_foreground_birefnet(image=image)
    mask = seg.to_dict()["labeled_mask"]
    object_infos = detect_objects_in_mask(mask, config_params)

    if not object_infos:
        return None

    if len(object_infos) > 1:
        h, w = mask.shape[:2]
        center = np.array([w / 2, h / 2])
        distances = [
            np.linalg.norm(
                np.array([obj["bbox"][0] + obj["bbox"][2] / 2, obj["bbox"][1] + obj["bbox"][3] / 2])
                - center
            )
            for obj in object_infos
        ]
        idx = int(np.argmin(distances))
        mask = object_infos[idx]["isolated_mask"]
    else:
        mask = object_infos[0]["isolated_mask"]

    pca_comp = morphology.PCAComputer()
    centroid, eigenvectors, eigenvalues, _ = pca_comp.execute(mask)
    angle = (
        compute_object_angle_pca(config_params, eigenvectors)
        if include_angle
        else 0.0
    )
    return backproject_to_robot_base(
        centroid[0],
        centroid[1],
        angle,
        config_params,
        robot_base_T_tool0=robot_base_T_tool0,
        hardcoded_height=hardcoded_height,
        return_rot_type=return_rot_type,
    )


def estimate_object_poses(
    image: np.ndarray,
    config_params: dict,
    robot_base_T_tool0: np.ndarray,
    return_rot_type: str = "rad",
    hardcoded_height: Optional[float] = None,
    include_angle: bool = False,
) -> list[np.ndarray]:
    """
    One image → list of 6-DOF poses (one per detected object in the mask).
    """
    seg = cornea.segment_using_foreground_birefnet(image=image)
    mask = seg.to_dict()["labeled_mask"]
    object_infos = detect_objects_in_mask(mask, config_params)
    poses = []
    for obj in object_infos:
        iso = obj["isolated_mask"]
        pca_comp = morphology.PCAComputer()
        centroid, eigenvectors, eigenvalues, _ = pca_comp.execute(iso)
        angle = (
            compute_object_angle_pca(config_params, eigenvectors)
            if include_angle
            else 0.0
        )
        pose = backproject_to_robot_base(
            centroid[0],
            centroid[1],
            angle,
            config_params,
            robot_base_T_tool0=robot_base_T_tool0,
            hardcoded_height=hardcoded_height,
            return_rot_type=return_rot_type,
        )
        poses.append(pose)
    return poses

For complete applications that combine this pipeline with robot motion, multiple scan poses, or downstream tasks, see the telekinesis-examples repository.

Planar Pose Estimation from RGB Images ​

Overview ​

Overview ​

Use Cases ​

Input-Output ​

The Pipeline ​

The Code ​

Planar Pose Estimation from RGB Images

Overview

Overview

Use Cases

Input-Output

The Pipeline

The Code