Planar Pose Estimation from RGB Images
SUMMARY
Estimate planar 6-DOF object poses from a single RGB image. The pipeline segments foreground objects, detects contours, computes centroid and in-plane orientation using PCA, and backprojects the result to the robot base frame using camera calibration.
Overview
Overview
This example demonstrates how to estimate planar 6-DOF object poses in the robot base frame from a single RGB image when parts lie on a planar surface.
The pipeline segments the foreground, detects individual objects within the mask, and computes each object's centroid and in-plane orientation using PCA. The resulting pixel coordinates and orientation are then backprojected using camera intrinsics and the tool–camera transform to obtain a pose in the robot base frame.
This approach provides a lightweight vision pipeline for applications such as bin picking, part placement, and robotic manipulation, where objects lie on a known plane.
Inputs
- RGB image (e.g. from a camera above the bin or part)
- Camera intrinsics, distortion, and tool–camera transform; capture height (depth) for backprojection
- Reference axis for in-plane angle (e.g.
ZERO_DEGREE_AXISin config) - Optional: base–tool0 transform (e.g. from robot) for poses in robot base frame
Required Telekinesis Skills / Packages
- Cornea — Segment Image Using Foreground BiRefNet for foreground segmentation
- Retina — Detect Contours for object instances in the mask
- Pupil — PCA (e.g. morphology.PCAComputer) for centroid and principal axis
- Medulla — pixel-to-camera projection (e.g. PixelToCameraPointProjector)
Optional: Rerun for visualization (segmentation, bboxes, PCA).
Use Cases
Planar pose estimation from a single camera
Segment foreground objects, compute centroid and orientation with PCA, and backproject to obtain a 6-DOF pose.Single-object pose estimation
One part in view → segmentation + PCA + backprojection → single 6-DOF pose.Multi-object pose estimation
Multiple parts → contour detection + PCA per object → list of 6-DOF poses.
Input-Output
[x, y, z, rx, ry, rz] in robot base frame, or a list of poses (one per detected object). Optional Rerun logs.The Pipeline
The pipeline estimates planar object poses by combining foreground segmentation, contour-based object detection, PCA-based orientation estimation, and camera backprojection.
Capture Image (or load RGB)
↓
Segment Image Using Foreground BiRefNet
↓
Detect Contours (objects in mask)
↓
Optional: Multi-Object Selection (e.g. closest to center)
↓
PCA (centroid, principal axis, in-plane angle vs reference axis)
↓
Pixel → Camera projection
↓
6-DOF pose in camera frame- Segment Image Using Foreground BiRefNet — Foreground segmentation to obtain a binary mask of the part(s).
- Detect Contours — Detect individual object regions in the segmentation mask. (bbox + isolated mask per instance).
- Orientation — PCA (Pupil) for centroid and principal axis; in-plane angle from PCA eigenvector vs a reference axis (e.g.
ZERO_DEGREE_AXIS). - Backproject — Pixel + depth + angle → 3D point in camera frame (e.g. Medulla projection); transform through tool–camera and base–tool to get 6-DOF pose in robot base.
The Code
The pose estimation pipeline: object detection in mask, PCA for centroid and angle, and backprojection. Entry points are estimate_single_object_planar_pose (one pose from one image) and estimate_object_poses (list of poses from one image). Config (camera intrinsics, TOOL0_T_CAMERA, ZERO_DEGREE_AXIS, etc.) is in config_params; robot_base_T_tool0 can be obtained from the robot or passed in.
import numpy as np
from typing import Optional
from telekinesis import cornea, retina
from pupil import morphology
from medulla import projection
from tf import tfutils
# config_params: CAMERA_INTRINSICS, CAMERA_DISTORTION, TOOL0_T_CAMERA,
# CAMERA_T_TOOL0_ROTATION, CAPTURE_HEIGHT, ZERO_DEGREE_AXIS, ROT_TYPE, etc.
# robot_base_T_tool0: 4x4 from robot or lookup_transform("base", "tool0")
def detect_objects_in_mask(
mask: np.ndarray,
config_params: dict,
min_area: int = 20_000,
max_area: int = 10_000_000,
) -> list[dict]:
"""Find individual objects in a binary mask; return list of {bbox, isolated_mask}."""
annotations = retina.detect_contours(
image=mask,
retrieval_mode="retrieve_external",
approx_method="chain_approximate_simple",
min_area=min_area,
max_area=max_area,
)
annotations = annotations.to_list()
object_infos = []
for ann in annotations:
if not ann["geometry"]["points"]:
continue
x, y, w, h = ann["bbox"]
x, y, w, h = int(x), int(y), int(w), int(h)
isolated_mask = np.zeros_like(mask)
isolated_mask[y : y + h, x : x + w] = mask[y : y + h, x : x + w]
object_infos.append({"bbox": [x, y, w, h], "isolated_mask": isolated_mask})
return object_infos
def compute_object_angle_pca(
config_params: dict,
eigenvectors: np.ndarray,
) -> float:
"""In-plane angle (rad) from PCA principal axis vs ZERO_DEGREE_AXIS."""
ax, ay = config_params["ZERO_DEGREE_AXIS"]
bx, by = eigenvectors[:, 0]
dot = ax * bx + ay * by
cross = ax * by - ay * bx
return float(np.arctan2(cross, dot))
def backproject_to_camera_pose(
x_pixel,
y_pixel,
angle,
config_params
):
projector = projection.PixelToCameraPointProjector(
config_params["CAMERA_INTRINSICS"],
config_params["CAMERA_DISTORTION"]
)
camera_T_point = projector.execute(
np.array([x_pixel, y_pixel]),
depth=config_params["CAPTURE_HEIGHT"],
theta=angle
)
pose = tfutils.transformation_matrix_to_pose(camera_T_point)
return pose
def estimate_single_object_planar_pose(
image: np.ndarray,
config_params: dict,
robot_base_T_tool0: np.ndarray,
return_rot_type: str = "rad",
hardcoded_height: Optional[float] = None,
include_angle: bool = True,
) -> Optional[np.ndarray]:
"""
One image → one 6-DOF pose (or None). If multiple objects, selects closest to image center.
"""
seg = cornea.segment_using_foreground_birefnet(image=image)
mask = seg.to_dict()["labeled_mask"]
object_infos = detect_objects_in_mask(mask, config_params)
if not object_infos:
return None
if len(object_infos) > 1:
h, w = mask.shape[:2]
center = np.array([w / 2, h / 2])
distances = [
np.linalg.norm(
np.array([obj["bbox"][0] + obj["bbox"][2] / 2, obj["bbox"][1] + obj["bbox"][3] / 2])
- center
)
for obj in object_infos
]
idx = int(np.argmin(distances))
mask = object_infos[idx]["isolated_mask"]
else:
mask = object_infos[0]["isolated_mask"]
pca_comp = morphology.PCAComputer()
centroid, eigenvectors, eigenvalues, _ = pca_comp.execute(mask)
angle = (
compute_object_angle_pca(config_params, eigenvectors)
if include_angle
else 0.0
)
return backproject_to_robot_base(
centroid[0],
centroid[1],
angle,
config_params,
robot_base_T_tool0=robot_base_T_tool0,
hardcoded_height=hardcoded_height,
return_rot_type=return_rot_type,
)
def estimate_object_poses(
image: np.ndarray,
config_params: dict,
robot_base_T_tool0: np.ndarray,
return_rot_type: str = "rad",
hardcoded_height: Optional[float] = None,
include_angle: bool = False,
) -> list[np.ndarray]:
"""
One image → list of 6-DOF poses (one per detected object in the mask).
"""
seg = cornea.segment_using_foreground_birefnet(image=image)
mask = seg.to_dict()["labeled_mask"]
object_infos = detect_objects_in_mask(mask, config_params)
poses = []
for obj in object_infos:
iso = obj["isolated_mask"]
pca_comp = morphology.PCAComputer()
centroid, eigenvectors, eigenvalues, _ = pca_comp.execute(iso)
angle = (
compute_object_angle_pca(config_params, eigenvectors)
if include_angle
else 0.0
)
pose = backproject_to_robot_base(
centroid[0],
centroid[1],
angle,
config_params,
robot_base_T_tool0=robot_base_T_tool0,
hardcoded_height=hardcoded_height,
return_rot_type=return_rot_type,
)
poses.append(pose)
return posesFor complete applications that combine this pipeline with robot motion, multiple scan poses, or downstream tasks, see the telekinesis-examples repository.

