Skip to content

Project Pixel to World Point

SUMMARY

Project Pixel to World Point projects a pixel and depth to a 3D point in world coordinates.

Unprojects a 2D pixel and depth into 3D camera space, then transforms to world coordinates using the camera pose (world_T_camera). Essential for robotics, where 3D points are needed in a global or robot frame.

Use this Skill when you want to convert pixel + depth to 3D world coordinates.

The Skill

python
from telekinesis import pupil
import numpy as np

world_T_point = pupil.project_pixel_to_world_point(
    camera_intrinsics=camera_intrinsics,
    distortion_coefficients=distortion_coefficients,
    pixel=pixel,
    depth=depth,
    world_T_camera=world_T_camera,
)

API Reference

Example

Projects pixel (320, 240) with depth 1.0 to a 3D point in world coordinates. The camera is at z=1 in world frame (world_T_camera). The result is a transform or point in world frame.

The Code

python
from telekinesis import pupil
import numpy as np
from loguru import logger

camera_intrinsics = np.array(
    [[500.0, 0, 320.0], [0, 500.0, 240.0], [0, 0, 1.0]],
    dtype=np.float64,
)
distortion_coefficients = np.array([0.0, 0.0, 0.0, 0.0, 0.0], dtype=np.float64)
pixel = np.array([320.0, 240.0], dtype=np.float64)
depth = 1.0
world_T_camera = np.eye(4, dtype=np.float64)
world_T_camera[2, 3] = 1.0

world_T_point = pupil.project_pixel_to_world_point(
    camera_intrinsics=camera_intrinsics,
    distortion_coefficients=distortion_coefficients,
    pixel=pixel,
    depth=depth,
    world_T_camera=world_T_camera,
)

logger.success(
    "Projected pixel to world point. world_T_point shape: {}",
    np.asarray(world_T_point.matrix).shape if hasattr(world_T_point, "matrix") else "N/A",
)

The Explanation of the Code

The code begins by importing the necessary modules: pupil for camera projection operations, numpy for numerical operations, and loguru for logging.

python
from telekinesis import pupil
import numpy as np
from loguru import logger

Next, camera intrinsics, distortion coefficients, pixel, depth, and the camera pose are configured. The world_T_camera is the 4x4 transform from camera to world frame.

python
camera_intrinsics = np.array(
    [[500.0, 0, 320.0], [0, 500.0, 240.0], [0, 0, 1.0]],
    dtype=np.float64,
)
distortion_coefficients = np.array([0.0, 0.0, 0.0, 0.0, 0.0], dtype=np.float64)
pixel = np.array([320.0, 240.0], dtype=np.float64)
depth = 1.0
world_T_camera = np.eye(4, dtype=np.float64)
world_T_camera[2, 3] = 1.0

The main operation uses the project_pixel_to_world_point Skill from the pupil module. This Skill unprojects a 2D pixel and depth into 3D camera space, then transforms to world coordinates using the camera pose.

python
world_T_point = pupil.project_pixel_to_world_point(
    camera_intrinsics=camera_intrinsics,
    distortion_coefficients=distortion_coefficients,
    pixel=pixel,
    depth=depth,
    world_T_camera=world_T_camera,
)

Finally, the 3D world point can be extracted from the output matrix for further processing, visualization, or downstream tasks.

python
# Extract world point from matrix if needed
logger.success("Projected pixel to world point.")

This operation is particularly useful in robotics and vision pipelines for 3D reconstruction, robot picking, and depth-to-world conversion, where converting pixel and depth to 3D world coordinates is required.

Running the Example

Runnable examples are available in the Telekinesis examples repository. Follow the README in that repository to set up the environment. Once set up, you can run this specific example with:

bash
cd telekinesis-examples
python examples/pupil_examples.py --example project_pixel_to_world_point

How to Tune the Parameters

The project_pixel_to_world_point Skill has no tunable parameters in the traditional sense. It requires camera calibration data, pixel coordinates, depth, and camera pose:

camera_intrinsics (no default—required): 3x3 matrix with fx, fy, cx, cy. Obtain from camera calibration.

distortion_coefficients (default: np.array([0.0, 0.0, 0.0, 0.0, 0.0])): Lens distortion coefficients. Use zeros for undistorted models.

pixel (no default—required): 2D pixel (u, v) in image coordinates.

depth (no default—required): Distance along the optical axis. Must be valid and positive.

world_T_camera (no default—required): 4x4 transform from camera to world frame. Obtain from camera pose estimation or calibration.

Where to Use the Skill in a Pipeline

Project Pixel to World Point is commonly used in the following pipelines:

  • Robot picking - Convert 2D pick point + depth to 3D grasp pose
  • 3D mapping - Build world point clouds from RGB-D
  • Object localization - Get object position in world frame
  • Sensor fusion - Align with robot/global coordinates

Related skills to build such a pipeline:

  • project_pixel_to_camera_point: Camera-frame only
  • project_world_point_to_pixel: Inverse operation
  • apply_transform_to_point_cloud: Transform point clouds (Vitreous)

Alternative Skills

Skillvs. Project Pixel to World Point
project_pixel_to_camera_pointUse when camera-frame coordinates suffice.
project_world_point_to_pixelInverse: world point → pixel.

When Not to Use the Skill

Do not use Project Pixel to World Point when:

  • Camera-frame is sufficient (Use project_pixel_to_camera_point)
  • You need pixel from world point (Use project_world_point_to_pixel)
  • world_T_camera is unknown (Calibrate or estimate camera pose first)