Project Pixel to World Point
SUMMARY
Project Pixel to World Point projects a pixel and depth to a 3D point in world coordinates.
Unprojects a 2D pixel and depth into 3D camera space, then transforms to world coordinates using the camera pose (world_T_camera). Essential for robotics, where 3D points are needed in a global or robot frame.
Use this Skill when you want to convert pixel + depth to 3D world coordinates.
The Skill
from telekinesis import pupil
import numpy as np
world_T_point = pupil.project_pixel_to_world_point(
camera_intrinsics=camera_intrinsics,
distortion_coefficients=distortion_coefficients,
pixel=pixel,
depth=depth,
world_T_camera=world_T_camera,
)Example
Projects pixel (320, 240) with depth 1.0 to a 3D point in world coordinates. The camera is at z=1 in world frame (world_T_camera). The result is a transform or point in world frame.
The Code
from telekinesis import pupil
import numpy as np
from loguru import logger
camera_intrinsics = np.array(
[[500.0, 0, 320.0], [0, 500.0, 240.0], [0, 0, 1.0]],
dtype=np.float64,
)
distortion_coefficients = np.array([0.0, 0.0, 0.0, 0.0, 0.0], dtype=np.float64)
pixel = np.array([320.0, 240.0], dtype=np.float64)
depth = 1.0
world_T_camera = np.eye(4, dtype=np.float64)
world_T_camera[2, 3] = 1.0
world_T_point = pupil.project_pixel_to_world_point(
camera_intrinsics=camera_intrinsics,
distortion_coefficients=distortion_coefficients,
pixel=pixel,
depth=depth,
world_T_camera=world_T_camera,
)
logger.success(
"Projected pixel to world point. world_T_point shape: {}",
np.asarray(world_T_point.matrix).shape if hasattr(world_T_point, "matrix") else "N/A",
)The Explanation of the Code
The code begins by importing the necessary modules: pupil for camera projection operations, numpy for numerical operations, and loguru for logging.
from telekinesis import pupil
import numpy as np
from loguru import loggerNext, camera intrinsics, distortion coefficients, pixel, depth, and the camera pose are configured. The world_T_camera is the 4x4 transform from camera to world frame.
camera_intrinsics = np.array(
[[500.0, 0, 320.0], [0, 500.0, 240.0], [0, 0, 1.0]],
dtype=np.float64,
)
distortion_coefficients = np.array([0.0, 0.0, 0.0, 0.0, 0.0], dtype=np.float64)
pixel = np.array([320.0, 240.0], dtype=np.float64)
depth = 1.0
world_T_camera = np.eye(4, dtype=np.float64)
world_T_camera[2, 3] = 1.0The main operation uses the project_pixel_to_world_point Skill from the pupil module. This Skill unprojects a 2D pixel and depth into 3D camera space, then transforms to world coordinates using the camera pose.
world_T_point = pupil.project_pixel_to_world_point(
camera_intrinsics=camera_intrinsics,
distortion_coefficients=distortion_coefficients,
pixel=pixel,
depth=depth,
world_T_camera=world_T_camera,
)Finally, the 3D world point can be extracted from the output matrix for further processing, visualization, or downstream tasks.
# Extract world point from matrix if needed
logger.success("Projected pixel to world point.")This operation is particularly useful in robotics and vision pipelines for 3D reconstruction, robot picking, and depth-to-world conversion, where converting pixel and depth to 3D world coordinates is required.
Running the Example
Runnable examples are available in the Telekinesis examples repository. Follow the README in that repository to set up the environment. Once set up, you can run this specific example with:
cd telekinesis-examples
python examples/pupil_examples.py --example project_pixel_to_world_pointHow to Tune the Parameters
The project_pixel_to_world_point Skill has no tunable parameters in the traditional sense. It requires camera calibration data, pixel coordinates, depth, and camera pose:
camera_intrinsics (no default—required): 3x3 matrix with fx, fy, cx, cy. Obtain from camera calibration.
distortion_coefficients (default: np.array([0.0, 0.0, 0.0, 0.0, 0.0])): Lens distortion coefficients. Use zeros for undistorted models.
pixel (no default—required): 2D pixel (u, v) in image coordinates.
depth (no default—required): Distance along the optical axis. Must be valid and positive.
world_T_camera (no default—required): 4x4 transform from camera to world frame. Obtain from camera pose estimation or calibration.
Where to Use the Skill in a Pipeline
Project Pixel to World Point is commonly used in the following pipelines:
- Robot picking - Convert 2D pick point + depth to 3D grasp pose
- 3D mapping - Build world point clouds from RGB-D
- Object localization - Get object position in world frame
- Sensor fusion - Align with robot/global coordinates
Related skills to build such a pipeline:
project_pixel_to_camera_point: Camera-frame onlyproject_world_point_to_pixel: Inverse operationapply_transform_to_point_cloud: Transform point clouds (Vitreous)
Alternative Skills
| Skill | vs. Project Pixel to World Point |
|---|---|
| project_pixel_to_camera_point | Use when camera-frame coordinates suffice. |
| project_world_point_to_pixel | Inverse: world point → pixel. |
When Not to Use the Skill
Do not use Project Pixel to World Point when:
- Camera-frame is sufficient (Use project_pixel_to_camera_point)
- You need pixel from world point (Use project_world_point_to_pixel)
- world_T_camera is unknown (Calibrate or estimate camera pose first)

