Datasets¶

Example data loaders and synthetic data generation.

Example Data Loader¶

Example dataset loading and management.

class aquacal.datasets.loader.ExampleDataset(name, type, reference_calibration=None, metadata=<factory>, cache_path=None)[source]¶

Bases: object

Example calibration dataset downloaded from Zenodo.

Parameters:

name (str)
type (str)
reference_calibration (CalibrationResult | None)
metadata (dict)
cache_path (Path | None)

name¶

Dataset name (e.g., ‘real-rig’)

Type:: str

type¶

Dataset type (‘real’)

Type:: str

reference_calibration¶

Optional reference calibration result

Type:: aquacal.config.schema.CalibrationResult | None

metadata¶

Additional metadata about the dataset

Type:: dict

cache_path¶

Path to cached dataset files

Type:: pathlib.Path | None

aquacal.datasets.loader.load_example(name)[source]¶

Load an example calibration dataset.

Downloads datasets from Zenodo on first use and caches them locally.

Parameters:: name (str) – Dataset name. Available options: - ‘real-rig’: Real hardware calibration (Zenodo download)
Returns:: ExampleDataset with reference calibration and cache path
Raises:: ValueError – If dataset name is not recognized
Return type:: ExampleDataset

Examples

>>> from aquacal.datasets import load_example
>>> ds = load_example('real-rig')
>>> print(ds.cache_path)

Synthetic Data Generation¶

Synthetic data generation for testing and validation.

This module provides functions to generate synthetic calibration data with known ground truth. The main entry point is create_scenario() which returns predefined test scenarios with complete ground truth.

class aquacal.datasets.synthetic.SyntheticScenario(name, board_config, intrinsics, extrinsics, water_zs, board_poses, noise_std, description, images=None)[source]¶

Bases: object

Complete synthetic test scenario with ground truth.

Parameters:

name (str)
board_config (BoardConfig)
intrinsics (dict[str, CameraIntrinsics])
extrinsics (dict[str, CameraExtrinsics])
water_zs (dict[str, float])
board_poses (list[BoardPose])
noise_std (float)
description (str)
images (dict[str, dict[int, ndarray[tuple[int, ...], dtype[_ScalarType_co]]]] | None)

name¶

Scenario name

Type:: str

board_config¶

ChArUco board specification

Type:: aquacal.config.schema.BoardConfig

intrinsics¶

Per-camera intrinsics

Type:: dict[str, aquacal.config.schema.CameraIntrinsics]

extrinsics¶

Per-camera extrinsics

Type:: dict[str, aquacal.config.schema.CameraExtrinsics]

water_zs¶

Per-camera interface distances (Z-coordinate of water surface)

Type:: dict[str, float]

board_poses¶

List of board poses for all frames

Type:: list[aquacal.config.schema.BoardPose]

noise_std¶

Gaussian noise standard deviation applied to detections (pixels)

Type:: float

description¶

Human-readable description

Type:: str

images¶

Optional dict of rendered images (camera_name -> frame_idx -> image)

Type:: dict[str, dict[int, numpy.ndarray[tuple[int, …], numpy.dtype[numpy._typing._array_like._ScalarType_co]]]] | None

aquacal.datasets.synthetic.generate_camera_intrinsics(image_size=(1920, 1080), fov_horizontal_deg=60.0, principal_point_offset=(0.0, 0.0), distortion_k1=0.0, distortion_k2=0.0)[source]¶

Generate camera intrinsics with specified parameters.

Parameters:

image_size (tuple[int, int]) – (width, height) in pixels
fov_horizontal_deg (float) – Horizontal field of view in degrees
principal_point_offset (tuple[float, float]) – Offset from image center (pixels)
distortion_k1 (float) – First radial distortion coefficient
distortion_k2 (float) – Second radial distortion coefficient

Returns:

CameraIntrinsics with computed K matrix and distortion

Return type:

CameraIntrinsics

aquacal.datasets.synthetic.generate_camera_array(n_cameras, layout='grid', spacing=0.1, height_above_water=0.15, height_variation=0.005, image_size=(1920, 1080), fov_deg=60.0, seed=42)[source]¶

Generate a realistic camera array with known ground truth.

Parameters:

n_cameras (int) – Number of cameras (2-14)
layout (str) – Camera arrangement - “grid”, “line”, or “ring”
spacing (float) – Distance between adjacent cameras (meters)
height_above_water (float) – Mean interface distance (meters)
height_variation (float) – Std dev of per-camera height variation (meters)
image_size (tuple[int, int]) – Image dimensions (width, height)
fov_deg (float) – Horizontal field of view
seed (int) – Random seed for reproducibility

Returns:

Tuple of (intrinsics, extrinsics, water_zs) dicts keyed by camera name. Camera “cam0” is always the reference camera at origin with identity rotation.

Return type:

tuple[dict[str, CameraIntrinsics], dict[str, CameraExtrinsics], dict[str, float]]

aquacal.datasets.synthetic.generate_real_rig_array()[source]¶

Generate camera array matching the real-world 12-camera rig.

Geometry is derived from an actual calibration of the AquaCal hardware rig (12 cameras, e3v8250 excluded) with the following idealizations applied:

Common intrinsics: focal length, principal point, and distortion are averaged across all 12 cameras.
All cameras placed at Z = 0 (average real Z ≈ 0).
All optical axes aligned to world +Z (looking straight down); real cameras deviate < 5 deg.
XY positions preserved from the real calibration.
Common water_z = 1.031 m (the calibrated value).

Returns:: Tuple of (intrinsics, extrinsics, water_zs) dicts keyed by camera name (cam0 … cam11).
Return type:: tuple[dict[str, CameraIntrinsics], dict[str, CameraExtrinsics], dict[str, float]]

aquacal.datasets.synthetic.generate_board_trajectory(n_frames, camera_positions, water_zs, depth_range=(0.3, 0.6), xy_extent=0.15, rotation_range_deg=15.0, min_cameras_per_frame=2, seed=42)[source]¶

Generate board poses ensuring pose graph connectivity.

Creates a trajectory that ensures: - Each frame is visible by at least min_cameras_per_frame cameras - The pose graph is connected (can chain from reference to all cameras) - Board stays within reasonable depth range underwater

Parameters:

n_frames (int) – Number of frames to generate
camera_positions (dict[str, ndarray[tuple[int, ...], dtype[float64]]]) – Dict of camera center positions (from extrinsics)
water_zs (dict[str, float]) – Per-camera interface distances
depth_range (tuple[float, float]) – (min_z, max_z) for board center in world coords
xy_extent (float) – Maximum XY offset from origin
rotation_range_deg (float) – Maximum board tilt from horizontal
min_cameras_per_frame (int) – Minimum cameras that must see board
seed (int) – Random seed

Returns:

List of BoardPose objects with frame indices 0 to n_frames-1

Return type:

list[BoardPose]

aquacal.datasets.synthetic.generate_real_rig_trajectory(n_frames=100, depth_range=(1.1, 2.0), seed=42)[source]¶

Generate board trajectory appropriate for the real rig geometry.

The real rig has cameras at Z ≈ 0 with water surface at Z ≈ 1.03 m, so the board should be below the water surface (default 1.1–2.0 m, i.e. ~70–970 mm below the surface).

Trajectory covers the full field of view:

Positions sweep across the ~1.3 × 1.2 m footprint of the camera array
Ensures connectivity by visiting regions seen by multiple cameras

Parameters:

n_frames (int) – Number of frames to generate
depth_range (tuple[float, float]) – (min_z, max_z) for board center in world coords
seed (int) – Random seed

Returns:

List of BoardPose objects

Return type:

list[BoardPose]

aquacal.datasets.synthetic.generate_dense_xy_grid(depth, n_grid=7, xy_extent=0.5, xy_center=(0.0, 0.0), tilt_deg=3.0, frame_offset=0, seed=42)[source]¶

Generate board poses at a regular XY grid at a fixed depth.

Used for dense spatial coverage in reconstruction evaluation and heatmaps. Each grid position has a small random tilt and random in-plane rotation.

Parameters:

depth (float) – Z coordinate for all board poses (meters)
n_grid (int) – Number of grid positions per axis (total poses = n_grid^2)
xy_extent (float) – Grid spans from -xy_extent to +xy_extent around xy_center in X and Y (meters)
xy_center (tuple[float, float]) – (x, y) center of the grid (meters). Should match the centroid of the camera array for best coverage.
tilt_deg (float) – Maximum random tilt from horizontal (degrees)
frame_offset (int) – Starting frame index (default 0)
seed (int) – Random seed for reproducible tilts and rotations

Returns:

List of n_grid^2 BoardPose objects with frame indices starting from frame_offset.

Return type:

list[BoardPose]

aquacal.datasets.synthetic.generate_synthetic_detections(intrinsics, extrinsics, water_zs, board, board_poses, noise_std=0.0, min_corners=8, seed=42)[source]¶

Generate synthetic detections by projecting through refractive interface.

For each board pose and camera: 1. Transform board corners to world coordinates 2. Project each corner through refractive interface 3. Add Gaussian noise to pixel coordinates 4. Filter corners outside image bounds 5. Only include camera if >= min_corners visible

Parameters:

intrinsics (dict[str, CameraIntrinsics]) – Per-camera intrinsics
extrinsics (dict[str, CameraExtrinsics]) – Per-camera extrinsics
water_zs (dict[str, float]) – Per-camera interface distances
board (BoardGeometry) – Board geometry
board_poses (list[BoardPose]) – List of board poses
noise_std (float) – Gaussian noise standard deviation (pixels)
min_corners (int) – Minimum corners for valid detection
seed (int) – Random seed for noise

Returns:

DetectionResult matching format from real detection pipeline

Return type:

DetectionResult

aquacal.datasets.synthetic.compute_calibration_errors(result, ground_truth)[source]¶

Compare calibration result to ground truth.

Computes: - focal_length_error_percent: Max relative error in fx, fy - principal_point_error_px: Max error in cx, cy - rotation_error_deg: Max rotation error across cameras - translation_error_mm: Max translation error across cameras - water_z_error_mm: Max interface distance error

Parameters:

result (CalibrationResult) – Calibration result from pipeline
ground_truth (SyntheticScenario) – Synthetic scenario with known truth

Returns:

Dict of error metrics

Return type:

dict[str, float]

aquacal.datasets.synthetic.create_scenario(name, seed=42)[source]¶

Create a predefined test scenario with complete ground truth.

Available scenarios:

'ideal': 4 cameras, 20 frames, 0 noise — verify math correctness
'minimal': 2 cameras, 10 frames, 0.3 px noise — edge case
'realistic': 12 cameras matching actual hardware, 30 frames, 0.5 px noise

All presets use the same ChArUco board (12x9 squares, 60 mm square size, 45 mm marker size, DICT_5X5_100).

Parameters:

name (str) – Scenario name ('ideal', 'minimal', or 'realistic')
seed (int) – Random seed for reproducibility

Returns:

SyntheticScenario with complete ground truth (intrinsics, extrinsics, interface distances, board poses).

Raises:

ValueError – If scenario name is not recognized.

Return type:

SyntheticScenario

Examples

>>> from aquacal.datasets import create_scenario
>>> scenario = create_scenario('ideal')
>>> print(f"{len(scenario.intrinsics)} cameras, {len(scenario.board_poses)} frames")
4 cameras, 20 frames
>>>
>>> scenario = create_scenario('realistic')
>>> print(f"{len(scenario.intrinsics)} cameras")
12 cameras