Datasets

Example data loaders and synthetic data generation.

Example Data Loader

Example dataset loading and management.

class aquacal.datasets.loader.ExampleDataset(name, type, reference_calibration=None, metadata=<factory>, cache_path=None)[source]

Bases: object

Example calibration dataset downloaded from Zenodo.

Parameters:
name

Dataset name (e.g., ‘real-rig’)

Type:

str

type

Dataset type (‘real’)

Type:

str

reference_calibration

Optional reference calibration result

Type:

aquacal.config.schema.CalibrationResult | None

metadata

Additional metadata about the dataset

Type:

dict

cache_path

Path to cached dataset files

Type:

pathlib.Path | None

aquacal.datasets.loader.load_example(name)[source]

Load an example calibration dataset.

Downloads datasets from Zenodo on first use and caches them locally.

Parameters:

name (str) – Dataset name. Available options: - ‘real-rig’: Real hardware calibration (Zenodo download)

Returns:

ExampleDataset with reference calibration and cache path

Raises:

ValueError – If dataset name is not recognized

Return type:

ExampleDataset

Examples

>>> from aquacal.datasets import load_example
>>> ds = load_example('real-rig')
>>> print(ds.cache_path)

Synthetic Data Generation

Synthetic data generation for testing and validation.

This module provides functions to generate synthetic calibration data with known ground truth. The main entry point is create_scenario() which returns predefined test scenarios with complete ground truth.

class aquacal.datasets.synthetic.SyntheticScenario(name, board_config, intrinsics, extrinsics, water_zs, board_poses, noise_std, description, images=None)[source]

Bases: object

Complete synthetic test scenario with ground truth.

Parameters:
name

Scenario name

Type:

str

board_config

ChArUco board specification

Type:

aquacal.config.schema.BoardConfig

intrinsics

Per-camera intrinsics

Type:

dict[str, aquacal.config.schema.CameraIntrinsics]

extrinsics

Per-camera extrinsics

Type:

dict[str, aquacal.config.schema.CameraExtrinsics]

water_zs

Per-camera interface distances (Z-coordinate of water surface)

Type:

dict[str, float]

board_poses

List of board poses for all frames

Type:

list[aquacal.config.schema.BoardPose]

noise_std

Gaussian noise standard deviation applied to detections (pixels)

Type:

float

description

Human-readable description

Type:

str

images

Optional dict of rendered images (camera_name -> frame_idx -> image)

Type:

dict[str, dict[int, numpy.ndarray[tuple[int, …], numpy.dtype[numpy._typing._array_like._ScalarType_co]]]] | None

aquacal.datasets.synthetic.generate_camera_intrinsics(image_size=(1920, 1080), fov_horizontal_deg=60.0, principal_point_offset=(0.0, 0.0), distortion_k1=0.0, distortion_k2=0.0)[source]

Generate camera intrinsics with specified parameters.

Parameters:
  • image_size (tuple[int, int]) – (width, height) in pixels

  • fov_horizontal_deg (float) – Horizontal field of view in degrees

  • principal_point_offset (tuple[float, float]) – Offset from image center (pixels)

  • distortion_k1 (float) – First radial distortion coefficient

  • distortion_k2 (float) – Second radial distortion coefficient

Returns:

CameraIntrinsics with computed K matrix and distortion

Return type:

CameraIntrinsics

aquacal.datasets.synthetic.generate_camera_array(n_cameras, layout='grid', spacing=0.1, height_above_water=0.15, height_variation=0.005, image_size=(1920, 1080), fov_deg=60.0, seed=42)[source]

Generate a realistic camera array with known ground truth.

Parameters:
  • n_cameras (int) – Number of cameras (2-14)

  • layout (str) – Camera arrangement - “grid”, “line”, or “ring”

  • spacing (float) – Distance between adjacent cameras (meters)

  • height_above_water (float) – Mean interface distance (meters)

  • height_variation (float) – Std dev of per-camera height variation (meters)

  • image_size (tuple[int, int]) – Image dimensions (width, height)

  • fov_deg (float) – Horizontal field of view

  • seed (int) – Random seed for reproducibility

Returns:

Tuple of (intrinsics, extrinsics, water_zs) dicts keyed by camera name. Camera “cam0” is always the reference camera at origin with identity rotation.

Return type:

tuple[dict[str, CameraIntrinsics], dict[str, CameraExtrinsics], dict[str, float]]

aquacal.datasets.synthetic.generate_real_rig_array()[source]

Generate camera array matching the real-world 12-camera rig.

Geometry is derived from an actual calibration of the AquaCal hardware rig (12 cameras, e3v8250 excluded) with the following idealizations applied:

  • Common intrinsics: focal length, principal point, and distortion are averaged across all 12 cameras.

  • All cameras placed at Z = 0 (average real Z ≈ 0).

  • All optical axes aligned to world +Z (looking straight down); real cameras deviate < 5 deg.

  • XY positions preserved from the real calibration.

  • Common water_z = 1.031 m (the calibrated value).

Returns:

Tuple of (intrinsics, extrinsics, water_zs) dicts keyed by camera name (cam0 … cam11).

Return type:

tuple[dict[str, CameraIntrinsics], dict[str, CameraExtrinsics], dict[str, float]]

aquacal.datasets.synthetic.generate_board_trajectory(n_frames, camera_positions, water_zs, depth_range=(0.3, 0.6), xy_extent=0.15, rotation_range_deg=15.0, min_cameras_per_frame=2, seed=42)[source]

Generate board poses ensuring pose graph connectivity.

Creates a trajectory that ensures: - Each frame is visible by at least min_cameras_per_frame cameras - The pose graph is connected (can chain from reference to all cameras) - Board stays within reasonable depth range underwater

Parameters:
  • n_frames (int) – Number of frames to generate

  • camera_positions (dict[str, ndarray[tuple[int, ...], dtype[float64]]]) – Dict of camera center positions (from extrinsics)

  • water_zs (dict[str, float]) – Per-camera interface distances

  • depth_range (tuple[float, float]) – (min_z, max_z) for board center in world coords

  • xy_extent (float) – Maximum XY offset from origin

  • rotation_range_deg (float) – Maximum board tilt from horizontal

  • min_cameras_per_frame (int) – Minimum cameras that must see board

  • seed (int) – Random seed

Returns:

List of BoardPose objects with frame indices 0 to n_frames-1

Return type:

list[BoardPose]

aquacal.datasets.synthetic.generate_real_rig_trajectory(n_frames=100, depth_range=(1.1, 2.0), seed=42)[source]

Generate board trajectory appropriate for the real rig geometry.

The real rig has cameras at Z ≈ 0 with water surface at Z ≈ 1.03 m, so the board should be below the water surface (default 1.1–2.0 m, i.e. ~70–970 mm below the surface).

Trajectory covers the full field of view:

  • Positions sweep across the ~1.3 × 1.2 m footprint of the camera array

  • Ensures connectivity by visiting regions seen by multiple cameras

Parameters:
  • n_frames (int) – Number of frames to generate

  • depth_range (tuple[float, float]) – (min_z, max_z) for board center in world coords

  • seed (int) – Random seed

Returns:

List of BoardPose objects

Return type:

list[BoardPose]

aquacal.datasets.synthetic.generate_dense_xy_grid(depth, n_grid=7, xy_extent=0.5, xy_center=(0.0, 0.0), tilt_deg=3.0, frame_offset=0, seed=42)[source]

Generate board poses at a regular XY grid at a fixed depth.

Used for dense spatial coverage in reconstruction evaluation and heatmaps. Each grid position has a small random tilt and random in-plane rotation.

Parameters:
  • depth (float) – Z coordinate for all board poses (meters)

  • n_grid (int) – Number of grid positions per axis (total poses = n_grid^2)

  • xy_extent (float) – Grid spans from -xy_extent to +xy_extent around xy_center in X and Y (meters)

  • xy_center (tuple[float, float]) – (x, y) center of the grid (meters). Should match the centroid of the camera array for best coverage.

  • tilt_deg (float) – Maximum random tilt from horizontal (degrees)

  • frame_offset (int) – Starting frame index (default 0)

  • seed (int) – Random seed for reproducible tilts and rotations

Returns:

List of n_grid^2 BoardPose objects with frame indices starting from frame_offset.

Return type:

list[BoardPose]

aquacal.datasets.synthetic.generate_synthetic_detections(intrinsics, extrinsics, water_zs, board, board_poses, noise_std=0.0, min_corners=8, seed=42)[source]

Generate synthetic detections by projecting through refractive interface.

For each board pose and camera: 1. Transform board corners to world coordinates 2. Project each corner through refractive interface 3. Add Gaussian noise to pixel coordinates 4. Filter corners outside image bounds 5. Only include camera if >= min_corners visible

Parameters:
  • intrinsics (dict[str, CameraIntrinsics]) – Per-camera intrinsics

  • extrinsics (dict[str, CameraExtrinsics]) – Per-camera extrinsics

  • water_zs (dict[str, float]) – Per-camera interface distances

  • board (BoardGeometry) – Board geometry

  • board_poses (list[BoardPose]) – List of board poses

  • noise_std (float) – Gaussian noise standard deviation (pixels)

  • min_corners (int) – Minimum corners for valid detection

  • seed (int) – Random seed for noise

Returns:

DetectionResult matching format from real detection pipeline

Return type:

DetectionResult

aquacal.datasets.synthetic.compute_calibration_errors(result, ground_truth)[source]

Compare calibration result to ground truth.

Computes: - focal_length_error_percent: Max relative error in fx, fy - principal_point_error_px: Max error in cx, cy - rotation_error_deg: Max rotation error across cameras - translation_error_mm: Max translation error across cameras - water_z_error_mm: Max interface distance error

Parameters:
Returns:

Dict of error metrics

Return type:

dict[str, float]

aquacal.datasets.synthetic.create_scenario(name, seed=42)[source]

Create a predefined test scenario with complete ground truth.

Available scenarios:

  • 'ideal': 4 cameras, 20 frames, 0 noise — verify math correctness

  • 'minimal': 2 cameras, 10 frames, 0.3 px noise — edge case

  • 'realistic': 12 cameras matching actual hardware, 30 frames, 0.5 px noise

All presets use the same ChArUco board (12x9 squares, 60 mm square size, 45 mm marker size, DICT_5X5_100).

Parameters:
  • name (str) – Scenario name ('ideal', 'minimal', or 'realistic')

  • seed (int) – Random seed for reproducibility

Returns:

SyntheticScenario with complete ground truth (intrinsics, extrinsics, interface distances, board poses).

Raises:

ValueError – If scenario name is not recognized.

Return type:

SyntheticScenario

Examples

>>> from aquacal.datasets import create_scenario
>>> scenario = create_scenario('ideal')
>>> print(f"{len(scenario.intrinsics)} cameras, {len(scenario.board_poses)} frames")
4 cameras, 20 frames
>>>
>>> scenario = create_scenario('realistic')
>>> print(f"{len(scenario.intrinsics)} cameras")
12 cameras