Datasets¶
Example data loaders and synthetic data generation.
Example Data Loader¶
Example dataset loading and management.
- class aquacal.datasets.loader.ExampleDataset(name, type, reference_calibration=None, metadata=<factory>, cache_path=None)[source]¶
Bases:
objectExample calibration dataset downloaded from Zenodo.
- Parameters:
name (str)
type (str)
reference_calibration (CalibrationResult | None)
metadata (dict)
cache_path (Path | None)
- reference_calibration¶
Optional reference calibration result
- Type:
- cache_path¶
Path to cached dataset files
- Type:
pathlib.Path | None
- aquacal.datasets.loader.load_example(name)[source]¶
Load an example calibration dataset.
Downloads datasets from Zenodo on first use and caches them locally.
- Parameters:
name (str) – Dataset name. Available options: - ‘real-rig’: Real hardware calibration (Zenodo download)
- Returns:
ExampleDataset with reference calibration and cache path
- Raises:
ValueError – If dataset name is not recognized
- Return type:
Examples
>>> from aquacal.datasets import load_example >>> ds = load_example('real-rig') >>> print(ds.cache_path)
Synthetic Data Generation¶
Synthetic data generation for testing and validation.
This module provides functions to generate synthetic calibration data with known
ground truth. The main entry point is create_scenario() which returns
predefined test scenarios with complete ground truth.
- class aquacal.datasets.synthetic.SyntheticScenario(name, board_config, intrinsics, extrinsics, water_zs, board_poses, noise_std, description, images=None)[source]¶
Bases:
objectComplete synthetic test scenario with ground truth.
- Parameters:
- board_config¶
ChArUco board specification
- intrinsics¶
Per-camera intrinsics
- extrinsics¶
Per-camera extrinsics
- board_poses¶
List of board poses for all frames
- images¶
Optional dict of rendered images (camera_name -> frame_idx -> image)
- Type:
dict[str, dict[int, numpy.ndarray[tuple[int, …], numpy.dtype[numpy._typing._array_like._ScalarType_co]]]] | None
- aquacal.datasets.synthetic.generate_camera_intrinsics(image_size=(1920, 1080), fov_horizontal_deg=60.0, principal_point_offset=(0.0, 0.0), distortion_k1=0.0, distortion_k2=0.0)[source]¶
Generate camera intrinsics with specified parameters.
- Parameters:
- Returns:
CameraIntrinsics with computed K matrix and distortion
- Return type:
- aquacal.datasets.synthetic.generate_camera_array(n_cameras, layout='grid', spacing=0.1, height_above_water=0.15, height_variation=0.005, image_size=(1920, 1080), fov_deg=60.0, seed=42)[source]¶
Generate a realistic camera array with known ground truth.
- Parameters:
n_cameras (int) – Number of cameras (2-14)
layout (str) – Camera arrangement - “grid”, “line”, or “ring”
spacing (float) – Distance between adjacent cameras (meters)
height_above_water (float) – Mean interface distance (meters)
height_variation (float) – Std dev of per-camera height variation (meters)
image_size (tuple[int, int]) – Image dimensions (width, height)
fov_deg (float) – Horizontal field of view
seed (int) – Random seed for reproducibility
- Returns:
Tuple of (intrinsics, extrinsics, water_zs) dicts keyed by camera name. Camera “cam0” is always the reference camera at origin with identity rotation.
- Return type:
tuple[dict[str, CameraIntrinsics], dict[str, CameraExtrinsics], dict[str, float]]
- aquacal.datasets.synthetic.generate_real_rig_array()[source]¶
Generate camera array matching the real-world 12-camera rig.
Geometry is derived from an actual calibration of the AquaCal hardware rig (12 cameras, e3v8250 excluded) with the following idealizations applied:
Common intrinsics: focal length, principal point, and distortion are averaged across all 12 cameras.
All cameras placed at Z = 0 (average real Z ≈ 0).
All optical axes aligned to world +Z (looking straight down); real cameras deviate < 5 deg.
XY positions preserved from the real calibration.
Common
water_z = 1.031 m(the calibrated value).
- Returns:
Tuple of
(intrinsics, extrinsics, water_zs)dicts keyed by camera name (cam0 … cam11).- Return type:
tuple[dict[str, CameraIntrinsics], dict[str, CameraExtrinsics], dict[str, float]]
- aquacal.datasets.synthetic.generate_board_trajectory(n_frames, camera_positions, water_zs, depth_range=(0.3, 0.6), xy_extent=0.15, rotation_range_deg=15.0, min_cameras_per_frame=2, seed=42)[source]¶
Generate board poses ensuring pose graph connectivity.
Creates a trajectory that ensures: - Each frame is visible by at least min_cameras_per_frame cameras - The pose graph is connected (can chain from reference to all cameras) - Board stays within reasonable depth range underwater
- Parameters:
n_frames (int) – Number of frames to generate
camera_positions (dict[str, ndarray[tuple[int, ...], dtype[float64]]]) – Dict of camera center positions (from extrinsics)
water_zs (dict[str, float]) – Per-camera interface distances
depth_range (tuple[float, float]) – (min_z, max_z) for board center in world coords
xy_extent (float) – Maximum XY offset from origin
rotation_range_deg (float) – Maximum board tilt from horizontal
min_cameras_per_frame (int) – Minimum cameras that must see board
seed (int) – Random seed
- Returns:
List of BoardPose objects with frame indices 0 to n_frames-1
- Return type:
- aquacal.datasets.synthetic.generate_real_rig_trajectory(n_frames=100, depth_range=(1.1, 2.0), seed=42)[source]¶
Generate board trajectory appropriate for the real rig geometry.
The real rig has cameras at Z ≈ 0 with water surface at Z ≈ 1.03 m, so the board should be below the water surface (default 1.1–2.0 m, i.e. ~70–970 mm below the surface).
Trajectory covers the full field of view:
Positions sweep across the ~1.3 × 1.2 m footprint of the camera array
Ensures connectivity by visiting regions seen by multiple cameras
- aquacal.datasets.synthetic.generate_dense_xy_grid(depth, n_grid=7, xy_extent=0.5, xy_center=(0.0, 0.0), tilt_deg=3.0, frame_offset=0, seed=42)[source]¶
Generate board poses at a regular XY grid at a fixed depth.
Used for dense spatial coverage in reconstruction evaluation and heatmaps. Each grid position has a small random tilt and random in-plane rotation.
- Parameters:
depth (float) – Z coordinate for all board poses (meters)
n_grid (int) – Number of grid positions per axis (total poses = n_grid^2)
xy_extent (float) – Grid spans from -xy_extent to +xy_extent around xy_center in X and Y (meters)
xy_center (tuple[float, float]) – (x, y) center of the grid (meters). Should match the centroid of the camera array for best coverage.
tilt_deg (float) – Maximum random tilt from horizontal (degrees)
frame_offset (int) – Starting frame index (default 0)
seed (int) – Random seed for reproducible tilts and rotations
- Returns:
List of n_grid^2 BoardPose objects with frame indices starting from frame_offset.
- Return type:
- aquacal.datasets.synthetic.generate_synthetic_detections(intrinsics, extrinsics, water_zs, board, board_poses, noise_std=0.0, min_corners=8, seed=42)[source]¶
Generate synthetic detections by projecting through refractive interface.
For each board pose and camera: 1. Transform board corners to world coordinates 2. Project each corner through refractive interface 3. Add Gaussian noise to pixel coordinates 4. Filter corners outside image bounds 5. Only include camera if >= min_corners visible
- Parameters:
intrinsics (dict[str, CameraIntrinsics]) – Per-camera intrinsics
extrinsics (dict[str, CameraExtrinsics]) – Per-camera extrinsics
water_zs (dict[str, float]) – Per-camera interface distances
board (BoardGeometry) – Board geometry
noise_std (float) – Gaussian noise standard deviation (pixels)
min_corners (int) – Minimum corners for valid detection
seed (int) – Random seed for noise
- Returns:
DetectionResult matching format from real detection pipeline
- Return type:
- aquacal.datasets.synthetic.compute_calibration_errors(result, ground_truth)[source]¶
Compare calibration result to ground truth.
Computes: - focal_length_error_percent: Max relative error in fx, fy - principal_point_error_px: Max error in cx, cy - rotation_error_deg: Max rotation error across cameras - translation_error_mm: Max translation error across cameras - water_z_error_mm: Max interface distance error
- Parameters:
result (CalibrationResult) – Calibration result from pipeline
ground_truth (SyntheticScenario) – Synthetic scenario with known truth
- Returns:
Dict of error metrics
- Return type:
- aquacal.datasets.synthetic.create_scenario(name, seed=42)[source]¶
Create a predefined test scenario with complete ground truth.
Available scenarios:
'ideal': 4 cameras, 20 frames, 0 noise — verify math correctness'minimal': 2 cameras, 10 frames, 0.3 px noise — edge case'realistic': 12 cameras matching actual hardware, 30 frames, 0.5 px noise
All presets use the same ChArUco board (12x9 squares, 60 mm square size, 45 mm marker size, DICT_5X5_100).
- Parameters:
- Returns:
SyntheticScenario with complete ground truth (intrinsics, extrinsics, interface distances, board poses).
- Raises:
ValueError – If scenario name is not recognized.
- Return type:
Examples
>>> from aquacal.datasets import create_scenario >>> scenario = create_scenario('ideal') >>> print(f"{len(scenario.intrinsics)} cameras, {len(scenario.board_poses)} frames") 4 cameras, 20 frames >>> >>> scenario = create_scenario('realistic') >>> print(f"{len(scenario.intrinsics)} cameras") 12 cameras