torch_em.data.datasets.electron_microscopy.hydra_vulgaris
Hydra vulgaris endodermal nerve net dataset for neuron instance segmentation in FIB-SEM.
The dataset contains a single FIB-SEM volume of the endodermal nerve net of Hydra vulgaris, with 20 completely reconstructed neurons. The EM image is at 4 x 4 x 30 nm native resolution; the neuron segmentation at 8 x 8 x 30 nm.
Data is streamed from the BossDB public S3 bucket via cloud-volume and cached locally as zarr v3 stores (chunk 64^3, shard 512^3, zstd compression) in (z, y, x) axis order.
This dataset is from the publication https://doi.org/10.1016/j.cub.2025.10.001. Please cite it if you use this dataset in your research.
The dataset is publicly available at https://bossdb.org/project/zhang2025 (DOI 10.60533/BOSS-2025-08G4). Requires cloud-volume: pip install cloud-volume.
1"""Hydra vulgaris endodermal nerve net dataset for neuron instance segmentation in FIB-SEM. 2 3The dataset contains a single FIB-SEM volume of the endodermal nerve net of Hydra vulgaris, 4with 20 completely reconstructed neurons. The EM image is at 54 x 4 x 30 nm native resolution; the neuron segmentation at 8 x 8 x 30 nm. 6 7Data is streamed from the BossDB public S3 bucket via cloud-volume and cached locally as 8zarr v3 stores (chunk 64^3, shard 512^3, zstd compression) in (z, y, x) axis order. 9 10This dataset is from the publication https://doi.org/10.1016/j.cub.2025.10.001. 11Please cite it if you use this dataset in your research. 12 13The dataset is publicly available at https://bossdb.org/project/zhang2025 (DOI 10.60533/BOSS-2025-08G4). 14Requires cloud-volume: pip install cloud-volume. 15""" 16 17import hashlib 18import os 19from concurrent.futures import ThreadPoolExecutor, as_completed 20from typing import List, Optional, Sequence, Tuple, Union 21 22import numpy as np 23from tqdm import tqdm 24from torch.utils.data import Dataset, DataLoader 25 26import torch_em 27 28from .. import util 29 30 31HYDRA_EM_URL = "precomputed://https://bossdb-open-data.s3.amazonaws.com/zhang2025/image" 32HYDRA_SEG_URL = "precomputed://https://bossdb-open-data.s3.amazonaws.com/zhang2025/neurons" 33 34# Pre-defined bounding boxes (nm): (x_min, x_max, y_min, y_max, z_min, z_max). 35# Each box is 32768 x 32768 x 18000 nm, placed in the regions with the densest 36# neuron annotations (verified by scanning the full volume). 37# At default resolution (image_mip=3, seg_mip=2, both 32 x 32 x 30 nm) each box 38# is 1024 x 1024 x 600 voxels (~630 MB image, ~2.5 GB neurons). 39HYDRA_BOUNDING_BOXES = [ 40 (131072, 163840, 360448, 393216, 18000, 36000), 41 (327680, 360448, 163840, 196608, 18000, 36000), 42 (163840, 196608, 294912, 327680, 18000, 36000), 43 (196608, 229376, 262144, 294912, 18000, 36000), 44] 45 46HYDRA_CHUNK_SHAPE = (64, 128, 128) 47HYDRA_SHARD_SHAPE = (128, 512, 512) 48 49 50def _hydra_bbox_to_str(bbox): 51 return hashlib.md5("_".join(str(v) for v in bbox).encode()).hexdigest()[:12] 52 53 54def _hydra_create_array(root, name, shape, dtype, is_label): 55 from zarr.codecs import BloscCodec 56 shuffle = "bitshuffle" if (np.issubdtype(dtype, np.integer) and is_label) else "shuffle" 57 return root.create_array( 58 name, 59 shape=shape, 60 chunks=HYDRA_CHUNK_SHAPE, 61 shards=HYDRA_SHARD_SHAPE, 62 dtype=dtype, 63 compressors=BloscCodec(cname="zstd", clevel=6, shuffle=shuffle), 64 ) 65 66 67def _hydra_bbox_voxels(cv, x_min_nm, x_max_nm, y_min_nm, y_max_nm, z_min_nm, z_max_nm): 68 scale = np.array(cv.resolution) 69 x0 = int(np.floor(x_min_nm / scale[0])) 70 x1 = int(np.ceil(x_max_nm / scale[0])) 71 y0 = int(np.floor(y_min_nm / scale[1])) 72 y1 = int(np.ceil(y_max_nm / scale[1])) 73 z0 = int(np.floor(z_min_nm / scale[2])) 74 z1 = int(np.ceil(z_max_nm / scale[2])) 75 return x0, x1, y0, y1, z0, z1, (z1 - z0, y1 - y0, x1 - x0) 76 77 78def _hydra_download_to_zarr(cv, ds, x0g, y0g, z0g, name): 79 shape = ds.shape # (z, y, x) 80 sz, sy, sx = HYDRA_SHARD_SHAPE 81 82 tasks = [] 83 for z0_ in range(0, shape[0], sz): 84 for y0_ in range(0, shape[1], sy): 85 for x0_ in range(0, shape[2], sx): 86 z1_ = min(z0_ + sz, shape[0]) 87 y1_ = min(y0_ + sy, shape[1]) 88 x1_ = min(x0_ + sx, shape[2]) 89 tasks.append(( 90 (z0_, z1_), (y0_, y1_), (x0_, x1_), 91 (x0g + x0_, x0g + x1_, y0g + y0_, y0g + y1_, z0g + z0_, z0g + z1_), 92 )) 93 94 target_dtype = np.dtype(ds.dtype) 95 96 def worker(item): 97 (z0_, z1_), (y0_, y1_), (x0_, x1_), (gx0, gx1, gy0, gy1, gz0, gz1) = item 98 block = np.asarray(cv[gx0:gx1, gy0:gy1, gz0:gz1]) 99 if block.ndim == 4: 100 block = block[..., 0] 101 ds[z0_:z1_, y0_:y1_, x0_:x1_] = block.transpose(2, 1, 0).astype(target_dtype) 102 103 with ThreadPoolExecutor(max_workers=8) as ex: 104 futures = [ex.submit(worker, t) for t in tasks] 105 for fut in tqdm(as_completed(futures), total=len(futures), desc=f"Downloading '{name}'", smoothing=0.05): 106 fut.result() 107 108 109def get_hydra_data( 110 path: Union[os.PathLike, str], 111 bounding_box: Tuple[float, ...], 112 image_mip: int = 3, 113 seg_mip: int = 2, 114 download: bool = False, 115) -> str: 116 """Stream and cache one Hydra bounding box as a zarr v3 store. 117 118 The zarr store contains: 119 - raw: EM grayscale (uint8, z/y/x) 120 - labels: neuron instance segmentation (uint32, z/y/x) 121 122 Args: 123 path: Filepath to a folder where the cached zarr store will be saved. 124 bounding_box: Region in nm as (x_min, x_max, y_min, y_max, z_min, z_max). 125 image_mip: MIP level for the EM image. Default mip=3 gives 32 x 32 x 30 nm resolution. 126 seg_mip: MIP level for the neuron segmentation. Default mip=2 gives 32 x 32 x 30 nm resolution. 127 download: Whether to stream and cache the data if not present. 128 129 Returns: 130 Filepath to the cached zarr store. 131 """ 132 import zarr 133 134 os.makedirs(str(path), exist_ok=True) 135 stem = _hydra_bbox_to_str(bounding_box) 136 zarr_path = os.path.join(str(path), f"{stem}.zarr") 137 138 def _complete(zp): 139 return os.path.isdir(os.path.join(zp, "raw")) and os.path.isdir(os.path.join(zp, "labels")) 140 141 if _complete(zarr_path): 142 return zarr_path 143 if not download: 144 raise RuntimeError( 145 f"No cached data at '{zarr_path}'. Set download=True to stream from BossDB." 146 ) 147 148 try: 149 from cloudvolume import CloudVolume 150 except ImportError: 151 raise ImportError( 152 "The 'cloud-volume' package is required: pip install cloud-volume" 153 ) 154 155 x_min_nm, x_max_nm, y_min_nm, y_max_nm, z_min_nm, z_max_nm = bounding_box 156 print(f"Streaming Hydra bbox {bounding_box} at image_mip={image_mip}, seg_mip={seg_mip} ...") 157 158 em_cv = CloudVolume(HYDRA_EM_URL, use_https=True, mip=image_mip, progress=False, fill_missing=True) 159 seg_cv = CloudVolume(HYDRA_SEG_URL, use_https=True, mip=seg_mip, progress=False, fill_missing=True) 160 161 ex0, ex1, ey0, ey1, ez0, ez1, em_shape = _hydra_bbox_voxels( 162 em_cv, x_min_nm, x_max_nm, y_min_nm, y_max_nm, z_min_nm, z_max_nm 163 ) 164 sx0, sx1, sy0, sy1, sz0, sz1, seg_shape = _hydra_bbox_voxels( 165 seg_cv, x_min_nm, x_max_nm, y_min_nm, y_max_nm, z_min_nm, z_max_nm 166 ) 167 168 shape = tuple(min(e, s) for e, s in zip(em_shape, seg_shape)) 169 170 root = zarr.open_group(zarr_path, mode="a") 171 root.attrs["bounding_box_nm"] = list(bounding_box) 172 root.attrs["image_mip"] = image_mip 173 root.attrs["seg_mip"] = seg_mip 174 175 if "raw" not in root: 176 ds_raw = _hydra_create_array(root, "raw", shape, np.dtype("uint8"), is_label=False) 177 _hydra_download_to_zarr(em_cv, ds_raw, ex0, ey0, ez0, name="raw") 178 179 if "labels" not in root: 180 ds_lbl = _hydra_create_array(root, "labels", shape, np.dtype("uint32"), is_label=True) 181 _hydra_download_to_zarr(seg_cv, ds_lbl, sx0, sy0, sz0, name="labels") 182 183 print(f"Cached to {zarr_path} (shape {shape})") 184 return zarr_path 185 186 187def get_hydra_paths( 188 path: Union[os.PathLike, str], 189 bounding_boxes: Optional[Sequence[Tuple[float, ...]]] = None, 190 image_mip: int = 3, 191 seg_mip: int = 2, 192 download: bool = False, 193) -> List[str]: 194 """Get paths to cached Hydra zarr stores. 195 196 Args: 197 path: Filepath to a folder where the cached zarr stores will be saved. 198 bounding_boxes: Bounding boxes in nm (x_min, x_max, y_min, y_max, z_min, z_max). 199 Defaults to the four pre-defined boxes covering the densest annotated region. 200 image_mip: MIP level for the EM image. 201 seg_mip: MIP level for the neuron segmentation. 202 download: Whether to stream and cache the data if not present. 203 204 Returns: 205 Filepaths to the cached zarr stores. 206 """ 207 boxes = list(bounding_boxes) if bounding_boxes is not None else HYDRA_BOUNDING_BOXES 208 return [get_hydra_data(path, bb, image_mip, seg_mip, download) for bb in boxes] 209 210 211def get_hydra_dataset( 212 path: Union[os.PathLike, str], 213 patch_shape: Tuple[int, int, int], 214 bounding_boxes: Optional[Sequence[Tuple[float, ...]]] = None, 215 image_mip: int = 3, 216 seg_mip: int = 2, 217 download: bool = False, 218 offsets: Optional[List[List[int]]] = None, 219 boundaries: bool = False, 220 **kwargs, 221) -> Dataset: 222 """Get the Hydra dataset for neuron instance segmentation in FIB-SEM. 223 224 Args: 225 path: Filepath to a folder where the cached zarr stores will be saved. 226 patch_shape: The patch shape (z, y, x) to use for training. 227 bounding_boxes: Bounding boxes in nm (x_min, x_max, y_min, y_max, z_min, z_max). 228 Defaults to the four pre-defined boxes covering the densest annotated region. 229 image_mip: MIP level for the EM image. Default mip=3 gives 32 x 32 x 30 nm. 230 seg_mip: MIP level for the neuron segmentation. Default mip=2 gives 32 x 32 x 30 nm. 231 download: Whether to stream and cache data if not already present. 232 offsets: Offset values for affinity computation used as target. 233 boundaries: Whether to compute boundaries as the target. 234 kwargs: Additional keyword arguments for `torch_em.default_segmentation_dataset`. 235 236 Returns: 237 The segmentation dataset. 238 """ 239 assert len(patch_shape) == 3 240 paths = get_hydra_paths(path, bounding_boxes, image_mip, seg_mip, download) 241 242 kwargs = util.update_kwargs(kwargs, "is_seg_dataset", True) 243 kwargs, _ = util.add_instance_label_transform( 244 kwargs, add_binary_target=False, boundaries=boundaries, offsets=offsets 245 ) 246 247 return torch_em.default_segmentation_dataset( 248 raw_paths=paths, 249 raw_key="raw", 250 label_paths=paths, 251 label_key="labels", 252 patch_shape=patch_shape, 253 **kwargs, 254 ) 255 256 257def get_hydra_loader( 258 path: Union[os.PathLike, str], 259 batch_size: int, 260 patch_shape: Tuple[int, int, int], 261 bounding_boxes: Optional[Sequence[Tuple[float, ...]]] = None, 262 image_mip: int = 3, 263 seg_mip: int = 2, 264 download: bool = False, 265 offsets: Optional[List[List[int]]] = None, 266 boundaries: bool = False, 267 **kwargs, 268) -> DataLoader: 269 """Get the DataLoader for neuron instance segmentation in the Hydra vulgaris FIB-SEM dataset. 270 271 Args: 272 path: Filepath to a folder where the cached zarr stores will be saved. 273 batch_size: The batch size for training. 274 patch_shape: The patch shape (z, y, x) to use for training. 275 bounding_boxes: Bounding boxes in nm (x_min, x_max, y_min, y_max, z_min, z_max). 276 Defaults to the four pre-defined boxes covering the densest annotated region. 277 image_mip: MIP level for the EM image. Default mip=3 gives 32 x 32 x 30 nm. 278 seg_mip: MIP level for the neuron segmentation. Default mip=2 gives 32 x 32 x 30 nm. 279 download: Whether to stream and cache data if not already present. 280 offsets: Offset values for affinity computation used as target. 281 boundaries: Whether to compute boundaries as the target. 282 kwargs: Additional keyword arguments for `torch_em.default_segmentation_dataset` or for the PyTorch DataLoader. 283 284 Returns: 285 The DataLoader. 286 """ 287 ds_kwargs, loader_kwargs = util.split_kwargs(torch_em.default_segmentation_dataset, **kwargs) 288 ds = get_hydra_dataset( 289 path=path, 290 patch_shape=patch_shape, 291 bounding_boxes=bounding_boxes, 292 image_mip=image_mip, 293 seg_mip=seg_mip, 294 download=download, 295 offsets=offsets, 296 boundaries=boundaries, 297 **ds_kwargs, 298 ) 299 return torch_em.get_data_loader(ds, batch_size=batch_size, **loader_kwargs)
110def get_hydra_data( 111 path: Union[os.PathLike, str], 112 bounding_box: Tuple[float, ...], 113 image_mip: int = 3, 114 seg_mip: int = 2, 115 download: bool = False, 116) -> str: 117 """Stream and cache one Hydra bounding box as a zarr v3 store. 118 119 The zarr store contains: 120 - raw: EM grayscale (uint8, z/y/x) 121 - labels: neuron instance segmentation (uint32, z/y/x) 122 123 Args: 124 path: Filepath to a folder where the cached zarr store will be saved. 125 bounding_box: Region in nm as (x_min, x_max, y_min, y_max, z_min, z_max). 126 image_mip: MIP level for the EM image. Default mip=3 gives 32 x 32 x 30 nm resolution. 127 seg_mip: MIP level for the neuron segmentation. Default mip=2 gives 32 x 32 x 30 nm resolution. 128 download: Whether to stream and cache the data if not present. 129 130 Returns: 131 Filepath to the cached zarr store. 132 """ 133 import zarr 134 135 os.makedirs(str(path), exist_ok=True) 136 stem = _hydra_bbox_to_str(bounding_box) 137 zarr_path = os.path.join(str(path), f"{stem}.zarr") 138 139 def _complete(zp): 140 return os.path.isdir(os.path.join(zp, "raw")) and os.path.isdir(os.path.join(zp, "labels")) 141 142 if _complete(zarr_path): 143 return zarr_path 144 if not download: 145 raise RuntimeError( 146 f"No cached data at '{zarr_path}'. Set download=True to stream from BossDB." 147 ) 148 149 try: 150 from cloudvolume import CloudVolume 151 except ImportError: 152 raise ImportError( 153 "The 'cloud-volume' package is required: pip install cloud-volume" 154 ) 155 156 x_min_nm, x_max_nm, y_min_nm, y_max_nm, z_min_nm, z_max_nm = bounding_box 157 print(f"Streaming Hydra bbox {bounding_box} at image_mip={image_mip}, seg_mip={seg_mip} ...") 158 159 em_cv = CloudVolume(HYDRA_EM_URL, use_https=True, mip=image_mip, progress=False, fill_missing=True) 160 seg_cv = CloudVolume(HYDRA_SEG_URL, use_https=True, mip=seg_mip, progress=False, fill_missing=True) 161 162 ex0, ex1, ey0, ey1, ez0, ez1, em_shape = _hydra_bbox_voxels( 163 em_cv, x_min_nm, x_max_nm, y_min_nm, y_max_nm, z_min_nm, z_max_nm 164 ) 165 sx0, sx1, sy0, sy1, sz0, sz1, seg_shape = _hydra_bbox_voxels( 166 seg_cv, x_min_nm, x_max_nm, y_min_nm, y_max_nm, z_min_nm, z_max_nm 167 ) 168 169 shape = tuple(min(e, s) for e, s in zip(em_shape, seg_shape)) 170 171 root = zarr.open_group(zarr_path, mode="a") 172 root.attrs["bounding_box_nm"] = list(bounding_box) 173 root.attrs["image_mip"] = image_mip 174 root.attrs["seg_mip"] = seg_mip 175 176 if "raw" not in root: 177 ds_raw = _hydra_create_array(root, "raw", shape, np.dtype("uint8"), is_label=False) 178 _hydra_download_to_zarr(em_cv, ds_raw, ex0, ey0, ez0, name="raw") 179 180 if "labels" not in root: 181 ds_lbl = _hydra_create_array(root, "labels", shape, np.dtype("uint32"), is_label=True) 182 _hydra_download_to_zarr(seg_cv, ds_lbl, sx0, sy0, sz0, name="labels") 183 184 print(f"Cached to {zarr_path} (shape {shape})") 185 return zarr_path
Stream and cache one Hydra bounding box as a zarr v3 store.
The zarr store contains:
- raw: EM grayscale (uint8, z/y/x)
- labels: neuron instance segmentation (uint32, z/y/x)
Arguments:
- path: Filepath to a folder where the cached zarr store will be saved.
- bounding_box: Region in nm as (x_min, x_max, y_min, y_max, z_min, z_max).
- image_mip: MIP level for the EM image. Default mip=3 gives 32 x 32 x 30 nm resolution.
- seg_mip: MIP level for the neuron segmentation. Default mip=2 gives 32 x 32 x 30 nm resolution.
- download: Whether to stream and cache the data if not present.
Returns:
Filepath to the cached zarr store.
188def get_hydra_paths( 189 path: Union[os.PathLike, str], 190 bounding_boxes: Optional[Sequence[Tuple[float, ...]]] = None, 191 image_mip: int = 3, 192 seg_mip: int = 2, 193 download: bool = False, 194) -> List[str]: 195 """Get paths to cached Hydra zarr stores. 196 197 Args: 198 path: Filepath to a folder where the cached zarr stores will be saved. 199 bounding_boxes: Bounding boxes in nm (x_min, x_max, y_min, y_max, z_min, z_max). 200 Defaults to the four pre-defined boxes covering the densest annotated region. 201 image_mip: MIP level for the EM image. 202 seg_mip: MIP level for the neuron segmentation. 203 download: Whether to stream and cache the data if not present. 204 205 Returns: 206 Filepaths to the cached zarr stores. 207 """ 208 boxes = list(bounding_boxes) if bounding_boxes is not None else HYDRA_BOUNDING_BOXES 209 return [get_hydra_data(path, bb, image_mip, seg_mip, download) for bb in boxes]
Get paths to cached Hydra zarr stores.
Arguments:
- path: Filepath to a folder where the cached zarr stores will be saved.
- bounding_boxes: Bounding boxes in nm (x_min, x_max, y_min, y_max, z_min, z_max). Defaults to the four pre-defined boxes covering the densest annotated region.
- image_mip: MIP level for the EM image.
- seg_mip: MIP level for the neuron segmentation.
- download: Whether to stream and cache the data if not present.
Returns:
Filepaths to the cached zarr stores.
212def get_hydra_dataset( 213 path: Union[os.PathLike, str], 214 patch_shape: Tuple[int, int, int], 215 bounding_boxes: Optional[Sequence[Tuple[float, ...]]] = None, 216 image_mip: int = 3, 217 seg_mip: int = 2, 218 download: bool = False, 219 offsets: Optional[List[List[int]]] = None, 220 boundaries: bool = False, 221 **kwargs, 222) -> Dataset: 223 """Get the Hydra dataset for neuron instance segmentation in FIB-SEM. 224 225 Args: 226 path: Filepath to a folder where the cached zarr stores will be saved. 227 patch_shape: The patch shape (z, y, x) to use for training. 228 bounding_boxes: Bounding boxes in nm (x_min, x_max, y_min, y_max, z_min, z_max). 229 Defaults to the four pre-defined boxes covering the densest annotated region. 230 image_mip: MIP level for the EM image. Default mip=3 gives 32 x 32 x 30 nm. 231 seg_mip: MIP level for the neuron segmentation. Default mip=2 gives 32 x 32 x 30 nm. 232 download: Whether to stream and cache data if not already present. 233 offsets: Offset values for affinity computation used as target. 234 boundaries: Whether to compute boundaries as the target. 235 kwargs: Additional keyword arguments for `torch_em.default_segmentation_dataset`. 236 237 Returns: 238 The segmentation dataset. 239 """ 240 assert len(patch_shape) == 3 241 paths = get_hydra_paths(path, bounding_boxes, image_mip, seg_mip, download) 242 243 kwargs = util.update_kwargs(kwargs, "is_seg_dataset", True) 244 kwargs, _ = util.add_instance_label_transform( 245 kwargs, add_binary_target=False, boundaries=boundaries, offsets=offsets 246 ) 247 248 return torch_em.default_segmentation_dataset( 249 raw_paths=paths, 250 raw_key="raw", 251 label_paths=paths, 252 label_key="labels", 253 patch_shape=patch_shape, 254 **kwargs, 255 )
Get the Hydra dataset for neuron instance segmentation in FIB-SEM.
Arguments:
- path: Filepath to a folder where the cached zarr stores will be saved.
- patch_shape: The patch shape (z, y, x) to use for training.
- bounding_boxes: Bounding boxes in nm (x_min, x_max, y_min, y_max, z_min, z_max). Defaults to the four pre-defined boxes covering the densest annotated region.
- image_mip: MIP level for the EM image. Default mip=3 gives 32 x 32 x 30 nm.
- seg_mip: MIP level for the neuron segmentation. Default mip=2 gives 32 x 32 x 30 nm.
- download: Whether to stream and cache data if not already present.
- offsets: Offset values for affinity computation used as target.
- boundaries: Whether to compute boundaries as the target.
- kwargs: Additional keyword arguments for
torch_em.default_segmentation_dataset.
Returns:
The segmentation dataset.
258def get_hydra_loader( 259 path: Union[os.PathLike, str], 260 batch_size: int, 261 patch_shape: Tuple[int, int, int], 262 bounding_boxes: Optional[Sequence[Tuple[float, ...]]] = None, 263 image_mip: int = 3, 264 seg_mip: int = 2, 265 download: bool = False, 266 offsets: Optional[List[List[int]]] = None, 267 boundaries: bool = False, 268 **kwargs, 269) -> DataLoader: 270 """Get the DataLoader for neuron instance segmentation in the Hydra vulgaris FIB-SEM dataset. 271 272 Args: 273 path: Filepath to a folder where the cached zarr stores will be saved. 274 batch_size: The batch size for training. 275 patch_shape: The patch shape (z, y, x) to use for training. 276 bounding_boxes: Bounding boxes in nm (x_min, x_max, y_min, y_max, z_min, z_max). 277 Defaults to the four pre-defined boxes covering the densest annotated region. 278 image_mip: MIP level for the EM image. Default mip=3 gives 32 x 32 x 30 nm. 279 seg_mip: MIP level for the neuron segmentation. Default mip=2 gives 32 x 32 x 30 nm. 280 download: Whether to stream and cache data if not already present. 281 offsets: Offset values for affinity computation used as target. 282 boundaries: Whether to compute boundaries as the target. 283 kwargs: Additional keyword arguments for `torch_em.default_segmentation_dataset` or for the PyTorch DataLoader. 284 285 Returns: 286 The DataLoader. 287 """ 288 ds_kwargs, loader_kwargs = util.split_kwargs(torch_em.default_segmentation_dataset, **kwargs) 289 ds = get_hydra_dataset( 290 path=path, 291 patch_shape=patch_shape, 292 bounding_boxes=bounding_boxes, 293 image_mip=image_mip, 294 seg_mip=seg_mip, 295 download=download, 296 offsets=offsets, 297 boundaries=boundaries, 298 **ds_kwargs, 299 ) 300 return torch_em.get_data_loader(ds, batch_size=batch_size, **loader_kwargs)
Get the DataLoader for neuron instance segmentation in the Hydra vulgaris FIB-SEM dataset.
Arguments:
- path: Filepath to a folder where the cached zarr stores will be saved.
- batch_size: The batch size for training.
- patch_shape: The patch shape (z, y, x) to use for training.
- bounding_boxes: Bounding boxes in nm (x_min, x_max, y_min, y_max, z_min, z_max). Defaults to the four pre-defined boxes covering the densest annotated region.
- image_mip: MIP level for the EM image. Default mip=3 gives 32 x 32 x 30 nm.
- seg_mip: MIP level for the neuron segmentation. Default mip=2 gives 32 x 32 x 30 nm.
- download: Whether to stream and cache data if not already present.
- offsets: Offset values for affinity computation used as target.
- boundaries: Whether to compute boundaries as the target.
- kwargs: Additional keyword arguments for
torch_em.default_segmentation_datasetor for the PyTorch DataLoader.
Returns:
The DataLoader.