torch_em.data.datasets.electron_microscopy.hydra_vulgaris

Hydra vulgaris endodermal nerve net dataset for neuron instance segmentation in FIB-SEM.

The dataset contains a single FIB-SEM volume of the endodermal nerve net of Hydra vulgaris, with 20 completely reconstructed neurons. The EM image is at 4 x 4 x 30 nm native resolution; the neuron segmentation at 8 x 8 x 30 nm.

Data is streamed from the BossDB public S3 bucket via cloud-volume and cached locally as zarr v3 stores (chunk 64^3, shard 512^3, zstd compression) in (z, y, x) axis order.

This dataset is from the publication https://doi.org/10.1016/j.cub.2025.10.001. Please cite it if you use this dataset in your research.

The dataset is publicly available at https://bossdb.org/project/zhang2025 (DOI 10.60533/BOSS-2025-08G4). Requires cloud-volume: pip install cloud-volume.

  1"""Hydra vulgaris endodermal nerve net dataset for neuron instance segmentation in FIB-SEM.
  2
  3The dataset contains a single FIB-SEM volume of the endodermal nerve net of Hydra vulgaris,
  4with 20 completely reconstructed neurons. The EM image is at
  54 x 4 x 30 nm native resolution; the neuron segmentation at 8 x 8 x 30 nm.
  6
  7Data is streamed from the BossDB public S3 bucket via cloud-volume and cached locally as
  8zarr v3 stores (chunk 64^3, shard 512^3, zstd compression) in (z, y, x) axis order.
  9
 10This dataset is from the publication https://doi.org/10.1016/j.cub.2025.10.001.
 11Please cite it if you use this dataset in your research.
 12
 13The dataset is publicly available at https://bossdb.org/project/zhang2025 (DOI 10.60533/BOSS-2025-08G4).
 14Requires cloud-volume: pip install cloud-volume.
 15"""
 16
 17import hashlib
 18import os
 19from concurrent.futures import ThreadPoolExecutor, as_completed
 20from typing import List, Optional, Sequence, Tuple, Union
 21
 22import numpy as np
 23from tqdm import tqdm
 24from torch.utils.data import Dataset, DataLoader
 25
 26import torch_em
 27
 28from .. import util
 29
 30
 31HYDRA_EM_URL = "precomputed://https://bossdb-open-data.s3.amazonaws.com/zhang2025/image"
 32HYDRA_SEG_URL = "precomputed://https://bossdb-open-data.s3.amazonaws.com/zhang2025/neurons"
 33
 34# Pre-defined bounding boxes (nm): (x_min, x_max, y_min, y_max, z_min, z_max).
 35# Each box is 32768 x 32768 x 18000 nm, placed in the regions with the densest
 36# neuron annotations (verified by scanning the full volume).
 37# At default resolution (image_mip=3, seg_mip=2, both 32 x 32 x 30 nm) each box
 38# is 1024 x 1024 x 600 voxels (~630 MB image, ~2.5 GB neurons).
 39HYDRA_BOUNDING_BOXES = [
 40    (131072, 163840, 360448, 393216, 18000, 36000),
 41    (327680, 360448, 163840, 196608, 18000, 36000),
 42    (163840, 196608, 294912, 327680, 18000, 36000),
 43    (196608, 229376, 262144, 294912, 18000, 36000),
 44]
 45
 46HYDRA_CHUNK_SHAPE = (64, 128, 128)
 47HYDRA_SHARD_SHAPE = (128, 512, 512)
 48
 49
 50def _hydra_bbox_to_str(bbox):
 51    return hashlib.md5("_".join(str(v) for v in bbox).encode()).hexdigest()[:12]
 52
 53
 54def _hydra_create_array(root, name, shape, dtype, is_label):
 55    from zarr.codecs import BloscCodec
 56    shuffle = "bitshuffle" if (np.issubdtype(dtype, np.integer) and is_label) else "shuffle"
 57    return root.create_array(
 58        name,
 59        shape=shape,
 60        chunks=HYDRA_CHUNK_SHAPE,
 61        shards=HYDRA_SHARD_SHAPE,
 62        dtype=dtype,
 63        compressors=BloscCodec(cname="zstd", clevel=6, shuffle=shuffle),
 64    )
 65
 66
 67def _hydra_bbox_voxels(cv, x_min_nm, x_max_nm, y_min_nm, y_max_nm, z_min_nm, z_max_nm):
 68    scale = np.array(cv.resolution)
 69    x0 = int(np.floor(x_min_nm / scale[0]))
 70    x1 = int(np.ceil(x_max_nm / scale[0]))
 71    y0 = int(np.floor(y_min_nm / scale[1]))
 72    y1 = int(np.ceil(y_max_nm / scale[1]))
 73    z0 = int(np.floor(z_min_nm / scale[2]))
 74    z1 = int(np.ceil(z_max_nm / scale[2]))
 75    return x0, x1, y0, y1, z0, z1, (z1 - z0, y1 - y0, x1 - x0)
 76
 77
 78def _hydra_download_to_zarr(cv, ds, x0g, y0g, z0g, name):
 79    shape = ds.shape  # (z, y, x)
 80    sz, sy, sx = HYDRA_SHARD_SHAPE
 81
 82    tasks = []
 83    for z0_ in range(0, shape[0], sz):
 84        for y0_ in range(0, shape[1], sy):
 85            for x0_ in range(0, shape[2], sx):
 86                z1_ = min(z0_ + sz, shape[0])
 87                y1_ = min(y0_ + sy, shape[1])
 88                x1_ = min(x0_ + sx, shape[2])
 89                tasks.append((
 90                    (z0_, z1_), (y0_, y1_), (x0_, x1_),
 91                    (x0g + x0_, x0g + x1_, y0g + y0_, y0g + y1_, z0g + z0_, z0g + z1_),
 92                ))
 93
 94    target_dtype = np.dtype(ds.dtype)
 95
 96    def worker(item):
 97        (z0_, z1_), (y0_, y1_), (x0_, x1_), (gx0, gx1, gy0, gy1, gz0, gz1) = item
 98        block = np.asarray(cv[gx0:gx1, gy0:gy1, gz0:gz1])
 99        if block.ndim == 4:
100            block = block[..., 0]
101        ds[z0_:z1_, y0_:y1_, x0_:x1_] = block.transpose(2, 1, 0).astype(target_dtype)
102
103    with ThreadPoolExecutor(max_workers=8) as ex:
104        futures = [ex.submit(worker, t) for t in tasks]
105        for fut in tqdm(as_completed(futures), total=len(futures), desc=f"Downloading '{name}'", smoothing=0.05):
106            fut.result()
107
108
109def get_hydra_data(
110    path: Union[os.PathLike, str],
111    bounding_box: Tuple[float, ...],
112    image_mip: int = 3,
113    seg_mip: int = 2,
114    download: bool = False,
115) -> str:
116    """Stream and cache one Hydra bounding box as a zarr v3 store.
117
118    The zarr store contains:
119      - raw: EM grayscale (uint8, z/y/x)
120      - labels: neuron instance segmentation (uint32, z/y/x)
121
122    Args:
123        path: Filepath to a folder where the cached zarr store will be saved.
124        bounding_box: Region in nm as (x_min, x_max, y_min, y_max, z_min, z_max).
125        image_mip: MIP level for the EM image. Default mip=3 gives 32 x 32 x 30 nm resolution.
126        seg_mip: MIP level for the neuron segmentation. Default mip=2 gives 32 x 32 x 30 nm resolution.
127        download: Whether to stream and cache the data if not present.
128
129    Returns:
130        Filepath to the cached zarr store.
131    """
132    import zarr
133
134    os.makedirs(str(path), exist_ok=True)
135    stem = _hydra_bbox_to_str(bounding_box)
136    zarr_path = os.path.join(str(path), f"{stem}.zarr")
137
138    def _complete(zp):
139        return os.path.isdir(os.path.join(zp, "raw")) and os.path.isdir(os.path.join(zp, "labels"))
140
141    if _complete(zarr_path):
142        return zarr_path
143    if not download:
144        raise RuntimeError(
145            f"No cached data at '{zarr_path}'. Set download=True to stream from BossDB."
146        )
147
148    try:
149        from cloudvolume import CloudVolume
150    except ImportError:
151        raise ImportError(
152            "The 'cloud-volume' package is required: pip install cloud-volume"
153        )
154
155    x_min_nm, x_max_nm, y_min_nm, y_max_nm, z_min_nm, z_max_nm = bounding_box
156    print(f"Streaming Hydra bbox {bounding_box} at image_mip={image_mip}, seg_mip={seg_mip} ...")
157
158    em_cv = CloudVolume(HYDRA_EM_URL, use_https=True, mip=image_mip, progress=False, fill_missing=True)
159    seg_cv = CloudVolume(HYDRA_SEG_URL, use_https=True, mip=seg_mip, progress=False, fill_missing=True)
160
161    ex0, ex1, ey0, ey1, ez0, ez1, em_shape = _hydra_bbox_voxels(
162        em_cv, x_min_nm, x_max_nm, y_min_nm, y_max_nm, z_min_nm, z_max_nm
163    )
164    sx0, sx1, sy0, sy1, sz0, sz1, seg_shape = _hydra_bbox_voxels(
165        seg_cv, x_min_nm, x_max_nm, y_min_nm, y_max_nm, z_min_nm, z_max_nm
166    )
167
168    shape = tuple(min(e, s) for e, s in zip(em_shape, seg_shape))
169
170    root = zarr.open_group(zarr_path, mode="a")
171    root.attrs["bounding_box_nm"] = list(bounding_box)
172    root.attrs["image_mip"] = image_mip
173    root.attrs["seg_mip"] = seg_mip
174
175    if "raw" not in root:
176        ds_raw = _hydra_create_array(root, "raw", shape, np.dtype("uint8"), is_label=False)
177        _hydra_download_to_zarr(em_cv, ds_raw, ex0, ey0, ez0, name="raw")
178
179    if "labels" not in root:
180        ds_lbl = _hydra_create_array(root, "labels", shape, np.dtype("uint32"), is_label=True)
181        _hydra_download_to_zarr(seg_cv, ds_lbl, sx0, sy0, sz0, name="labels")
182
183    print(f"Cached to {zarr_path} (shape {shape})")
184    return zarr_path
185
186
187def get_hydra_paths(
188    path: Union[os.PathLike, str],
189    bounding_boxes: Optional[Sequence[Tuple[float, ...]]] = None,
190    image_mip: int = 3,
191    seg_mip: int = 2,
192    download: bool = False,
193) -> List[str]:
194    """Get paths to cached Hydra zarr stores.
195
196    Args:
197        path: Filepath to a folder where the cached zarr stores will be saved.
198        bounding_boxes: Bounding boxes in nm (x_min, x_max, y_min, y_max, z_min, z_max).
199            Defaults to the four pre-defined boxes covering the densest annotated region.
200        image_mip: MIP level for the EM image.
201        seg_mip: MIP level for the neuron segmentation.
202        download: Whether to stream and cache the data if not present.
203
204    Returns:
205        Filepaths to the cached zarr stores.
206    """
207    boxes = list(bounding_boxes) if bounding_boxes is not None else HYDRA_BOUNDING_BOXES
208    return [get_hydra_data(path, bb, image_mip, seg_mip, download) for bb in boxes]
209
210
211def get_hydra_dataset(
212    path: Union[os.PathLike, str],
213    patch_shape: Tuple[int, int, int],
214    bounding_boxes: Optional[Sequence[Tuple[float, ...]]] = None,
215    image_mip: int = 3,
216    seg_mip: int = 2,
217    download: bool = False,
218    offsets: Optional[List[List[int]]] = None,
219    boundaries: bool = False,
220    **kwargs,
221) -> Dataset:
222    """Get the Hydra dataset for neuron instance segmentation in FIB-SEM.
223
224    Args:
225        path: Filepath to a folder where the cached zarr stores will be saved.
226        patch_shape: The patch shape (z, y, x) to use for training.
227        bounding_boxes: Bounding boxes in nm (x_min, x_max, y_min, y_max, z_min, z_max).
228            Defaults to the four pre-defined boxes covering the densest annotated region.
229        image_mip: MIP level for the EM image. Default mip=3 gives 32 x 32 x 30 nm.
230        seg_mip: MIP level for the neuron segmentation. Default mip=2 gives 32 x 32 x 30 nm.
231        download: Whether to stream and cache data if not already present.
232        offsets: Offset values for affinity computation used as target.
233        boundaries: Whether to compute boundaries as the target.
234        kwargs: Additional keyword arguments for `torch_em.default_segmentation_dataset`.
235
236    Returns:
237        The segmentation dataset.
238    """
239    assert len(patch_shape) == 3
240    paths = get_hydra_paths(path, bounding_boxes, image_mip, seg_mip, download)
241
242    kwargs = util.update_kwargs(kwargs, "is_seg_dataset", True)
243    kwargs, _ = util.add_instance_label_transform(
244        kwargs, add_binary_target=False, boundaries=boundaries, offsets=offsets
245    )
246
247    return torch_em.default_segmentation_dataset(
248        raw_paths=paths,
249        raw_key="raw",
250        label_paths=paths,
251        label_key="labels",
252        patch_shape=patch_shape,
253        **kwargs,
254    )
255
256
257def get_hydra_loader(
258    path: Union[os.PathLike, str],
259    batch_size: int,
260    patch_shape: Tuple[int, int, int],
261    bounding_boxes: Optional[Sequence[Tuple[float, ...]]] = None,
262    image_mip: int = 3,
263    seg_mip: int = 2,
264    download: bool = False,
265    offsets: Optional[List[List[int]]] = None,
266    boundaries: bool = False,
267    **kwargs,
268) -> DataLoader:
269    """Get the DataLoader for neuron instance segmentation in the Hydra vulgaris FIB-SEM dataset.
270
271    Args:
272        path: Filepath to a folder where the cached zarr stores will be saved.
273        batch_size: The batch size for training.
274        patch_shape: The patch shape (z, y, x) to use for training.
275        bounding_boxes: Bounding boxes in nm (x_min, x_max, y_min, y_max, z_min, z_max).
276            Defaults to the four pre-defined boxes covering the densest annotated region.
277        image_mip: MIP level for the EM image. Default mip=3 gives 32 x 32 x 30 nm.
278        seg_mip: MIP level for the neuron segmentation. Default mip=2 gives 32 x 32 x 30 nm.
279        download: Whether to stream and cache data if not already present.
280        offsets: Offset values for affinity computation used as target.
281        boundaries: Whether to compute boundaries as the target.
282        kwargs: Additional keyword arguments for `torch_em.default_segmentation_dataset` or for the PyTorch DataLoader.
283
284    Returns:
285        The DataLoader.
286    """
287    ds_kwargs, loader_kwargs = util.split_kwargs(torch_em.default_segmentation_dataset, **kwargs)
288    ds = get_hydra_dataset(
289        path=path,
290        patch_shape=patch_shape,
291        bounding_boxes=bounding_boxes,
292        image_mip=image_mip,
293        seg_mip=seg_mip,
294        download=download,
295        offsets=offsets,
296        boundaries=boundaries,
297        **ds_kwargs,
298    )
299    return torch_em.get_data_loader(ds, batch_size=batch_size, **loader_kwargs)
HYDRA_EM_URL = 'precomputed://https://bossdb-open-data.s3.amazonaws.com/zhang2025/image'
HYDRA_SEG_URL = 'precomputed://https://bossdb-open-data.s3.amazonaws.com/zhang2025/neurons'
HYDRA_BOUNDING_BOXES = [(131072, 163840, 360448, 393216, 18000, 36000), (327680, 360448, 163840, 196608, 18000, 36000), (163840, 196608, 294912, 327680, 18000, 36000), (196608, 229376, 262144, 294912, 18000, 36000)]
HYDRA_CHUNK_SHAPE = (64, 128, 128)
HYDRA_SHARD_SHAPE = (128, 512, 512)
def get_hydra_data( path: Union[os.PathLike, str], bounding_box: Tuple[float, ...], image_mip: int = 3, seg_mip: int = 2, download: bool = False) -> str:
110def get_hydra_data(
111    path: Union[os.PathLike, str],
112    bounding_box: Tuple[float, ...],
113    image_mip: int = 3,
114    seg_mip: int = 2,
115    download: bool = False,
116) -> str:
117    """Stream and cache one Hydra bounding box as a zarr v3 store.
118
119    The zarr store contains:
120      - raw: EM grayscale (uint8, z/y/x)
121      - labels: neuron instance segmentation (uint32, z/y/x)
122
123    Args:
124        path: Filepath to a folder where the cached zarr store will be saved.
125        bounding_box: Region in nm as (x_min, x_max, y_min, y_max, z_min, z_max).
126        image_mip: MIP level for the EM image. Default mip=3 gives 32 x 32 x 30 nm resolution.
127        seg_mip: MIP level for the neuron segmentation. Default mip=2 gives 32 x 32 x 30 nm resolution.
128        download: Whether to stream and cache the data if not present.
129
130    Returns:
131        Filepath to the cached zarr store.
132    """
133    import zarr
134
135    os.makedirs(str(path), exist_ok=True)
136    stem = _hydra_bbox_to_str(bounding_box)
137    zarr_path = os.path.join(str(path), f"{stem}.zarr")
138
139    def _complete(zp):
140        return os.path.isdir(os.path.join(zp, "raw")) and os.path.isdir(os.path.join(zp, "labels"))
141
142    if _complete(zarr_path):
143        return zarr_path
144    if not download:
145        raise RuntimeError(
146            f"No cached data at '{zarr_path}'. Set download=True to stream from BossDB."
147        )
148
149    try:
150        from cloudvolume import CloudVolume
151    except ImportError:
152        raise ImportError(
153            "The 'cloud-volume' package is required: pip install cloud-volume"
154        )
155
156    x_min_nm, x_max_nm, y_min_nm, y_max_nm, z_min_nm, z_max_nm = bounding_box
157    print(f"Streaming Hydra bbox {bounding_box} at image_mip={image_mip}, seg_mip={seg_mip} ...")
158
159    em_cv = CloudVolume(HYDRA_EM_URL, use_https=True, mip=image_mip, progress=False, fill_missing=True)
160    seg_cv = CloudVolume(HYDRA_SEG_URL, use_https=True, mip=seg_mip, progress=False, fill_missing=True)
161
162    ex0, ex1, ey0, ey1, ez0, ez1, em_shape = _hydra_bbox_voxels(
163        em_cv, x_min_nm, x_max_nm, y_min_nm, y_max_nm, z_min_nm, z_max_nm
164    )
165    sx0, sx1, sy0, sy1, sz0, sz1, seg_shape = _hydra_bbox_voxels(
166        seg_cv, x_min_nm, x_max_nm, y_min_nm, y_max_nm, z_min_nm, z_max_nm
167    )
168
169    shape = tuple(min(e, s) for e, s in zip(em_shape, seg_shape))
170
171    root = zarr.open_group(zarr_path, mode="a")
172    root.attrs["bounding_box_nm"] = list(bounding_box)
173    root.attrs["image_mip"] = image_mip
174    root.attrs["seg_mip"] = seg_mip
175
176    if "raw" not in root:
177        ds_raw = _hydra_create_array(root, "raw", shape, np.dtype("uint8"), is_label=False)
178        _hydra_download_to_zarr(em_cv, ds_raw, ex0, ey0, ez0, name="raw")
179
180    if "labels" not in root:
181        ds_lbl = _hydra_create_array(root, "labels", shape, np.dtype("uint32"), is_label=True)
182        _hydra_download_to_zarr(seg_cv, ds_lbl, sx0, sy0, sz0, name="labels")
183
184    print(f"Cached to {zarr_path} (shape {shape})")
185    return zarr_path

Stream and cache one Hydra bounding box as a zarr v3 store.

The zarr store contains:
  • raw: EM grayscale (uint8, z/y/x)
  • labels: neuron instance segmentation (uint32, z/y/x)
Arguments:
  • path: Filepath to a folder where the cached zarr store will be saved.
  • bounding_box: Region in nm as (x_min, x_max, y_min, y_max, z_min, z_max).
  • image_mip: MIP level for the EM image. Default mip=3 gives 32 x 32 x 30 nm resolution.
  • seg_mip: MIP level for the neuron segmentation. Default mip=2 gives 32 x 32 x 30 nm resolution.
  • download: Whether to stream and cache the data if not present.
Returns:

Filepath to the cached zarr store.

def get_hydra_paths( path: Union[os.PathLike, str], bounding_boxes: Optional[Sequence[Tuple[float, ...]]] = None, image_mip: int = 3, seg_mip: int = 2, download: bool = False) -> List[str]:
188def get_hydra_paths(
189    path: Union[os.PathLike, str],
190    bounding_boxes: Optional[Sequence[Tuple[float, ...]]] = None,
191    image_mip: int = 3,
192    seg_mip: int = 2,
193    download: bool = False,
194) -> List[str]:
195    """Get paths to cached Hydra zarr stores.
196
197    Args:
198        path: Filepath to a folder where the cached zarr stores will be saved.
199        bounding_boxes: Bounding boxes in nm (x_min, x_max, y_min, y_max, z_min, z_max).
200            Defaults to the four pre-defined boxes covering the densest annotated region.
201        image_mip: MIP level for the EM image.
202        seg_mip: MIP level for the neuron segmentation.
203        download: Whether to stream and cache the data if not present.
204
205    Returns:
206        Filepaths to the cached zarr stores.
207    """
208    boxes = list(bounding_boxes) if bounding_boxes is not None else HYDRA_BOUNDING_BOXES
209    return [get_hydra_data(path, bb, image_mip, seg_mip, download) for bb in boxes]

Get paths to cached Hydra zarr stores.

Arguments:
  • path: Filepath to a folder where the cached zarr stores will be saved.
  • bounding_boxes: Bounding boxes in nm (x_min, x_max, y_min, y_max, z_min, z_max). Defaults to the four pre-defined boxes covering the densest annotated region.
  • image_mip: MIP level for the EM image.
  • seg_mip: MIP level for the neuron segmentation.
  • download: Whether to stream and cache the data if not present.
Returns:

Filepaths to the cached zarr stores.

def get_hydra_dataset( path: Union[os.PathLike, str], patch_shape: Tuple[int, int, int], bounding_boxes: Optional[Sequence[Tuple[float, ...]]] = None, image_mip: int = 3, seg_mip: int = 2, download: bool = False, offsets: Optional[List[List[int]]] = None, boundaries: bool = False, **kwargs) -> torch.utils.data.dataset.Dataset:
212def get_hydra_dataset(
213    path: Union[os.PathLike, str],
214    patch_shape: Tuple[int, int, int],
215    bounding_boxes: Optional[Sequence[Tuple[float, ...]]] = None,
216    image_mip: int = 3,
217    seg_mip: int = 2,
218    download: bool = False,
219    offsets: Optional[List[List[int]]] = None,
220    boundaries: bool = False,
221    **kwargs,
222) -> Dataset:
223    """Get the Hydra dataset for neuron instance segmentation in FIB-SEM.
224
225    Args:
226        path: Filepath to a folder where the cached zarr stores will be saved.
227        patch_shape: The patch shape (z, y, x) to use for training.
228        bounding_boxes: Bounding boxes in nm (x_min, x_max, y_min, y_max, z_min, z_max).
229            Defaults to the four pre-defined boxes covering the densest annotated region.
230        image_mip: MIP level for the EM image. Default mip=3 gives 32 x 32 x 30 nm.
231        seg_mip: MIP level for the neuron segmentation. Default mip=2 gives 32 x 32 x 30 nm.
232        download: Whether to stream and cache data if not already present.
233        offsets: Offset values for affinity computation used as target.
234        boundaries: Whether to compute boundaries as the target.
235        kwargs: Additional keyword arguments for `torch_em.default_segmentation_dataset`.
236
237    Returns:
238        The segmentation dataset.
239    """
240    assert len(patch_shape) == 3
241    paths = get_hydra_paths(path, bounding_boxes, image_mip, seg_mip, download)
242
243    kwargs = util.update_kwargs(kwargs, "is_seg_dataset", True)
244    kwargs, _ = util.add_instance_label_transform(
245        kwargs, add_binary_target=False, boundaries=boundaries, offsets=offsets
246    )
247
248    return torch_em.default_segmentation_dataset(
249        raw_paths=paths,
250        raw_key="raw",
251        label_paths=paths,
252        label_key="labels",
253        patch_shape=patch_shape,
254        **kwargs,
255    )

Get the Hydra dataset for neuron instance segmentation in FIB-SEM.

Arguments:
  • path: Filepath to a folder where the cached zarr stores will be saved.
  • patch_shape: The patch shape (z, y, x) to use for training.
  • bounding_boxes: Bounding boxes in nm (x_min, x_max, y_min, y_max, z_min, z_max). Defaults to the four pre-defined boxes covering the densest annotated region.
  • image_mip: MIP level for the EM image. Default mip=3 gives 32 x 32 x 30 nm.
  • seg_mip: MIP level for the neuron segmentation. Default mip=2 gives 32 x 32 x 30 nm.
  • download: Whether to stream and cache data if not already present.
  • offsets: Offset values for affinity computation used as target.
  • boundaries: Whether to compute boundaries as the target.
  • kwargs: Additional keyword arguments for torch_em.default_segmentation_dataset.
Returns:

The segmentation dataset.

def get_hydra_loader( path: Union[os.PathLike, str], batch_size: int, patch_shape: Tuple[int, int, int], bounding_boxes: Optional[Sequence[Tuple[float, ...]]] = None, image_mip: int = 3, seg_mip: int = 2, download: bool = False, offsets: Optional[List[List[int]]] = None, boundaries: bool = False, **kwargs) -> torch.utils.data.dataloader.DataLoader:
258def get_hydra_loader(
259    path: Union[os.PathLike, str],
260    batch_size: int,
261    patch_shape: Tuple[int, int, int],
262    bounding_boxes: Optional[Sequence[Tuple[float, ...]]] = None,
263    image_mip: int = 3,
264    seg_mip: int = 2,
265    download: bool = False,
266    offsets: Optional[List[List[int]]] = None,
267    boundaries: bool = False,
268    **kwargs,
269) -> DataLoader:
270    """Get the DataLoader for neuron instance segmentation in the Hydra vulgaris FIB-SEM dataset.
271
272    Args:
273        path: Filepath to a folder where the cached zarr stores will be saved.
274        batch_size: The batch size for training.
275        patch_shape: The patch shape (z, y, x) to use for training.
276        bounding_boxes: Bounding boxes in nm (x_min, x_max, y_min, y_max, z_min, z_max).
277            Defaults to the four pre-defined boxes covering the densest annotated region.
278        image_mip: MIP level for the EM image. Default mip=3 gives 32 x 32 x 30 nm.
279        seg_mip: MIP level for the neuron segmentation. Default mip=2 gives 32 x 32 x 30 nm.
280        download: Whether to stream and cache data if not already present.
281        offsets: Offset values for affinity computation used as target.
282        boundaries: Whether to compute boundaries as the target.
283        kwargs: Additional keyword arguments for `torch_em.default_segmentation_dataset` or for the PyTorch DataLoader.
284
285    Returns:
286        The DataLoader.
287    """
288    ds_kwargs, loader_kwargs = util.split_kwargs(torch_em.default_segmentation_dataset, **kwargs)
289    ds = get_hydra_dataset(
290        path=path,
291        patch_shape=patch_shape,
292        bounding_boxes=bounding_boxes,
293        image_mip=image_mip,
294        seg_mip=seg_mip,
295        download=download,
296        offsets=offsets,
297        boundaries=boundaries,
298        **ds_kwargs,
299    )
300    return torch_em.get_data_loader(ds, batch_size=batch_size, **loader_kwargs)

Get the DataLoader for neuron instance segmentation in the Hydra vulgaris FIB-SEM dataset.

Arguments:
  • path: Filepath to a folder where the cached zarr stores will be saved.
  • batch_size: The batch size for training.
  • patch_shape: The patch shape (z, y, x) to use for training.
  • bounding_boxes: Bounding boxes in nm (x_min, x_max, y_min, y_max, z_min, z_max). Defaults to the four pre-defined boxes covering the densest annotated region.
  • image_mip: MIP level for the EM image. Default mip=3 gives 32 x 32 x 30 nm.
  • seg_mip: MIP level for the neuron segmentation. Default mip=2 gives 32 x 32 x 30 nm.
  • download: Whether to stream and cache data if not already present.
  • offsets: Offset values for affinity computation used as target.
  • boundaries: Whether to compute boundaries as the target.
  • kwargs: Additional keyword arguments for torch_em.default_segmentation_dataset or for the PyTorch DataLoader.
Returns:

The DataLoader.