torch_em.data.datasets.electron_microscopy.hemibrain

The Hemibrain dataset contains a FIB-SEM volume of the Drosophila central brain with dense neuron instance segmentation and additional volumetric annotations.

It covers approximately half of the central brain at 8 nm isotropic resolution, with ~25,000 neurons reconstructed and proofread across regions including the mushroom body, central complex, lateral horn, and portions of the optic lobe.

Three label types are available via the label_choice parameter:

  • "neurons": dense neuron instance segmentation (uint64, 8 nm). Nearly fully dense (~99% of voxels labeled), each neuron/glial cell gets a unique ID. Background (0) is only extracellular space.

  • "mito": mitochondria detection (uint64, 16 nm, upsampled to 8 nm for storage). Four classes - in practice ID=1 ("special") is the mitochondria class (~10% of voxels), ID=4 ("unlabeled") is background (~88%), ID=3 ("background") ~2%, ID=2 ("interior") ~0.1%. For binary training use ID=1 as foreground.

  • "tissue": coarse tissue type semantic map (uint64, 16 nm, upsampled to 8 nm). Seven classes covering the full volume: 1=broken white tissue (<1%), 2=trachea (<1%), 3=cell bodies (~2%), 4=dark matter (~1%), 5=large dendrites (~14%), 6=neuropil (~83%), 7=out of bounds. Useful as a region prior or multi-class semantic target.

Bounding boxes are always specified in 8 nm voxel coordinates. For mito and tissue (native 16 nm), coordinates are halved during download and labels are upsampled back to 8 nm using nearest-neighbour rescaling (skimage.transform.rescale, order=0).

This dataset is from the publication https://doi.org/10.7554/eLife.57443. Please cite it if you use this dataset in your research.

The dataset is publicly available at https://www.janelia.org/project-team/flyem/hemibrain. Requires cloud-volume: pip install cloud-volume.

NOTE (on data size): the full EM volume is (34427, 39725, 41394) voxels at 8 nm isotropic resolution. Downloading the entire volume is not feasible. Data is instead accessed by specifying bounding boxes (x_min, x_max, y_min, y_max, z_min, z_max) in 8 nm voxel coordinates, streamed from GCS and cached locally as HDF5 files.

  1"""The Hemibrain dataset contains a FIB-SEM volume of the Drosophila central brain
  2with dense neuron instance segmentation and additional volumetric annotations.
  3
  4It covers approximately half of the central brain at 8 nm isotropic resolution,
  5with ~25,000 neurons reconstructed and proofread across regions including the
  6mushroom body, central complex, lateral horn, and portions of the optic lobe.
  7
  8Three label types are available via the `label_choice` parameter:
  9
 10- "neurons": dense neuron instance segmentation (uint64, 8 nm). Nearly fully dense
 11  (~99% of voxels labeled), each neuron/glial cell gets a unique ID. Background (0)
 12  is only extracellular space.
 13
 14- "mito": mitochondria detection (uint64, 16 nm, upsampled to 8 nm for storage).
 15  Four classes - in practice ID=1 ("special") is the mitochondria class (~10% of
 16  voxels), ID=4 ("unlabeled") is background (~88%), ID=3 ("background") ~2%,
 17  ID=2 ("interior") ~0.1%. For binary training use ID=1 as foreground.
 18
 19- "tissue": coarse tissue type semantic map (uint64, 16 nm, upsampled to 8 nm).
 20  Seven classes covering the full volume:
 21  1=broken white tissue (<1%), 2=trachea (<1%), 3=cell bodies (~2%),
 22  4=dark matter (~1%), 5=large dendrites (~14%), 6=neuropil (~83%),
 23  7=out of bounds. Useful as a region prior or multi-class semantic target.
 24
 25Bounding boxes are always specified in 8 nm voxel coordinates. For mito and tissue
 26(native 16 nm), coordinates are halved during download and labels are upsampled back
 27to 8 nm using nearest-neighbour rescaling (skimage.transform.rescale, order=0).
 28
 29This dataset is from the publication https://doi.org/10.7554/eLife.57443.
 30Please cite it if you use this dataset in your research.
 31
 32The dataset is publicly available at https://www.janelia.org/project-team/flyem/hemibrain.
 33Requires cloud-volume: pip install cloud-volume.
 34
 35NOTE (on data size): the full EM volume is (34427, 39725, 41394) voxels at 8 nm isotropic
 36resolution. Downloading the entire volume is not feasible. Data is instead accessed by
 37specifying bounding boxes (x_min, x_max, y_min, y_max, z_min, z_max) in 8 nm voxel
 38coordinates, streamed from GCS and cached locally as HDF5 files.
 39"""
 40
 41import hashlib
 42import os
 43from typing import List, Literal, Optional, Tuple, Union
 44
 45import numpy as np
 46from torch.utils.data import DataLoader, Dataset
 47
 48import torch_em
 49from .. import util
 50
 51
 52EM_URL = "gs://neuroglancer-janelia-flyem-hemibrain/emdata/clahe_yz/jpeg"
 53SEG_URL = "gs://neuroglancer-janelia-flyem-hemibrain/v1.2/segmentation"
 54MITO_URL = "gs://neuroglancer-janelia-flyem-hemibrain/mito_20190717.27250582"
 55TISSUE_URL = "gs://neuroglancer-janelia-flyem-hemibrain/mask_normalized_round6"
 56
 57LABEL_URLS = {
 58    "neurons": SEG_URL,
 59    "mito": MITO_URL,
 60    "tissue": TISSUE_URL,
 61}
 62# Mito and tissue are at 16 nm (factor 2 coarser than the 8 nm EM).
 63LABEL_RESOLUTION_FACTOR = {"neurons": 1, "mito": 2, "tissue": 2}
 64
 65# A representative 1024³-voxel subvolume near the centre of the reconstructed region.
 66# Units are 8 nm voxels in (x, y, z) order, matching the CloudVolume coordinate space.
 67DEFAULT_BOUNDING_BOX = (15000, 16024, 18000, 19024, 18000, 19024)
 68
 69
 70def _bbox_to_str(bbox):
 71    return hashlib.md5("_".join(str(v) for v in bbox).encode()).hexdigest()[:12]
 72
 73
 74def get_hemibrain_data(
 75    path: Union[os.PathLike, str],
 76    bounding_box: Tuple[int, int, int, int, int, int] = DEFAULT_BOUNDING_BOX,
 77    label_choice: Literal["neurons", "mito", "tissue"] = "neurons",
 78    download: bool = False,
 79) -> str:
 80    """Stream a subvolume from the Hemibrain dataset and cache it as an HDF5 file.
 81
 82    Args:
 83        path: Filepath to a folder where the cached HDF5 file will be saved.
 84        bounding_box: The region to fetch as (x_min, x_max, y_min, y_max, z_min, z_max)
 85            in 8 nm voxel coordinates. Defaults to a central 1024³ training region.
 86        label_choice: Which labels to cache alongside the EM. One of "neurons", "mito",
 87            or "tissue". Mito and tissue are at 16 nm and are resampled to match the EM.
 88        download: Whether to stream and cache the data if it is not present.
 89
 90    Returns:
 91        The filepath to the cached HDF5 file.
 92    """
 93    import h5py
 94
 95    os.makedirs(str(path), exist_ok=True)
 96    h5_path = os.path.join(str(path), f"{label_choice}_{_bbox_to_str(bounding_box)}.h5")
 97    if os.path.exists(h5_path):
 98        return h5_path
 99
100    if not download:
101        raise RuntimeError(
102            f"No cached data found at '{h5_path}'. Set download=True to stream it from GCS."
103        )
104
105    try:
106        import cloudvolume
107    except ImportError:
108        raise ImportError("The 'cloud-volume' package is required: pip install cloud-volume")
109
110    x_min, x_max, y_min, y_max, z_min, z_max = bounding_box
111    print(f"Streaming Hemibrain EM + {label_choice} for bbox {bounding_box} ...")
112
113    em_vol = cloudvolume.CloudVolume(EM_URL, use_https=True, mip=0, progress=True)
114    raw = np.array(em_vol[x_min:x_max, y_min:y_max, z_min:z_max])[..., 0].transpose(2, 1, 0)
115
116    factor = LABEL_RESOLUTION_FACTOR[label_choice]
117    lx0, lx1 = x_min // factor, x_max // factor
118    ly0, ly1 = y_min // factor, y_max // factor
119    lz0, lz1 = z_min // factor, z_max // factor
120    lbl_vol = cloudvolume.CloudVolume(LABEL_URLS[label_choice], use_https=True, mip=0, progress=True)
121    labels = np.array(lbl_vol[lx0:lx1, ly0:ly1, lz0:lz1])[..., 0].transpose(2, 1, 0)
122    if factor > 1:
123        from skimage.transform import rescale
124        labels = rescale(labels, factor, order=0, anti_aliasing=False, preserve_range=True).astype(labels.dtype)
125        labels = labels[:raw.shape[0], :raw.shape[1], :raw.shape[2]]
126
127    with h5py.File(h5_path, "w", locking=False) as f:
128        f.attrs["bounding_box"] = bounding_box
129        f.attrs["label_choice"] = label_choice
130        f.attrs["resolution_nm"] = em_vol.resolution.tolist()
131        f.create_dataset("raw", data=raw.astype("uint8"), compression="gzip", chunks=True)
132        f.create_dataset("labels", data=labels.astype("uint64"), compression="gzip", chunks=True)
133
134    print(f"Cached to {h5_path} (raw {raw.shape}, labels {labels.shape})")
135    return h5_path
136
137
138def get_hemibrain_paths(
139    path: Union[os.PathLike, str],
140    bounding_boxes: Optional[List[Tuple[int, int, int, int, int, int]]] = None,
141    label_choice: Literal["neurons", "mito", "tissue"] = "neurons",
142    download: bool = False,
143) -> List[str]:
144    """Get paths to Hemibrain HDF5 cache files.
145
146    Args:
147        path: Filepath to a folder where the cached HDF5 files will be saved.
148        bounding_boxes: List of regions to fetch, each as
149            (x_min, x_max, y_min, y_max, z_min, z_max) in 8 nm voxel coordinates.
150            Defaults to [DEFAULT_BOUNDING_BOX].
151        label_choice: Which labels to use. One of "neurons", "mito", or "tissue".
152        download: Whether to stream and cache the data if it is not present.
153
154    Returns:
155        List of filepaths to the cached HDF5 files.
156    """
157    if bounding_boxes is None:
158        bounding_boxes = [DEFAULT_BOUNDING_BOX]
159    return [get_hemibrain_data(path, bbox, label_choice, download) for bbox in bounding_boxes]
160
161
162def get_hemibrain_dataset(
163    path: Union[os.PathLike, str],
164    patch_shape: Tuple[int, int, int],
165    bounding_boxes: Optional[List[Tuple[int, int, int, int, int, int]]] = None,
166    label_choice: Literal["neurons", "mito", "tissue"] = "neurons",
167    download: bool = False,
168    offsets: Optional[List[List[int]]] = None,
169    boundaries: bool = False,
170    **kwargs,
171) -> Dataset:
172    """Get the Hemibrain dataset for neuron or organelle segmentation.
173
174    Args:
175        path: Filepath to a folder where the cached HDF5 files will be saved.
176        patch_shape: The patch shape (z, y, x) to use for training.
177        bounding_boxes: List of subvolumes to use, each as
178            (x_min, x_max, y_min, y_max, z_min, z_max) in 8 nm voxel coordinates.
179            Defaults to [DEFAULT_BOUNDING_BOX] - a central 1024³ region.
180        label_choice: Which labels to use. One of "neurons", "mito", or "tissue".
181        download: Whether to stream and cache data if not already present.
182        offsets: Offset values for affinity computation used as target.
183        boundaries: Whether to compute boundaries as the target.
184        kwargs: Additional keyword arguments for `torch_em.default_segmentation_dataset`.
185
186    Returns:
187        The segmentation dataset.
188    """
189    assert len(patch_shape) == 3
190
191    paths = get_hemibrain_paths(path, bounding_boxes, label_choice, download)
192
193    kwargs = util.update_kwargs(kwargs, "is_seg_dataset", True)
194    kwargs, _ = util.add_instance_label_transform(
195        kwargs, add_binary_target=False, boundaries=boundaries, offsets=offsets
196    )
197
198    return torch_em.default_segmentation_dataset(
199        raw_paths=paths,
200        raw_key="raw",
201        label_paths=paths,
202        label_key="labels",
203        patch_shape=patch_shape,
204        **kwargs,
205    )
206
207
208def get_hemibrain_loader(
209    path: Union[os.PathLike, str],
210    patch_shape: Tuple[int, int, int],
211    batch_size: int,
212    bounding_boxes: Optional[List[Tuple[int, int, int, int, int, int]]] = None,
213    label_choice: Literal["neurons", "mito", "tissue"] = "neurons",
214    download: bool = False,
215    offsets: Optional[List[List[int]]] = None,
216    boundaries: bool = False,
217    **kwargs,
218) -> DataLoader:
219    """Get the DataLoader for neuron or organelle segmentation in the Hemibrain dataset.
220
221    Args:
222        path: Filepath to a folder where the cached HDF5 files will be saved.
223        patch_shape: The patch shape (z, y, x) to use for training.
224        batch_size: The batch size for training.
225        bounding_boxes: List of subvolumes to use, each as
226            (x_min, x_max, y_min, y_max, z_min, z_max) in 8 nm voxel coordinates.
227            Defaults to [DEFAULT_BOUNDING_BOX] - a central 1024³ region.
228        label_choice: Which labels to use. One of "neurons", "mito", or "tissue".
229        download: Whether to stream and cache data if not already present.
230        offsets: Offset values for affinity computation used as target.
231        boundaries: Whether to compute boundaries as the target.
232        kwargs: Additional keyword arguments for `torch_em.default_segmentation_dataset`
233            or for the PyTorch DataLoader.
234
235    Returns:
236        The DataLoader.
237    """
238    ds_kwargs, loader_kwargs = util.split_kwargs(torch_em.default_segmentation_dataset, **kwargs)
239    dataset = get_hemibrain_dataset(
240        path, patch_shape, bounding_boxes=bounding_boxes, label_choice=label_choice,
241        download=download, offsets=offsets, boundaries=boundaries, **ds_kwargs
242    )
243    return torch_em.get_data_loader(dataset, batch_size, **loader_kwargs)
EM_URL = 'gs://neuroglancer-janelia-flyem-hemibrain/emdata/clahe_yz/jpeg'
SEG_URL = 'gs://neuroglancer-janelia-flyem-hemibrain/v1.2/segmentation'
MITO_URL = 'gs://neuroglancer-janelia-flyem-hemibrain/mito_20190717.27250582'
TISSUE_URL = 'gs://neuroglancer-janelia-flyem-hemibrain/mask_normalized_round6'
LABEL_URLS = {'neurons': 'gs://neuroglancer-janelia-flyem-hemibrain/v1.2/segmentation', 'mito': 'gs://neuroglancer-janelia-flyem-hemibrain/mito_20190717.27250582', 'tissue': 'gs://neuroglancer-janelia-flyem-hemibrain/mask_normalized_round6'}
LABEL_RESOLUTION_FACTOR = {'neurons': 1, 'mito': 2, 'tissue': 2}
DEFAULT_BOUNDING_BOX = (15000, 16024, 18000, 19024, 18000, 19024)
def get_hemibrain_data( path: Union[os.PathLike, str], bounding_box: Tuple[int, int, int, int, int, int] = (15000, 16024, 18000, 19024, 18000, 19024), label_choice: Literal['neurons', 'mito', 'tissue'] = 'neurons', download: bool = False) -> str:
 75def get_hemibrain_data(
 76    path: Union[os.PathLike, str],
 77    bounding_box: Tuple[int, int, int, int, int, int] = DEFAULT_BOUNDING_BOX,
 78    label_choice: Literal["neurons", "mito", "tissue"] = "neurons",
 79    download: bool = False,
 80) -> str:
 81    """Stream a subvolume from the Hemibrain dataset and cache it as an HDF5 file.
 82
 83    Args:
 84        path: Filepath to a folder where the cached HDF5 file will be saved.
 85        bounding_box: The region to fetch as (x_min, x_max, y_min, y_max, z_min, z_max)
 86            in 8 nm voxel coordinates. Defaults to a central 1024³ training region.
 87        label_choice: Which labels to cache alongside the EM. One of "neurons", "mito",
 88            or "tissue". Mito and tissue are at 16 nm and are resampled to match the EM.
 89        download: Whether to stream and cache the data if it is not present.
 90
 91    Returns:
 92        The filepath to the cached HDF5 file.
 93    """
 94    import h5py
 95
 96    os.makedirs(str(path), exist_ok=True)
 97    h5_path = os.path.join(str(path), f"{label_choice}_{_bbox_to_str(bounding_box)}.h5")
 98    if os.path.exists(h5_path):
 99        return h5_path
100
101    if not download:
102        raise RuntimeError(
103            f"No cached data found at '{h5_path}'. Set download=True to stream it from GCS."
104        )
105
106    try:
107        import cloudvolume
108    except ImportError:
109        raise ImportError("The 'cloud-volume' package is required: pip install cloud-volume")
110
111    x_min, x_max, y_min, y_max, z_min, z_max = bounding_box
112    print(f"Streaming Hemibrain EM + {label_choice} for bbox {bounding_box} ...")
113
114    em_vol = cloudvolume.CloudVolume(EM_URL, use_https=True, mip=0, progress=True)
115    raw = np.array(em_vol[x_min:x_max, y_min:y_max, z_min:z_max])[..., 0].transpose(2, 1, 0)
116
117    factor = LABEL_RESOLUTION_FACTOR[label_choice]
118    lx0, lx1 = x_min // factor, x_max // factor
119    ly0, ly1 = y_min // factor, y_max // factor
120    lz0, lz1 = z_min // factor, z_max // factor
121    lbl_vol = cloudvolume.CloudVolume(LABEL_URLS[label_choice], use_https=True, mip=0, progress=True)
122    labels = np.array(lbl_vol[lx0:lx1, ly0:ly1, lz0:lz1])[..., 0].transpose(2, 1, 0)
123    if factor > 1:
124        from skimage.transform import rescale
125        labels = rescale(labels, factor, order=0, anti_aliasing=False, preserve_range=True).astype(labels.dtype)
126        labels = labels[:raw.shape[0], :raw.shape[1], :raw.shape[2]]
127
128    with h5py.File(h5_path, "w", locking=False) as f:
129        f.attrs["bounding_box"] = bounding_box
130        f.attrs["label_choice"] = label_choice
131        f.attrs["resolution_nm"] = em_vol.resolution.tolist()
132        f.create_dataset("raw", data=raw.astype("uint8"), compression="gzip", chunks=True)
133        f.create_dataset("labels", data=labels.astype("uint64"), compression="gzip", chunks=True)
134
135    print(f"Cached to {h5_path} (raw {raw.shape}, labels {labels.shape})")
136    return h5_path

Stream a subvolume from the Hemibrain dataset and cache it as an HDF5 file.

Arguments:
  • path: Filepath to a folder where the cached HDF5 file will be saved.
  • bounding_box: The region to fetch as (x_min, x_max, y_min, y_max, z_min, z_max) in 8 nm voxel coordinates. Defaults to a central 1024³ training region.
  • label_choice: Which labels to cache alongside the EM. One of "neurons", "mito", or "tissue". Mito and tissue are at 16 nm and are resampled to match the EM.
  • download: Whether to stream and cache the data if it is not present.
Returns:

The filepath to the cached HDF5 file.

def get_hemibrain_paths( path: Union[os.PathLike, str], bounding_boxes: Optional[List[Tuple[int, int, int, int, int, int]]] = None, label_choice: Literal['neurons', 'mito', 'tissue'] = 'neurons', download: bool = False) -> List[str]:
139def get_hemibrain_paths(
140    path: Union[os.PathLike, str],
141    bounding_boxes: Optional[List[Tuple[int, int, int, int, int, int]]] = None,
142    label_choice: Literal["neurons", "mito", "tissue"] = "neurons",
143    download: bool = False,
144) -> List[str]:
145    """Get paths to Hemibrain HDF5 cache files.
146
147    Args:
148        path: Filepath to a folder where the cached HDF5 files will be saved.
149        bounding_boxes: List of regions to fetch, each as
150            (x_min, x_max, y_min, y_max, z_min, z_max) in 8 nm voxel coordinates.
151            Defaults to [DEFAULT_BOUNDING_BOX].
152        label_choice: Which labels to use. One of "neurons", "mito", or "tissue".
153        download: Whether to stream and cache the data if it is not present.
154
155    Returns:
156        List of filepaths to the cached HDF5 files.
157    """
158    if bounding_boxes is None:
159        bounding_boxes = [DEFAULT_BOUNDING_BOX]
160    return [get_hemibrain_data(path, bbox, label_choice, download) for bbox in bounding_boxes]

Get paths to Hemibrain HDF5 cache files.

Arguments:
  • path: Filepath to a folder where the cached HDF5 files will be saved.
  • bounding_boxes: List of regions to fetch, each as (x_min, x_max, y_min, y_max, z_min, z_max) in 8 nm voxel coordinates. Defaults to [DEFAULT_BOUNDING_BOX].
  • label_choice: Which labels to use. One of "neurons", "mito", or "tissue".
  • download: Whether to stream and cache the data if it is not present.
Returns:

List of filepaths to the cached HDF5 files.

def get_hemibrain_dataset( path: Union[os.PathLike, str], patch_shape: Tuple[int, int, int], bounding_boxes: Optional[List[Tuple[int, int, int, int, int, int]]] = None, label_choice: Literal['neurons', 'mito', 'tissue'] = 'neurons', download: bool = False, offsets: Optional[List[List[int]]] = None, boundaries: bool = False, **kwargs) -> torch.utils.data.dataset.Dataset:
163def get_hemibrain_dataset(
164    path: Union[os.PathLike, str],
165    patch_shape: Tuple[int, int, int],
166    bounding_boxes: Optional[List[Tuple[int, int, int, int, int, int]]] = None,
167    label_choice: Literal["neurons", "mito", "tissue"] = "neurons",
168    download: bool = False,
169    offsets: Optional[List[List[int]]] = None,
170    boundaries: bool = False,
171    **kwargs,
172) -> Dataset:
173    """Get the Hemibrain dataset for neuron or organelle segmentation.
174
175    Args:
176        path: Filepath to a folder where the cached HDF5 files will be saved.
177        patch_shape: The patch shape (z, y, x) to use for training.
178        bounding_boxes: List of subvolumes to use, each as
179            (x_min, x_max, y_min, y_max, z_min, z_max) in 8 nm voxel coordinates.
180            Defaults to [DEFAULT_BOUNDING_BOX] - a central 1024³ region.
181        label_choice: Which labels to use. One of "neurons", "mito", or "tissue".
182        download: Whether to stream and cache data if not already present.
183        offsets: Offset values for affinity computation used as target.
184        boundaries: Whether to compute boundaries as the target.
185        kwargs: Additional keyword arguments for `torch_em.default_segmentation_dataset`.
186
187    Returns:
188        The segmentation dataset.
189    """
190    assert len(patch_shape) == 3
191
192    paths = get_hemibrain_paths(path, bounding_boxes, label_choice, download)
193
194    kwargs = util.update_kwargs(kwargs, "is_seg_dataset", True)
195    kwargs, _ = util.add_instance_label_transform(
196        kwargs, add_binary_target=False, boundaries=boundaries, offsets=offsets
197    )
198
199    return torch_em.default_segmentation_dataset(
200        raw_paths=paths,
201        raw_key="raw",
202        label_paths=paths,
203        label_key="labels",
204        patch_shape=patch_shape,
205        **kwargs,
206    )

Get the Hemibrain dataset for neuron or organelle segmentation.

Arguments:
  • path: Filepath to a folder where the cached HDF5 files will be saved.
  • patch_shape: The patch shape (z, y, x) to use for training.
  • bounding_boxes: List of subvolumes to use, each as (x_min, x_max, y_min, y_max, z_min, z_max) in 8 nm voxel coordinates. Defaults to [DEFAULT_BOUNDING_BOX] - a central 1024³ region.
  • label_choice: Which labels to use. One of "neurons", "mito", or "tissue".
  • download: Whether to stream and cache data if not already present.
  • offsets: Offset values for affinity computation used as target.
  • boundaries: Whether to compute boundaries as the target.
  • kwargs: Additional keyword arguments for torch_em.default_segmentation_dataset.
Returns:

The segmentation dataset.

def get_hemibrain_loader( path: Union[os.PathLike, str], patch_shape: Tuple[int, int, int], batch_size: int, bounding_boxes: Optional[List[Tuple[int, int, int, int, int, int]]] = None, label_choice: Literal['neurons', 'mito', 'tissue'] = 'neurons', download: bool = False, offsets: Optional[List[List[int]]] = None, boundaries: bool = False, **kwargs) -> torch.utils.data.dataloader.DataLoader:
209def get_hemibrain_loader(
210    path: Union[os.PathLike, str],
211    patch_shape: Tuple[int, int, int],
212    batch_size: int,
213    bounding_boxes: Optional[List[Tuple[int, int, int, int, int, int]]] = None,
214    label_choice: Literal["neurons", "mito", "tissue"] = "neurons",
215    download: bool = False,
216    offsets: Optional[List[List[int]]] = None,
217    boundaries: bool = False,
218    **kwargs,
219) -> DataLoader:
220    """Get the DataLoader for neuron or organelle segmentation in the Hemibrain dataset.
221
222    Args:
223        path: Filepath to a folder where the cached HDF5 files will be saved.
224        patch_shape: The patch shape (z, y, x) to use for training.
225        batch_size: The batch size for training.
226        bounding_boxes: List of subvolumes to use, each as
227            (x_min, x_max, y_min, y_max, z_min, z_max) in 8 nm voxel coordinates.
228            Defaults to [DEFAULT_BOUNDING_BOX] - a central 1024³ region.
229        label_choice: Which labels to use. One of "neurons", "mito", or "tissue".
230        download: Whether to stream and cache data if not already present.
231        offsets: Offset values for affinity computation used as target.
232        boundaries: Whether to compute boundaries as the target.
233        kwargs: Additional keyword arguments for `torch_em.default_segmentation_dataset`
234            or for the PyTorch DataLoader.
235
236    Returns:
237        The DataLoader.
238    """
239    ds_kwargs, loader_kwargs = util.split_kwargs(torch_em.default_segmentation_dataset, **kwargs)
240    dataset = get_hemibrain_dataset(
241        path, patch_shape, bounding_boxes=bounding_boxes, label_choice=label_choice,
242        download=download, offsets=offsets, boundaries=boundaries, **ds_kwargs
243    )
244    return torch_em.get_data_loader(dataset, batch_size, **loader_kwargs)

Get the DataLoader for neuron or organelle segmentation in the Hemibrain dataset.

Arguments:
  • path: Filepath to a folder where the cached HDF5 files will be saved.
  • patch_shape: The patch shape (z, y, x) to use for training.
  • batch_size: The batch size for training.
  • bounding_boxes: List of subvolumes to use, each as (x_min, x_max, y_min, y_max, z_min, z_max) in 8 nm voxel coordinates. Defaults to [DEFAULT_BOUNDING_BOX] - a central 1024³ region.
  • label_choice: Which labels to use. One of "neurons", "mito", or "tissue".
  • download: Whether to stream and cache data if not already present.
  • offsets: Offset values for affinity computation used as target.
  • boundaries: Whether to compute boundaries as the target.
  • kwargs: Additional keyword arguments for torch_em.default_segmentation_dataset or for the PyTorch DataLoader.
Returns:

The DataLoader.