torch_em.data.datasets.electron_microscopy.zebrafinch

Zebrafinch Area X datasets for neuron and organelle segmentation in 3DEM.

Two FIB-SEM volumes of adult male zebra finch (Taeniopygia guttata) area X are available, both from the Kornfeld lab:

  • j0251: 10 x 10 x 25 nm native resolution, full extent ~256 x 256 x 384 µm. Labels: neuron instance segmentation (~4.26 M neurons) and endoplasmic reticulum. Cell-type labels (17 types: MSN, GPe, GPi, HVC axons, interneurons, etc.) and synapse coordinates are available via the REST API at https://syconn.esc.mpcdf.mpg.de.
  • j0126: 10 x 10 x 20 nm native resolution, full extent ~107 x 109 x 114 µm. Labels: neuron instance segmentation only.

Data is streamed from the Kornfeld lab public server via cloud-volume and cached locally as zarr v3 stores in (z, y, x) axis order.

This dataset is from the publication https://doi.org/10.1101/2025.10.25.684569. Please cite it if you use this dataset in your research.

The dataset is publicly available at https://syconn.esc.mpcdf.mpg.de. Requires cloud-volume: pip install cloud-volume.

  1"""Zebrafinch Area X datasets for neuron and organelle segmentation in 3DEM.
  2
  3Two FIB-SEM volumes of adult male zebra finch (Taeniopygia guttata) area X are
  4available, both from the Kornfeld lab:
  5
  6- j0251: 10 x 10 x 25 nm native resolution, full extent ~256 x 256 x 384 µm.
  7  Labels: neuron instance segmentation (~4.26 M neurons) and endoplasmic reticulum.
  8  Cell-type labels (17 types: MSN, GPe, GPi, HVC axons, interneurons, etc.) and
  9  synapse coordinates are available via the REST API at https://syconn.esc.mpcdf.mpg.de.
 10- j0126: 10 x 10 x 20 nm native resolution, full extent ~107 x 109 x 114 µm.
 11  Labels: neuron instance segmentation only.
 12
 13Data is streamed from the Kornfeld lab public server via cloud-volume and cached
 14locally as zarr v3 stores in (z, y, x) axis order.
 15
 16This dataset is from the publication https://doi.org/10.1101/2025.10.25.684569.
 17Please cite it if you use this dataset in your research.
 18
 19The dataset is publicly available at https://syconn.esc.mpcdf.mpg.de.
 20Requires cloud-volume: pip install cloud-volume.
 21"""
 22
 23import hashlib
 24import os
 25from concurrent.futures import ThreadPoolExecutor, as_completed
 26from typing import List, Literal, Optional, Tuple, Union
 27
 28import numpy as np
 29from tqdm import tqdm
 30from torch.utils.data import Dataset, DataLoader
 31
 32import torch_em
 33from .. import util
 34
 35
 36J0251_BASE_URL = (
 37    "precomputed://https://syconn.esc.mpcdf.mpg.de"
 38    "/j0251_72_seg_20210127_agglo2_syn_20220811_celltypes_20230822"
 39)
 40J0126_BASE_URL = "precomputed://https://syconn.esc.mpcdf.mpg.de"
 41
 42ZEBRAFINCH_DATASETS = {
 43    "j0251": {
 44        "em_url": f"{J0251_BASE_URL}/image",
 45        "seg_url": f"{J0251_BASE_URL}/segmentation",
 46        "er_url": f"{J0251_BASE_URL}/er",
 47        # Full extent ~256 x 256 x 384 µm at 10 x 10 x 25 nm native resolution.
 48        "bbox_nm": (0, 271190, 0, 273500, 0, 387350),
 49    },
 50    "j0126": {
 51        "em_url": f"{J0126_BASE_URL}/j0126/volume/image",
 52        "seg_url": f"{J0126_BASE_URL}/volume/segmentation",
 53        "er_url": None,
 54        # Full extent ~107 x 109 x 114 µm at 10 x 10 x 20 nm native resolution.
 55        "bbox_nm": (0, 106640, 0, 109130, 0, 114000),
 56    },
 57}
 58
 59ZEBRAFINCH_CHUNK_SHAPE = (64, 128, 128)
 60ZEBRAFINCH_SHARD_SHAPE = (128, 512, 512)
 61
 62
 63def _zebrafinch_bbox_to_str(bbox):
 64    return hashlib.md5("_".join(str(v) for v in bbox).encode()).hexdigest()[:12]
 65
 66
 67def _zebrafinch_create_array(root, name, shape, dtype, is_label):
 68    from zarr.codecs import BloscCodec
 69    shuffle = "bitshuffle" if (np.issubdtype(dtype, np.integer) and is_label) else "shuffle"
 70    return root.create_array(
 71        name,
 72        shape=shape,
 73        chunks=ZEBRAFINCH_CHUNK_SHAPE,
 74        shards=ZEBRAFINCH_SHARD_SHAPE,
 75        dtype=dtype,
 76        compressors=BloscCodec(cname="zstd", clevel=6, shuffle=shuffle),
 77    )
 78
 79
 80def _zebrafinch_bbox_voxels(cv, x_min_nm, x_max_nm, y_min_nm, y_max_nm, z_min_nm, z_max_nm):
 81    scale = np.array(cv.resolution)
 82    x0 = int(np.floor(x_min_nm / scale[0]))
 83    x1 = int(np.ceil(x_max_nm / scale[0]))
 84    y0 = int(np.floor(y_min_nm / scale[1]))
 85    y1 = int(np.ceil(y_max_nm / scale[1]))
 86    z0 = int(np.floor(z_min_nm / scale[2]))
 87    z1 = int(np.ceil(z_max_nm / scale[2]))
 88    return x0, x1, y0, y1, z0, z1, (z1 - z0, y1 - y0, x1 - x0)
 89
 90
 91def _zebrafinch_download_to_zarr(cv, ds, x0g, y0g, z0g, name):
 92    shape = ds.shape  # (z, y, x)
 93    sz, sy, sx = ZEBRAFINCH_SHARD_SHAPE
 94
 95    tasks = []
 96    for z0_ in range(0, shape[0], sz):
 97        for y0_ in range(0, shape[1], sy):
 98            for x0_ in range(0, shape[2], sx):
 99                z1_ = min(z0_ + sz, shape[0])
100                y1_ = min(y0_ + sy, shape[1])
101                x1_ = min(x0_ + sx, shape[2])
102                tasks.append((
103                    (z0_, z1_), (y0_, y1_), (x0_, x1_),
104                    (x0g + x0_, x0g + x1_, y0g + y0_, y0g + y1_, z0g + z0_, z0g + z1_),
105                ))
106
107    target_dtype = np.dtype(ds.dtype)
108
109    def worker(item):
110        (z0_, z1_), (y0_, y1_), (x0_, x1_), (gx0, gx1, gy0, gy1, gz0, gz1) = item
111        block = np.asarray(cv[gx0:gx1, gy0:gy1, gz0:gz1])
112        if block.ndim == 4:
113            block = block[..., 0]
114        ds[z0_:z1_, y0_:y1_, x0_:x1_] = block.transpose(2, 1, 0).astype(target_dtype)
115
116    with ThreadPoolExecutor(max_workers=8) as ex:
117        futures = [ex.submit(worker, t) for t in tasks]
118        for fut in tqdm(as_completed(futures), total=len(futures), desc=f"Downloading '{name}'", smoothing=0.05):
119            fut.result()
120
121
122def get_zebrafinch_data(
123    path: Union[os.PathLike, str],
124    bounding_box: Optional[Tuple[float, ...]] = None,
125    mip: int = 0,
126    dataset: Literal["j0251", "j0126"] = "j0251",
127    download: bool = False,
128) -> str:
129    """Stream and cache a region of a zebrafinch dataset as a zarr v3 store.
130
131    The zarr store contains:
132      - raw: EM grayscale (uint8, z/y/x)
133      - labels: neuron instance segmentation (uint64, z/y/x)
134      - er: endoplasmic reticulum instance segmentation (uint64, z/y/x) - j0251 only.
135
136    Args:
137        path: Filepath to a folder where the cached zarr store will be saved.
138        bounding_box: Region in nm as (x_min, x_max, y_min, y_max, z_min, z_max).
139            Defaults to the full volume extent for the chosen dataset.
140        mip: MIP level for both EM and segmentation. Default mip=0 gives native resolution
141            (10 x 10 x 25 nm for j0251, 10 x 10 x 20 nm for j0126).
142        dataset: Which specimen to use, either "j0251" or "j0126".
143        download: Whether to stream and cache the data if not present.
144
145    Returns:
146        Filepath to the cached zarr store.
147    """
148    import zarr
149
150    ds_info = ZEBRAFINCH_DATASETS[dataset]
151    os.makedirs(str(path), exist_ok=True)
152    bbox = bounding_box if bounding_box is not None else ds_info["bbox_nm"]
153    bbox_hash = _zebrafinch_bbox_to_str(bbox)
154    zarr_path = os.path.join(str(path), f"{dataset}_mip{mip}_{bbox_hash}.zarr")
155
156    arrays_needed = ["raw", "labels"] + (["er"] if ds_info["er_url"] is not None else [])
157    root = zarr.open_group(zarr_path, mode="a")
158    missing = [k for k in arrays_needed if k not in root]
159    if not missing:
160        return zarr_path
161    if not download:
162        raise RuntimeError(
163            f"No cached data at '{zarr_path}'. Set download=True to stream from the Kornfeld lab server."
164        )
165
166    try:
167        from cloudvolume import CloudVolume
168    except ImportError:
169        raise ImportError("The 'cloud-volume' package is required: pip install cloud-volume")
170
171    x_min_nm, x_max_nm, y_min_nm, y_max_nm, z_min_nm, z_max_nm = bbox
172    print(f"Streaming zebrafinch {dataset} at mip={mip} ...")
173
174    cv_kwargs = dict(use_https=True, mip=mip, progress=False, fill_missing=True, provenance={})
175    em_cv = CloudVolume(ds_info["em_url"], **cv_kwargs)
176    seg_cv = CloudVolume(ds_info["seg_url"], **cv_kwargs)
177
178    ex0, ex1, ey0, ey1, ez0, ez1, em_shape = _zebrafinch_bbox_voxels(
179        em_cv, x_min_nm, x_max_nm, y_min_nm, y_max_nm, z_min_nm, z_max_nm
180    )
181    sx0, sx1, sy0, sy1, sz0, sz1, seg_shape = _zebrafinch_bbox_voxels(
182        seg_cv, x_min_nm, x_max_nm, y_min_nm, y_max_nm, z_min_nm, z_max_nm
183    )
184    shape = tuple(min(e, s) for e, s in zip(em_shape, seg_shape))
185
186    root.attrs["bounding_box_nm"] = list(bbox)
187    root.attrs["mip"] = mip
188
189    if "raw" not in root:
190        ds_raw = _zebrafinch_create_array(root, "raw", shape, np.dtype("uint8"), is_label=False)
191        _zebrafinch_download_to_zarr(em_cv, ds_raw, ex0, ey0, ez0, name="raw")
192
193    if "labels" not in root:
194        ds_lbl = _zebrafinch_create_array(root, "labels", shape, np.dtype("uint64"), is_label=True)
195        _zebrafinch_download_to_zarr(seg_cv, ds_lbl, sx0, sy0, sz0, name="labels")
196
197    if "er" not in root and ds_info["er_url"] is not None:
198        er_cv = CloudVolume(ds_info["er_url"], **cv_kwargs)
199        rx0, rx1, ry0, ry1, rz0, rz1, er_shape = _zebrafinch_bbox_voxels(
200            er_cv, x_min_nm, x_max_nm, y_min_nm, y_max_nm, z_min_nm, z_max_nm
201        )
202        shape_er = tuple(min(e, r) for e, r in zip(shape, er_shape))
203        ds_er = _zebrafinch_create_array(root, "er", shape_er, np.dtype("uint64"), is_label=True)
204        _zebrafinch_download_to_zarr(er_cv, ds_er, rx0, ry0, rz0, name="er")
205
206    print(f"Cached to {zarr_path} (shape {shape})")
207    return zarr_path
208
209
210def get_zebrafinch_dataset(
211    path: Union[os.PathLike, str],
212    patch_shape: Tuple[int, int, int],
213    bounding_box: Optional[Tuple[float, ...]] = None,
214    mip: int = 0,
215    dataset: Literal["j0251", "j0126"] = "j0251",
216    label_choice: Literal["neurons", "er"] = "neurons",
217    download: bool = False,
218    offsets: Optional[List[List[int]]] = None,
219    boundaries: bool = False,
220    **kwargs,
221) -> Dataset:
222    """Get a zebrafinch dataset for neuron or organelle segmentation.
223
224    Args:
225        path: Filepath to a folder where the cached zarr store will be saved.
226        patch_shape: The patch shape (z, y, x) to use for training.
227        bounding_box: Region in nm as (x_min, x_max, y_min, y_max, z_min, z_max).
228            Defaults to the full volume extent for the chosen dataset.
229        mip: MIP level for both EM and segmentation. Default mip=0 gives native resolution
230            (10 x 10 x 25 nm for j0251, 10 x 10 x 20 nm for j0126).
231        dataset: Which specimen to use, either "j0251" or "j0126".
232        label_choice: Which segmentation to use as target. Either "neurons" or "er".
233            "er" is only available for j0251.
234        download: Whether to stream and cache data if not already present.
235        offsets: Offset values for affinity computation used as target.
236        boundaries: Whether to compute boundaries as the target.
237        kwargs: Additional keyword arguments for `torch_em.default_segmentation_dataset`.
238
239    Returns:
240        The segmentation dataset.
241    """
242    assert len(patch_shape) == 3
243    if label_choice == "er" and ZEBRAFINCH_DATASETS[dataset]["er_url"] is None:
244        raise ValueError(f"label_choice='er' is not available for dataset='{dataset}'")
245    zarr_path = get_zebrafinch_data(path, bounding_box, mip, dataset, download)
246
247    label_key = "labels" if label_choice == "neurons" else "er"
248
249    kwargs = util.update_kwargs(kwargs, "is_seg_dataset", True)
250    kwargs, _ = util.add_instance_label_transform(
251        kwargs, add_binary_target=False, boundaries=boundaries, offsets=offsets
252    )
253
254    return torch_em.default_segmentation_dataset(
255        raw_paths=zarr_path,
256        raw_key="raw",
257        label_paths=zarr_path,
258        label_key=label_key,
259        patch_shape=patch_shape,
260        **kwargs,
261    )
262
263
264def get_zebrafinch_loader(
265    path: Union[os.PathLike, str],
266    batch_size: int,
267    patch_shape: Tuple[int, int, int],
268    bounding_box: Optional[Tuple[float, ...]] = None,
269    mip: int = 0,
270    dataset: Literal["j0251", "j0126"] = "j0251",
271    label_choice: Literal["neurons", "er"] = "neurons",
272    download: bool = False,
273    offsets: Optional[List[List[int]]] = None,
274    boundaries: bool = False,
275    **kwargs,
276) -> DataLoader:
277    """Get the DataLoader for neuron or organelle segmentation in a zebrafinch dataset.
278
279    Args:
280        path: Filepath to a folder where the cached zarr store will be saved.
281        batch_size: The batch size for training.
282        patch_shape: The patch shape (z, y, x) to use for training.
283        bounding_box: Region in nm as (x_min, x_max, y_min, y_max, z_min, z_max).
284            Defaults to the full volume extent for the chosen dataset.
285        mip: MIP level for both EM and segmentation. Default mip=0 gives native resolution
286            (10 x 10 x 25 nm for j0251, 10 x 10 x 20 nm for j0126).
287        dataset: Which specimen to use, either "j0251" or "j0126".
288        label_choice: Which segmentation to use as target. Either "neurons" or "er".
289            "er" is only available for j0251.
290        download: Whether to stream and cache data if not already present.
291        offsets: Offset values for affinity computation used as target.
292        boundaries: Whether to compute boundaries as the target.
293        kwargs: Additional keyword arguments for `torch_em.default_segmentation_dataset`
294            or for the PyTorch DataLoader.
295
296    Returns:
297        The DataLoader.
298    """
299    ds_kwargs, loader_kwargs = util.split_kwargs(torch_em.default_segmentation_dataset, **kwargs)
300    ds = get_zebrafinch_dataset(
301        path=path,
302        patch_shape=patch_shape,
303        bounding_box=bounding_box,
304        mip=mip,
305        dataset=dataset,
306        label_choice=label_choice,
307        download=download,
308        offsets=offsets,
309        boundaries=boundaries,
310        **ds_kwargs,
311    )
312    return torch_em.get_data_loader(ds, batch_size=batch_size, **loader_kwargs)
J0251_BASE_URL = 'precomputed://https://syconn.esc.mpcdf.mpg.de/j0251_72_seg_20210127_agglo2_syn_20220811_celltypes_20230822'
J0126_BASE_URL = 'precomputed://https://syconn.esc.mpcdf.mpg.de'
ZEBRAFINCH_DATASETS = {'j0251': {'em_url': 'precomputed://https://syconn.esc.mpcdf.mpg.de/j0251_72_seg_20210127_agglo2_syn_20220811_celltypes_20230822/image', 'seg_url': 'precomputed://https://syconn.esc.mpcdf.mpg.de/j0251_72_seg_20210127_agglo2_syn_20220811_celltypes_20230822/segmentation', 'er_url': 'precomputed://https://syconn.esc.mpcdf.mpg.de/j0251_72_seg_20210127_agglo2_syn_20220811_celltypes_20230822/er', 'bbox_nm': (0, 271190, 0, 273500, 0, 387350)}, 'j0126': {'em_url': 'precomputed://https://syconn.esc.mpcdf.mpg.de/j0126/volume/image', 'seg_url': 'precomputed://https://syconn.esc.mpcdf.mpg.de/volume/segmentation', 'er_url': None, 'bbox_nm': (0, 106640, 0, 109130, 0, 114000)}}
ZEBRAFINCH_CHUNK_SHAPE = (64, 128, 128)
ZEBRAFINCH_SHARD_SHAPE = (128, 512, 512)
def get_zebrafinch_data( path: Union[os.PathLike, str], bounding_box: Optional[Tuple[float, ...]] = None, mip: int = 0, dataset: Literal['j0251', 'j0126'] = 'j0251', download: bool = False) -> str:
123def get_zebrafinch_data(
124    path: Union[os.PathLike, str],
125    bounding_box: Optional[Tuple[float, ...]] = None,
126    mip: int = 0,
127    dataset: Literal["j0251", "j0126"] = "j0251",
128    download: bool = False,
129) -> str:
130    """Stream and cache a region of a zebrafinch dataset as a zarr v3 store.
131
132    The zarr store contains:
133      - raw: EM grayscale (uint8, z/y/x)
134      - labels: neuron instance segmentation (uint64, z/y/x)
135      - er: endoplasmic reticulum instance segmentation (uint64, z/y/x) - j0251 only.
136
137    Args:
138        path: Filepath to a folder where the cached zarr store will be saved.
139        bounding_box: Region in nm as (x_min, x_max, y_min, y_max, z_min, z_max).
140            Defaults to the full volume extent for the chosen dataset.
141        mip: MIP level for both EM and segmentation. Default mip=0 gives native resolution
142            (10 x 10 x 25 nm for j0251, 10 x 10 x 20 nm for j0126).
143        dataset: Which specimen to use, either "j0251" or "j0126".
144        download: Whether to stream and cache the data if not present.
145
146    Returns:
147        Filepath to the cached zarr store.
148    """
149    import zarr
150
151    ds_info = ZEBRAFINCH_DATASETS[dataset]
152    os.makedirs(str(path), exist_ok=True)
153    bbox = bounding_box if bounding_box is not None else ds_info["bbox_nm"]
154    bbox_hash = _zebrafinch_bbox_to_str(bbox)
155    zarr_path = os.path.join(str(path), f"{dataset}_mip{mip}_{bbox_hash}.zarr")
156
157    arrays_needed = ["raw", "labels"] + (["er"] if ds_info["er_url"] is not None else [])
158    root = zarr.open_group(zarr_path, mode="a")
159    missing = [k for k in arrays_needed if k not in root]
160    if not missing:
161        return zarr_path
162    if not download:
163        raise RuntimeError(
164            f"No cached data at '{zarr_path}'. Set download=True to stream from the Kornfeld lab server."
165        )
166
167    try:
168        from cloudvolume import CloudVolume
169    except ImportError:
170        raise ImportError("The 'cloud-volume' package is required: pip install cloud-volume")
171
172    x_min_nm, x_max_nm, y_min_nm, y_max_nm, z_min_nm, z_max_nm = bbox
173    print(f"Streaming zebrafinch {dataset} at mip={mip} ...")
174
175    cv_kwargs = dict(use_https=True, mip=mip, progress=False, fill_missing=True, provenance={})
176    em_cv = CloudVolume(ds_info["em_url"], **cv_kwargs)
177    seg_cv = CloudVolume(ds_info["seg_url"], **cv_kwargs)
178
179    ex0, ex1, ey0, ey1, ez0, ez1, em_shape = _zebrafinch_bbox_voxels(
180        em_cv, x_min_nm, x_max_nm, y_min_nm, y_max_nm, z_min_nm, z_max_nm
181    )
182    sx0, sx1, sy0, sy1, sz0, sz1, seg_shape = _zebrafinch_bbox_voxels(
183        seg_cv, x_min_nm, x_max_nm, y_min_nm, y_max_nm, z_min_nm, z_max_nm
184    )
185    shape = tuple(min(e, s) for e, s in zip(em_shape, seg_shape))
186
187    root.attrs["bounding_box_nm"] = list(bbox)
188    root.attrs["mip"] = mip
189
190    if "raw" not in root:
191        ds_raw = _zebrafinch_create_array(root, "raw", shape, np.dtype("uint8"), is_label=False)
192        _zebrafinch_download_to_zarr(em_cv, ds_raw, ex0, ey0, ez0, name="raw")
193
194    if "labels" not in root:
195        ds_lbl = _zebrafinch_create_array(root, "labels", shape, np.dtype("uint64"), is_label=True)
196        _zebrafinch_download_to_zarr(seg_cv, ds_lbl, sx0, sy0, sz0, name="labels")
197
198    if "er" not in root and ds_info["er_url"] is not None:
199        er_cv = CloudVolume(ds_info["er_url"], **cv_kwargs)
200        rx0, rx1, ry0, ry1, rz0, rz1, er_shape = _zebrafinch_bbox_voxels(
201            er_cv, x_min_nm, x_max_nm, y_min_nm, y_max_nm, z_min_nm, z_max_nm
202        )
203        shape_er = tuple(min(e, r) for e, r in zip(shape, er_shape))
204        ds_er = _zebrafinch_create_array(root, "er", shape_er, np.dtype("uint64"), is_label=True)
205        _zebrafinch_download_to_zarr(er_cv, ds_er, rx0, ry0, rz0, name="er")
206
207    print(f"Cached to {zarr_path} (shape {shape})")
208    return zarr_path

Stream and cache a region of a zebrafinch dataset as a zarr v3 store.

The zarr store contains:
  • raw: EM grayscale (uint8, z/y/x)
  • labels: neuron instance segmentation (uint64, z/y/x)
  • er: endoplasmic reticulum instance segmentation (uint64, z/y/x) - j0251 only.
Arguments:
  • path: Filepath to a folder where the cached zarr store will be saved.
  • bounding_box: Region in nm as (x_min, x_max, y_min, y_max, z_min, z_max). Defaults to the full volume extent for the chosen dataset.
  • mip: MIP level for both EM and segmentation. Default mip=0 gives native resolution (10 x 10 x 25 nm for j0251, 10 x 10 x 20 nm for j0126).
  • dataset: Which specimen to use, either "j0251" or "j0126".
  • download: Whether to stream and cache the data if not present.
Returns:

Filepath to the cached zarr store.

def get_zebrafinch_dataset( path: Union[os.PathLike, str], patch_shape: Tuple[int, int, int], bounding_box: Optional[Tuple[float, ...]] = None, mip: int = 0, dataset: Literal['j0251', 'j0126'] = 'j0251', label_choice: Literal['neurons', 'er'] = 'neurons', download: bool = False, offsets: Optional[List[List[int]]] = None, boundaries: bool = False, **kwargs) -> torch.utils.data.dataset.Dataset:
211def get_zebrafinch_dataset(
212    path: Union[os.PathLike, str],
213    patch_shape: Tuple[int, int, int],
214    bounding_box: Optional[Tuple[float, ...]] = None,
215    mip: int = 0,
216    dataset: Literal["j0251", "j0126"] = "j0251",
217    label_choice: Literal["neurons", "er"] = "neurons",
218    download: bool = False,
219    offsets: Optional[List[List[int]]] = None,
220    boundaries: bool = False,
221    **kwargs,
222) -> Dataset:
223    """Get a zebrafinch dataset for neuron or organelle segmentation.
224
225    Args:
226        path: Filepath to a folder where the cached zarr store will be saved.
227        patch_shape: The patch shape (z, y, x) to use for training.
228        bounding_box: Region in nm as (x_min, x_max, y_min, y_max, z_min, z_max).
229            Defaults to the full volume extent for the chosen dataset.
230        mip: MIP level for both EM and segmentation. Default mip=0 gives native resolution
231            (10 x 10 x 25 nm for j0251, 10 x 10 x 20 nm for j0126).
232        dataset: Which specimen to use, either "j0251" or "j0126".
233        label_choice: Which segmentation to use as target. Either "neurons" or "er".
234            "er" is only available for j0251.
235        download: Whether to stream and cache data if not already present.
236        offsets: Offset values for affinity computation used as target.
237        boundaries: Whether to compute boundaries as the target.
238        kwargs: Additional keyword arguments for `torch_em.default_segmentation_dataset`.
239
240    Returns:
241        The segmentation dataset.
242    """
243    assert len(patch_shape) == 3
244    if label_choice == "er" and ZEBRAFINCH_DATASETS[dataset]["er_url"] is None:
245        raise ValueError(f"label_choice='er' is not available for dataset='{dataset}'")
246    zarr_path = get_zebrafinch_data(path, bounding_box, mip, dataset, download)
247
248    label_key = "labels" if label_choice == "neurons" else "er"
249
250    kwargs = util.update_kwargs(kwargs, "is_seg_dataset", True)
251    kwargs, _ = util.add_instance_label_transform(
252        kwargs, add_binary_target=False, boundaries=boundaries, offsets=offsets
253    )
254
255    return torch_em.default_segmentation_dataset(
256        raw_paths=zarr_path,
257        raw_key="raw",
258        label_paths=zarr_path,
259        label_key=label_key,
260        patch_shape=patch_shape,
261        **kwargs,
262    )

Get a zebrafinch dataset for neuron or organelle segmentation.

Arguments:
  • path: Filepath to a folder where the cached zarr store will be saved.
  • patch_shape: The patch shape (z, y, x) to use for training.
  • bounding_box: Region in nm as (x_min, x_max, y_min, y_max, z_min, z_max). Defaults to the full volume extent for the chosen dataset.
  • mip: MIP level for both EM and segmentation. Default mip=0 gives native resolution (10 x 10 x 25 nm for j0251, 10 x 10 x 20 nm for j0126).
  • dataset: Which specimen to use, either "j0251" or "j0126".
  • label_choice: Which segmentation to use as target. Either "neurons" or "er". "er" is only available for j0251.
  • download: Whether to stream and cache data if not already present.
  • offsets: Offset values for affinity computation used as target.
  • boundaries: Whether to compute boundaries as the target.
  • kwargs: Additional keyword arguments for torch_em.default_segmentation_dataset.
Returns:

The segmentation dataset.

def get_zebrafinch_loader( path: Union[os.PathLike, str], batch_size: int, patch_shape: Tuple[int, int, int], bounding_box: Optional[Tuple[float, ...]] = None, mip: int = 0, dataset: Literal['j0251', 'j0126'] = 'j0251', label_choice: Literal['neurons', 'er'] = 'neurons', download: bool = False, offsets: Optional[List[List[int]]] = None, boundaries: bool = False, **kwargs) -> torch.utils.data.dataloader.DataLoader:
265def get_zebrafinch_loader(
266    path: Union[os.PathLike, str],
267    batch_size: int,
268    patch_shape: Tuple[int, int, int],
269    bounding_box: Optional[Tuple[float, ...]] = None,
270    mip: int = 0,
271    dataset: Literal["j0251", "j0126"] = "j0251",
272    label_choice: Literal["neurons", "er"] = "neurons",
273    download: bool = False,
274    offsets: Optional[List[List[int]]] = None,
275    boundaries: bool = False,
276    **kwargs,
277) -> DataLoader:
278    """Get the DataLoader for neuron or organelle segmentation in a zebrafinch dataset.
279
280    Args:
281        path: Filepath to a folder where the cached zarr store will be saved.
282        batch_size: The batch size for training.
283        patch_shape: The patch shape (z, y, x) to use for training.
284        bounding_box: Region in nm as (x_min, x_max, y_min, y_max, z_min, z_max).
285            Defaults to the full volume extent for the chosen dataset.
286        mip: MIP level for both EM and segmentation. Default mip=0 gives native resolution
287            (10 x 10 x 25 nm for j0251, 10 x 10 x 20 nm for j0126).
288        dataset: Which specimen to use, either "j0251" or "j0126".
289        label_choice: Which segmentation to use as target. Either "neurons" or "er".
290            "er" is only available for j0251.
291        download: Whether to stream and cache data if not already present.
292        offsets: Offset values for affinity computation used as target.
293        boundaries: Whether to compute boundaries as the target.
294        kwargs: Additional keyword arguments for `torch_em.default_segmentation_dataset`
295            or for the PyTorch DataLoader.
296
297    Returns:
298        The DataLoader.
299    """
300    ds_kwargs, loader_kwargs = util.split_kwargs(torch_em.default_segmentation_dataset, **kwargs)
301    ds = get_zebrafinch_dataset(
302        path=path,
303        patch_shape=patch_shape,
304        bounding_box=bounding_box,
305        mip=mip,
306        dataset=dataset,
307        label_choice=label_choice,
308        download=download,
309        offsets=offsets,
310        boundaries=boundaries,
311        **ds_kwargs,
312    )
313    return torch_em.get_data_loader(ds, batch_size=batch_size, **loader_kwargs)

Get the DataLoader for neuron or organelle segmentation in a zebrafinch dataset.

Arguments:
  • path: Filepath to a folder where the cached zarr store will be saved.
  • batch_size: The batch size for training.
  • patch_shape: The patch shape (z, y, x) to use for training.
  • bounding_box: Region in nm as (x_min, x_max, y_min, y_max, z_min, z_max). Defaults to the full volume extent for the chosen dataset.
  • mip: MIP level for both EM and segmentation. Default mip=0 gives native resolution (10 x 10 x 25 nm for j0251, 10 x 10 x 20 nm for j0126).
  • dataset: Which specimen to use, either "j0251" or "j0126".
  • label_choice: Which segmentation to use as target. Either "neurons" or "er". "er" is only available for j0251.
  • download: Whether to stream and cache data if not already present.
  • offsets: Offset values for affinity computation used as target.
  • boundaries: Whether to compute boundaries as the target.
  • kwargs: Additional keyword arguments for torch_em.default_segmentation_dataset or for the PyTorch DataLoader.
Returns:

The DataLoader.