torch_em.data.datasets.electron_microscopy.tumor_spheroid_em

The tumor spheroid EM dataset contains SBF-SEM imaging of tumor spheroids with gold nanoparticles.

Two data sources are available, selected via the source parameter:

"2d_manual" - Manually annotated 2D TIFF slices at two isotropic resolutions (50 x 50 x 50 nm and 100 x 100 x 100 nm). Each slice has paired instance segmentation labels for cells and nuclei. Slices span all three orthogonal planes (XY, XZ, YZ). Available targets: "cells", "nuclei".

"3d_automatic" - Full 3D volume with automated instance segmentation for cells, nuclei, and gold nanoparticles (nps). Raw data available at four resolutions: native 50 x 10 x 10 nm ("50-10-10"), and downsampled "50-25-25", "50-50-50", "100-100-100". Labels at "50-50-50" and "100-100-100" for cells/nuclei, and "50-50-50" only for nps. Requires downloading the full 67 GB zarr archive. Available targets: "cells", "nuclei", "nps".

The volume covers approximately 102.4 x 102.4 x 35 um at native voxel size.

This dataset is from the publication https://doi.org/10.64898/2026.04.17.719153. Please cite it if you use this dataset for a publication.

The data is available at https://doi.org/10.6019/S-BIAD3263.

  1"""The tumor spheroid EM dataset contains SBF-SEM imaging of tumor spheroids with gold nanoparticles.
  2
  3Two data sources are available, selected via the `source` parameter:
  4
  5**"2d_manual"** - Manually annotated 2D TIFF slices at two isotropic resolutions
  6(50 x 50 x 50 nm and 100 x 100 x 100 nm). Each slice has paired instance
  7segmentation labels for cells and nuclei. Slices span all three orthogonal
  8planes (XY, XZ, YZ). Available targets: "cells", "nuclei".
  9
 10**"3d_automatic"** - Full 3D volume with automated instance segmentation for cells,
 11nuclei, and gold nanoparticles (nps). Raw data available at four resolutions:
 12native 50 x 10 x 10 nm ("50-10-10"), and downsampled "50-25-25", "50-50-50",
 13"100-100-100". Labels at "50-50-50" and "100-100-100" for cells/nuclei, and
 14"50-50-50" only for nps. Requires downloading the full 67 GB zarr archive.
 15Available targets: "cells", "nuclei", "nps".
 16
 17The volume covers approximately 102.4 x 102.4 x 35 um at native voxel size.
 18
 19This dataset is from the publication https://doi.org/10.64898/2026.04.17.719153.
 20Please cite it if you use this dataset for a publication.
 21
 22The data is available at https://doi.org/10.6019/S-BIAD3263.
 23"""
 24
 25import os
 26from glob import glob
 27from typing import List, Literal, Optional, Tuple, Union
 28
 29import imageio.v3 as imageio
 30
 31from torch.utils.data import DataLoader, Dataset
 32
 33import torch_em
 34from .. import util
 35
 36
 37FTP_BASE = "https://ftp.ebi.ac.uk/pub/databases/biostudies/S-BIAD/263/S-BIAD3263/Files"
 38ZARR_URL = f"{FTP_BASE}/Au_01-vol_01.zarr.zip"
 39ZARR_ROOT = "Au_01-vol_01.zarr"
 40
 41SLICE_IDS = {
 42    "50-50-50": {
 43        "x": ["0277", "0336", "0390", "0653", "1300"],
 44        "y": ["0288", "0488", "0889", "1272", "1606"],
 45        "z": ["0016", "0034", "0073", "0075", "0169", "0173", "0180", "0192", "0212", "0274"],
 46    },
 47    "100-100-100": {
 48        "x": ["0138", "0168", "0195", "0326", "0650"],
 49        "y": ["0144", "0244", "0444", "0636", "0803"],
 50        "z": ["0008", "0017", "0036", "0038", "0084", "0086", "0090", "0096", "0106", "0137"],
 51    },
 52}
 53
 54LABEL_RESOLUTIONS_3D = {
 55    "cells": ("50-50-50", "100-100-100"),
 56    "nuclei": ("50-50-50", "100-100-100"),
 57    "nps": ("50-50-50",),
 58}
 59
 60SourceChoice = Literal["2d_manual", "3d_automatic"]
 61Resolution2DChoice = Literal["50-50-50", "100-100-100"]
 62Resolution3DChoice = Literal["50-10-10", "50-25-25", "50-50-50", "100-100-100"]
 63TargetChoice = Literal["cells", "nuclei", "nps"]
 64OrientationChoice = Literal["x", "y", "z"]
 65
 66
 67def _download_2d_slice(axis, coord, resolution, out_dir):
 68    import h5py
 69
 70    stem = f"Au_01-vol_01-{axis}_{coord}"
 71    h5_path = os.path.join(out_dir, f"{stem}.h5")
 72    if os.path.exists(h5_path):
 73        return
 74
 75    base_url = f"{FTP_BASE}/ground_truths/{resolution}"
 76    raw_tmp = os.path.join(out_dir, f"{stem}_raw.tif")
 77    cells_tmp = os.path.join(out_dir, f"{stem}_cells.tif")
 78    nuclei_tmp = os.path.join(out_dir, f"{stem}_nuclei.tif")
 79
 80    util.download_source(raw_tmp, f"{base_url}/{stem}.tif", download=True)
 81    util.download_source(cells_tmp, f"{base_url}/labels/{stem}-cells.tif", download=True)
 82    util.download_source(nuclei_tmp, f"{base_url}/labels/{stem}-nuclei.tif", download=True)
 83
 84    raw = imageio.imread(raw_tmp)
 85    cells = imageio.imread(cells_tmp)
 86    nuclei = imageio.imread(nuclei_tmp)
 87
 88    with h5py.File(h5_path, "w") as f:
 89        f.create_dataset("raw", data=raw, compression="gzip")
 90        f.create_dataset("labels/cells", data=cells.astype("uint32"), compression="gzip")
 91        f.create_dataset("labels/nuclei", data=nuclei.astype("uint32"), compression="gzip")
 92
 93    os.remove(raw_tmp)
 94    os.remove(cells_tmp)
 95    os.remove(nuclei_tmp)
 96
 97
 98def get_tumor_spheroid_data(
 99    path: Union[os.PathLike, str],
100    source: SourceChoice = "2d_manual",
101    resolution: str = "50-50-50",
102    download: bool = False,
103) -> str:
104    """Download the tumor spheroid EM data.
105
106    Args:
107        path: Filepath to a folder where the downloaded data will be saved.
108        source: Data source. "2d_manual" downloads sparse 2D annotated TIFF slices
109            (cells + nuclei, ~50 MB). "3d_automatic" downloads the full 3D zarr
110            archive with automated segmentation for cells, nuclei, and nanoparticles
111            (~67 GB).
112        resolution: The voxel resolution to use. For "2d_manual": "50-50-50" or
113            "100-100-100". For "3d_automatic": "50-10-10", "50-25-25", "50-50-50",
114            or "100-100-100" (all in nm, ZYX order).
115        download: Whether to download the data if it is not present.
116
117    Returns:
118        Path to the downloaded data (folder for "2d_manual", zip file for "3d_automatic").
119    """
120    if source == "2d_manual":
121        assert resolution in SLICE_IDS, \
122            f"Invalid resolution '{resolution}' for 2d_manual, expected one of {list(SLICE_IDS)}."
123        out_dir = os.path.join(str(path), "2d_manual", resolution)
124        expected = sum(len(v) for v in SLICE_IDS[resolution].values())
125        if len(glob(os.path.join(out_dir, "*.h5"))) >= expected:
126            return out_dir
127        if not download:
128            raise RuntimeError(
129                f"No cached data found at '{out_dir}'. Set download=True to download from BioImage Archive."
130            )
131        os.makedirs(out_dir, exist_ok=True)
132        for axis, ids in SLICE_IDS[resolution].items():
133            for coord in ids:
134                _download_2d_slice(axis, coord, resolution, out_dir)
135        return out_dir
136
137    elif source == "3d_automatic":
138        zarr_path = os.path.join(str(path), "3d_automatic", "Au_01-vol_01.zarr.zip")
139        if os.path.exists(zarr_path):
140            return zarr_path
141        if not download:
142            raise RuntimeError(
143                f"Zarr archive not found at '{zarr_path}'. Set download=True to download (~67 GB)."
144            )
145        os.makedirs(os.path.dirname(zarr_path), exist_ok=True)
146        util.download_source(zarr_path, ZARR_URL, download=True)
147        return zarr_path
148
149    else:
150        raise ValueError(f"Invalid source '{source}', expected '2d_manual' or '3d_automatic'.")
151
152
153def get_tumor_spheroid_paths(
154    path: Union[os.PathLike, str],
155    source: SourceChoice = "2d_manual",
156    resolution: str = "50-50-50",
157    target: TargetChoice = "cells",
158    orientations: Optional[List[OrientationChoice]] = None,
159    download: bool = False,
160) -> Tuple[List[str], str, str]:
161    """Get paths and array keys for the tumor spheroid EM data.
162
163    Args:
164        path: Filepath to a folder where the downloaded data will be saved.
165        source: Data source, either "2d_manual" or "3d_automatic".
166        resolution: The voxel resolution to use. For "2d_manual": "50-50-50" or
167            "100-100-100". For "3d_automatic": "50-10-10", "50-25-25", "50-50-50",
168            or "100-100-100".
169        target: The segmentation target. "cells" and "nuclei" are available for
170            both sources. "nps" (gold nanoparticles) is only available for
171            "3d_automatic" at "50-50-50" resolution.
172        orientations: Slice orientations to include ("x", "y", "z"). Defaults to
173            all three. Only relevant for "2d_manual".
174        download: Whether to download the data if it is not present.
175
176    Returns:
177        Tuple of (file paths, raw key, label key).
178    """
179    if source == "2d_manual":
180        assert target in ("cells", "nuclei"), \
181            f"Target '{target}' is not available for '2d_manual'. Choose 'cells' or 'nuclei'."
182        if orientations is None:
183            orientations = ["x", "y", "z"]
184        out_dir = get_tumor_spheroid_data(path, source, resolution, download)
185        file_paths = []
186        for axis in orientations:
187            for coord in SLICE_IDS[resolution][axis]:
188                file_paths.append(os.path.join(out_dir, f"Au_01-vol_01-{axis}_{coord}.h5"))
189        file_paths.sort()
190        return file_paths, "raw", f"labels/{target}"
191
192    elif source == "3d_automatic":
193        assert target in LABEL_RESOLUTIONS_3D, \
194            f"Invalid target '{target}', expected one of {list(LABEL_RESOLUTIONS_3D)}."
195        valid_resolutions = LABEL_RESOLUTIONS_3D[target]
196        assert resolution in valid_resolutions, (
197            f"Resolution '{resolution}' is not available for target '{target}'. "
198            f"Valid options: {valid_resolutions}."
199        )
200        if orientations is not None:
201            raise ValueError("The 'orientations' parameter is only valid for source='2d_manual'.")
202        zarr_path = get_tumor_spheroid_data(path, source, resolution, download)
203        raw_key = f"{ZARR_ROOT}/images/{resolution}"
204        label_key = f"{ZARR_ROOT}/labels/{target}/masks/{resolution}"
205        return [zarr_path], raw_key, label_key
206
207    else:
208        raise ValueError(f"Invalid source '{source}', expected '2d_manual' or '3d_automatic'.")
209
210
211def get_tumor_spheroid_dataset(
212    path: Union[os.PathLike, str],
213    patch_shape: Tuple[int, ...],
214    source: SourceChoice = "2d_manual",
215    resolution: str = "50-50-50",
216    target: TargetChoice = "cells",
217    orientations: Optional[List[OrientationChoice]] = None,
218    download: bool = False,
219    offsets: Optional[List[List[int]]] = None,
220    boundaries: bool = False,
221    binary: bool = False,
222    **kwargs,
223) -> Dataset:
224    """Get the tumor spheroid EM dataset for cell/nucleus/nanoparticle segmentation.
225
226    Args:
227        path: Filepath to a folder where the downloaded data will be saved.
228        patch_shape: The patch shape to use for training. Use (H, W) for "2d_manual"
229            and (D, H, W) for "3d_automatic".
230        source: Data source. "2d_manual" uses sparse manually annotated 2D slices
231            (cells + nuclei). "3d_automatic" uses the full 3D volume with automated
232            segmentation (cells, nuclei, nps). Requires ~67 GB download.
233        resolution: The voxel resolution. For "2d_manual": "50-50-50" or
234            "100-100-100". For "3d_automatic": "50-10-10", "50-25-25", "50-50-50",
235            or "100-100-100".
236        target: The segmentation target ("cells", "nuclei", or "nps").
237            "nps" is only available for "3d_automatic" at "50-50-50".
238        orientations: Slice orientations to include. Only for "2d_manual".
239        download: Whether to download the data if it is not present.
240        offsets: Offset values for affinity computation used as target.
241        boundaries: Whether to compute boundaries as the target.
242        binary: Whether to return a binary segmentation target.
243        kwargs: Additional keyword arguments for `torch_em.default_segmentation_dataset`.
244
245    Returns:
246        The segmentation dataset.
247    """
248    assert sum((offsets is not None, boundaries, binary)) <= 1, f"{offsets}, {boundaries}, {binary}"
249
250    file_paths, raw_key, label_key = get_tumor_spheroid_paths(
251        path, source, resolution, target, orientations, download
252    )
253
254    if offsets is not None:
255        label_transform = torch_em.transform.label.AffinityTransform(
256            offsets=offsets, ignore_label=None, add_binary_target=True, add_mask=True
257        )
258        msg = "Offsets are passed, but 'label_transform2' is in the kwargs. It will be over-ridden."
259        kwargs = util.update_kwargs(kwargs, "label_transform2", label_transform, msg=msg)
260    elif boundaries:
261        label_transform = torch_em.transform.label.BoundaryTransform(add_binary_target=True)
262        msg = "Boundaries is set to True, but 'label_transform' is in the kwargs. It will be over-ridden."
263        kwargs = util.update_kwargs(kwargs, "label_transform", label_transform, msg=msg)
264    elif binary:
265        label_transform = torch_em.transform.label.labels_to_binary
266        msg = "Binary is set to True, but 'label_transform' is in the kwargs. It will be over-ridden."
267        kwargs = util.update_kwargs(kwargs, "label_transform", label_transform, msg=msg)
268
269    return torch_em.default_segmentation_dataset(
270        raw_paths=file_paths,
271        raw_key=raw_key,
272        label_paths=file_paths,
273        label_key=label_key,
274        patch_shape=patch_shape,
275        **kwargs,
276    )
277
278
279def get_tumor_spheroid_loader(
280    path: Union[os.PathLike, str],
281    patch_shape: Tuple[int, ...],
282    batch_size: int,
283    source: SourceChoice = "2d_manual",
284    resolution: str = "50-50-50",
285    target: TargetChoice = "cells",
286    orientations: Optional[List[OrientationChoice]] = None,
287    download: bool = False,
288    offsets: Optional[List[List[int]]] = None,
289    boundaries: bool = False,
290    binary: bool = False,
291    **kwargs,
292) -> DataLoader:
293    """Get the DataLoader for segmentation in the tumor spheroid EM dataset.
294
295    Args:
296        path: Filepath to a folder where the downloaded data will be saved.
297        patch_shape: The patch shape to use for training. Use (H, W) for "2d_manual"
298            and (D, H, W) for "3d_automatic".
299        batch_size: The batch size for training.
300        source: Data source. "2d_manual" uses sparse manually annotated 2D slices
301            (cells + nuclei). "3d_automatic" uses the full 3D volume with automated
302            segmentation (cells, nuclei, nps). Requires ~67 GB download.
303        resolution: The voxel resolution. For "2d_manual": "50-50-50" or
304            "100-100-100". For "3d_automatic": "50-10-10", "50-25-25", "50-50-50",
305            or "100-100-100".
306        target: The segmentation target ("cells", "nuclei", or "nps").
307            "nps" is only available for "3d_automatic" at "50-50-50".
308        orientations: Slice orientations to include. Only for "2d_manual".
309        download: Whether to download the data if it is not present.
310        offsets: Offset values for affinity computation used as target.
311        boundaries: Whether to compute boundaries as the target.
312        binary: Whether to return a binary segmentation target.
313        kwargs: Additional keyword arguments for `torch_em.default_segmentation_dataset`
314            or for the PyTorch DataLoader.
315
316    Returns:
317        The DataLoader.
318    """
319    ds_kwargs, loader_kwargs = util.split_kwargs(torch_em.default_segmentation_dataset, **kwargs)
320    dataset = get_tumor_spheroid_dataset(
321        path, patch_shape, source=source, resolution=resolution, target=target,
322        orientations=orientations, download=download, offsets=offsets, boundaries=boundaries,
323        binary=binary, **ds_kwargs,
324    )
325    return torch_em.get_data_loader(dataset, batch_size, **loader_kwargs)
FTP_BASE = 'https://ftp.ebi.ac.uk/pub/databases/biostudies/S-BIAD/263/S-BIAD3263/Files'
ZARR_URL = 'https://ftp.ebi.ac.uk/pub/databases/biostudies/S-BIAD/263/S-BIAD3263/Files/Au_01-vol_01.zarr.zip'
ZARR_ROOT = 'Au_01-vol_01.zarr'
SLICE_IDS = {'50-50-50': {'x': ['0277', '0336', '0390', '0653', '1300'], 'y': ['0288', '0488', '0889', '1272', '1606'], 'z': ['0016', '0034', '0073', '0075', '0169', '0173', '0180', '0192', '0212', '0274']}, '100-100-100': {'x': ['0138', '0168', '0195', '0326', '0650'], 'y': ['0144', '0244', '0444', '0636', '0803'], 'z': ['0008', '0017', '0036', '0038', '0084', '0086', '0090', '0096', '0106', '0137']}}
LABEL_RESOLUTIONS_3D = {'cells': ('50-50-50', '100-100-100'), 'nuclei': ('50-50-50', '100-100-100'), 'nps': ('50-50-50',)}
SourceChoice = typing.Literal['2d_manual', '3d_automatic']
Resolution2DChoice = typing.Literal['50-50-50', '100-100-100']
Resolution3DChoice = typing.Literal['50-10-10', '50-25-25', '50-50-50', '100-100-100']
TargetChoice = typing.Literal['cells', 'nuclei', 'nps']
OrientationChoice = typing.Literal['x', 'y', 'z']
def get_tumor_spheroid_data( path: Union[os.PathLike, str], source: Literal['2d_manual', '3d_automatic'] = '2d_manual', resolution: str = '50-50-50', download: bool = False) -> str:
 99def get_tumor_spheroid_data(
100    path: Union[os.PathLike, str],
101    source: SourceChoice = "2d_manual",
102    resolution: str = "50-50-50",
103    download: bool = False,
104) -> str:
105    """Download the tumor spheroid EM data.
106
107    Args:
108        path: Filepath to a folder where the downloaded data will be saved.
109        source: Data source. "2d_manual" downloads sparse 2D annotated TIFF slices
110            (cells + nuclei, ~50 MB). "3d_automatic" downloads the full 3D zarr
111            archive with automated segmentation for cells, nuclei, and nanoparticles
112            (~67 GB).
113        resolution: The voxel resolution to use. For "2d_manual": "50-50-50" or
114            "100-100-100". For "3d_automatic": "50-10-10", "50-25-25", "50-50-50",
115            or "100-100-100" (all in nm, ZYX order).
116        download: Whether to download the data if it is not present.
117
118    Returns:
119        Path to the downloaded data (folder for "2d_manual", zip file for "3d_automatic").
120    """
121    if source == "2d_manual":
122        assert resolution in SLICE_IDS, \
123            f"Invalid resolution '{resolution}' for 2d_manual, expected one of {list(SLICE_IDS)}."
124        out_dir = os.path.join(str(path), "2d_manual", resolution)
125        expected = sum(len(v) for v in SLICE_IDS[resolution].values())
126        if len(glob(os.path.join(out_dir, "*.h5"))) >= expected:
127            return out_dir
128        if not download:
129            raise RuntimeError(
130                f"No cached data found at '{out_dir}'. Set download=True to download from BioImage Archive."
131            )
132        os.makedirs(out_dir, exist_ok=True)
133        for axis, ids in SLICE_IDS[resolution].items():
134            for coord in ids:
135                _download_2d_slice(axis, coord, resolution, out_dir)
136        return out_dir
137
138    elif source == "3d_automatic":
139        zarr_path = os.path.join(str(path), "3d_automatic", "Au_01-vol_01.zarr.zip")
140        if os.path.exists(zarr_path):
141            return zarr_path
142        if not download:
143            raise RuntimeError(
144                f"Zarr archive not found at '{zarr_path}'. Set download=True to download (~67 GB)."
145            )
146        os.makedirs(os.path.dirname(zarr_path), exist_ok=True)
147        util.download_source(zarr_path, ZARR_URL, download=True)
148        return zarr_path
149
150    else:
151        raise ValueError(f"Invalid source '{source}', expected '2d_manual' or '3d_automatic'.")

Download the tumor spheroid EM data.

Arguments:
  • path: Filepath to a folder where the downloaded data will be saved.
  • source: Data source. "2d_manual" downloads sparse 2D annotated TIFF slices (cells + nuclei, ~50 MB). "3d_automatic" downloads the full 3D zarr archive with automated segmentation for cells, nuclei, and nanoparticles (~67 GB).
  • resolution: The voxel resolution to use. For "2d_manual": "50-50-50" or "100-100-100". For "3d_automatic": "50-10-10", "50-25-25", "50-50-50", or "100-100-100" (all in nm, ZYX order).
  • download: Whether to download the data if it is not present.
Returns:

Path to the downloaded data (folder for "2d_manual", zip file for "3d_automatic").

def get_tumor_spheroid_paths( path: Union[os.PathLike, str], source: Literal['2d_manual', '3d_automatic'] = '2d_manual', resolution: str = '50-50-50', target: Literal['cells', 'nuclei', 'nps'] = 'cells', orientations: Optional[List[Literal['x', 'y', 'z']]] = None, download: bool = False) -> Tuple[List[str], str, str]:
154def get_tumor_spheroid_paths(
155    path: Union[os.PathLike, str],
156    source: SourceChoice = "2d_manual",
157    resolution: str = "50-50-50",
158    target: TargetChoice = "cells",
159    orientations: Optional[List[OrientationChoice]] = None,
160    download: bool = False,
161) -> Tuple[List[str], str, str]:
162    """Get paths and array keys for the tumor spheroid EM data.
163
164    Args:
165        path: Filepath to a folder where the downloaded data will be saved.
166        source: Data source, either "2d_manual" or "3d_automatic".
167        resolution: The voxel resolution to use. For "2d_manual": "50-50-50" or
168            "100-100-100". For "3d_automatic": "50-10-10", "50-25-25", "50-50-50",
169            or "100-100-100".
170        target: The segmentation target. "cells" and "nuclei" are available for
171            both sources. "nps" (gold nanoparticles) is only available for
172            "3d_automatic" at "50-50-50" resolution.
173        orientations: Slice orientations to include ("x", "y", "z"). Defaults to
174            all three. Only relevant for "2d_manual".
175        download: Whether to download the data if it is not present.
176
177    Returns:
178        Tuple of (file paths, raw key, label key).
179    """
180    if source == "2d_manual":
181        assert target in ("cells", "nuclei"), \
182            f"Target '{target}' is not available for '2d_manual'. Choose 'cells' or 'nuclei'."
183        if orientations is None:
184            orientations = ["x", "y", "z"]
185        out_dir = get_tumor_spheroid_data(path, source, resolution, download)
186        file_paths = []
187        for axis in orientations:
188            for coord in SLICE_IDS[resolution][axis]:
189                file_paths.append(os.path.join(out_dir, f"Au_01-vol_01-{axis}_{coord}.h5"))
190        file_paths.sort()
191        return file_paths, "raw", f"labels/{target}"
192
193    elif source == "3d_automatic":
194        assert target in LABEL_RESOLUTIONS_3D, \
195            f"Invalid target '{target}', expected one of {list(LABEL_RESOLUTIONS_3D)}."
196        valid_resolutions = LABEL_RESOLUTIONS_3D[target]
197        assert resolution in valid_resolutions, (
198            f"Resolution '{resolution}' is not available for target '{target}'. "
199            f"Valid options: {valid_resolutions}."
200        )
201        if orientations is not None:
202            raise ValueError("The 'orientations' parameter is only valid for source='2d_manual'.")
203        zarr_path = get_tumor_spheroid_data(path, source, resolution, download)
204        raw_key = f"{ZARR_ROOT}/images/{resolution}"
205        label_key = f"{ZARR_ROOT}/labels/{target}/masks/{resolution}"
206        return [zarr_path], raw_key, label_key
207
208    else:
209        raise ValueError(f"Invalid source '{source}', expected '2d_manual' or '3d_automatic'.")

Get paths and array keys for the tumor spheroid EM data.

Arguments:
  • path: Filepath to a folder where the downloaded data will be saved.
  • source: Data source, either "2d_manual" or "3d_automatic".
  • resolution: The voxel resolution to use. For "2d_manual": "50-50-50" or "100-100-100". For "3d_automatic": "50-10-10", "50-25-25", "50-50-50", or "100-100-100".
  • target: The segmentation target. "cells" and "nuclei" are available for both sources. "nps" (gold nanoparticles) is only available for "3d_automatic" at "50-50-50" resolution.
  • orientations: Slice orientations to include ("x", "y", "z"). Defaults to all three. Only relevant for "2d_manual".
  • download: Whether to download the data if it is not present.
Returns:

Tuple of (file paths, raw key, label key).

def get_tumor_spheroid_dataset( path: Union[os.PathLike, str], patch_shape: Tuple[int, ...], source: Literal['2d_manual', '3d_automatic'] = '2d_manual', resolution: str = '50-50-50', target: Literal['cells', 'nuclei', 'nps'] = 'cells', orientations: Optional[List[Literal['x', 'y', 'z']]] = None, download: bool = False, offsets: Optional[List[List[int]]] = None, boundaries: bool = False, binary: bool = False, **kwargs) -> torch.utils.data.dataset.Dataset:
212def get_tumor_spheroid_dataset(
213    path: Union[os.PathLike, str],
214    patch_shape: Tuple[int, ...],
215    source: SourceChoice = "2d_manual",
216    resolution: str = "50-50-50",
217    target: TargetChoice = "cells",
218    orientations: Optional[List[OrientationChoice]] = None,
219    download: bool = False,
220    offsets: Optional[List[List[int]]] = None,
221    boundaries: bool = False,
222    binary: bool = False,
223    **kwargs,
224) -> Dataset:
225    """Get the tumor spheroid EM dataset for cell/nucleus/nanoparticle segmentation.
226
227    Args:
228        path: Filepath to a folder where the downloaded data will be saved.
229        patch_shape: The patch shape to use for training. Use (H, W) for "2d_manual"
230            and (D, H, W) for "3d_automatic".
231        source: Data source. "2d_manual" uses sparse manually annotated 2D slices
232            (cells + nuclei). "3d_automatic" uses the full 3D volume with automated
233            segmentation (cells, nuclei, nps). Requires ~67 GB download.
234        resolution: The voxel resolution. For "2d_manual": "50-50-50" or
235            "100-100-100". For "3d_automatic": "50-10-10", "50-25-25", "50-50-50",
236            or "100-100-100".
237        target: The segmentation target ("cells", "nuclei", or "nps").
238            "nps" is only available for "3d_automatic" at "50-50-50".
239        orientations: Slice orientations to include. Only for "2d_manual".
240        download: Whether to download the data if it is not present.
241        offsets: Offset values for affinity computation used as target.
242        boundaries: Whether to compute boundaries as the target.
243        binary: Whether to return a binary segmentation target.
244        kwargs: Additional keyword arguments for `torch_em.default_segmentation_dataset`.
245
246    Returns:
247        The segmentation dataset.
248    """
249    assert sum((offsets is not None, boundaries, binary)) <= 1, f"{offsets}, {boundaries}, {binary}"
250
251    file_paths, raw_key, label_key = get_tumor_spheroid_paths(
252        path, source, resolution, target, orientations, download
253    )
254
255    if offsets is not None:
256        label_transform = torch_em.transform.label.AffinityTransform(
257            offsets=offsets, ignore_label=None, add_binary_target=True, add_mask=True
258        )
259        msg = "Offsets are passed, but 'label_transform2' is in the kwargs. It will be over-ridden."
260        kwargs = util.update_kwargs(kwargs, "label_transform2", label_transform, msg=msg)
261    elif boundaries:
262        label_transform = torch_em.transform.label.BoundaryTransform(add_binary_target=True)
263        msg = "Boundaries is set to True, but 'label_transform' is in the kwargs. It will be over-ridden."
264        kwargs = util.update_kwargs(kwargs, "label_transform", label_transform, msg=msg)
265    elif binary:
266        label_transform = torch_em.transform.label.labels_to_binary
267        msg = "Binary is set to True, but 'label_transform' is in the kwargs. It will be over-ridden."
268        kwargs = util.update_kwargs(kwargs, "label_transform", label_transform, msg=msg)
269
270    return torch_em.default_segmentation_dataset(
271        raw_paths=file_paths,
272        raw_key=raw_key,
273        label_paths=file_paths,
274        label_key=label_key,
275        patch_shape=patch_shape,
276        **kwargs,
277    )

Get the tumor spheroid EM dataset for cell/nucleus/nanoparticle segmentation.

Arguments:
  • path: Filepath to a folder where the downloaded data will be saved.
  • patch_shape: The patch shape to use for training. Use (H, W) for "2d_manual" and (D, H, W) for "3d_automatic".
  • source: Data source. "2d_manual" uses sparse manually annotated 2D slices (cells + nuclei). "3d_automatic" uses the full 3D volume with automated segmentation (cells, nuclei, nps). Requires ~67 GB download.
  • resolution: The voxel resolution. For "2d_manual": "50-50-50" or "100-100-100". For "3d_automatic": "50-10-10", "50-25-25", "50-50-50", or "100-100-100".
  • target: The segmentation target ("cells", "nuclei", or "nps"). "nps" is only available for "3d_automatic" at "50-50-50".
  • orientations: Slice orientations to include. Only for "2d_manual".
  • download: Whether to download the data if it is not present.
  • offsets: Offset values for affinity computation used as target.
  • boundaries: Whether to compute boundaries as the target.
  • binary: Whether to return a binary segmentation target.
  • kwargs: Additional keyword arguments for torch_em.default_segmentation_dataset.
Returns:

The segmentation dataset.

def get_tumor_spheroid_loader( path: Union[os.PathLike, str], patch_shape: Tuple[int, ...], batch_size: int, source: Literal['2d_manual', '3d_automatic'] = '2d_manual', resolution: str = '50-50-50', target: Literal['cells', 'nuclei', 'nps'] = 'cells', orientations: Optional[List[Literal['x', 'y', 'z']]] = None, download: bool = False, offsets: Optional[List[List[int]]] = None, boundaries: bool = False, binary: bool = False, **kwargs) -> torch.utils.data.dataloader.DataLoader:
280def get_tumor_spheroid_loader(
281    path: Union[os.PathLike, str],
282    patch_shape: Tuple[int, ...],
283    batch_size: int,
284    source: SourceChoice = "2d_manual",
285    resolution: str = "50-50-50",
286    target: TargetChoice = "cells",
287    orientations: Optional[List[OrientationChoice]] = None,
288    download: bool = False,
289    offsets: Optional[List[List[int]]] = None,
290    boundaries: bool = False,
291    binary: bool = False,
292    **kwargs,
293) -> DataLoader:
294    """Get the DataLoader for segmentation in the tumor spheroid EM dataset.
295
296    Args:
297        path: Filepath to a folder where the downloaded data will be saved.
298        patch_shape: The patch shape to use for training. Use (H, W) for "2d_manual"
299            and (D, H, W) for "3d_automatic".
300        batch_size: The batch size for training.
301        source: Data source. "2d_manual" uses sparse manually annotated 2D slices
302            (cells + nuclei). "3d_automatic" uses the full 3D volume with automated
303            segmentation (cells, nuclei, nps). Requires ~67 GB download.
304        resolution: The voxel resolution. For "2d_manual": "50-50-50" or
305            "100-100-100". For "3d_automatic": "50-10-10", "50-25-25", "50-50-50",
306            or "100-100-100".
307        target: The segmentation target ("cells", "nuclei", or "nps").
308            "nps" is only available for "3d_automatic" at "50-50-50".
309        orientations: Slice orientations to include. Only for "2d_manual".
310        download: Whether to download the data if it is not present.
311        offsets: Offset values for affinity computation used as target.
312        boundaries: Whether to compute boundaries as the target.
313        binary: Whether to return a binary segmentation target.
314        kwargs: Additional keyword arguments for `torch_em.default_segmentation_dataset`
315            or for the PyTorch DataLoader.
316
317    Returns:
318        The DataLoader.
319    """
320    ds_kwargs, loader_kwargs = util.split_kwargs(torch_em.default_segmentation_dataset, **kwargs)
321    dataset = get_tumor_spheroid_dataset(
322        path, patch_shape, source=source, resolution=resolution, target=target,
323        orientations=orientations, download=download, offsets=offsets, boundaries=boundaries,
324        binary=binary, **ds_kwargs,
325    )
326    return torch_em.get_data_loader(dataset, batch_size, **loader_kwargs)

Get the DataLoader for segmentation in the tumor spheroid EM dataset.

Arguments:
  • path: Filepath to a folder where the downloaded data will be saved.
  • patch_shape: The patch shape to use for training. Use (H, W) for "2d_manual" and (D, H, W) for "3d_automatic".
  • batch_size: The batch size for training.
  • source: Data source. "2d_manual" uses sparse manually annotated 2D slices (cells + nuclei). "3d_automatic" uses the full 3D volume with automated segmentation (cells, nuclei, nps). Requires ~67 GB download.
  • resolution: The voxel resolution. For "2d_manual": "50-50-50" or "100-100-100". For "3d_automatic": "50-10-10", "50-25-25", "50-50-50", or "100-100-100".
  • target: The segmentation target ("cells", "nuclei", or "nps"). "nps" is only available for "3d_automatic" at "50-50-50".
  • orientations: Slice orientations to include. Only for "2d_manual".
  • download: Whether to download the data if it is not present.
  • offsets: Offset values for affinity computation used as target.
  • boundaries: Whether to compute boundaries as the target.
  • binary: Whether to return a binary segmentation target.
  • kwargs: Additional keyword arguments for torch_em.default_segmentation_dataset or for the PyTorch DataLoader.
Returns:

The DataLoader.