torch_em.data.datasets.electron_microscopy.tumor_spheroid_em
The tumor spheroid EM dataset contains SBF-SEM imaging of tumor spheroids with gold nanoparticles.
Two data sources are available, selected via the source parameter:
"2d_manual" - Manually annotated 2D TIFF slices at two isotropic resolutions (50 x 50 x 50 nm and 100 x 100 x 100 nm). Each slice has paired instance segmentation labels for cells and nuclei. Slices span all three orthogonal planes (XY, XZ, YZ). Available targets: "cells", "nuclei".
"3d_automatic" - Full 3D volume with automated instance segmentation for cells, nuclei, and gold nanoparticles (nps). Raw data available at four resolutions: native 50 x 10 x 10 nm ("50-10-10"), and downsampled "50-25-25", "50-50-50", "100-100-100". Labels at "50-50-50" and "100-100-100" for cells/nuclei, and "50-50-50" only for nps. Requires downloading the full 67 GB zarr archive. Available targets: "cells", "nuclei", "nps".
The volume covers approximately 102.4 x 102.4 x 35 um at native voxel size.
This dataset is from the publication https://doi.org/10.64898/2026.04.17.719153. Please cite it if you use this dataset for a publication.
The data is available at https://doi.org/10.6019/S-BIAD3263.
1"""The tumor spheroid EM dataset contains SBF-SEM imaging of tumor spheroids with gold nanoparticles. 2 3Two data sources are available, selected via the `source` parameter: 4 5**"2d_manual"** - Manually annotated 2D TIFF slices at two isotropic resolutions 6(50 x 50 x 50 nm and 100 x 100 x 100 nm). Each slice has paired instance 7segmentation labels for cells and nuclei. Slices span all three orthogonal 8planes (XY, XZ, YZ). Available targets: "cells", "nuclei". 9 10**"3d_automatic"** - Full 3D volume with automated instance segmentation for cells, 11nuclei, and gold nanoparticles (nps). Raw data available at four resolutions: 12native 50 x 10 x 10 nm ("50-10-10"), and downsampled "50-25-25", "50-50-50", 13"100-100-100". Labels at "50-50-50" and "100-100-100" for cells/nuclei, and 14"50-50-50" only for nps. Requires downloading the full 67 GB zarr archive. 15Available targets: "cells", "nuclei", "nps". 16 17The volume covers approximately 102.4 x 102.4 x 35 um at native voxel size. 18 19This dataset is from the publication https://doi.org/10.64898/2026.04.17.719153. 20Please cite it if you use this dataset for a publication. 21 22The data is available at https://doi.org/10.6019/S-BIAD3263. 23""" 24 25import os 26from glob import glob 27from typing import List, Literal, Optional, Tuple, Union 28 29import imageio.v3 as imageio 30 31from torch.utils.data import DataLoader, Dataset 32 33import torch_em 34from .. import util 35 36 37FTP_BASE = "https://ftp.ebi.ac.uk/pub/databases/biostudies/S-BIAD/263/S-BIAD3263/Files" 38ZARR_URL = f"{FTP_BASE}/Au_01-vol_01.zarr.zip" 39ZARR_ROOT = "Au_01-vol_01.zarr" 40 41SLICE_IDS = { 42 "50-50-50": { 43 "x": ["0277", "0336", "0390", "0653", "1300"], 44 "y": ["0288", "0488", "0889", "1272", "1606"], 45 "z": ["0016", "0034", "0073", "0075", "0169", "0173", "0180", "0192", "0212", "0274"], 46 }, 47 "100-100-100": { 48 "x": ["0138", "0168", "0195", "0326", "0650"], 49 "y": ["0144", "0244", "0444", "0636", "0803"], 50 "z": ["0008", "0017", "0036", "0038", "0084", "0086", "0090", "0096", "0106", "0137"], 51 }, 52} 53 54LABEL_RESOLUTIONS_3D = { 55 "cells": ("50-50-50", "100-100-100"), 56 "nuclei": ("50-50-50", "100-100-100"), 57 "nps": ("50-50-50",), 58} 59 60SourceChoice = Literal["2d_manual", "3d_automatic"] 61Resolution2DChoice = Literal["50-50-50", "100-100-100"] 62Resolution3DChoice = Literal["50-10-10", "50-25-25", "50-50-50", "100-100-100"] 63TargetChoice = Literal["cells", "nuclei", "nps"] 64OrientationChoice = Literal["x", "y", "z"] 65 66 67def _download_2d_slice(axis, coord, resolution, out_dir): 68 import h5py 69 70 stem = f"Au_01-vol_01-{axis}_{coord}" 71 h5_path = os.path.join(out_dir, f"{stem}.h5") 72 if os.path.exists(h5_path): 73 return 74 75 base_url = f"{FTP_BASE}/ground_truths/{resolution}" 76 raw_tmp = os.path.join(out_dir, f"{stem}_raw.tif") 77 cells_tmp = os.path.join(out_dir, f"{stem}_cells.tif") 78 nuclei_tmp = os.path.join(out_dir, f"{stem}_nuclei.tif") 79 80 util.download_source(raw_tmp, f"{base_url}/{stem}.tif", download=True) 81 util.download_source(cells_tmp, f"{base_url}/labels/{stem}-cells.tif", download=True) 82 util.download_source(nuclei_tmp, f"{base_url}/labels/{stem}-nuclei.tif", download=True) 83 84 raw = imageio.imread(raw_tmp) 85 cells = imageio.imread(cells_tmp) 86 nuclei = imageio.imread(nuclei_tmp) 87 88 with h5py.File(h5_path, "w") as f: 89 f.create_dataset("raw", data=raw, compression="gzip") 90 f.create_dataset("labels/cells", data=cells.astype("uint32"), compression="gzip") 91 f.create_dataset("labels/nuclei", data=nuclei.astype("uint32"), compression="gzip") 92 93 os.remove(raw_tmp) 94 os.remove(cells_tmp) 95 os.remove(nuclei_tmp) 96 97 98def get_tumor_spheroid_data( 99 path: Union[os.PathLike, str], 100 source: SourceChoice = "2d_manual", 101 resolution: str = "50-50-50", 102 download: bool = False, 103) -> str: 104 """Download the tumor spheroid EM data. 105 106 Args: 107 path: Filepath to a folder where the downloaded data will be saved. 108 source: Data source. "2d_manual" downloads sparse 2D annotated TIFF slices 109 (cells + nuclei, ~50 MB). "3d_automatic" downloads the full 3D zarr 110 archive with automated segmentation for cells, nuclei, and nanoparticles 111 (~67 GB). 112 resolution: The voxel resolution to use. For "2d_manual": "50-50-50" or 113 "100-100-100". For "3d_automatic": "50-10-10", "50-25-25", "50-50-50", 114 or "100-100-100" (all in nm, ZYX order). 115 download: Whether to download the data if it is not present. 116 117 Returns: 118 Path to the downloaded data (folder for "2d_manual", zip file for "3d_automatic"). 119 """ 120 if source == "2d_manual": 121 assert resolution in SLICE_IDS, \ 122 f"Invalid resolution '{resolution}' for 2d_manual, expected one of {list(SLICE_IDS)}." 123 out_dir = os.path.join(str(path), "2d_manual", resolution) 124 expected = sum(len(v) for v in SLICE_IDS[resolution].values()) 125 if len(glob(os.path.join(out_dir, "*.h5"))) >= expected: 126 return out_dir 127 if not download: 128 raise RuntimeError( 129 f"No cached data found at '{out_dir}'. Set download=True to download from BioImage Archive." 130 ) 131 os.makedirs(out_dir, exist_ok=True) 132 for axis, ids in SLICE_IDS[resolution].items(): 133 for coord in ids: 134 _download_2d_slice(axis, coord, resolution, out_dir) 135 return out_dir 136 137 elif source == "3d_automatic": 138 zarr_path = os.path.join(str(path), "3d_automatic", "Au_01-vol_01.zarr.zip") 139 if os.path.exists(zarr_path): 140 return zarr_path 141 if not download: 142 raise RuntimeError( 143 f"Zarr archive not found at '{zarr_path}'. Set download=True to download (~67 GB)." 144 ) 145 os.makedirs(os.path.dirname(zarr_path), exist_ok=True) 146 util.download_source(zarr_path, ZARR_URL, download=True) 147 return zarr_path 148 149 else: 150 raise ValueError(f"Invalid source '{source}', expected '2d_manual' or '3d_automatic'.") 151 152 153def get_tumor_spheroid_paths( 154 path: Union[os.PathLike, str], 155 source: SourceChoice = "2d_manual", 156 resolution: str = "50-50-50", 157 target: TargetChoice = "cells", 158 orientations: Optional[List[OrientationChoice]] = None, 159 download: bool = False, 160) -> Tuple[List[str], str, str]: 161 """Get paths and array keys for the tumor spheroid EM data. 162 163 Args: 164 path: Filepath to a folder where the downloaded data will be saved. 165 source: Data source, either "2d_manual" or "3d_automatic". 166 resolution: The voxel resolution to use. For "2d_manual": "50-50-50" or 167 "100-100-100". For "3d_automatic": "50-10-10", "50-25-25", "50-50-50", 168 or "100-100-100". 169 target: The segmentation target. "cells" and "nuclei" are available for 170 both sources. "nps" (gold nanoparticles) is only available for 171 "3d_automatic" at "50-50-50" resolution. 172 orientations: Slice orientations to include ("x", "y", "z"). Defaults to 173 all three. Only relevant for "2d_manual". 174 download: Whether to download the data if it is not present. 175 176 Returns: 177 Tuple of (file paths, raw key, label key). 178 """ 179 if source == "2d_manual": 180 assert target in ("cells", "nuclei"), \ 181 f"Target '{target}' is not available for '2d_manual'. Choose 'cells' or 'nuclei'." 182 if orientations is None: 183 orientations = ["x", "y", "z"] 184 out_dir = get_tumor_spheroid_data(path, source, resolution, download) 185 file_paths = [] 186 for axis in orientations: 187 for coord in SLICE_IDS[resolution][axis]: 188 file_paths.append(os.path.join(out_dir, f"Au_01-vol_01-{axis}_{coord}.h5")) 189 file_paths.sort() 190 return file_paths, "raw", f"labels/{target}" 191 192 elif source == "3d_automatic": 193 assert target in LABEL_RESOLUTIONS_3D, \ 194 f"Invalid target '{target}', expected one of {list(LABEL_RESOLUTIONS_3D)}." 195 valid_resolutions = LABEL_RESOLUTIONS_3D[target] 196 assert resolution in valid_resolutions, ( 197 f"Resolution '{resolution}' is not available for target '{target}'. " 198 f"Valid options: {valid_resolutions}." 199 ) 200 if orientations is not None: 201 raise ValueError("The 'orientations' parameter is only valid for source='2d_manual'.") 202 zarr_path = get_tumor_spheroid_data(path, source, resolution, download) 203 raw_key = f"{ZARR_ROOT}/images/{resolution}" 204 label_key = f"{ZARR_ROOT}/labels/{target}/masks/{resolution}" 205 return [zarr_path], raw_key, label_key 206 207 else: 208 raise ValueError(f"Invalid source '{source}', expected '2d_manual' or '3d_automatic'.") 209 210 211def get_tumor_spheroid_dataset( 212 path: Union[os.PathLike, str], 213 patch_shape: Tuple[int, ...], 214 source: SourceChoice = "2d_manual", 215 resolution: str = "50-50-50", 216 target: TargetChoice = "cells", 217 orientations: Optional[List[OrientationChoice]] = None, 218 download: bool = False, 219 offsets: Optional[List[List[int]]] = None, 220 boundaries: bool = False, 221 binary: bool = False, 222 **kwargs, 223) -> Dataset: 224 """Get the tumor spheroid EM dataset for cell/nucleus/nanoparticle segmentation. 225 226 Args: 227 path: Filepath to a folder where the downloaded data will be saved. 228 patch_shape: The patch shape to use for training. Use (H, W) for "2d_manual" 229 and (D, H, W) for "3d_automatic". 230 source: Data source. "2d_manual" uses sparse manually annotated 2D slices 231 (cells + nuclei). "3d_automatic" uses the full 3D volume with automated 232 segmentation (cells, nuclei, nps). Requires ~67 GB download. 233 resolution: The voxel resolution. For "2d_manual": "50-50-50" or 234 "100-100-100". For "3d_automatic": "50-10-10", "50-25-25", "50-50-50", 235 or "100-100-100". 236 target: The segmentation target ("cells", "nuclei", or "nps"). 237 "nps" is only available for "3d_automatic" at "50-50-50". 238 orientations: Slice orientations to include. Only for "2d_manual". 239 download: Whether to download the data if it is not present. 240 offsets: Offset values for affinity computation used as target. 241 boundaries: Whether to compute boundaries as the target. 242 binary: Whether to return a binary segmentation target. 243 kwargs: Additional keyword arguments for `torch_em.default_segmentation_dataset`. 244 245 Returns: 246 The segmentation dataset. 247 """ 248 assert sum((offsets is not None, boundaries, binary)) <= 1, f"{offsets}, {boundaries}, {binary}" 249 250 file_paths, raw_key, label_key = get_tumor_spheroid_paths( 251 path, source, resolution, target, orientations, download 252 ) 253 254 if offsets is not None: 255 label_transform = torch_em.transform.label.AffinityTransform( 256 offsets=offsets, ignore_label=None, add_binary_target=True, add_mask=True 257 ) 258 msg = "Offsets are passed, but 'label_transform2' is in the kwargs. It will be over-ridden." 259 kwargs = util.update_kwargs(kwargs, "label_transform2", label_transform, msg=msg) 260 elif boundaries: 261 label_transform = torch_em.transform.label.BoundaryTransform(add_binary_target=True) 262 msg = "Boundaries is set to True, but 'label_transform' is in the kwargs. It will be over-ridden." 263 kwargs = util.update_kwargs(kwargs, "label_transform", label_transform, msg=msg) 264 elif binary: 265 label_transform = torch_em.transform.label.labels_to_binary 266 msg = "Binary is set to True, but 'label_transform' is in the kwargs. It will be over-ridden." 267 kwargs = util.update_kwargs(kwargs, "label_transform", label_transform, msg=msg) 268 269 return torch_em.default_segmentation_dataset( 270 raw_paths=file_paths, 271 raw_key=raw_key, 272 label_paths=file_paths, 273 label_key=label_key, 274 patch_shape=patch_shape, 275 **kwargs, 276 ) 277 278 279def get_tumor_spheroid_loader( 280 path: Union[os.PathLike, str], 281 patch_shape: Tuple[int, ...], 282 batch_size: int, 283 source: SourceChoice = "2d_manual", 284 resolution: str = "50-50-50", 285 target: TargetChoice = "cells", 286 orientations: Optional[List[OrientationChoice]] = None, 287 download: bool = False, 288 offsets: Optional[List[List[int]]] = None, 289 boundaries: bool = False, 290 binary: bool = False, 291 **kwargs, 292) -> DataLoader: 293 """Get the DataLoader for segmentation in the tumor spheroid EM dataset. 294 295 Args: 296 path: Filepath to a folder where the downloaded data will be saved. 297 patch_shape: The patch shape to use for training. Use (H, W) for "2d_manual" 298 and (D, H, W) for "3d_automatic". 299 batch_size: The batch size for training. 300 source: Data source. "2d_manual" uses sparse manually annotated 2D slices 301 (cells + nuclei). "3d_automatic" uses the full 3D volume with automated 302 segmentation (cells, nuclei, nps). Requires ~67 GB download. 303 resolution: The voxel resolution. For "2d_manual": "50-50-50" or 304 "100-100-100". For "3d_automatic": "50-10-10", "50-25-25", "50-50-50", 305 or "100-100-100". 306 target: The segmentation target ("cells", "nuclei", or "nps"). 307 "nps" is only available for "3d_automatic" at "50-50-50". 308 orientations: Slice orientations to include. Only for "2d_manual". 309 download: Whether to download the data if it is not present. 310 offsets: Offset values for affinity computation used as target. 311 boundaries: Whether to compute boundaries as the target. 312 binary: Whether to return a binary segmentation target. 313 kwargs: Additional keyword arguments for `torch_em.default_segmentation_dataset` 314 or for the PyTorch DataLoader. 315 316 Returns: 317 The DataLoader. 318 """ 319 ds_kwargs, loader_kwargs = util.split_kwargs(torch_em.default_segmentation_dataset, **kwargs) 320 dataset = get_tumor_spheroid_dataset( 321 path, patch_shape, source=source, resolution=resolution, target=target, 322 orientations=orientations, download=download, offsets=offsets, boundaries=boundaries, 323 binary=binary, **ds_kwargs, 324 ) 325 return torch_em.get_data_loader(dataset, batch_size, **loader_kwargs)
99def get_tumor_spheroid_data( 100 path: Union[os.PathLike, str], 101 source: SourceChoice = "2d_manual", 102 resolution: str = "50-50-50", 103 download: bool = False, 104) -> str: 105 """Download the tumor spheroid EM data. 106 107 Args: 108 path: Filepath to a folder where the downloaded data will be saved. 109 source: Data source. "2d_manual" downloads sparse 2D annotated TIFF slices 110 (cells + nuclei, ~50 MB). "3d_automatic" downloads the full 3D zarr 111 archive with automated segmentation for cells, nuclei, and nanoparticles 112 (~67 GB). 113 resolution: The voxel resolution to use. For "2d_manual": "50-50-50" or 114 "100-100-100". For "3d_automatic": "50-10-10", "50-25-25", "50-50-50", 115 or "100-100-100" (all in nm, ZYX order). 116 download: Whether to download the data if it is not present. 117 118 Returns: 119 Path to the downloaded data (folder for "2d_manual", zip file for "3d_automatic"). 120 """ 121 if source == "2d_manual": 122 assert resolution in SLICE_IDS, \ 123 f"Invalid resolution '{resolution}' for 2d_manual, expected one of {list(SLICE_IDS)}." 124 out_dir = os.path.join(str(path), "2d_manual", resolution) 125 expected = sum(len(v) for v in SLICE_IDS[resolution].values()) 126 if len(glob(os.path.join(out_dir, "*.h5"))) >= expected: 127 return out_dir 128 if not download: 129 raise RuntimeError( 130 f"No cached data found at '{out_dir}'. Set download=True to download from BioImage Archive." 131 ) 132 os.makedirs(out_dir, exist_ok=True) 133 for axis, ids in SLICE_IDS[resolution].items(): 134 for coord in ids: 135 _download_2d_slice(axis, coord, resolution, out_dir) 136 return out_dir 137 138 elif source == "3d_automatic": 139 zarr_path = os.path.join(str(path), "3d_automatic", "Au_01-vol_01.zarr.zip") 140 if os.path.exists(zarr_path): 141 return zarr_path 142 if not download: 143 raise RuntimeError( 144 f"Zarr archive not found at '{zarr_path}'. Set download=True to download (~67 GB)." 145 ) 146 os.makedirs(os.path.dirname(zarr_path), exist_ok=True) 147 util.download_source(zarr_path, ZARR_URL, download=True) 148 return zarr_path 149 150 else: 151 raise ValueError(f"Invalid source '{source}', expected '2d_manual' or '3d_automatic'.")
Download the tumor spheroid EM data.
Arguments:
- path: Filepath to a folder where the downloaded data will be saved.
- source: Data source. "2d_manual" downloads sparse 2D annotated TIFF slices (cells + nuclei, ~50 MB). "3d_automatic" downloads the full 3D zarr archive with automated segmentation for cells, nuclei, and nanoparticles (~67 GB).
- resolution: The voxel resolution to use. For "2d_manual": "50-50-50" or "100-100-100". For "3d_automatic": "50-10-10", "50-25-25", "50-50-50", or "100-100-100" (all in nm, ZYX order).
- download: Whether to download the data if it is not present.
Returns:
Path to the downloaded data (folder for "2d_manual", zip file for "3d_automatic").
154def get_tumor_spheroid_paths( 155 path: Union[os.PathLike, str], 156 source: SourceChoice = "2d_manual", 157 resolution: str = "50-50-50", 158 target: TargetChoice = "cells", 159 orientations: Optional[List[OrientationChoice]] = None, 160 download: bool = False, 161) -> Tuple[List[str], str, str]: 162 """Get paths and array keys for the tumor spheroid EM data. 163 164 Args: 165 path: Filepath to a folder where the downloaded data will be saved. 166 source: Data source, either "2d_manual" or "3d_automatic". 167 resolution: The voxel resolution to use. For "2d_manual": "50-50-50" or 168 "100-100-100". For "3d_automatic": "50-10-10", "50-25-25", "50-50-50", 169 or "100-100-100". 170 target: The segmentation target. "cells" and "nuclei" are available for 171 both sources. "nps" (gold nanoparticles) is only available for 172 "3d_automatic" at "50-50-50" resolution. 173 orientations: Slice orientations to include ("x", "y", "z"). Defaults to 174 all three. Only relevant for "2d_manual". 175 download: Whether to download the data if it is not present. 176 177 Returns: 178 Tuple of (file paths, raw key, label key). 179 """ 180 if source == "2d_manual": 181 assert target in ("cells", "nuclei"), \ 182 f"Target '{target}' is not available for '2d_manual'. Choose 'cells' or 'nuclei'." 183 if orientations is None: 184 orientations = ["x", "y", "z"] 185 out_dir = get_tumor_spheroid_data(path, source, resolution, download) 186 file_paths = [] 187 for axis in orientations: 188 for coord in SLICE_IDS[resolution][axis]: 189 file_paths.append(os.path.join(out_dir, f"Au_01-vol_01-{axis}_{coord}.h5")) 190 file_paths.sort() 191 return file_paths, "raw", f"labels/{target}" 192 193 elif source == "3d_automatic": 194 assert target in LABEL_RESOLUTIONS_3D, \ 195 f"Invalid target '{target}', expected one of {list(LABEL_RESOLUTIONS_3D)}." 196 valid_resolutions = LABEL_RESOLUTIONS_3D[target] 197 assert resolution in valid_resolutions, ( 198 f"Resolution '{resolution}' is not available for target '{target}'. " 199 f"Valid options: {valid_resolutions}." 200 ) 201 if orientations is not None: 202 raise ValueError("The 'orientations' parameter is only valid for source='2d_manual'.") 203 zarr_path = get_tumor_spheroid_data(path, source, resolution, download) 204 raw_key = f"{ZARR_ROOT}/images/{resolution}" 205 label_key = f"{ZARR_ROOT}/labels/{target}/masks/{resolution}" 206 return [zarr_path], raw_key, label_key 207 208 else: 209 raise ValueError(f"Invalid source '{source}', expected '2d_manual' or '3d_automatic'.")
Get paths and array keys for the tumor spheroid EM data.
Arguments:
- path: Filepath to a folder where the downloaded data will be saved.
- source: Data source, either "2d_manual" or "3d_automatic".
- resolution: The voxel resolution to use. For "2d_manual": "50-50-50" or "100-100-100". For "3d_automatic": "50-10-10", "50-25-25", "50-50-50", or "100-100-100".
- target: The segmentation target. "cells" and "nuclei" are available for both sources. "nps" (gold nanoparticles) is only available for "3d_automatic" at "50-50-50" resolution.
- orientations: Slice orientations to include ("x", "y", "z"). Defaults to all three. Only relevant for "2d_manual".
- download: Whether to download the data if it is not present.
Returns:
Tuple of (file paths, raw key, label key).
212def get_tumor_spheroid_dataset( 213 path: Union[os.PathLike, str], 214 patch_shape: Tuple[int, ...], 215 source: SourceChoice = "2d_manual", 216 resolution: str = "50-50-50", 217 target: TargetChoice = "cells", 218 orientations: Optional[List[OrientationChoice]] = None, 219 download: bool = False, 220 offsets: Optional[List[List[int]]] = None, 221 boundaries: bool = False, 222 binary: bool = False, 223 **kwargs, 224) -> Dataset: 225 """Get the tumor spheroid EM dataset for cell/nucleus/nanoparticle segmentation. 226 227 Args: 228 path: Filepath to a folder where the downloaded data will be saved. 229 patch_shape: The patch shape to use for training. Use (H, W) for "2d_manual" 230 and (D, H, W) for "3d_automatic". 231 source: Data source. "2d_manual" uses sparse manually annotated 2D slices 232 (cells + nuclei). "3d_automatic" uses the full 3D volume with automated 233 segmentation (cells, nuclei, nps). Requires ~67 GB download. 234 resolution: The voxel resolution. For "2d_manual": "50-50-50" or 235 "100-100-100". For "3d_automatic": "50-10-10", "50-25-25", "50-50-50", 236 or "100-100-100". 237 target: The segmentation target ("cells", "nuclei", or "nps"). 238 "nps" is only available for "3d_automatic" at "50-50-50". 239 orientations: Slice orientations to include. Only for "2d_manual". 240 download: Whether to download the data if it is not present. 241 offsets: Offset values for affinity computation used as target. 242 boundaries: Whether to compute boundaries as the target. 243 binary: Whether to return a binary segmentation target. 244 kwargs: Additional keyword arguments for `torch_em.default_segmentation_dataset`. 245 246 Returns: 247 The segmentation dataset. 248 """ 249 assert sum((offsets is not None, boundaries, binary)) <= 1, f"{offsets}, {boundaries}, {binary}" 250 251 file_paths, raw_key, label_key = get_tumor_spheroid_paths( 252 path, source, resolution, target, orientations, download 253 ) 254 255 if offsets is not None: 256 label_transform = torch_em.transform.label.AffinityTransform( 257 offsets=offsets, ignore_label=None, add_binary_target=True, add_mask=True 258 ) 259 msg = "Offsets are passed, but 'label_transform2' is in the kwargs. It will be over-ridden." 260 kwargs = util.update_kwargs(kwargs, "label_transform2", label_transform, msg=msg) 261 elif boundaries: 262 label_transform = torch_em.transform.label.BoundaryTransform(add_binary_target=True) 263 msg = "Boundaries is set to True, but 'label_transform' is in the kwargs. It will be over-ridden." 264 kwargs = util.update_kwargs(kwargs, "label_transform", label_transform, msg=msg) 265 elif binary: 266 label_transform = torch_em.transform.label.labels_to_binary 267 msg = "Binary is set to True, but 'label_transform' is in the kwargs. It will be over-ridden." 268 kwargs = util.update_kwargs(kwargs, "label_transform", label_transform, msg=msg) 269 270 return torch_em.default_segmentation_dataset( 271 raw_paths=file_paths, 272 raw_key=raw_key, 273 label_paths=file_paths, 274 label_key=label_key, 275 patch_shape=patch_shape, 276 **kwargs, 277 )
Get the tumor spheroid EM dataset for cell/nucleus/nanoparticle segmentation.
Arguments:
- path: Filepath to a folder where the downloaded data will be saved.
- patch_shape: The patch shape to use for training. Use (H, W) for "2d_manual" and (D, H, W) for "3d_automatic".
- source: Data source. "2d_manual" uses sparse manually annotated 2D slices (cells + nuclei). "3d_automatic" uses the full 3D volume with automated segmentation (cells, nuclei, nps). Requires ~67 GB download.
- resolution: The voxel resolution. For "2d_manual": "50-50-50" or "100-100-100". For "3d_automatic": "50-10-10", "50-25-25", "50-50-50", or "100-100-100".
- target: The segmentation target ("cells", "nuclei", or "nps"). "nps" is only available for "3d_automatic" at "50-50-50".
- orientations: Slice orientations to include. Only for "2d_manual".
- download: Whether to download the data if it is not present.
- offsets: Offset values for affinity computation used as target.
- boundaries: Whether to compute boundaries as the target.
- binary: Whether to return a binary segmentation target.
- kwargs: Additional keyword arguments for
torch_em.default_segmentation_dataset.
Returns:
The segmentation dataset.
280def get_tumor_spheroid_loader( 281 path: Union[os.PathLike, str], 282 patch_shape: Tuple[int, ...], 283 batch_size: int, 284 source: SourceChoice = "2d_manual", 285 resolution: str = "50-50-50", 286 target: TargetChoice = "cells", 287 orientations: Optional[List[OrientationChoice]] = None, 288 download: bool = False, 289 offsets: Optional[List[List[int]]] = None, 290 boundaries: bool = False, 291 binary: bool = False, 292 **kwargs, 293) -> DataLoader: 294 """Get the DataLoader for segmentation in the tumor spheroid EM dataset. 295 296 Args: 297 path: Filepath to a folder where the downloaded data will be saved. 298 patch_shape: The patch shape to use for training. Use (H, W) for "2d_manual" 299 and (D, H, W) for "3d_automatic". 300 batch_size: The batch size for training. 301 source: Data source. "2d_manual" uses sparse manually annotated 2D slices 302 (cells + nuclei). "3d_automatic" uses the full 3D volume with automated 303 segmentation (cells, nuclei, nps). Requires ~67 GB download. 304 resolution: The voxel resolution. For "2d_manual": "50-50-50" or 305 "100-100-100". For "3d_automatic": "50-10-10", "50-25-25", "50-50-50", 306 or "100-100-100". 307 target: The segmentation target ("cells", "nuclei", or "nps"). 308 "nps" is only available for "3d_automatic" at "50-50-50". 309 orientations: Slice orientations to include. Only for "2d_manual". 310 download: Whether to download the data if it is not present. 311 offsets: Offset values for affinity computation used as target. 312 boundaries: Whether to compute boundaries as the target. 313 binary: Whether to return a binary segmentation target. 314 kwargs: Additional keyword arguments for `torch_em.default_segmentation_dataset` 315 or for the PyTorch DataLoader. 316 317 Returns: 318 The DataLoader. 319 """ 320 ds_kwargs, loader_kwargs = util.split_kwargs(torch_em.default_segmentation_dataset, **kwargs) 321 dataset = get_tumor_spheroid_dataset( 322 path, patch_shape, source=source, resolution=resolution, target=target, 323 orientations=orientations, download=download, offsets=offsets, boundaries=boundaries, 324 binary=binary, **ds_kwargs, 325 ) 326 return torch_em.get_data_loader(dataset, batch_size, **loader_kwargs)
Get the DataLoader for segmentation in the tumor spheroid EM dataset.
Arguments:
- path: Filepath to a folder where the downloaded data will be saved.
- patch_shape: The patch shape to use for training. Use (H, W) for "2d_manual" and (D, H, W) for "3d_automatic".
- batch_size: The batch size for training.
- source: Data source. "2d_manual" uses sparse manually annotated 2D slices (cells + nuclei). "3d_automatic" uses the full 3D volume with automated segmentation (cells, nuclei, nps). Requires ~67 GB download.
- resolution: The voxel resolution. For "2d_manual": "50-50-50" or "100-100-100". For "3d_automatic": "50-10-10", "50-25-25", "50-50-50", or "100-100-100".
- target: The segmentation target ("cells", "nuclei", or "nps"). "nps" is only available for "3d_automatic" at "50-50-50".
- orientations: Slice orientations to include. Only for "2d_manual".
- download: Whether to download the data if it is not present.
- offsets: Offset values for affinity computation used as target.
- boundaries: Whether to compute boundaries as the target.
- binary: Whether to return a binary segmentation target.
- kwargs: Additional keyword arguments for
torch_em.default_segmentation_datasetor for the PyTorch DataLoader.
Returns:
The DataLoader.