torch_em.data.datasets.electron_microscopy.parlakgul_liver
The Parlakgul liver dataset contains FIB-SEM volumes of mouse liver with dense semantic segmentation of 7 organelle classes. All labels are binary semantic masks (0=background, 1=foreground) - not instance segmentation.
Four FIB-SEM volumes are available across lean and obese conditions:
- 6461 (lean): 12000 x 8000 x 5638 voxels, 8 nm isotropic
- 6464 (obese 1): 9112 x 10200 x 7896 voxels, 8 nm isotropic
- 9430 (obese 2): 8000 x 8050 x 8501 voxels, 8 nm isotropic
- 1857 (obese Climp63): 9700 x 9650 x 3629 voxels, 8 nm isotropic
Seven semantic segmentation classes are available via the label_choice parameter:
- "er": endoplasmic reticulum
- "er_sheets": ER sheets
- "er_tubules": ER tubules
- "mito": mitochondria
- "lipid_droplet": lipid droplets
- "nuclear_membrane": nuclear membrane
- "plasma_membrane": plasma membrane (not available for 1857)
Data is streamed lazily from EMPIAR-10791 via HTTP: raw TIFFs are fetched per z-slice, segmentation is extracted per z-slice from ZIP archives using HTTP range requests. Only the requested bounding box region is downloaded and cached as zarr v3.
Bounding boxes are specified as (x_min, x_max, y_min, y_max, z_min, z_max) in voxels.
This dataset is from the publication https://doi.org/10.1038/s41586-022-04518-2. Please cite it if you use this dataset in your research.
The data is publicly available at https://www.ebi.ac.uk/empiar/EMPIAR-10791/.
1"""The Parlakgul liver dataset contains FIB-SEM volumes of mouse liver with dense 2semantic segmentation of 7 organelle classes. All labels are binary semantic masks 3(0=background, 1=foreground) - not instance segmentation. 4 5Four FIB-SEM volumes are available across lean and obese conditions: 6- 6461 (lean): 12000 x 8000 x 5638 voxels, 8 nm isotropic 7- 6464 (obese 1): 9112 x 10200 x 7896 voxels, 8 nm isotropic 8- 9430 (obese 2): 8000 x 8050 x 8501 voxels, 8 nm isotropic 9- 1857 (obese Climp63): 9700 x 9650 x 3629 voxels, 8 nm isotropic 10 11Seven semantic segmentation classes are available via the `label_choice` parameter: 12- "er": endoplasmic reticulum 13- "er_sheets": ER sheets 14- "er_tubules": ER tubules 15- "mito": mitochondria 16- "lipid_droplet": lipid droplets 17- "nuclear_membrane": nuclear membrane 18- "plasma_membrane": plasma membrane (not available for 1857) 19 20Data is streamed lazily from EMPIAR-10791 via HTTP: raw TIFFs are fetched per z-slice, 21segmentation is extracted per z-slice from ZIP archives using HTTP range requests. 22Only the requested bounding box region is downloaded and cached as zarr v3. 23 24Bounding boxes are specified as (x_min, x_max, y_min, y_max, z_min, z_max) in voxels. 25 26This dataset is from the publication https://doi.org/10.1038/s41586-022-04518-2. 27Please cite it if you use this dataset in your research. 28 29The data is publicly available at https://www.ebi.ac.uk/empiar/EMPIAR-10791/. 30""" 31 32import hashlib 33import io 34import os 35import zipfile 36from typing import Dict, List, Literal, Tuple, Union 37 38import numpy as np 39from torch.utils.data import DataLoader, Dataset 40 41import torch_em 42from .. import util 43 44 45EMPIAR_BASE = "https://ftp.ebi.ac.uk/empiar/world_availability/10791/data" 46PARLAKGUL_PAPER_BASE = ( 47 "Parlakgul%20et%20al%20-%20Regulation%20of%20liver%20subcellular%20architecture%20" 48 "controls%20metabolic%20homeostasis/FIB-SEM%20Raw%20and%20Segmentation%20Data" 49) 50 51PARLAKGUL_SAMPLES: Dict[str, dict] = { 52 "6461": { 53 "condition": "lean", 54 "raw_dir": "6461%20-%20Lean%20Liver/6461%20Lean%20Liver%20-%20Raw", 55 "seg_dir": "6461%20-%20Lean%20Liver/6461%20Lean%20Liver%20-%20Segmentation", 56 "raw_pattern": "Gunes_WT1_8x8x8nm_3MHz.{z:04d}.tif", 57 "shape": (5638, 8000, 12000), 58 "seg_zips": { 59 "er": "6461%20Lean%20ER.zip", 60 "er_sheets": "6461%20Lean%20ER%20Sheets.zip", 61 "er_tubules": "6461%20Lean%20ER%20Tubules.zip", 62 "mito": "6461%20Lean%20Mitochondria.zip", 63 "lipid_droplet": "6461%20Lean%20Lipid%20Droplet.zip", 64 "nuclear_membrane": "6461%20Lean%20Nuclear%20membrane.zip", 65 "plasma_membrane": "6461%20Lean%20Plasma%20Membrane.zip", 66 }, 67 }, 68 "6464": { 69 "condition": "obese1", 70 "raw_dir": "6464%20-%20Obese1%20Liver/6464%20Obese1%20Liver%20-%20Raw", 71 "seg_dir": "6464%20-%20Obese1%20Liver/6464%20Obese1%20Liver%20-%20Segmentation", 72 "raw_pattern": "Gunes_HFD1_8x8x8nm_3MHz.{z:04d}.tif", 73 "shape": (7896, 10200, 9112), 74 "seg_zips": { 75 "er": "6464%20Obese1%20ER.zip", 76 "er_sheets": "6464%20Obese1%20ER%20Sheets.zip", 77 "er_tubules": "6464%20Obese1%20ER%20Tubules.zip", 78 "mito": "6464%20Obese1%20Mitochondria.zip", 79 "lipid_droplet": "6464%20Obese1%20Lipid%20Droplet.zip", 80 "nuclear_membrane": "6464%20Obese1%20Nuclear%20membrane.zip", 81 "plasma_membrane": "6464%20Obese1%20Plasma%20Membrane.zip", 82 }, 83 }, 84 "9430": { 85 "condition": "obese2", 86 "raw_dir": "9430%20-%20Obese2%20Liver/9430%20Obese2%20Liver%20-%20Raw", 87 "seg_dir": "9430%20-%20Obese2%20Liver/9430%20Obese2%20Liver%20-%20Segmentation", 88 "raw_pattern": "Gunes_HFD2_8x8x8nm_3MHz.{z:04d}.tif", 89 "shape": (8501, 8050, 8000), 90 "seg_zips": { 91 "er": "9430%20Obese2%20ER.zip", 92 "er_sheets": "9430%20Obese2%20ER%20Sheets.zip", 93 "er_tubules": "9430%20Obese2%20ER%20Tubules.zip", 94 "mito": "9430%20Obese2%20Mitochondria.zip", 95 "lipid_droplet": "9430%20Obese2%20Lipid%20Droplet.zip", 96 "nuclear_membrane": "9430%20Obese2%20Nuclear%20membrane.zip", 97 "plasma_membrane": "9430%20Obese2%20Plasma%20Membrane.zip", 98 }, 99 }, 100 "1857": { 101 "condition": "obese_climp63", 102 "raw_dir": "1857%20-%20Obese%20Climp-63%20Liver/1857%20Obese%20Climp63%20Liver%20-%20Raw", 103 "seg_dir": "1857%20-%20Obese%20Climp-63%20Liver/1857%20Obese%20Climp63%20Liver%20-%20Segmentation", 104 "raw_pattern": "Gunes_CLIMP63_8x8x8nm_3MHz.{z:04d}.tif", 105 "shape": (3629, 9650, 9700), 106 "seg_zips": { 107 "er": "1857%20Obese%20Climp63%20ER.zip", 108 "er_sheets": "1857%20Obese%20Climp63%20ER%20Sheets.zip", 109 "er_tubules": "1857%20Obese%20Climp63%20ER%20Tubules.zip", 110 "mito": "1857%20Obese%20Climp63%20Mitochondria.zip", 111 "lipid_droplet": "1857%20Obese%20Climp63%20Lipid%20Droplet.zip", 112 "nuclear_membrane": "1857%20Obese%20Climp63%20Nuclear%20membrane.zip", 113 }, 114 }, 115} 116 117PARLAKGUL_CHUNK_SHAPE = (64, 256, 256) 118LabelChoice = Literal[ 119 "er", "er_sheets", "er_tubules", "mito", "lipid_droplet", "nuclear_membrane", "plasma_membrane" 120] 121 122 123def _bbox_to_str(bbox): 124 return hashlib.md5("_".join(str(v) for v in bbox).encode()).hexdigest()[:12] 125 126 127class _HttpFile: 128 """Seekable file-like object backed by HTTP range requests.""" 129 130 def __init__(self, url): 131 import requests 132 self.url = url 133 self._pos = 0 134 r = requests.head(url, timeout=30) 135 r.raise_for_status() 136 self._size = int(r.headers["Content-Length"]) 137 138 def read(self, n=-1): 139 import time 140 import requests 141 end = (self._size - 1) if n == -1 else min(self._pos + n - 1, self._size - 1) 142 if self._pos > end: 143 return b"" 144 for attempt in range(5): 145 try: 146 r = requests.get(self.url, headers={"Range": f"bytes={self._pos}-{end}"}, timeout=120) 147 data = r.content 148 self._pos += len(data) 149 return data 150 except Exception: 151 if attempt == 4: 152 raise 153 time.sleep(2 ** attempt) 154 155 def seek(self, pos, whence=0): 156 if whence == 0: 157 self._pos = pos 158 elif whence == 1: 159 self._pos += pos 160 elif whence == 2: 161 self._pos = self._size + pos 162 self._pos = max(0, min(self._pos, self._size)) 163 return self._pos 164 165 def tell(self): 166 return self._pos 167 168 def seekable(self): 169 return True 170 171 def readable(self): 172 return True 173 174 def __enter__(self): 175 return self 176 177 def __exit__(self, *args): 178 pass 179 180 181def _read_zip_slice(zip_url, slice_idx, x_min, x_max, y_min, y_max): 182 """Extract one segmentation TIFF from a remote ZIP using HTTP range requests.""" 183 import tifffile 184 185 zf = zipfile.ZipFile(_HttpFile(zip_url)) 186 names = sorted(n for n in zf.namelist() if n.endswith(".tiff") or n.endswith(".tif")) 187 if slice_idx >= len(names): 188 raise IndexError(f"Slice {slice_idx} out of range (zip has {len(names)} TIFFs)") 189 data = zf.read(names[slice_idx]) 190 img = tifffile.imread(io.BytesIO(data)) 191 return img[y_min:y_max, x_min:x_max] 192 193 194def _read_raw_slice(raw_url, x_min, x_max, y_min, y_max): 195 """Download one raw TIFF slice and crop to the requested region.""" 196 import time 197 import requests 198 import tifffile 199 200 for attempt in range(5): 201 try: 202 r = requests.get(raw_url, timeout=180) 203 r.raise_for_status() 204 img = tifffile.imread(io.BytesIO(r.content)) 205 return img[y_min:y_max, x_min:x_max] 206 except Exception: 207 if attempt == 4: 208 raise 209 time.sleep(2 ** attempt) 210 211 212def get_parlakgul_liver_data( 213 path: Union[os.PathLike, str], 214 bounding_box: Tuple[int, int, int, int, int, int], 215 sample: Literal["6461", "6464", "9430", "1857"] = "6461", 216 label_choice: LabelChoice = "mito", 217 download: bool = False, 218) -> str: 219 """Stream a subvolume from the Parlakgul liver dataset and cache it as a zarr v3 store. 220 221 Args: 222 path: Filepath to a folder where the cached zarr store will be saved. 223 bounding_box: The region to fetch as (x_min, x_max, y_min, y_max, z_min, z_max) 224 in voxel coordinates at 8 nm isotropic resolution. 225 sample: Which liver sample to use. One of "6461" (lean), "6464" (obese 1), 226 "9430" (obese 2), "1857" (obese Climp63). 227 label_choice: Which organelle segmentation to use as labels. 228 download: Whether to stream and cache the data if it is not present. 229 230 Returns: 231 The filepath to the cached zarr store. 232 """ 233 import zarr 234 from zarr.codecs import BloscCodec 235 236 os.makedirs(str(path), exist_ok=True) 237 zarr_path = os.path.join(str(path), f"{sample}_{label_choice}_{_bbox_to_str(bounding_box)}.zarr") 238 239 root = zarr.open_group(zarr_path, mode="a") 240 if "raw" in root and "labels" in root: 241 return zarr_path 242 243 if not download: 244 raise RuntimeError( 245 f"No cached data found at '{zarr_path}'. Set download=True to stream it from EMPIAR." 246 ) 247 248 x_min, x_max, y_min, y_max, z_min, z_max = bounding_box 249 sample_info = PARLAKGUL_SAMPLES[sample] 250 251 if label_choice not in sample_info["seg_zips"]: 252 raise ValueError(f"label_choice='{label_choice}' not available for sample='{sample}'") 253 254 shape = (z_max - z_min, y_max - y_min, x_max - x_min) 255 raw_arr = np.zeros(shape, dtype=np.uint8) 256 lbl_arr = np.zeros(shape, dtype=np.uint8) 257 258 raw_base = f"{EMPIAR_BASE}/{PARLAKGUL_PAPER_BASE}/{sample_info['raw_dir']}" 259 zip_name = sample_info["seg_zips"][label_choice] 260 seg_zip_url = f"{EMPIAR_BASE}/{PARLAKGUL_PAPER_BASE}/{sample_info['seg_dir']}/{zip_name}" 261 262 print(f"Streaming Parlakgul {sample} ({sample_info['condition']}) EM + {label_choice} ...") 263 for i, z in enumerate(range(z_min, z_max)): 264 fname = sample_info["raw_pattern"].format(z=z) 265 raw_url = f"{raw_base}/{fname}" 266 raw_arr[i] = _read_raw_slice(raw_url, x_min, x_max, y_min, y_max) 267 lbl_arr[i] = _read_zip_slice(seg_zip_url, z, x_min, x_max, y_min, y_max) 268 if (i + 1) % 10 == 0: 269 print(f" {i + 1}/{z_max - z_min} slices done") 270 271 def _make_array(name, data, is_label): 272 shuffle = "bitshuffle" if is_label else "shuffle" 273 arr = root.create_array( 274 name, shape=data.shape, chunks=PARLAKGUL_CHUNK_SHAPE, dtype=data.dtype, 275 compressors=BloscCodec(cname="zstd", clevel=6, shuffle=shuffle), 276 ) 277 arr[:] = data 278 279 root.attrs["bounding_box"] = list(bounding_box) 280 root.attrs["sample"] = sample 281 root.attrs["label_choice"] = label_choice 282 root.attrs["resolution_nm"] = [8, 8, 8] 283 284 if "raw" not in root: 285 _make_array("raw", raw_arr, is_label=False) 286 if "labels" not in root: 287 _make_array("labels", lbl_arr, is_label=True) 288 289 print(f"Cached to {zarr_path} (shape {shape})") 290 return zarr_path 291 292 293def get_parlakgul_liver_paths( 294 path: Union[os.PathLike, str], 295 bounding_boxes: List[Tuple[int, int, int, int, int, int]], 296 sample: Literal["6461", "6464", "9430", "1857"] = "6461", 297 label_choice: LabelChoice = "mito", 298 download: bool = False, 299) -> List[str]: 300 """Get paths to Parlakgul liver zarr stores. 301 302 Args: 303 path: Filepath to a folder where the cached zarr stores will be saved. 304 bounding_boxes: List of regions to fetch, each as 305 (x_min, x_max, y_min, y_max, z_min, z_max) in voxel coordinates. 306 sample: Which liver sample to use. 307 label_choice: Which organelle segmentation to use as labels. 308 download: Whether to stream and cache the data if it is not present. 309 310 Returns: 311 List of filepaths to the cached zarr stores. 312 """ 313 return [get_parlakgul_liver_data(path, bbox, sample, label_choice, download) for bbox in bounding_boxes] 314 315 316def get_parlakgul_liver_dataset( 317 path: Union[os.PathLike, str], 318 patch_shape: Tuple[int, int, int], 319 bounding_boxes: List[Tuple[int, int, int, int, int, int]], 320 sample: Literal["6461", "6464", "9430", "1857"] = "6461", 321 label_choice: LabelChoice = "mito", 322 download: bool = False, 323 **kwargs, 324) -> Dataset: 325 """Get the Parlakgul liver dataset for organelle segmentation. 326 327 Args: 328 path: Filepath to a folder where the cached zarr stores will be saved. 329 patch_shape: The patch shape (z, y, x) to use for training. 330 bounding_boxes: List of subvolumes to use, each as 331 (x_min, x_max, y_min, y_max, z_min, z_max) in 8 nm voxel coordinates. 332 sample: Which liver sample to use. One of "6461", "6464", "9430", "1857". 333 label_choice: Which organelle to segment. 334 download: Whether to stream and cache data if not already present. 335 kwargs: Additional keyword arguments for `torch_em.default_segmentation_dataset`. 336 337 Returns: 338 The segmentation dataset. 339 """ 340 assert len(patch_shape) == 3 341 342 paths = get_parlakgul_liver_paths(path, bounding_boxes, sample, label_choice, download) 343 344 kwargs = util.update_kwargs(kwargs, "is_seg_dataset", True) 345 346 return torch_em.default_segmentation_dataset( 347 raw_paths=paths, 348 raw_key="raw", 349 label_paths=paths, 350 label_key="labels", 351 patch_shape=patch_shape, 352 **kwargs, 353 ) 354 355 356def get_parlakgul_liver_loader( 357 path: Union[os.PathLike, str], 358 patch_shape: Tuple[int, int, int], 359 batch_size: int, 360 bounding_boxes: List[Tuple[int, int, int, int, int, int]], 361 sample: Literal["6461", "6464", "9430", "1857"] = "6461", 362 label_choice: LabelChoice = "mito", 363 download: bool = False, 364 **kwargs, 365) -> DataLoader: 366 """Get the DataLoader for organelle segmentation in the Parlakgul liver dataset. 367 368 Args: 369 path: Filepath to a folder where the cached zarr stores will be saved. 370 patch_shape: The patch shape (z, y, x) to use for training. 371 batch_size: The batch size for training. 372 bounding_boxes: List of subvolumes to use, each as 373 (x_min, x_max, y_min, y_max, z_min, z_max) in 8 nm voxel coordinates. 374 sample: Which liver sample to use. One of "6461", "6464", "9430", "1857". 375 label_choice: Which organelle to segment. 376 download: Whether to stream and cache data if not already present. 377 kwargs: Additional keyword arguments for `torch_em.default_segmentation_dataset` 378 or for the PyTorch DataLoader. 379 380 Returns: 381 The DataLoader. 382 """ 383 ds_kwargs, loader_kwargs = util.split_kwargs(torch_em.default_segmentation_dataset, **kwargs) 384 dataset = get_parlakgul_liver_dataset( 385 path, patch_shape, bounding_boxes, sample=sample, label_choice=label_choice, 386 download=download, **ds_kwargs 387 ) 388 return torch_em.get_data_loader(dataset, batch_size, **loader_kwargs)
213def get_parlakgul_liver_data( 214 path: Union[os.PathLike, str], 215 bounding_box: Tuple[int, int, int, int, int, int], 216 sample: Literal["6461", "6464", "9430", "1857"] = "6461", 217 label_choice: LabelChoice = "mito", 218 download: bool = False, 219) -> str: 220 """Stream a subvolume from the Parlakgul liver dataset and cache it as a zarr v3 store. 221 222 Args: 223 path: Filepath to a folder where the cached zarr store will be saved. 224 bounding_box: The region to fetch as (x_min, x_max, y_min, y_max, z_min, z_max) 225 in voxel coordinates at 8 nm isotropic resolution. 226 sample: Which liver sample to use. One of "6461" (lean), "6464" (obese 1), 227 "9430" (obese 2), "1857" (obese Climp63). 228 label_choice: Which organelle segmentation to use as labels. 229 download: Whether to stream and cache the data if it is not present. 230 231 Returns: 232 The filepath to the cached zarr store. 233 """ 234 import zarr 235 from zarr.codecs import BloscCodec 236 237 os.makedirs(str(path), exist_ok=True) 238 zarr_path = os.path.join(str(path), f"{sample}_{label_choice}_{_bbox_to_str(bounding_box)}.zarr") 239 240 root = zarr.open_group(zarr_path, mode="a") 241 if "raw" in root and "labels" in root: 242 return zarr_path 243 244 if not download: 245 raise RuntimeError( 246 f"No cached data found at '{zarr_path}'. Set download=True to stream it from EMPIAR." 247 ) 248 249 x_min, x_max, y_min, y_max, z_min, z_max = bounding_box 250 sample_info = PARLAKGUL_SAMPLES[sample] 251 252 if label_choice not in sample_info["seg_zips"]: 253 raise ValueError(f"label_choice='{label_choice}' not available for sample='{sample}'") 254 255 shape = (z_max - z_min, y_max - y_min, x_max - x_min) 256 raw_arr = np.zeros(shape, dtype=np.uint8) 257 lbl_arr = np.zeros(shape, dtype=np.uint8) 258 259 raw_base = f"{EMPIAR_BASE}/{PARLAKGUL_PAPER_BASE}/{sample_info['raw_dir']}" 260 zip_name = sample_info["seg_zips"][label_choice] 261 seg_zip_url = f"{EMPIAR_BASE}/{PARLAKGUL_PAPER_BASE}/{sample_info['seg_dir']}/{zip_name}" 262 263 print(f"Streaming Parlakgul {sample} ({sample_info['condition']}) EM + {label_choice} ...") 264 for i, z in enumerate(range(z_min, z_max)): 265 fname = sample_info["raw_pattern"].format(z=z) 266 raw_url = f"{raw_base}/{fname}" 267 raw_arr[i] = _read_raw_slice(raw_url, x_min, x_max, y_min, y_max) 268 lbl_arr[i] = _read_zip_slice(seg_zip_url, z, x_min, x_max, y_min, y_max) 269 if (i + 1) % 10 == 0: 270 print(f" {i + 1}/{z_max - z_min} slices done") 271 272 def _make_array(name, data, is_label): 273 shuffle = "bitshuffle" if is_label else "shuffle" 274 arr = root.create_array( 275 name, shape=data.shape, chunks=PARLAKGUL_CHUNK_SHAPE, dtype=data.dtype, 276 compressors=BloscCodec(cname="zstd", clevel=6, shuffle=shuffle), 277 ) 278 arr[:] = data 279 280 root.attrs["bounding_box"] = list(bounding_box) 281 root.attrs["sample"] = sample 282 root.attrs["label_choice"] = label_choice 283 root.attrs["resolution_nm"] = [8, 8, 8] 284 285 if "raw" not in root: 286 _make_array("raw", raw_arr, is_label=False) 287 if "labels" not in root: 288 _make_array("labels", lbl_arr, is_label=True) 289 290 print(f"Cached to {zarr_path} (shape {shape})") 291 return zarr_path
Stream a subvolume from the Parlakgul liver dataset and cache it as a zarr v3 store.
Arguments:
- path: Filepath to a folder where the cached zarr store will be saved.
- bounding_box: The region to fetch as (x_min, x_max, y_min, y_max, z_min, z_max) in voxel coordinates at 8 nm isotropic resolution.
- sample: Which liver sample to use. One of "6461" (lean), "6464" (obese 1), "9430" (obese 2), "1857" (obese Climp63).
- label_choice: Which organelle segmentation to use as labels.
- download: Whether to stream and cache the data if it is not present.
Returns:
The filepath to the cached zarr store.
294def get_parlakgul_liver_paths( 295 path: Union[os.PathLike, str], 296 bounding_boxes: List[Tuple[int, int, int, int, int, int]], 297 sample: Literal["6461", "6464", "9430", "1857"] = "6461", 298 label_choice: LabelChoice = "mito", 299 download: bool = False, 300) -> List[str]: 301 """Get paths to Parlakgul liver zarr stores. 302 303 Args: 304 path: Filepath to a folder where the cached zarr stores will be saved. 305 bounding_boxes: List of regions to fetch, each as 306 (x_min, x_max, y_min, y_max, z_min, z_max) in voxel coordinates. 307 sample: Which liver sample to use. 308 label_choice: Which organelle segmentation to use as labels. 309 download: Whether to stream and cache the data if it is not present. 310 311 Returns: 312 List of filepaths to the cached zarr stores. 313 """ 314 return [get_parlakgul_liver_data(path, bbox, sample, label_choice, download) for bbox in bounding_boxes]
Get paths to Parlakgul liver zarr stores.
Arguments:
- path: Filepath to a folder where the cached zarr stores will be saved.
- bounding_boxes: List of regions to fetch, each as (x_min, x_max, y_min, y_max, z_min, z_max) in voxel coordinates.
- sample: Which liver sample to use.
- label_choice: Which organelle segmentation to use as labels.
- download: Whether to stream and cache the data if it is not present.
Returns:
List of filepaths to the cached zarr stores.
317def get_parlakgul_liver_dataset( 318 path: Union[os.PathLike, str], 319 patch_shape: Tuple[int, int, int], 320 bounding_boxes: List[Tuple[int, int, int, int, int, int]], 321 sample: Literal["6461", "6464", "9430", "1857"] = "6461", 322 label_choice: LabelChoice = "mito", 323 download: bool = False, 324 **kwargs, 325) -> Dataset: 326 """Get the Parlakgul liver dataset for organelle segmentation. 327 328 Args: 329 path: Filepath to a folder where the cached zarr stores will be saved. 330 patch_shape: The patch shape (z, y, x) to use for training. 331 bounding_boxes: List of subvolumes to use, each as 332 (x_min, x_max, y_min, y_max, z_min, z_max) in 8 nm voxel coordinates. 333 sample: Which liver sample to use. One of "6461", "6464", "9430", "1857". 334 label_choice: Which organelle to segment. 335 download: Whether to stream and cache data if not already present. 336 kwargs: Additional keyword arguments for `torch_em.default_segmentation_dataset`. 337 338 Returns: 339 The segmentation dataset. 340 """ 341 assert len(patch_shape) == 3 342 343 paths = get_parlakgul_liver_paths(path, bounding_boxes, sample, label_choice, download) 344 345 kwargs = util.update_kwargs(kwargs, "is_seg_dataset", True) 346 347 return torch_em.default_segmentation_dataset( 348 raw_paths=paths, 349 raw_key="raw", 350 label_paths=paths, 351 label_key="labels", 352 patch_shape=patch_shape, 353 **kwargs, 354 )
Get the Parlakgul liver dataset for organelle segmentation.
Arguments:
- path: Filepath to a folder where the cached zarr stores will be saved.
- patch_shape: The patch shape (z, y, x) to use for training.
- bounding_boxes: List of subvolumes to use, each as (x_min, x_max, y_min, y_max, z_min, z_max) in 8 nm voxel coordinates.
- sample: Which liver sample to use. One of "6461", "6464", "9430", "1857".
- label_choice: Which organelle to segment.
- download: Whether to stream and cache data if not already present.
- kwargs: Additional keyword arguments for
torch_em.default_segmentation_dataset.
Returns:
The segmentation dataset.
357def get_parlakgul_liver_loader( 358 path: Union[os.PathLike, str], 359 patch_shape: Tuple[int, int, int], 360 batch_size: int, 361 bounding_boxes: List[Tuple[int, int, int, int, int, int]], 362 sample: Literal["6461", "6464", "9430", "1857"] = "6461", 363 label_choice: LabelChoice = "mito", 364 download: bool = False, 365 **kwargs, 366) -> DataLoader: 367 """Get the DataLoader for organelle segmentation in the Parlakgul liver dataset. 368 369 Args: 370 path: Filepath to a folder where the cached zarr stores will be saved. 371 patch_shape: The patch shape (z, y, x) to use for training. 372 batch_size: The batch size for training. 373 bounding_boxes: List of subvolumes to use, each as 374 (x_min, x_max, y_min, y_max, z_min, z_max) in 8 nm voxel coordinates. 375 sample: Which liver sample to use. One of "6461", "6464", "9430", "1857". 376 label_choice: Which organelle to segment. 377 download: Whether to stream and cache data if not already present. 378 kwargs: Additional keyword arguments for `torch_em.default_segmentation_dataset` 379 or for the PyTorch DataLoader. 380 381 Returns: 382 The DataLoader. 383 """ 384 ds_kwargs, loader_kwargs = util.split_kwargs(torch_em.default_segmentation_dataset, **kwargs) 385 dataset = get_parlakgul_liver_dataset( 386 path, patch_shape, bounding_boxes, sample=sample, label_choice=label_choice, 387 download=download, **ds_kwargs 388 ) 389 return torch_em.get_data_loader(dataset, batch_size, **loader_kwargs)
Get the DataLoader for organelle segmentation in the Parlakgul liver dataset.
Arguments:
- path: Filepath to a folder where the cached zarr stores will be saved.
- patch_shape: The patch shape (z, y, x) to use for training.
- batch_size: The batch size for training.
- bounding_boxes: List of subvolumes to use, each as (x_min, x_max, y_min, y_max, z_min, z_max) in 8 nm voxel coordinates.
- sample: Which liver sample to use. One of "6461", "6464", "9430", "1857".
- label_choice: Which organelle to segment.
- download: Whether to stream and cache data if not already present.
- kwargs: Additional keyword arguments for
torch_em.default_segmentation_datasetor for the PyTorch DataLoader.
Returns:
The DataLoader.