torch_em.data.datasets.histopathology.phenocell
The PhenoCell dataset contains annotations for cell phenotyping in H&E stained histopathology images, with instance segmentation and 14 granular cell types derived from co-registered multiplexed (CODEX) imaging.
The dataset is part of the PhenoBench (PathoCellBench) benchmark and is hosted on HuggingFace at https://huggingface.co/datasets/Kainmueller-Lab/phenobench. This dataset is from the publication https://doi.org/10.48550/arXiv.2507.03532. Please cite it if you use this dataset in your research.
The data consists of 109 fields of view of 1440x1920 pixels. On the first use each
field of view is converted into a single chunked and compressed HDF5 file with the
following layout:
- 'raw/histopathology/h&e': the (3, H, W) H&E image.
- 'raw/codex/all': the (58, H, W) stack of co-registered CODEX channels.
- 'raw/codex/CODEX_CHANNELS for the full list of 58 channels).
- 'labels/instances': the instance segmentation.
- 'labels/semantic_coarse': the coarse 15-class cell type map (the benchmark labels).
- 'labels/semantic_fine': the fine-grained 30-class cell type map.
The coarse semantic classes ('semantic_coarse' label choice) are: 0: Background 1: B cells 2: Macrophages/Monocytes 3: Adipocytes 4: Dendritic cells 5: T cells 6: Granulocytes 7: NK cells 8: Nerves 9: Plasma cells 10: Smooth muscle 11: Stroma 12: Tumor cells 13: Vasculature/Lymphatics 14: Other cells
The 'semantic_fine' label choice has 30 granular classes that the coarse ones are collapsed from: 0: background 1: B cells 2: CD11b+ monocytes 3: CD11b+CD68+ macrophages 4: CD11c+ DCs 5: CD163+ macrophages 6: CD3+ T cells 7: CD4+ T cells 8: CD4+ T cells CD45RO+ 9: CD4+ T cells GATA3+ 10: CD68+ macrophages 11: CD68+ macrophages GzmB+ 12: CD68+CD163+ macrophages 13: CD8+ T cells 14: NK cells 15: Tregs 16: adipocytes 17: dirt 18: granulocytes 19: immune cells 20: immune cells / vasculature 21: lymphatics 22: nerves 23: plasma cells 24: smooth muscle 25: stroma 26: tumor cells 27: tumor cells / immune cells 28: undefined 29: vasculature
NOTE: Downloading requires 'huggingface_hub'. The dataset is large (each field of view is around 350 MB), so by default only the requested split is downloaded.
1"""The PhenoCell dataset contains annotations for cell phenotyping in 2H&E stained histopathology images, with instance segmentation and 14 granular 3cell types derived from co-registered multiplexed (CODEX) imaging. 4 5The dataset is part of the PhenoBench (PathoCellBench) benchmark and is hosted on 6HuggingFace at https://huggingface.co/datasets/Kainmueller-Lab/phenobench. 7This dataset is from the publication https://doi.org/10.48550/arXiv.2507.03532. 8Please cite it if you use this dataset in your research. 9 10The data consists of 109 fields of view of 1440x1920 pixels. On the first use each 11field of view is converted into a single chunked and compressed HDF5 file with the 12following layout: 13 - 'raw/histopathology/h&e': the (3, H, W) H&E image. 14 - 'raw/codex/all': the (58, H, W) stack of co-registered CODEX channels. 15 - 'raw/codex/<marker>_<target>': each individual CODEX channel (H, W), e.g. 16 'raw/codex/CD20_B_cells' (see `CODEX_CHANNELS` for the full list of 58 channels). 17 - 'labels/instances': the instance segmentation. 18 - 'labels/semantic_coarse': the coarse 15-class cell type map (the benchmark labels). 19 - 'labels/semantic_fine': the fine-grained 30-class cell type map. 20 21The coarse semantic classes ('semantic_coarse' label choice) are: 22 0: Background 23 1: B cells 24 2: Macrophages/Monocytes 25 3: Adipocytes 26 4: Dendritic cells 27 5: T cells 28 6: Granulocytes 29 7: NK cells 30 8: Nerves 31 9: Plasma cells 32 10: Smooth muscle 33 11: Stroma 34 12: Tumor cells 35 13: Vasculature/Lymphatics 36 14: Other cells 37 38The 'semantic_fine' label choice has 30 granular classes that the coarse ones are 39collapsed from: 40 0: background 41 1: B cells 42 2: CD11b+ monocytes 43 3: CD11b+CD68+ macrophages 44 4: CD11c+ DCs 45 5: CD163+ macrophages 46 6: CD3+ T cells 47 7: CD4+ T cells 48 8: CD4+ T cells CD45RO+ 49 9: CD4+ T cells GATA3+ 50 10: CD68+ macrophages 51 11: CD68+ macrophages GzmB+ 52 12: CD68+CD163+ macrophages 53 13: CD8+ T cells 54 14: NK cells 55 15: Tregs 56 16: adipocytes 57 17: dirt 58 18: granulocytes 59 19: immune cells 60 20: immune cells / vasculature 61 21: lymphatics 62 22: nerves 63 23: plasma cells 64 24: smooth muscle 65 25: stroma 66 26: tumor cells 67 27: tumor cells / immune cells 68 28: undefined 69 29: vasculature 70 71NOTE: Downloading requires 'huggingface_hub'. The dataset is large (each field of 72view is around 350 MB), so by default only the requested split is downloaded. 73""" 74 75import os 76from pathlib import Path 77from typing import List, Literal, Optional, Tuple, Union 78 79from tqdm import tqdm 80 81import torch 82 83from torch.utils.data import Dataset, DataLoader 84 85import torch_em 86 87from .. import util 88 89 90HF_REPO = "Kainmueller-Lab/phenobench" 91SRC_HDF_DIR = "pathocell_hdf" 92SPLIT_FILE = "data/phenocell/splits/phenocell_dataset_split.csv" 93 94# Source label key in the downloaded HDF5 -> destination key in the converted HDF5. 95SOURCE_LABELS = { 96 "gt_inst": "labels/instances", 97 "gt_ct_coarse": "labels/semantic_coarse", 98 "gt_ct": "labels/semantic_fine", 99} 100 101LABEL_KEYS = { 102 "instances": "labels/instances", 103 "semantic_coarse": "labels/semantic_coarse", 104 "semantic_fine": "labels/semantic_fine", 105} 106 107# The multi-channel raw inputs. Individual CODEX channels (see CODEX_CHANNELS) can also be chosen. 108MODALITY_KEYS = { 109 "histopathology": "raw/histopathology/h&e", 110 "codex": "raw/codex/all", 111} 112 113# The 58 CODEX channels in their stored order, named '<marker>_<target>' (the keys under 'raw/codex/'). 114CODEX_CHANNELS = [ 115 "CD44_stroma", "FOXP3_regulatory_T_cells", "CDX2_intestinal_epithelia", "CD8_cytotoxic_T_cells", 116 "p53_tumor_suppressor", "GATA3_Th2_helper_T_cells", "CD45_hematopoietic_cells", "T-bet_Th1_cells", 117 "beta-catenin_Wnt_signaling", "HLA-DR_MHC-II", "PD-L1_checkpoint", "Ki67_proliferation", 118 "CD45RA_naive_T_cells", "CD4_T_helper_cells", "CD21_DCs", "MUC-1_epithelia", "CD30_costimulator", 119 "CD2_T_cells", "Vimentin_cytoplasm", "CD20_B_cells", "LAG-3_checkpoint", "Na-K-ATPase_membranes", 120 "CD5_T_cells", "IDO-1_metabolism", "Cytokeratin_epithelia", "CD11b_macrophages", "CD56_NK_cells", 121 "aSMA_smooth_muscle", "BCL-2_apoptosis", "CD25_IL-2_Ra", "Collagen_IV_bas._memb.", "CD11c_DCs", 122 "PD-1_checkpoint", "HOCHST13", "Granzyme_B_cytotoxicity", "EGFR_signaling", "VISTA_costimulator", 123 "CD15_granulocytes", "CD194_CCR4_chemokine_R", "ICOS_costimulator", "MMP9_matrix_metalloproteinase", 124 "Synaptophysin_neuroendocrine", "CD71_transferrin_R", "GFAP_nerves", "CD7_T_cells", "CD3_T_cells", 125 "Chromogranin_A_neuroendocrine", "CD163_macrophages", "CD57_NK_cells", "CD45RO_memory_cells", 126 "CD68_macrophages", "CD31_vasculature", "Podoplanin_lymphatics", "CD34_vasculature", "CD38_multifunctional", 127 "CD138_plasma_cells", "MMP12_matrix_metalloproteinase", "DRAQ5", 128] 129 130 131def _samples_for_split(split_csv, split): 132 import pandas as pd 133 134 df = pd.read_csv(split_csv) 135 if split is not None: 136 if split not in ("train", "valid", "test"): 137 raise ValueError(f"'{split}' is not a valid split choice. Use 'train', 'valid' or 'test'.") 138 df = df[df["train_test_val_split"] == split] 139 140 return sorted(df["sample_name"].tolist()) 141 142 143def _convert_sample(src_path, output_path): 144 import h5py 145 146 with h5py.File(src_path, "r") as f: 147 image = f["img"][:] 148 codex = f["ifl"][:] 149 labels = {dst: f[src][0] for src, dst in SOURCE_LABELS.items()} 150 151 if codex.shape[0] != len(CODEX_CHANNELS): 152 raise RuntimeError(f"Expected {len(CODEX_CHANNELS)} CODEX channels, but found {codex.shape[0]}.") 153 154 tmp_path = output_path + ".tmp" 155 with h5py.File(tmp_path, "w") as f: 156 f.create_dataset("raw/histopathology/h&e", data=image, compression="gzip", chunks=(1, 512, 512)) 157 f.create_dataset("raw/codex/all", data=codex, compression="gzip", chunks=(1, 512, 512)) 158 for i, name in enumerate(CODEX_CHANNELS): 159 f.create_dataset(f"raw/codex/{name}", data=codex[i], compression="gzip", chunks=(512, 512)) 160 for dst, label in labels.items(): 161 f.create_dataset(dst, data=label, compression="gzip", chunks=(512, 512)) 162 163 os.replace(tmp_path, output_path) 164 165 166def get_phenocell_data( 167 path: Union[os.PathLike, str], 168 split: Optional[Literal["train", "valid", "test"]] = None, 169 download: bool = False, 170) -> str: 171 """Download and preprocess the PhenoCell data. 172 173 Args: 174 path: Filepath to a folder where the downloaded data will be saved. 175 split: The split to use. Either 'train', 'valid', 'test' or None for all fields of view. 176 download: Whether to download the data if it is not present. 177 178 Returns: 179 Filepath to the folder where the preprocessed data is stored. 180 """ 181 try: 182 from huggingface_hub import hf_hub_download, snapshot_download 183 except ImportError: 184 raise ImportError("'huggingface_hub' is required to download PhenoCell. Install it via conda/pip.") 185 186 preprocessed_dir = os.path.join(path, "preprocessed") 187 os.makedirs(preprocessed_dir, exist_ok=True) 188 189 if not os.path.exists(os.path.join(path, SPLIT_FILE)): 190 if not download: 191 raise RuntimeError(f"Cannot find the data at {path}, but download was set to False.") 192 hf_hub_download(repo_id=HF_REPO, repo_type="dataset", filename=SPLIT_FILE, local_dir=path) 193 194 samples = _samples_for_split(os.path.join(path, SPLIT_FILE), split) 195 to_convert = [s for s in samples if not os.path.exists(os.path.join(preprocessed_dir, f"{Path(s).stem}.h5"))] 196 197 if to_convert: 198 if not download: 199 raise RuntimeError(f"Cannot find the data at {path}, but download was set to False.") 200 patterns = [f"{SRC_HDF_DIR}/{s}" for s in to_convert] 201 snapshot_download(repo_id=HF_REPO, repo_type="dataset", local_dir=path, allow_patterns=patterns) 202 203 for sample in tqdm(to_convert, desc="Converting PhenoCell fields of view"): 204 _convert_sample( 205 os.path.join(path, SRC_HDF_DIR, sample), 206 os.path.join(preprocessed_dir, f"{Path(sample).stem}.h5"), 207 ) 208 209 return preprocessed_dir 210 211 212def get_phenocell_paths( 213 path: Union[os.PathLike, str], 214 split: Optional[Literal["train", "valid", "test"]] = None, 215 download: bool = False, 216) -> List[str]: 217 """Get paths to the PhenoCell data. 218 219 Args: 220 path: Filepath to a folder where the downloaded data will be saved. 221 split: The split to use. Either 'train', 'valid', 'test' or None for all fields of view. 222 download: Whether to download the data if it is not present. 223 224 Returns: 225 List of filepaths to the preprocessed HDF5 files. 226 """ 227 preprocessed_dir = get_phenocell_data(path, split, download) 228 samples = _samples_for_split(os.path.join(path, SPLIT_FILE), split) 229 volume_paths = [os.path.join(preprocessed_dir, f"{Path(s).stem}.h5") for s in samples] 230 231 missing = [p for p in volume_paths if not os.path.exists(p)] 232 if missing: 233 raise RuntimeError(f"Could not find the data at {missing}.") 234 235 return volume_paths 236 237 238def get_phenocell_dataset( 239 path: Union[os.PathLike, str], 240 patch_shape: Tuple[int, int], 241 split: Optional[Literal["train", "valid", "test"]] = None, 242 label_choice: Literal["instances", "semantic_coarse", "semantic_fine"] = "instances", 243 modality: str = "histopathology", 244 download: bool = False, 245 label_dtype: torch.dtype = torch.int64, 246 resize_inputs: bool = False, 247 **kwargs 248) -> Dataset: 249 """Get the PhenoCell dataset for cell phenotyping in H&E stained histopathology images. 250 251 Args: 252 path: Filepath to a folder where the downloaded data will be saved. 253 patch_shape: The patch shape to use for training. 254 split: The split to use. Either 'train', 'valid', 'test' or None for all fields of view. 255 label_choice: The label type. Either 'instances', 'semantic_coarse' (15-class) or 'semantic_fine' (30-class). 256 modality: The raw input. Either 'histopathology' (3-channel H&E), 'codex' (58-channel multiplexed stack) 257 or the name of a single CODEX channel (see `CODEX_CHANNELS`), e.g. 'CD20_B_cells'. 258 download: Whether to download the data if it is not present. 259 label_dtype: The datatype of the labels. 260 resize_inputs: Whether to resize the input images. 261 kwargs: Additional keyword arguments for `torch_em.default_segmentation_dataset`. 262 263 Returns: 264 The segmentation dataset. 265 """ 266 if label_choice not in LABEL_KEYS: 267 raise ValueError(f"'{label_choice}' is not a valid label choice. Choose from {list(LABEL_KEYS.keys())}.") 268 269 if modality in MODALITY_KEYS: 270 raw_key, with_channels = MODALITY_KEYS[modality], True 271 elif modality in CODEX_CHANNELS: 272 raw_key, with_channels = f"raw/codex/{modality}", False 273 else: 274 raise ValueError(f"'{modality}' is not a valid modality. Use 'histopathology', 'codex' or a CODEX channel.") 275 276 volume_paths = get_phenocell_paths(path, split, download) 277 278 if resize_inputs: 279 resize_kwargs = {"patch_shape": patch_shape, "is_rgb": modality == "histopathology"} 280 kwargs, patch_shape = util.update_kwargs_for_resize_trafo( 281 kwargs=kwargs, patch_shape=patch_shape, resize_inputs=resize_inputs, resize_kwargs=resize_kwargs 282 ) 283 284 return torch_em.default_segmentation_dataset( 285 raw_paths=volume_paths, 286 raw_key=raw_key, 287 label_paths=volume_paths, 288 label_key=LABEL_KEYS[label_choice], 289 patch_shape=patch_shape, 290 label_dtype=label_dtype, 291 is_seg_dataset=True, 292 with_channels=with_channels, 293 ndim=2, 294 **kwargs 295 ) 296 297 298def get_phenocell_loader( 299 path: Union[os.PathLike, str], 300 patch_shape: Tuple[int, int], 301 batch_size: int, 302 split: Optional[Literal["train", "valid", "test"]] = None, 303 label_choice: Literal["instances", "semantic_coarse", "semantic_fine"] = "instances", 304 modality: str = "histopathology", 305 download: bool = False, 306 label_dtype: torch.dtype = torch.int64, 307 resize_inputs: bool = False, 308 **kwargs 309) -> DataLoader: 310 """Get the PhenoCell dataloader for cell phenotyping in H&E stained histopathology images. 311 312 Args: 313 path: Filepath to a folder where the downloaded data will be saved. 314 patch_shape: The patch shape to use for training. 315 batch_size: The batch size for training. 316 split: The split to use. Either 'train', 'valid', 'test' or None for all fields of view. 317 label_choice: The label type. Either 'instances', 'semantic_coarse' (15-class) or 'semantic_fine' (30-class). 318 modality: The raw input. Either 'histopathology' (3-channel H&E), 'codex' (58-channel multiplexed stack) 319 or the name of a single CODEX channel (see `CODEX_CHANNELS`), e.g. 'CD20_B_cells'. 320 download: Whether to download the data if it is not present. 321 label_dtype: The datatype of the labels. 322 resize_inputs: Whether to resize the input images. 323 kwargs: Additional keyword arguments for `torch_em.default_segmentation_dataset` or for the PyTorch DataLoader. 324 325 Returns: 326 The DataLoader. 327 """ 328 ds_kwargs, loader_kwargs = util.split_kwargs(torch_em.default_segmentation_dataset, **kwargs) 329 dataset = get_phenocell_dataset( 330 path=path, patch_shape=patch_shape, split=split, label_choice=label_choice, modality=modality, 331 download=download, label_dtype=label_dtype, resize_inputs=resize_inputs, **ds_kwargs 332 ) 333 return torch_em.get_data_loader(dataset, batch_size, **loader_kwargs)
167def get_phenocell_data( 168 path: Union[os.PathLike, str], 169 split: Optional[Literal["train", "valid", "test"]] = None, 170 download: bool = False, 171) -> str: 172 """Download and preprocess the PhenoCell data. 173 174 Args: 175 path: Filepath to a folder where the downloaded data will be saved. 176 split: The split to use. Either 'train', 'valid', 'test' or None for all fields of view. 177 download: Whether to download the data if it is not present. 178 179 Returns: 180 Filepath to the folder where the preprocessed data is stored. 181 """ 182 try: 183 from huggingface_hub import hf_hub_download, snapshot_download 184 except ImportError: 185 raise ImportError("'huggingface_hub' is required to download PhenoCell. Install it via conda/pip.") 186 187 preprocessed_dir = os.path.join(path, "preprocessed") 188 os.makedirs(preprocessed_dir, exist_ok=True) 189 190 if not os.path.exists(os.path.join(path, SPLIT_FILE)): 191 if not download: 192 raise RuntimeError(f"Cannot find the data at {path}, but download was set to False.") 193 hf_hub_download(repo_id=HF_REPO, repo_type="dataset", filename=SPLIT_FILE, local_dir=path) 194 195 samples = _samples_for_split(os.path.join(path, SPLIT_FILE), split) 196 to_convert = [s for s in samples if not os.path.exists(os.path.join(preprocessed_dir, f"{Path(s).stem}.h5"))] 197 198 if to_convert: 199 if not download: 200 raise RuntimeError(f"Cannot find the data at {path}, but download was set to False.") 201 patterns = [f"{SRC_HDF_DIR}/{s}" for s in to_convert] 202 snapshot_download(repo_id=HF_REPO, repo_type="dataset", local_dir=path, allow_patterns=patterns) 203 204 for sample in tqdm(to_convert, desc="Converting PhenoCell fields of view"): 205 _convert_sample( 206 os.path.join(path, SRC_HDF_DIR, sample), 207 os.path.join(preprocessed_dir, f"{Path(sample).stem}.h5"), 208 ) 209 210 return preprocessed_dir
Download and preprocess the PhenoCell data.
Arguments:
- path: Filepath to a folder where the downloaded data will be saved.
- split: The split to use. Either 'train', 'valid', 'test' or None for all fields of view.
- download: Whether to download the data if it is not present.
Returns:
Filepath to the folder where the preprocessed data is stored.
213def get_phenocell_paths( 214 path: Union[os.PathLike, str], 215 split: Optional[Literal["train", "valid", "test"]] = None, 216 download: bool = False, 217) -> List[str]: 218 """Get paths to the PhenoCell data. 219 220 Args: 221 path: Filepath to a folder where the downloaded data will be saved. 222 split: The split to use. Either 'train', 'valid', 'test' or None for all fields of view. 223 download: Whether to download the data if it is not present. 224 225 Returns: 226 List of filepaths to the preprocessed HDF5 files. 227 """ 228 preprocessed_dir = get_phenocell_data(path, split, download) 229 samples = _samples_for_split(os.path.join(path, SPLIT_FILE), split) 230 volume_paths = [os.path.join(preprocessed_dir, f"{Path(s).stem}.h5") for s in samples] 231 232 missing = [p for p in volume_paths if not os.path.exists(p)] 233 if missing: 234 raise RuntimeError(f"Could not find the data at {missing}.") 235 236 return volume_paths
Get paths to the PhenoCell data.
Arguments:
- path: Filepath to a folder where the downloaded data will be saved.
- split: The split to use. Either 'train', 'valid', 'test' or None for all fields of view.
- download: Whether to download the data if it is not present.
Returns:
List of filepaths to the preprocessed HDF5 files.
239def get_phenocell_dataset( 240 path: Union[os.PathLike, str], 241 patch_shape: Tuple[int, int], 242 split: Optional[Literal["train", "valid", "test"]] = None, 243 label_choice: Literal["instances", "semantic_coarse", "semantic_fine"] = "instances", 244 modality: str = "histopathology", 245 download: bool = False, 246 label_dtype: torch.dtype = torch.int64, 247 resize_inputs: bool = False, 248 **kwargs 249) -> Dataset: 250 """Get the PhenoCell dataset for cell phenotyping in H&E stained histopathology images. 251 252 Args: 253 path: Filepath to a folder where the downloaded data will be saved. 254 patch_shape: The patch shape to use for training. 255 split: The split to use. Either 'train', 'valid', 'test' or None for all fields of view. 256 label_choice: The label type. Either 'instances', 'semantic_coarse' (15-class) or 'semantic_fine' (30-class). 257 modality: The raw input. Either 'histopathology' (3-channel H&E), 'codex' (58-channel multiplexed stack) 258 or the name of a single CODEX channel (see `CODEX_CHANNELS`), e.g. 'CD20_B_cells'. 259 download: Whether to download the data if it is not present. 260 label_dtype: The datatype of the labels. 261 resize_inputs: Whether to resize the input images. 262 kwargs: Additional keyword arguments for `torch_em.default_segmentation_dataset`. 263 264 Returns: 265 The segmentation dataset. 266 """ 267 if label_choice not in LABEL_KEYS: 268 raise ValueError(f"'{label_choice}' is not a valid label choice. Choose from {list(LABEL_KEYS.keys())}.") 269 270 if modality in MODALITY_KEYS: 271 raw_key, with_channels = MODALITY_KEYS[modality], True 272 elif modality in CODEX_CHANNELS: 273 raw_key, with_channels = f"raw/codex/{modality}", False 274 else: 275 raise ValueError(f"'{modality}' is not a valid modality. Use 'histopathology', 'codex' or a CODEX channel.") 276 277 volume_paths = get_phenocell_paths(path, split, download) 278 279 if resize_inputs: 280 resize_kwargs = {"patch_shape": patch_shape, "is_rgb": modality == "histopathology"} 281 kwargs, patch_shape = util.update_kwargs_for_resize_trafo( 282 kwargs=kwargs, patch_shape=patch_shape, resize_inputs=resize_inputs, resize_kwargs=resize_kwargs 283 ) 284 285 return torch_em.default_segmentation_dataset( 286 raw_paths=volume_paths, 287 raw_key=raw_key, 288 label_paths=volume_paths, 289 label_key=LABEL_KEYS[label_choice], 290 patch_shape=patch_shape, 291 label_dtype=label_dtype, 292 is_seg_dataset=True, 293 with_channels=with_channels, 294 ndim=2, 295 **kwargs 296 )
Get the PhenoCell dataset for cell phenotyping in H&E stained histopathology images.
Arguments:
- path: Filepath to a folder where the downloaded data will be saved.
- patch_shape: The patch shape to use for training.
- split: The split to use. Either 'train', 'valid', 'test' or None for all fields of view.
- label_choice: The label type. Either 'instances', 'semantic_coarse' (15-class) or 'semantic_fine' (30-class).
- modality: The raw input. Either 'histopathology' (3-channel H&E), 'codex' (58-channel multiplexed stack)
or the name of a single CODEX channel (see
CODEX_CHANNELS), e.g. 'CD20_B_cells'. - download: Whether to download the data if it is not present.
- label_dtype: The datatype of the labels.
- resize_inputs: Whether to resize the input images.
- kwargs: Additional keyword arguments for
torch_em.default_segmentation_dataset.
Returns:
The segmentation dataset.
299def get_phenocell_loader( 300 path: Union[os.PathLike, str], 301 patch_shape: Tuple[int, int], 302 batch_size: int, 303 split: Optional[Literal["train", "valid", "test"]] = None, 304 label_choice: Literal["instances", "semantic_coarse", "semantic_fine"] = "instances", 305 modality: str = "histopathology", 306 download: bool = False, 307 label_dtype: torch.dtype = torch.int64, 308 resize_inputs: bool = False, 309 **kwargs 310) -> DataLoader: 311 """Get the PhenoCell dataloader for cell phenotyping in H&E stained histopathology images. 312 313 Args: 314 path: Filepath to a folder where the downloaded data will be saved. 315 patch_shape: The patch shape to use for training. 316 batch_size: The batch size for training. 317 split: The split to use. Either 'train', 'valid', 'test' or None for all fields of view. 318 label_choice: The label type. Either 'instances', 'semantic_coarse' (15-class) or 'semantic_fine' (30-class). 319 modality: The raw input. Either 'histopathology' (3-channel H&E), 'codex' (58-channel multiplexed stack) 320 or the name of a single CODEX channel (see `CODEX_CHANNELS`), e.g. 'CD20_B_cells'. 321 download: Whether to download the data if it is not present. 322 label_dtype: The datatype of the labels. 323 resize_inputs: Whether to resize the input images. 324 kwargs: Additional keyword arguments for `torch_em.default_segmentation_dataset` or for the PyTorch DataLoader. 325 326 Returns: 327 The DataLoader. 328 """ 329 ds_kwargs, loader_kwargs = util.split_kwargs(torch_em.default_segmentation_dataset, **kwargs) 330 dataset = get_phenocell_dataset( 331 path=path, patch_shape=patch_shape, split=split, label_choice=label_choice, modality=modality, 332 download=download, label_dtype=label_dtype, resize_inputs=resize_inputs, **ds_kwargs 333 ) 334 return torch_em.get_data_loader(dataset, batch_size, **loader_kwargs)
Get the PhenoCell dataloader for cell phenotyping in H&E stained histopathology images.
Arguments:
- path: Filepath to a folder where the downloaded data will be saved.
- patch_shape: The patch shape to use for training.
- batch_size: The batch size for training.
- split: The split to use. Either 'train', 'valid', 'test' or None for all fields of view.
- label_choice: The label type. Either 'instances', 'semantic_coarse' (15-class) or 'semantic_fine' (30-class).
- modality: The raw input. Either 'histopathology' (3-channel H&E), 'codex' (58-channel multiplexed stack)
or the name of a single CODEX channel (see
CODEX_CHANNELS), e.g. 'CD20_B_cells'. - download: Whether to download the data if it is not present.
- label_dtype: The datatype of the labels.
- resize_inputs: Whether to resize the input images.
- kwargs: Additional keyword arguments for
torch_em.default_segmentation_datasetor for the PyTorch DataLoader.
Returns:
The DataLoader.