torch_em.data.datasets.histopathology.catch

The CATCH dataset contains annotations for tissue segmentation in H&E stained histopathology images of seven canine cutaneous tumor types.

The dataset consists of 350 whole-slide images (50 per tumor type) with 12,424 polygon annotations across 13 tissue classes. The original Aperio SVS images are distributed via IBM Aspera (often firewalled), so this loader instead obtains the images from the Imaging Data Commons (IDC) over HTTPS as DICOM whole-slide images.

This dataset is from the publication https://doi.org/10.1038/s41597-022-01692-w. Please cite it if you use this dataset in your research. It is hosted on TCIA at https://doi.org/10.7937/TCIA.2M93-FX66 (CC BY 4.0) and mirrored on IDC.

NOTE: Downloading requires 'idc-index'. Reading the DICOM images requires 'wsidicom' and rasterizing the polygons requires 'scikit-image'. The data is large (each slide is around 0.2-2 GB as DICOM), so the slides are downloaded and converted one tumor type / slide at a time, and the DICOM source is removed after conversion. By default the full-resolution (base) level is used; this level can be several gigapixels per slide, so it is read and written to the HDF5 file in tiles. Pass a higher level to use a downsampled level instead.

The annotations are coarse region-level polygons (around 35 per slide), not cell or nucleus annotations. They are sparse: regions outside any polygon are left as 0, so 'labels/semantic' is a sparse region-level map and class 0 should typically be treated as background / ignored during training. Each whole-slide image shows a single tumor type, so within one slide you see that tumor class plus the surrounding normal tissue classes.

The 13 classes ('labels/semantic') are grouped into a 'Tissue' supercategory (1-6) and a 'Tumor' supercategory (7-13): 0: background (unannotated) 1: Bone 2: Cartilage 3: Dermis 4: Epidermis 5: Subcutis 6: Inflamm/Necrosis 7: Melanoma 8: Plasmacytoma 9: Mast Cell Tumor 10: PNST 11: SCC 12: Trichoblastoma 13: Histiocytoma

  1"""The CATCH dataset contains annotations for tissue segmentation in
  2H&E stained histopathology images of seven canine cutaneous tumor types.
  3
  4The dataset consists of 350 whole-slide images (50 per tumor type) with 12,424
  5polygon annotations across 13 tissue classes. The original Aperio SVS images are
  6distributed via IBM Aspera (often firewalled), so this loader instead obtains the
  7images from the Imaging Data Commons (IDC) over HTTPS as DICOM whole-slide images.
  8
  9This dataset is from the publication https://doi.org/10.1038/s41597-022-01692-w.
 10Please cite it if you use this dataset in your research. It is hosted on TCIA at
 11https://doi.org/10.7937/TCIA.2M93-FX66 (CC BY 4.0) and mirrored on IDC.
 12
 13NOTE: Downloading requires 'idc-index'. Reading the DICOM images requires 'wsidicom'
 14and rasterizing the polygons requires 'scikit-image'. The data is large (each slide
 15is around 0.2-2 GB as DICOM), so the slides are downloaded and converted one tumor
 16type / slide at a time, and the DICOM source is removed after conversion. By default
 17the full-resolution (base) level is used; this level can be several gigapixels per
 18slide, so it is read and written to the HDF5 file in tiles. Pass a higher `level` to
 19use a downsampled level instead.
 20
 21The annotations are coarse region-level polygons (around 35 per slide), not cell or
 22nucleus annotations. They are sparse: regions outside any polygon are left as 0, so
 23'labels/semantic' is a sparse region-level map and class 0 should typically be treated
 24as background / ignored during training. Each whole-slide image shows a single tumor
 25type, so within one slide you see that tumor class plus the surrounding normal tissue
 26classes.
 27
 28The 13 classes ('labels/semantic') are grouped into a 'Tissue' supercategory (1-6) and
 29a 'Tumor' supercategory (7-13):
 30    0: background (unannotated)
 31    1: Bone
 32    2: Cartilage
 33    3: Dermis
 34    4: Epidermis
 35    5: Subcutis
 36    6: Inflamm/Necrosis
 37    7: Melanoma
 38    8: Plasmacytoma
 39    9: Mast Cell Tumor
 40    10: PNST
 41    11: SCC
 42    12: Trichoblastoma
 43    13: Histiocytoma
 44"""
 45
 46import os
 47import shutil
 48import zipfile
 49from glob import glob
 50from typing import List, Optional, Tuple, Union
 51
 52import numpy as np
 53from tqdm import tqdm
 54
 55import torch
 56
 57from torch.utils.data import Dataset, DataLoader
 58
 59import torch_em
 60
 61from .. import util
 62
 63
 64COCO_URL = "https://www.cancerimagingarchive.net/wp-content/uploads/CATCH-json.zip"
 65IDC_COLLECTION = "catch"
 66TUMOR_TYPES = ["Histiocytoma", "MCT", "Melanoma", "PNST", "Plasmacytoma", "SCC", "Trichoblastoma"]
 67
 68
 69def _load_coco(path):
 70    import json
 71
 72    coco_path = os.path.join(path, "CATCH.json")
 73    coco = json.load(open(coco_path))
 74    annotations = {}
 75    for ann in coco["annotations"]:
 76        annotations.setdefault(ann["image_id"], []).append(ann)
 77    images = {im["file_name"]: (im["id"], im["width"], im["height"]) for im in coco["images"]}
 78    return images, annotations
 79
 80
 81def _rasterize_into(label_dataset, annotations, downsample):
 82    from skimage.draw import polygon as draw_polygon
 83
 84    height, width = label_dataset.shape
 85    # Larger regions are drawn first so that smaller annotations stay on top. Each polygon is
 86    # rasterized within its own bounding box to avoid allocating a full-resolution label in memory.
 87    for ann in sorted(annotations, key=lambda a: a.get("area", 0), reverse=True):
 88        segments = ann["segmentation"]
 89        if segments and isinstance(segments[0], (int, float)):
 90            segments = [segments]
 91        for segment in segments:
 92            xs = np.asarray(segment[0::2], dtype="float64") / downsample
 93            ys = np.asarray(segment[1::2], dtype="float64") / downsample
 94            x0, x1 = max(int(np.floor(xs.min())), 0), min(int(np.ceil(xs.max())) + 1, width)
 95            y0, y1 = max(int(np.floor(ys.min())), 0), min(int(np.ceil(ys.max())) + 1, height)
 96            if x1 <= x0 or y1 <= y0:
 97                continue
 98            rr, cc = draw_polygon(ys - y0, xs - x0, shape=(y1 - y0, x1 - x0))
 99            block = label_dataset[y0:y1, x0:x1]
100            block[rr, cc] = ann["category_id"]
101            label_dataset[y0:y1, x0:x1] = block
102
103
104def _convert_slide(series_uid, file_name, images, annotations, level, output_path, tmp_dir, tile=4096):
105    import h5py
106    from idc_index import IDCClient
107    from wsidicom import WsiDicom
108
109    # Download into a per-series folder, since several slides of the same patient share a PatientID.
110    slide_dir = os.path.join(tmp_dir, series_uid)
111    if not os.path.exists(slide_dir):
112        IDCClient().download_dicom_series(
113            seriesInstanceUID=series_uid, downloadDir=tmp_dir, dirTemplate="%SeriesInstanceUID"
114        )
115
116    slide = WsiDicom.open(slide_dir)
117    try:
118        base_width = slide.size.width
119        # By default the highest resolution (base) level is used.
120        wsi_level = max(slide.levels, key=lambda lv: lv.size.width) if level is None \
121            else next(lv for lv in slide.levels if lv.level == level)
122        width, height = wsi_level.size.width, wsi_level.size.height
123        downsample = base_width / width
124
125        image_id = images[file_name][0]
126        tmp_path = output_path + ".tmp"
127        with h5py.File(tmp_path, "w") as f:
128            raw = f.create_dataset(
129                "raw", shape=(3, height, width), dtype="uint8", compression="gzip", chunks=(1, 512, 512)
130            )
131            label = f.create_dataset(
132                "labels/semantic", shape=(height, width), dtype="uint8", compression="gzip", chunks=(512, 512)
133            )
134            # The base level can be several gigapixels, so the image is read and written in tiles.
135            for y in range(0, height, tile):
136                for x in range(0, width, tile):
137                    th, tw = min(tile, height - y), min(tile, width - x)
138                    region = np.array(slide.read_region((x, y), wsi_level.level, (tw, th)))[..., :3]
139                    raw[:, y:y + th, x:x + tw] = region.transpose(2, 0, 1)
140            _rasterize_into(label, annotations.get(image_id, []), downsample)
141    finally:
142        slide.close()
143
144    os.replace(tmp_path, output_path)
145    shutil.rmtree(slide_dir, ignore_errors=True)
146
147
148def get_catch_data(
149    path: Union[os.PathLike, str],
150    tumor_types: Optional[Union[str, List[str]]] = None,
151    level: Optional[int] = None,
152    download: bool = False,
153) -> str:
154    """Download and preprocess the CATCH data.
155
156    Args:
157        path: Filepath to a folder where the data will be saved.
158        tumor_types: The tumor types to use. By default all seven tumor types are used.
159        level: The DICOM pyramid level to read. By default the highest resolution (base) level is used.
160        download: Whether to download the data if it is not present.
161
162    Returns:
163        Filepath to the folder where the preprocessed data is stored.
164    """
165    if tumor_types is None:
166        tumor_types = TUMOR_TYPES
167    if isinstance(tumor_types, str):
168        tumor_types = [tumor_types]
169    for tumor_type in tumor_types:
170        if tumor_type not in TUMOR_TYPES:
171            raise ValueError(f"'{tumor_type}' is not a valid tumor type. Choose from {TUMOR_TYPES}.")
172
173    preprocessed_dir = os.path.join(path, "preprocessed")
174    tmp_dir = os.path.join(path, "dicom")
175    os.makedirs(preprocessed_dir, exist_ok=True)
176
177    coco_path = os.path.join(path, "CATCH.json")
178    if not os.path.exists(coco_path):
179        zip_path = os.path.join(path, "CATCH-json.zip")
180        util.download_source(path=zip_path, url=COCO_URL, download=download, checksum=None)
181        with zipfile.ZipFile(zip_path, "r") as f:
182            f.extractall(path)
183
184    images, annotations = _load_coco(path)
185
186    try:
187        from idc_index import IDCClient
188    except ImportError:
189        raise ImportError("'idc-index' is required to download CATCH. Install it via conda/pip.")
190
191    # The slide microscopy index provides the 'ContainerIdentifier', which matches the COCO file name
192    # ('<ContainerIdentifier>.svs') and is unique per slide (unlike PatientID, where one patient may have
193    # several slides).
194    client = IDCClient()
195    client.fetch_index("sm_index")
196    catch = client.index[client.index["collection_id"] == IDC_COLLECTION]
197    catch = catch.merge(client.sm_index[["SeriesInstanceUID", "ContainerIdentifier"]], on="SeriesInstanceUID")
198
199    to_convert = catch[catch["ContainerIdentifier"].str.startswith(tuple(tumor_types))]
200    for _, row in tqdm(list(to_convert.iterrows()), desc="Converting CATCH slides"):
201        container_id = row["ContainerIdentifier"]
202        output_path = os.path.join(preprocessed_dir, f"{container_id}.h5")
203        if os.path.exists(output_path):
204            continue
205        if not download:
206            raise RuntimeError(f"Cannot find the data at {path}, but download was set to False.")
207        _convert_slide(
208            row["SeriesInstanceUID"], f"{container_id}.svs", images, annotations, level, output_path, tmp_dir
209        )
210
211    return preprocessed_dir
212
213
214def get_catch_paths(
215    path: Union[os.PathLike, str],
216    tumor_types: Optional[Union[str, List[str]]] = None,
217    level: Optional[int] = None,
218    download: bool = False,
219) -> List[str]:
220    """Get paths to the CATCH data.
221
222    Args:
223        path: Filepath to a folder where the data will be saved.
224        tumor_types: The tumor types to use. By default all seven tumor types are used.
225        level: The DICOM pyramid level to read. By default the highest resolution (base) level is used.
226        download: Whether to download the data if it is not present.
227
228    Returns:
229        List of filepaths to the preprocessed HDF5 files.
230    """
231    preprocessed_dir = get_catch_data(path, tumor_types, level, download)
232    volume_paths = sorted(glob(os.path.join(preprocessed_dir, "*.h5")))
233    if not volume_paths:
234        raise RuntimeError("Could not find any preprocessed CATCH slides for the requested settings.")
235
236    return volume_paths
237
238
239def get_catch_dataset(
240    path: Union[os.PathLike, str],
241    patch_shape: Tuple[int, int],
242    tumor_types: Optional[Union[str, List[str]]] = None,
243    level: Optional[int] = None,
244    download: bool = False,
245    label_dtype: torch.dtype = torch.int64,
246    resize_inputs: bool = False,
247    **kwargs
248) -> Dataset:
249    """Get the CATCH dataset for tissue segmentation in canine cutaneous tumor histopathology images.
250
251    Args:
252        path: Filepath to a folder where the data will be saved.
253        patch_shape: The patch shape to use for training.
254        tumor_types: The tumor types to use. By default all seven tumor types are used.
255        level: The DICOM pyramid level to read. By default the highest resolution (base) level is used.
256        download: Whether to download the data if it is not present.
257        label_dtype: The datatype of the labels.
258        resize_inputs: Whether to resize the input images.
259        kwargs: Additional keyword arguments for `torch_em.default_segmentation_dataset`.
260
261    Returns:
262        The segmentation dataset.
263    """
264    volume_paths = get_catch_paths(path, tumor_types, level, download)
265
266    if resize_inputs:
267        resize_kwargs = {"patch_shape": patch_shape, "is_rgb": True}
268        kwargs, patch_shape = util.update_kwargs_for_resize_trafo(
269            kwargs=kwargs, patch_shape=patch_shape, resize_inputs=resize_inputs, resize_kwargs=resize_kwargs
270        )
271
272    return torch_em.default_segmentation_dataset(
273        raw_paths=volume_paths,
274        raw_key="raw",
275        label_paths=volume_paths,
276        label_key="labels/semantic",
277        patch_shape=patch_shape,
278        label_dtype=label_dtype,
279        is_seg_dataset=True,
280        with_channels=True,
281        ndim=2,
282        **kwargs
283    )
284
285
286def get_catch_loader(
287    path: Union[os.PathLike, str],
288    patch_shape: Tuple[int, int],
289    batch_size: int,
290    tumor_types: Optional[Union[str, List[str]]] = None,
291    level: Optional[int] = None,
292    download: bool = False,
293    label_dtype: torch.dtype = torch.int64,
294    resize_inputs: bool = False,
295    **kwargs
296) -> DataLoader:
297    """Get the CATCH dataloader for tissue segmentation in canine cutaneous tumor histopathology images.
298
299    Args:
300        path: Filepath to a folder where the data will be saved.
301        patch_shape: The patch shape to use for training.
302        batch_size: The batch size for training.
303        tumor_types: The tumor types to use. By default all seven tumor types are used.
304        level: The DICOM pyramid level to read. By default the highest resolution (base) level is used.
305        download: Whether to download the data if it is not present.
306        label_dtype: The datatype of the labels.
307        resize_inputs: Whether to resize the input images.
308        kwargs: Additional keyword arguments for `torch_em.default_segmentation_dataset` or for the PyTorch DataLoader.
309
310    Returns:
311        The DataLoader.
312    """
313    ds_kwargs, loader_kwargs = util.split_kwargs(torch_em.default_segmentation_dataset, **kwargs)
314    dataset = get_catch_dataset(
315        path=path, patch_shape=patch_shape, tumor_types=tumor_types, level=level, download=download,
316        label_dtype=label_dtype, resize_inputs=resize_inputs, **ds_kwargs
317    )
318    return torch_em.get_data_loader(dataset, batch_size, **loader_kwargs)
COCO_URL = 'https://www.cancerimagingarchive.net/wp-content/uploads/CATCH-json.zip'
IDC_COLLECTION = 'catch'
TUMOR_TYPES = ['Histiocytoma', 'MCT', 'Melanoma', 'PNST', 'Plasmacytoma', 'SCC', 'Trichoblastoma']
def get_catch_data( path: Union[os.PathLike, str], tumor_types: Union[List[str], str, NoneType] = None, level: Optional[int] = None, download: bool = False) -> str:
149def get_catch_data(
150    path: Union[os.PathLike, str],
151    tumor_types: Optional[Union[str, List[str]]] = None,
152    level: Optional[int] = None,
153    download: bool = False,
154) -> str:
155    """Download and preprocess the CATCH data.
156
157    Args:
158        path: Filepath to a folder where the data will be saved.
159        tumor_types: The tumor types to use. By default all seven tumor types are used.
160        level: The DICOM pyramid level to read. By default the highest resolution (base) level is used.
161        download: Whether to download the data if it is not present.
162
163    Returns:
164        Filepath to the folder where the preprocessed data is stored.
165    """
166    if tumor_types is None:
167        tumor_types = TUMOR_TYPES
168    if isinstance(tumor_types, str):
169        tumor_types = [tumor_types]
170    for tumor_type in tumor_types:
171        if tumor_type not in TUMOR_TYPES:
172            raise ValueError(f"'{tumor_type}' is not a valid tumor type. Choose from {TUMOR_TYPES}.")
173
174    preprocessed_dir = os.path.join(path, "preprocessed")
175    tmp_dir = os.path.join(path, "dicom")
176    os.makedirs(preprocessed_dir, exist_ok=True)
177
178    coco_path = os.path.join(path, "CATCH.json")
179    if not os.path.exists(coco_path):
180        zip_path = os.path.join(path, "CATCH-json.zip")
181        util.download_source(path=zip_path, url=COCO_URL, download=download, checksum=None)
182        with zipfile.ZipFile(zip_path, "r") as f:
183            f.extractall(path)
184
185    images, annotations = _load_coco(path)
186
187    try:
188        from idc_index import IDCClient
189    except ImportError:
190        raise ImportError("'idc-index' is required to download CATCH. Install it via conda/pip.")
191
192    # The slide microscopy index provides the 'ContainerIdentifier', which matches the COCO file name
193    # ('<ContainerIdentifier>.svs') and is unique per slide (unlike PatientID, where one patient may have
194    # several slides).
195    client = IDCClient()
196    client.fetch_index("sm_index")
197    catch = client.index[client.index["collection_id"] == IDC_COLLECTION]
198    catch = catch.merge(client.sm_index[["SeriesInstanceUID", "ContainerIdentifier"]], on="SeriesInstanceUID")
199
200    to_convert = catch[catch["ContainerIdentifier"].str.startswith(tuple(tumor_types))]
201    for _, row in tqdm(list(to_convert.iterrows()), desc="Converting CATCH slides"):
202        container_id = row["ContainerIdentifier"]
203        output_path = os.path.join(preprocessed_dir, f"{container_id}.h5")
204        if os.path.exists(output_path):
205            continue
206        if not download:
207            raise RuntimeError(f"Cannot find the data at {path}, but download was set to False.")
208        _convert_slide(
209            row["SeriesInstanceUID"], f"{container_id}.svs", images, annotations, level, output_path, tmp_dir
210        )
211
212    return preprocessed_dir

Download and preprocess the CATCH data.

Arguments:
  • path: Filepath to a folder where the data will be saved.
  • tumor_types: The tumor types to use. By default all seven tumor types are used.
  • level: The DICOM pyramid level to read. By default the highest resolution (base) level is used.
  • download: Whether to download the data if it is not present.
Returns:

Filepath to the folder where the preprocessed data is stored.

def get_catch_paths( path: Union[os.PathLike, str], tumor_types: Union[List[str], str, NoneType] = None, level: Optional[int] = None, download: bool = False) -> List[str]:
215def get_catch_paths(
216    path: Union[os.PathLike, str],
217    tumor_types: Optional[Union[str, List[str]]] = None,
218    level: Optional[int] = None,
219    download: bool = False,
220) -> List[str]:
221    """Get paths to the CATCH data.
222
223    Args:
224        path: Filepath to a folder where the data will be saved.
225        tumor_types: The tumor types to use. By default all seven tumor types are used.
226        level: The DICOM pyramid level to read. By default the highest resolution (base) level is used.
227        download: Whether to download the data if it is not present.
228
229    Returns:
230        List of filepaths to the preprocessed HDF5 files.
231    """
232    preprocessed_dir = get_catch_data(path, tumor_types, level, download)
233    volume_paths = sorted(glob(os.path.join(preprocessed_dir, "*.h5")))
234    if not volume_paths:
235        raise RuntimeError("Could not find any preprocessed CATCH slides for the requested settings.")
236
237    return volume_paths

Get paths to the CATCH data.

Arguments:
  • path: Filepath to a folder where the data will be saved.
  • tumor_types: The tumor types to use. By default all seven tumor types are used.
  • level: The DICOM pyramid level to read. By default the highest resolution (base) level is used.
  • download: Whether to download the data if it is not present.
Returns:

List of filepaths to the preprocessed HDF5 files.

def get_catch_dataset( path: Union[os.PathLike, str], patch_shape: Tuple[int, int], tumor_types: Union[List[str], str, NoneType] = None, level: Optional[int] = None, download: bool = False, label_dtype: torch.dtype = torch.int64, resize_inputs: bool = False, **kwargs) -> torch.utils.data.dataset.Dataset:
240def get_catch_dataset(
241    path: Union[os.PathLike, str],
242    patch_shape: Tuple[int, int],
243    tumor_types: Optional[Union[str, List[str]]] = None,
244    level: Optional[int] = None,
245    download: bool = False,
246    label_dtype: torch.dtype = torch.int64,
247    resize_inputs: bool = False,
248    **kwargs
249) -> Dataset:
250    """Get the CATCH dataset for tissue segmentation in canine cutaneous tumor histopathology images.
251
252    Args:
253        path: Filepath to a folder where the data will be saved.
254        patch_shape: The patch shape to use for training.
255        tumor_types: The tumor types to use. By default all seven tumor types are used.
256        level: The DICOM pyramid level to read. By default the highest resolution (base) level is used.
257        download: Whether to download the data if it is not present.
258        label_dtype: The datatype of the labels.
259        resize_inputs: Whether to resize the input images.
260        kwargs: Additional keyword arguments for `torch_em.default_segmentation_dataset`.
261
262    Returns:
263        The segmentation dataset.
264    """
265    volume_paths = get_catch_paths(path, tumor_types, level, download)
266
267    if resize_inputs:
268        resize_kwargs = {"patch_shape": patch_shape, "is_rgb": True}
269        kwargs, patch_shape = util.update_kwargs_for_resize_trafo(
270            kwargs=kwargs, patch_shape=patch_shape, resize_inputs=resize_inputs, resize_kwargs=resize_kwargs
271        )
272
273    return torch_em.default_segmentation_dataset(
274        raw_paths=volume_paths,
275        raw_key="raw",
276        label_paths=volume_paths,
277        label_key="labels/semantic",
278        patch_shape=patch_shape,
279        label_dtype=label_dtype,
280        is_seg_dataset=True,
281        with_channels=True,
282        ndim=2,
283        **kwargs
284    )

Get the CATCH dataset for tissue segmentation in canine cutaneous tumor histopathology images.

Arguments:
  • path: Filepath to a folder where the data will be saved.
  • patch_shape: The patch shape to use for training.
  • tumor_types: The tumor types to use. By default all seven tumor types are used.
  • level: The DICOM pyramid level to read. By default the highest resolution (base) level is used.
  • download: Whether to download the data if it is not present.
  • label_dtype: The datatype of the labels.
  • resize_inputs: Whether to resize the input images.
  • kwargs: Additional keyword arguments for torch_em.default_segmentation_dataset.
Returns:

The segmentation dataset.

def get_catch_loader( path: Union[os.PathLike, str], patch_shape: Tuple[int, int], batch_size: int, tumor_types: Union[List[str], str, NoneType] = None, level: Optional[int] = None, download: bool = False, label_dtype: torch.dtype = torch.int64, resize_inputs: bool = False, **kwargs) -> torch.utils.data.dataloader.DataLoader:
287def get_catch_loader(
288    path: Union[os.PathLike, str],
289    patch_shape: Tuple[int, int],
290    batch_size: int,
291    tumor_types: Optional[Union[str, List[str]]] = None,
292    level: Optional[int] = None,
293    download: bool = False,
294    label_dtype: torch.dtype = torch.int64,
295    resize_inputs: bool = False,
296    **kwargs
297) -> DataLoader:
298    """Get the CATCH dataloader for tissue segmentation in canine cutaneous tumor histopathology images.
299
300    Args:
301        path: Filepath to a folder where the data will be saved.
302        patch_shape: The patch shape to use for training.
303        batch_size: The batch size for training.
304        tumor_types: The tumor types to use. By default all seven tumor types are used.
305        level: The DICOM pyramid level to read. By default the highest resolution (base) level is used.
306        download: Whether to download the data if it is not present.
307        label_dtype: The datatype of the labels.
308        resize_inputs: Whether to resize the input images.
309        kwargs: Additional keyword arguments for `torch_em.default_segmentation_dataset` or for the PyTorch DataLoader.
310
311    Returns:
312        The DataLoader.
313    """
314    ds_kwargs, loader_kwargs = util.split_kwargs(torch_em.default_segmentation_dataset, **kwargs)
315    dataset = get_catch_dataset(
316        path=path, patch_shape=patch_shape, tumor_types=tumor_types, level=level, download=download,
317        label_dtype=label_dtype, resize_inputs=resize_inputs, **ds_kwargs
318    )
319    return torch_em.get_data_loader(dataset, batch_size, **loader_kwargs)

Get the CATCH dataloader for tissue segmentation in canine cutaneous tumor histopathology images.

Arguments:
  • path: Filepath to a folder where the data will be saved.
  • patch_shape: The patch shape to use for training.
  • batch_size: The batch size for training.
  • tumor_types: The tumor types to use. By default all seven tumor types are used.
  • level: The DICOM pyramid level to read. By default the highest resolution (base) level is used.
  • download: Whether to download the data if it is not present.
  • label_dtype: The datatype of the labels.
  • resize_inputs: Whether to resize the input images.
  • kwargs: Additional keyword arguments for torch_em.default_segmentation_dataset or for the PyTorch DataLoader.
Returns:

The DataLoader.