torch_em.data.datasets.medical.msd

The MSD dataset contains annotations for 10 different datasets, composed of multiple structures / organs across various medical imaging modalities.

Here's an example for how to pass different tasks:

# we want to get datasets for one task, eg. "heart"
task_names = ["heart"]

# Example: We want to get datasets for multiple tasks
# NOTE 1: it's important to note that datasets with similar number of modality (channels) can be paired together.
# to use different datasets together, you need to use "raw_transform" to update inputs per dataset
# to pair as desired patch shapes per batch.
# Example 1: "heart", "liver", "lung" all have one modality inputs
task_names = ["heart", "lung", "liver"]

# Example 2: "braintumour" and "prostate" have multi-modal inputs, however the no. of modalities are not equal.
# hence, you can use only one at a time.
task_names = ["prostate"]

This dataset is from the Medical Segmentation Decathlon Challenge:

Please cite them if you use this dataset for your research.

  1"""The MSD dataset contains annotations for 10 different datasets,
  2composed of multiple structures / organs across various medical imaging modalities.
  3
  4Here's an example for how to pass different tasks:
  5```python
  6# we want to get datasets for one task, eg. "heart"
  7task_names = ["heart"]
  8
  9# Example: We want to get datasets for multiple tasks
 10# NOTE 1: it's important to note that datasets with similar number of modality (channels) can be paired together.
 11# to use different datasets together, you need to use "raw_transform" to update inputs per dataset
 12# to pair as desired patch shapes per batch.
 13# Example 1: "heart", "liver", "lung" all have one modality inputs
 14task_names = ["heart", "lung", "liver"]
 15
 16# Example 2: "braintumour" and "prostate" have multi-modal inputs, however the no. of modalities are not equal.
 17# hence, you can use only one at a time.
 18task_names = ["prostate"]
 19```
 20
 21This dataset is from the Medical Segmentation Decathlon Challenge:
 22- Antonelli et al. - https://doi.org/10.1038/s41467-022-30695-9
 23- Link - http://medicaldecathlon.com/
 24
 25Please cite them if you use this dataset for your research.
 26"""
 27
 28import os
 29from glob import glob
 30from pathlib import Path
 31from typing import Tuple, List, Union
 32
 33from torch.utils.data import Dataset, DataLoader
 34
 35import torch_em
 36
 37from .. import util
 38from ....data import ConcatDataset
 39
 40
 41URL = {
 42    "braintumour": "https://msd-for-monai.s3-us-west-2.amazonaws.com/Task01_BrainTumour.tar",
 43    "heart": "https://msd-for-monai.s3-us-west-2.amazonaws.com/Task02_Heart.tar",
 44    "liver": "https://msd-for-monai.s3-us-west-2.amazonaws.com/Task03_Liver.tar",
 45    "hippocampus": "https://msd-for-monai.s3-us-west-2.amazonaws.com/Task04_Hippocampus.tar",
 46    "prostate": "https://msd-for-monai.s3-us-west-2.amazonaws.com/Task05_Prostate.tar",
 47    "lung": "https://msd-for-monai.s3-us-west-2.amazonaws.com/Task06_Lung.tar",
 48    "pancreas": "https://msd-for-monai.s3-us-west-2.amazonaws.com/Task07_Pancreas.tar",
 49    "hepaticvessel": "https://msd-for-monai.s3-us-west-2.amazonaws.com/Task08_HepaticVessel.tar",
 50    "spleen": "https://msd-for-monai.s3-us-west-2.amazonaws.com/Task09_Spleen.tar",
 51    "colon": "https://msd-for-monai.s3-us-west-2.amazonaws.com/Task10_Colon.tar",
 52}
 53
 54CHECKSUM = {
 55    "braintumour": "d423911308d2ae5396d9c6bf4fad2b68cfde2dd09044269da9c0d639c22753c4",
 56    "heart": "4277dc6dfe100142aa8060e895f6ff0f81c5b733703ea250bd294df8f820bcba",
 57    "liver": "4007d9db1acda850d57a6ceb2b3998b7a0d43f8ad5a3f740dc38bc0cb8b7a2c5",
 58    "hippocampus": "282d808a3e84e5a52f090d9dd4c0b0057b94a6bd51ad41569aef5ff303287771",
 59    "prostate": "8cbbd7147691109b880ff8774eb6ab26704b1be0935482e7996a36a4ed31ec79",
 60    "lung": "f782cd09da9cf7a3128475d4a53650d371db10f0427aa76e166fccfcb2654161",
 61    "pancreas": "e40181a0229ca85c2588d6ebb90fa6674f84eb1e66f0f968cda088d011769732",
 62    "hepaticvessel": "ee880799f12e3b6e1ef2f8645f6626c5b39de77a4f1eae6f496c25fbf306ba04",
 63    "spleen": "dfeba347daae4fb08c38f4d243ab606b28b91b206ffc445ec55c35489fa65e60",
 64    "colon": "a26bfd23faf2de703f5a51a262cd4e2b9774c47e7fb86f0e0a854f8446ec2325",
 65}
 66
 67FILENAMES = {
 68    "braintumour": "Task01_BrainTumour.tar",
 69    "heart": "Task02_Heart.tar",
 70    "liver": "Task03_Liver.tar",
 71    "hippocampus": "Task04_Hippocampus.tar",
 72    "prostate": "Task05_Prostate.tar",
 73    "lung": "Task06_Lung.tar",
 74    "pancreas": "Task07_Pancreas.tar",
 75    "hepaticvessel": "Task08_HepaticVessel.tar",
 76    "spleen": "Task09_Spleen.tar",
 77    "colon": "Task10_Colon.tar",
 78}
 79
 80
 81def get_msd_data(path: Union[os.PathLike, str], task_name: str, download: bool = False) -> str:
 82    """Download the MSD dataset.
 83
 84    Args:
 85        path: Filepath to a folder where the data is downloaded for further processing.
 86        task_name: The choice of specific task.
 87        download: Whether to download the data if it is not present.
 88
 89    Returns:
 90        Filepath where the data is downloaded.
 91    """
 92    data_dir = os.path.join(path, "data", task_name)
 93    if os.path.exists(data_dir):
 94        return data_dir
 95
 96    os.makedirs(path, exist_ok=True)
 97
 98    fpath = os.path.join(path, FILENAMES[task_name])
 99    util.download_source(path=fpath, url=URL[task_name], download=download, checksum=None)
100    util.unzip_tarfile(tar_path=fpath, dst=data_dir, remove=False)
101
102    return data_dir
103
104
105def get_msd_dataset(
106    path: Union[os.PathLike, str],
107    patch_shape: Tuple[int, ...],
108    task_names: Union[str, List[str]],
109    download: bool = False,
110    **kwargs
111) -> Dataset:
112    """Get the MSD dataset for semantic segmentation in medical imaging datasets.
113
114    Args:
115        path: Filepath to a folder where the data is downloaded for further processing.
116        patch_shape: The patch shape to use for training.
117        task_names: The names for the 10 different segmentation tasks (see the challenge website for further details):
118            1. tasks with 1 modality inputs are: heart, liver, hippocampus, lung, pancreas, hepaticvessel, spleen, colon
119            2. tasks with multi-modality inputs are:
120                - braintumour: with 4 modality (channel) inputs
121                - prostate: with 2 modality (channel) inputs
122        download: Whether to download the data if it is not present.
123        kwargs: Additional keyword arguments for `torch_em.default_segmentation_dataset`.
124
125    Returns:
126        The segmentation dataset.
127    """
128    if isinstance(task_names, str):
129        task_names = [task_names]
130
131    _datasets = []
132    for task_name in task_names:
133        data_dir = get_msd_data(path, task_name, download)
134        image_paths = glob(os.path.join(data_dir, Path(FILENAMES[task_name]).stem, "imagesTr", "*.nii.gz"))
135        label_paths = glob(os.path.join(data_dir, Path(FILENAMES[task_name]).stem, "labelsTr", "*.nii.gz"))
136
137        if task_name in ["braintumour", "prostate"]:
138            kwargs["with_channels"] = True
139
140        this_dataset = torch_em.default_segmentation_dataset(
141            raw_paths=image_paths,
142            raw_key="data",
143            label_paths=label_paths,
144            label_key="data",
145            patch_shape=patch_shape,
146            **kwargs
147        )
148        _datasets.append(this_dataset)
149
150    return ConcatDataset(*_datasets)
151
152
153def get_msd_loader(
154    path: Union[os.PathLike, str],
155    batch_size: int,
156    patch_shape: Tuple[int, ...],
157    task_names: Union[str, List[str]],
158    download: bool = False,
159    **kwargs
160) -> DataLoader:
161    """Get the MSD dataloader for semantic segmentation in medical imaging datasets.
162
163    Args:
164        path: Filepath to a folder where the data is downloaded for further processing.
165        batch_size: The batch size for training.
166        patch_shape: The patch shape to use for training.
167        task_names: The names for the 10 different segmentation tasks (see the challenge website for further details):
168            1. tasks with 1 modality inputs are: heart, liver, hippocampus, lung, pancreas, hepaticvessel, spleen, colon
169            2. tasks with multi-modality inputs are:
170                - braintumour: with 4 modality (channel) inputs
171                - prostate: with 2 modality (channel) inputs
172        download: Whether to download the data if it is not present.
173        kwargs: Additional keyword arguments for `torch_em.default_segmentation_dataset` or for the PyTorch DataLoader.
174
175    Returns:
176        The DataLoader.
177    """
178    ds_kwargs, loader_kwargs = util.split_kwargs(torch_em.default_segmentation_dataset, **kwargs)
179    dataset = get_msd_dataset(path, patch_shape, task_names, download, **ds_kwargs)
180    return torch_em.get_data_loader(dataset, batch_size, **loader_kwargs)
URL = {'braintumour': 'https://msd-for-monai.s3-us-west-2.amazonaws.com/Task01_BrainTumour.tar', 'heart': 'https://msd-for-monai.s3-us-west-2.amazonaws.com/Task02_Heart.tar', 'liver': 'https://msd-for-monai.s3-us-west-2.amazonaws.com/Task03_Liver.tar', 'hippocampus': 'https://msd-for-monai.s3-us-west-2.amazonaws.com/Task04_Hippocampus.tar', 'prostate': 'https://msd-for-monai.s3-us-west-2.amazonaws.com/Task05_Prostate.tar', 'lung': 'https://msd-for-monai.s3-us-west-2.amazonaws.com/Task06_Lung.tar', 'pancreas': 'https://msd-for-monai.s3-us-west-2.amazonaws.com/Task07_Pancreas.tar', 'hepaticvessel': 'https://msd-for-monai.s3-us-west-2.amazonaws.com/Task08_HepaticVessel.tar', 'spleen': 'https://msd-for-monai.s3-us-west-2.amazonaws.com/Task09_Spleen.tar', 'colon': 'https://msd-for-monai.s3-us-west-2.amazonaws.com/Task10_Colon.tar'}
CHECKSUM = {'braintumour': 'd423911308d2ae5396d9c6bf4fad2b68cfde2dd09044269da9c0d639c22753c4', 'heart': '4277dc6dfe100142aa8060e895f6ff0f81c5b733703ea250bd294df8f820bcba', 'liver': '4007d9db1acda850d57a6ceb2b3998b7a0d43f8ad5a3f740dc38bc0cb8b7a2c5', 'hippocampus': '282d808a3e84e5a52f090d9dd4c0b0057b94a6bd51ad41569aef5ff303287771', 'prostate': '8cbbd7147691109b880ff8774eb6ab26704b1be0935482e7996a36a4ed31ec79', 'lung': 'f782cd09da9cf7a3128475d4a53650d371db10f0427aa76e166fccfcb2654161', 'pancreas': 'e40181a0229ca85c2588d6ebb90fa6674f84eb1e66f0f968cda088d011769732', 'hepaticvessel': 'ee880799f12e3b6e1ef2f8645f6626c5b39de77a4f1eae6f496c25fbf306ba04', 'spleen': 'dfeba347daae4fb08c38f4d243ab606b28b91b206ffc445ec55c35489fa65e60', 'colon': 'a26bfd23faf2de703f5a51a262cd4e2b9774c47e7fb86f0e0a854f8446ec2325'}
FILENAMES = {'braintumour': 'Task01_BrainTumour.tar', 'heart': 'Task02_Heart.tar', 'liver': 'Task03_Liver.tar', 'hippocampus': 'Task04_Hippocampus.tar', 'prostate': 'Task05_Prostate.tar', 'lung': 'Task06_Lung.tar', 'pancreas': 'Task07_Pancreas.tar', 'hepaticvessel': 'Task08_HepaticVessel.tar', 'spleen': 'Task09_Spleen.tar', 'colon': 'Task10_Colon.tar'}
def get_msd_data( path: Union[os.PathLike, str], task_name: str, download: bool = False) -> str:
 82def get_msd_data(path: Union[os.PathLike, str], task_name: str, download: bool = False) -> str:
 83    """Download the MSD dataset.
 84
 85    Args:
 86        path: Filepath to a folder where the data is downloaded for further processing.
 87        task_name: The choice of specific task.
 88        download: Whether to download the data if it is not present.
 89
 90    Returns:
 91        Filepath where the data is downloaded.
 92    """
 93    data_dir = os.path.join(path, "data", task_name)
 94    if os.path.exists(data_dir):
 95        return data_dir
 96
 97    os.makedirs(path, exist_ok=True)
 98
 99    fpath = os.path.join(path, FILENAMES[task_name])
100    util.download_source(path=fpath, url=URL[task_name], download=download, checksum=None)
101    util.unzip_tarfile(tar_path=fpath, dst=data_dir, remove=False)
102
103    return data_dir

Download the MSD dataset.

Arguments:
  • path: Filepath to a folder where the data is downloaded for further processing.
  • task_name: The choice of specific task.
  • download: Whether to download the data if it is not present.
Returns:

Filepath where the data is downloaded.

def get_msd_dataset( path: Union[os.PathLike, str], patch_shape: Tuple[int, ...], task_names: Union[str, List[str]], download: bool = False, **kwargs) -> torch.utils.data.dataset.Dataset:
106def get_msd_dataset(
107    path: Union[os.PathLike, str],
108    patch_shape: Tuple[int, ...],
109    task_names: Union[str, List[str]],
110    download: bool = False,
111    **kwargs
112) -> Dataset:
113    """Get the MSD dataset for semantic segmentation in medical imaging datasets.
114
115    Args:
116        path: Filepath to a folder where the data is downloaded for further processing.
117        patch_shape: The patch shape to use for training.
118        task_names: The names for the 10 different segmentation tasks (see the challenge website for further details):
119            1. tasks with 1 modality inputs are: heart, liver, hippocampus, lung, pancreas, hepaticvessel, spleen, colon
120            2. tasks with multi-modality inputs are:
121                - braintumour: with 4 modality (channel) inputs
122                - prostate: with 2 modality (channel) inputs
123        download: Whether to download the data if it is not present.
124        kwargs: Additional keyword arguments for `torch_em.default_segmentation_dataset`.
125
126    Returns:
127        The segmentation dataset.
128    """
129    if isinstance(task_names, str):
130        task_names = [task_names]
131
132    _datasets = []
133    for task_name in task_names:
134        data_dir = get_msd_data(path, task_name, download)
135        image_paths = glob(os.path.join(data_dir, Path(FILENAMES[task_name]).stem, "imagesTr", "*.nii.gz"))
136        label_paths = glob(os.path.join(data_dir, Path(FILENAMES[task_name]).stem, "labelsTr", "*.nii.gz"))
137
138        if task_name in ["braintumour", "prostate"]:
139            kwargs["with_channels"] = True
140
141        this_dataset = torch_em.default_segmentation_dataset(
142            raw_paths=image_paths,
143            raw_key="data",
144            label_paths=label_paths,
145            label_key="data",
146            patch_shape=patch_shape,
147            **kwargs
148        )
149        _datasets.append(this_dataset)
150
151    return ConcatDataset(*_datasets)

Get the MSD dataset for semantic segmentation in medical imaging datasets.

Arguments:
  • path: Filepath to a folder where the data is downloaded for further processing.
  • patch_shape: The patch shape to use for training.
  • task_names: The names for the 10 different segmentation tasks (see the challenge website for further details):
    1. tasks with 1 modality inputs are: heart, liver, hippocampus, lung, pancreas, hepaticvessel, spleen, colon
    2. tasks with multi-modality inputs are:
      • braintumour: with 4 modality (channel) inputs
      • prostate: with 2 modality (channel) inputs
  • download: Whether to download the data if it is not present.
  • kwargs: Additional keyword arguments for torch_em.default_segmentation_dataset.
Returns:

The segmentation dataset.

def get_msd_loader( path: Union[os.PathLike, str], batch_size: int, patch_shape: Tuple[int, ...], task_names: Union[str, List[str]], download: bool = False, **kwargs) -> torch.utils.data.dataloader.DataLoader:
154def get_msd_loader(
155    path: Union[os.PathLike, str],
156    batch_size: int,
157    patch_shape: Tuple[int, ...],
158    task_names: Union[str, List[str]],
159    download: bool = False,
160    **kwargs
161) -> DataLoader:
162    """Get the MSD dataloader for semantic segmentation in medical imaging datasets.
163
164    Args:
165        path: Filepath to a folder where the data is downloaded for further processing.
166        batch_size: The batch size for training.
167        patch_shape: The patch shape to use for training.
168        task_names: The names for the 10 different segmentation tasks (see the challenge website for further details):
169            1. tasks with 1 modality inputs are: heart, liver, hippocampus, lung, pancreas, hepaticvessel, spleen, colon
170            2. tasks with multi-modality inputs are:
171                - braintumour: with 4 modality (channel) inputs
172                - prostate: with 2 modality (channel) inputs
173        download: Whether to download the data if it is not present.
174        kwargs: Additional keyword arguments for `torch_em.default_segmentation_dataset` or for the PyTorch DataLoader.
175
176    Returns:
177        The DataLoader.
178    """
179    ds_kwargs, loader_kwargs = util.split_kwargs(torch_em.default_segmentation_dataset, **kwargs)
180    dataset = get_msd_dataset(path, patch_shape, task_names, download, **ds_kwargs)
181    return torch_em.get_data_loader(dataset, batch_size, **loader_kwargs)

Get the MSD dataloader for semantic segmentation in medical imaging datasets.

Arguments:
  • path: Filepath to a folder where the data is downloaded for further processing.
  • batch_size: The batch size for training.
  • patch_shape: The patch shape to use for training.
  • task_names: The names for the 10 different segmentation tasks (see the challenge website for further details):
    1. tasks with 1 modality inputs are: heart, liver, hippocampus, lung, pancreas, hepaticvessel, spleen, colon
    2. tasks with multi-modality inputs are:
      • braintumour: with 4 modality (channel) inputs
      • prostate: with 2 modality (channel) inputs
  • download: Whether to download the data if it is not present.
  • kwargs: Additional keyword arguments for torch_em.default_segmentation_dataset or for the PyTorch DataLoader.
Returns:

The DataLoader.