torch_em.data.datasets.medical.msd
The MSD dataset contains annotations for 10 different datasets, composed of multiple structures / organs across various medical imaging modalities.
Here's an example for how to pass different tasks:
# we want to get datasets for one task, eg. "heart"
task_names = ["heart"]
# Example: We want to get datasets for multiple tasks
# NOTE 1: it's important to note that datasets with similar number of modality (channels) can be paired together.
# to use different datasets together, you need to use "raw_transform" to update inputs per dataset
# to pair as desired patch shapes per batch.
# Example 1: "heart", "liver", "lung" all have one modality inputs
task_names = ["heart", "lung", "liver"]
# Example 2: "braintumour" and "prostate" have multi-modal inputs, however the no. of modalities are not equal.
# hence, you can use only one at a time.
task_names = ["prostate"]
This dataset is from the Medical Segmentation Decathlon Challenge:
- Antonelli et al. - https://doi.org/10.1038/s41467-022-30695-9
- Link - http://medicaldecathlon.com/
Please cite them if you use this dataset for your research.
1"""The MSD dataset contains annotations for 10 different datasets, 2composed of multiple structures / organs across various medical imaging modalities. 3 4Here's an example for how to pass different tasks: 5```python 6# we want to get datasets for one task, eg. "heart" 7task_names = ["heart"] 8 9# Example: We want to get datasets for multiple tasks 10# NOTE 1: it's important to note that datasets with similar number of modality (channels) can be paired together. 11# to use different datasets together, you need to use "raw_transform" to update inputs per dataset 12# to pair as desired patch shapes per batch. 13# Example 1: "heart", "liver", "lung" all have one modality inputs 14task_names = ["heart", "lung", "liver"] 15 16# Example 2: "braintumour" and "prostate" have multi-modal inputs, however the no. of modalities are not equal. 17# hence, you can use only one at a time. 18task_names = ["prostate"] 19``` 20 21This dataset is from the Medical Segmentation Decathlon Challenge: 22- Antonelli et al. - https://doi.org/10.1038/s41467-022-30695-9 23- Link - http://medicaldecathlon.com/ 24 25Please cite them if you use this dataset for your research. 26""" 27 28import os 29from glob import glob 30from pathlib import Path 31from typing import Tuple, List, Union 32 33from torch.utils.data import Dataset, DataLoader 34 35import torch_em 36 37from .. import util 38from ....data import ConcatDataset 39 40 41URL = { 42 "braintumour": "https://msd-for-monai.s3-us-west-2.amazonaws.com/Task01_BrainTumour.tar", 43 "heart": "https://msd-for-monai.s3-us-west-2.amazonaws.com/Task02_Heart.tar", 44 "liver": "https://msd-for-monai.s3-us-west-2.amazonaws.com/Task03_Liver.tar", 45 "hippocampus": "https://msd-for-monai.s3-us-west-2.amazonaws.com/Task04_Hippocampus.tar", 46 "prostate": "https://msd-for-monai.s3-us-west-2.amazonaws.com/Task05_Prostate.tar", 47 "lung": "https://msd-for-monai.s3-us-west-2.amazonaws.com/Task06_Lung.tar", 48 "pancreas": "https://msd-for-monai.s3-us-west-2.amazonaws.com/Task07_Pancreas.tar", 49 "hepaticvessel": "https://msd-for-monai.s3-us-west-2.amazonaws.com/Task08_HepaticVessel.tar", 50 "spleen": "https://msd-for-monai.s3-us-west-2.amazonaws.com/Task09_Spleen.tar", 51 "colon": "https://msd-for-monai.s3-us-west-2.amazonaws.com/Task10_Colon.tar", 52} 53 54CHECKSUM = { 55 "braintumour": "d423911308d2ae5396d9c6bf4fad2b68cfde2dd09044269da9c0d639c22753c4", 56 "heart": "4277dc6dfe100142aa8060e895f6ff0f81c5b733703ea250bd294df8f820bcba", 57 "liver": "4007d9db1acda850d57a6ceb2b3998b7a0d43f8ad5a3f740dc38bc0cb8b7a2c5", 58 "hippocampus": "282d808a3e84e5a52f090d9dd4c0b0057b94a6bd51ad41569aef5ff303287771", 59 "prostate": "8cbbd7147691109b880ff8774eb6ab26704b1be0935482e7996a36a4ed31ec79", 60 "lung": "f782cd09da9cf7a3128475d4a53650d371db10f0427aa76e166fccfcb2654161", 61 "pancreas": "e40181a0229ca85c2588d6ebb90fa6674f84eb1e66f0f968cda088d011769732", 62 "hepaticvessel": "ee880799f12e3b6e1ef2f8645f6626c5b39de77a4f1eae6f496c25fbf306ba04", 63 "spleen": "dfeba347daae4fb08c38f4d243ab606b28b91b206ffc445ec55c35489fa65e60", 64 "colon": "a26bfd23faf2de703f5a51a262cd4e2b9774c47e7fb86f0e0a854f8446ec2325", 65} 66 67FILENAMES = { 68 "braintumour": "Task01_BrainTumour.tar", 69 "heart": "Task02_Heart.tar", 70 "liver": "Task03_Liver.tar", 71 "hippocampus": "Task04_Hippocampus.tar", 72 "prostate": "Task05_Prostate.tar", 73 "lung": "Task06_Lung.tar", 74 "pancreas": "Task07_Pancreas.tar", 75 "hepaticvessel": "Task08_HepaticVessel.tar", 76 "spleen": "Task09_Spleen.tar", 77 "colon": "Task10_Colon.tar", 78} 79 80 81def get_msd_data(path: Union[os.PathLike, str], task_name: str, download: bool = False) -> str: 82 """Download the MSD dataset. 83 84 Args: 85 path: Filepath to a folder where the data is downloaded for further processing. 86 task_name: The choice of specific task. 87 download: Whether to download the data if it is not present. 88 89 Returns: 90 Filepath where the data is downloaded. 91 """ 92 data_dir = os.path.join(path, "data", task_name) 93 if os.path.exists(data_dir): 94 return data_dir 95 96 os.makedirs(path, exist_ok=True) 97 98 fpath = os.path.join(path, FILENAMES[task_name]) 99 util.download_source(path=fpath, url=URL[task_name], download=download, checksum=None) 100 util.unzip_tarfile(tar_path=fpath, dst=data_dir, remove=False) 101 102 return data_dir 103 104 105def get_msd_dataset( 106 path: Union[os.PathLike, str], 107 patch_shape: Tuple[int, ...], 108 task_names: Union[str, List[str]], 109 download: bool = False, 110 **kwargs 111) -> Dataset: 112 """Get the MSD dataset for semantic segmentation in medical imaging datasets. 113 114 Args: 115 path: Filepath to a folder where the data is downloaded for further processing. 116 patch_shape: The patch shape to use for training. 117 task_names: The names for the 10 different segmentation tasks (see the challenge website for further details): 118 1. tasks with 1 modality inputs are: heart, liver, hippocampus, lung, pancreas, hepaticvessel, spleen, colon 119 2. tasks with multi-modality inputs are: 120 - braintumour: with 4 modality (channel) inputs 121 - prostate: with 2 modality (channel) inputs 122 download: Whether to download the data if it is not present. 123 kwargs: Additional keyword arguments for `torch_em.default_segmentation_dataset`. 124 125 Returns: 126 The segmentation dataset. 127 """ 128 if isinstance(task_names, str): 129 task_names = [task_names] 130 131 _datasets = [] 132 for task_name in task_names: 133 data_dir = get_msd_data(path, task_name, download) 134 image_paths = glob(os.path.join(data_dir, Path(FILENAMES[task_name]).stem, "imagesTr", "*.nii.gz")) 135 label_paths = glob(os.path.join(data_dir, Path(FILENAMES[task_name]).stem, "labelsTr", "*.nii.gz")) 136 137 if task_name in ["braintumour", "prostate"]: 138 kwargs["with_channels"] = True 139 140 this_dataset = torch_em.default_segmentation_dataset( 141 raw_paths=image_paths, 142 raw_key="data", 143 label_paths=label_paths, 144 label_key="data", 145 patch_shape=patch_shape, 146 **kwargs 147 ) 148 _datasets.append(this_dataset) 149 150 return ConcatDataset(*_datasets) 151 152 153def get_msd_loader( 154 path: Union[os.PathLike, str], 155 batch_size: int, 156 patch_shape: Tuple[int, ...], 157 task_names: Union[str, List[str]], 158 download: bool = False, 159 **kwargs 160) -> DataLoader: 161 """Get the MSD dataloader for semantic segmentation in medical imaging datasets. 162 163 Args: 164 path: Filepath to a folder where the data is downloaded for further processing. 165 batch_size: The batch size for training. 166 patch_shape: The patch shape to use for training. 167 task_names: The names for the 10 different segmentation tasks (see the challenge website for further details): 168 1. tasks with 1 modality inputs are: heart, liver, hippocampus, lung, pancreas, hepaticvessel, spleen, colon 169 2. tasks with multi-modality inputs are: 170 - braintumour: with 4 modality (channel) inputs 171 - prostate: with 2 modality (channel) inputs 172 download: Whether to download the data if it is not present. 173 kwargs: Additional keyword arguments for `torch_em.default_segmentation_dataset` or for the PyTorch DataLoader. 174 175 Returns: 176 The DataLoader. 177 """ 178 ds_kwargs, loader_kwargs = util.split_kwargs(torch_em.default_segmentation_dataset, **kwargs) 179 dataset = get_msd_dataset(path, patch_shape, task_names, download, **ds_kwargs) 180 return torch_em.get_data_loader(dataset, batch_size, **loader_kwargs)
URL =
{'braintumour': 'https://msd-for-monai.s3-us-west-2.amazonaws.com/Task01_BrainTumour.tar', 'heart': 'https://msd-for-monai.s3-us-west-2.amazonaws.com/Task02_Heart.tar', 'liver': 'https://msd-for-monai.s3-us-west-2.amazonaws.com/Task03_Liver.tar', 'hippocampus': 'https://msd-for-monai.s3-us-west-2.amazonaws.com/Task04_Hippocampus.tar', 'prostate': 'https://msd-for-monai.s3-us-west-2.amazonaws.com/Task05_Prostate.tar', 'lung': 'https://msd-for-monai.s3-us-west-2.amazonaws.com/Task06_Lung.tar', 'pancreas': 'https://msd-for-monai.s3-us-west-2.amazonaws.com/Task07_Pancreas.tar', 'hepaticvessel': 'https://msd-for-monai.s3-us-west-2.amazonaws.com/Task08_HepaticVessel.tar', 'spleen': 'https://msd-for-monai.s3-us-west-2.amazonaws.com/Task09_Spleen.tar', 'colon': 'https://msd-for-monai.s3-us-west-2.amazonaws.com/Task10_Colon.tar'}
CHECKSUM =
{'braintumour': 'd423911308d2ae5396d9c6bf4fad2b68cfde2dd09044269da9c0d639c22753c4', 'heart': '4277dc6dfe100142aa8060e895f6ff0f81c5b733703ea250bd294df8f820bcba', 'liver': '4007d9db1acda850d57a6ceb2b3998b7a0d43f8ad5a3f740dc38bc0cb8b7a2c5', 'hippocampus': '282d808a3e84e5a52f090d9dd4c0b0057b94a6bd51ad41569aef5ff303287771', 'prostate': '8cbbd7147691109b880ff8774eb6ab26704b1be0935482e7996a36a4ed31ec79', 'lung': 'f782cd09da9cf7a3128475d4a53650d371db10f0427aa76e166fccfcb2654161', 'pancreas': 'e40181a0229ca85c2588d6ebb90fa6674f84eb1e66f0f968cda088d011769732', 'hepaticvessel': 'ee880799f12e3b6e1ef2f8645f6626c5b39de77a4f1eae6f496c25fbf306ba04', 'spleen': 'dfeba347daae4fb08c38f4d243ab606b28b91b206ffc445ec55c35489fa65e60', 'colon': 'a26bfd23faf2de703f5a51a262cd4e2b9774c47e7fb86f0e0a854f8446ec2325'}
FILENAMES =
{'braintumour': 'Task01_BrainTumour.tar', 'heart': 'Task02_Heart.tar', 'liver': 'Task03_Liver.tar', 'hippocampus': 'Task04_Hippocampus.tar', 'prostate': 'Task05_Prostate.tar', 'lung': 'Task06_Lung.tar', 'pancreas': 'Task07_Pancreas.tar', 'hepaticvessel': 'Task08_HepaticVessel.tar', 'spleen': 'Task09_Spleen.tar', 'colon': 'Task10_Colon.tar'}
def
get_msd_data( path: Union[os.PathLike, str], task_name: str, download: bool = False) -> str:
82def get_msd_data(path: Union[os.PathLike, str], task_name: str, download: bool = False) -> str: 83 """Download the MSD dataset. 84 85 Args: 86 path: Filepath to a folder where the data is downloaded for further processing. 87 task_name: The choice of specific task. 88 download: Whether to download the data if it is not present. 89 90 Returns: 91 Filepath where the data is downloaded. 92 """ 93 data_dir = os.path.join(path, "data", task_name) 94 if os.path.exists(data_dir): 95 return data_dir 96 97 os.makedirs(path, exist_ok=True) 98 99 fpath = os.path.join(path, FILENAMES[task_name]) 100 util.download_source(path=fpath, url=URL[task_name], download=download, checksum=None) 101 util.unzip_tarfile(tar_path=fpath, dst=data_dir, remove=False) 102 103 return data_dir
Download the MSD dataset.
Arguments:
- path: Filepath to a folder where the data is downloaded for further processing.
- task_name: The choice of specific task.
- download: Whether to download the data if it is not present.
Returns:
Filepath where the data is downloaded.
def
get_msd_dataset( path: Union[os.PathLike, str], patch_shape: Tuple[int, ...], task_names: Union[str, List[str]], download: bool = False, **kwargs) -> torch.utils.data.dataset.Dataset:
106def get_msd_dataset( 107 path: Union[os.PathLike, str], 108 patch_shape: Tuple[int, ...], 109 task_names: Union[str, List[str]], 110 download: bool = False, 111 **kwargs 112) -> Dataset: 113 """Get the MSD dataset for semantic segmentation in medical imaging datasets. 114 115 Args: 116 path: Filepath to a folder where the data is downloaded for further processing. 117 patch_shape: The patch shape to use for training. 118 task_names: The names for the 10 different segmentation tasks (see the challenge website for further details): 119 1. tasks with 1 modality inputs are: heart, liver, hippocampus, lung, pancreas, hepaticvessel, spleen, colon 120 2. tasks with multi-modality inputs are: 121 - braintumour: with 4 modality (channel) inputs 122 - prostate: with 2 modality (channel) inputs 123 download: Whether to download the data if it is not present. 124 kwargs: Additional keyword arguments for `torch_em.default_segmentation_dataset`. 125 126 Returns: 127 The segmentation dataset. 128 """ 129 if isinstance(task_names, str): 130 task_names = [task_names] 131 132 _datasets = [] 133 for task_name in task_names: 134 data_dir = get_msd_data(path, task_name, download) 135 image_paths = glob(os.path.join(data_dir, Path(FILENAMES[task_name]).stem, "imagesTr", "*.nii.gz")) 136 label_paths = glob(os.path.join(data_dir, Path(FILENAMES[task_name]).stem, "labelsTr", "*.nii.gz")) 137 138 if task_name in ["braintumour", "prostate"]: 139 kwargs["with_channels"] = True 140 141 this_dataset = torch_em.default_segmentation_dataset( 142 raw_paths=image_paths, 143 raw_key="data", 144 label_paths=label_paths, 145 label_key="data", 146 patch_shape=patch_shape, 147 **kwargs 148 ) 149 _datasets.append(this_dataset) 150 151 return ConcatDataset(*_datasets)
Get the MSD dataset for semantic segmentation in medical imaging datasets.
Arguments:
- path: Filepath to a folder where the data is downloaded for further processing.
- patch_shape: The patch shape to use for training.
- task_names: The names for the 10 different segmentation tasks (see the challenge website for further details):
- tasks with 1 modality inputs are: heart, liver, hippocampus, lung, pancreas, hepaticvessel, spleen, colon
- tasks with multi-modality inputs are:
- braintumour: with 4 modality (channel) inputs
- prostate: with 2 modality (channel) inputs
- download: Whether to download the data if it is not present.
- kwargs: Additional keyword arguments for
torch_em.default_segmentation_dataset
.
Returns:
The segmentation dataset.
def
get_msd_loader( path: Union[os.PathLike, str], batch_size: int, patch_shape: Tuple[int, ...], task_names: Union[str, List[str]], download: bool = False, **kwargs) -> torch.utils.data.dataloader.DataLoader:
154def get_msd_loader( 155 path: Union[os.PathLike, str], 156 batch_size: int, 157 patch_shape: Tuple[int, ...], 158 task_names: Union[str, List[str]], 159 download: bool = False, 160 **kwargs 161) -> DataLoader: 162 """Get the MSD dataloader for semantic segmentation in medical imaging datasets. 163 164 Args: 165 path: Filepath to a folder where the data is downloaded for further processing. 166 batch_size: The batch size for training. 167 patch_shape: The patch shape to use for training. 168 task_names: The names for the 10 different segmentation tasks (see the challenge website for further details): 169 1. tasks with 1 modality inputs are: heart, liver, hippocampus, lung, pancreas, hepaticvessel, spleen, colon 170 2. tasks with multi-modality inputs are: 171 - braintumour: with 4 modality (channel) inputs 172 - prostate: with 2 modality (channel) inputs 173 download: Whether to download the data if it is not present. 174 kwargs: Additional keyword arguments for `torch_em.default_segmentation_dataset` or for the PyTorch DataLoader. 175 176 Returns: 177 The DataLoader. 178 """ 179 ds_kwargs, loader_kwargs = util.split_kwargs(torch_em.default_segmentation_dataset, **kwargs) 180 dataset = get_msd_dataset(path, patch_shape, task_names, download, **ds_kwargs) 181 return torch_em.get_data_loader(dataset, batch_size, **loader_kwargs)
Get the MSD dataloader for semantic segmentation in medical imaging datasets.
Arguments:
- path: Filepath to a folder where the data is downloaded for further processing.
- batch_size: The batch size for training.
- patch_shape: The patch shape to use for training.
- task_names: The names for the 10 different segmentation tasks (see the challenge website for further details):
- tasks with 1 modality inputs are: heart, liver, hippocampus, lung, pancreas, hepaticvessel, spleen, colon
- tasks with multi-modality inputs are:
- braintumour: with 4 modality (channel) inputs
- prostate: with 2 modality (channel) inputs
- download: Whether to download the data if it is not present.
- kwargs: Additional keyword arguments for
torch_em.default_segmentation_dataset
or for the PyTorch DataLoader.
Returns:
The DataLoader.