torch_em.data.datasets.light_microscopy.liconn
The LICONN dataset contains a dense connectomic reconstruction of mouse hippocampal CA1 neuropil acquired by spinning-disk confocal microscopy of expansion-microscopy-processed tissue (~16x expansion), yielding a native voxel resolution of 9x9x12 nm (XYZ) at mip=0. All neuronal structures are densely annotated as instance segmentations: 18,268 axons (342 mm total length), 1,643 dendrites (119 mm total), and 71,269 spines.
Two segmentation variants are provided:
- 'proofread': manually proofread segmentation (higher accuracy).
- 'agglomerated': automatically agglomerated segmentation.
The data is served as Neuroglancer precomputed volumes from Google Cloud Storage (gs://liconn-public) and requires the cloudvolume package to download.
All volumes are stored in a single zarr v3 store (liconn.zarr) with sharding. The store contains arrays 'raw', 'seg_proofread', and 'seg_agglomerated'.
This dataset is from the following publication:
- Velicky et al. (2025): https://doi.org/10.1038/s41586-025-08985-1 Please cite it if you use this dataset in your research.
1"""The LICONN dataset contains a dense connectomic reconstruction of mouse hippocampal 2CA1 neuropil acquired by spinning-disk confocal microscopy of expansion-microscopy-processed 3tissue (~16x expansion), yielding a native voxel resolution of 9x9x12 nm (XYZ) at mip=0. 4All neuronal structures are densely annotated as instance segmentations: 18,268 axons 5(342 mm total length), 1,643 dendrites (119 mm total), and 71,269 spines. 6 7Two segmentation variants are provided: 8- 'proofread': manually proofread segmentation (higher accuracy). 9- 'agglomerated': automatically agglomerated segmentation. 10 11The data is served as Neuroglancer precomputed volumes from Google Cloud Storage 12(gs://liconn-public) and requires the cloudvolume package to download. 13 14All volumes are stored in a single zarr v3 store (liconn.zarr) with sharding. 15The store contains arrays 'raw', 'seg_proofread', and 'seg_agglomerated'. 16 17This dataset is from the following publication: 18- Velicky et al. (2025): https://doi.org/10.1038/s41586-025-08985-1 19Please cite it if you use this dataset in your research. 20""" 21 22import os 23from concurrent.futures import ThreadPoolExecutor, as_completed 24from typing import List, Optional, Tuple, Union 25 26import numpy as np 27from tqdm import tqdm 28 29from torch.utils.data import Dataset, DataLoader 30 31import torch_em 32 33from .. import util 34 35 36IMG_URL = "precomputed://https://storage.googleapis.com/liconn-public/ExPID82_1/image_230130b" 37SEG_PR_URL = "precomputed://https://storage.googleapis.com/liconn-public/ExPID82_1/segmentation/231030_agg_240123" 38SEG_AGG_URL = "precomputed://https://storage.googleapis.com/liconn-public/ExPID82_1/segmentation/231030_agg_230921_cmpl" 39 40SEGMENTATIONS = ("proofread", "agglomerated") 41 42ZARR_FNAME = "liconn.zarr" 43SHARD_SHAPE = (64, 256, 256) 44CHUNK_SHAPE = (32, 64, 64) 45 46 47def _to_zyx(a: np.ndarray) -> np.ndarray: 48 # CloudVolume returns (X, Y, Z[, C]); squeeze trailing channel dim if present. 49 if a.ndim == 4: 50 a = a.squeeze(axis=-1) 51 if a.ndim != 3: 52 raise ValueError(f"Expected 3D block, got shape {a.shape}") 53 return a.transpose(2, 1, 0) 54 55 56def _create_array(root, name: str, shape, dtype, is_label: bool): 57 from zarr.codecs import BloscCodec 58 59 shuffle = "bitshuffle" if (np.issubdtype(dtype, np.integer) and is_label) else "shuffle" 60 return root.create_array( 61 name, 62 shape=shape, 63 chunks=CHUNK_SHAPE, 64 shards=SHARD_SHAPE, 65 dtype=dtype, 66 compressors=BloscCodec(cname="zstd", clevel=6, shuffle=shuffle), 67 ) 68 69 70def _download_ng_volume(vol, ds, name: str) -> None: 71 x0, y0, z0 = map(int, vol.bounds.minpt) 72 x1, y1, z1 = map(int, vol.bounds.maxpt) 73 shape = (z1 - z0, y1 - y0, x1 - x0) 74 75 tasks = [] 76 for z0_ in range(0, shape[0], SHARD_SHAPE[0]): 77 for y0_ in range(0, shape[1], SHARD_SHAPE[1]): 78 for x0_ in range(0, shape[2], SHARD_SHAPE[2]): 79 z1_ = min(z0_ + SHARD_SHAPE[0], shape[0]) 80 y1_ = min(y0_ + SHARD_SHAPE[1], shape[1]) 81 x1_ = min(x0_ + SHARD_SHAPE[2], shape[2]) 82 tasks.append(( 83 (z0_, z1_), (y0_, y1_), (x0_, x1_), 84 (x0 + x0_, x0 + x1_, y0 + y0_, y0 + y1_, z0 + z0_, z0 + z1_) 85 )) 86 87 max_workers = max(8, (os.cpu_count() or 4) * 4) 88 89 def worker(item): 90 (z0_, z1_), (y0_, y1_), (x0_, x1_), (gx0, gx1, gy0, gy1, gz0, gz1) = item 91 block = np.asarray(vol[gx0:gx1, gy0:gy1, gz0:gz1]) 92 ds[z0_:z1_, y0_:y1_, x0_:x1_] = _to_zyx(block) 93 94 with ThreadPoolExecutor(max_workers=max_workers) as ex: 95 futures = [ex.submit(worker, t) for t in tasks] 96 for fut in tqdm(as_completed(futures), total=len(futures), desc=f"Downloading '{name}'", smoothing=0.05): 97 fut.result() 98 99 100def get_liconn_data( 101 path: Union[os.PathLike, str], 102 segmentation: str = "proofread", 103 download: bool = False, 104) -> None: 105 """Download the LICONN image and segmentation into a single zarr v3 store with sharding. 106 107 The entire volume is always downloaded (image at mip=1, 18x18x24 nm resolution; 108 segmentation at mip=0, same voxel grid). ROI-based sub-region selection is not supported 109 at download time - use the roi parameter in get_liconn_dataset to restrict patch sampling 110 to a sub-region after the full volume is on disk. 111 112 All arrays are stored in liconn.zarr with names 'raw', 'seg_proofread', and 113 'seg_agglomerated'. Each array uses zarr v3 sharding (shard shape SHARD_SHAPE, 114 inner chunk shape CHUNK_SHAPE) with zstd+blosc compression. 115 116 Args: 117 path: Filepath to a folder where the data will be saved. 118 segmentation: Which segmentation variant to download. Either 'proofread' or 'agglomerated'. 119 download: Whether to download the data if it is not present. 120 """ 121 if segmentation not in SEGMENTATIONS: 122 raise ValueError(f"'{segmentation}' is not a valid segmentation. Choose from {SEGMENTATIONS}.") 123 124 try: 125 from cloudvolume import CloudVolume 126 except ImportError: 127 raise ImportError( 128 "cloudvolume is required to download the LICONN data. Install it with: pip install cloud-volume" 129 ) 130 131 import zarr 132 133 os.makedirs(path, exist_ok=True) 134 zarr_path = os.path.join(str(path), ZARR_FNAME) 135 label_key = f"seg_{segmentation}" 136 137 def _array_complete(arr_name): 138 d = os.path.join(zarr_path, arr_name) 139 return os.path.isdir(d) and len(os.listdir(d)) > 1 140 141 raw_missing = not _array_complete("raw") 142 label_missing = not _array_complete(label_key) 143 144 if not raw_missing and not label_missing: 145 return 146 147 if not download: 148 missing = [k for k, m in [("raw", raw_missing), (label_key, label_missing)] if m] 149 raise RuntimeError(f"LICONN arrays {missing} not found in {zarr_path}. Pass download=True to download them.") 150 151 root = zarr.open_group(zarr_path, mode="a") 152 153 if raw_missing: 154 img_cv = CloudVolume(IMG_URL, mip=1, progress=False, cache=False, fill_missing=True) 155 x0, y0, z0 = map(int, img_cv.bounds.minpt) 156 x1, y1, z1 = map(int, img_cv.bounds.maxpt) 157 shape = (z1 - z0, y1 - y0, x1 - x0) 158 ds = _create_array(root, "raw", shape, np.dtype(img_cv.dtype), is_label=False) 159 _download_ng_volume(img_cv, ds, name="raw") 160 161 if label_missing: 162 seg_url = SEG_PR_URL if segmentation == "proofread" else SEG_AGG_URL 163 seg_cv = CloudVolume(seg_url, mip=0, progress=False, cache=False, fill_missing=True) 164 x0, y0, z0 = map(int, seg_cv.bounds.minpt) 165 x1, y1, z1 = map(int, seg_cv.bounds.maxpt) 166 shape = (z1 - z0, y1 - y0, x1 - x0) 167 ds = _create_array(root, label_key, shape, np.dtype(seg_cv.dtype), is_label=True) 168 _download_ng_volume(seg_cv, ds, name=label_key) 169 170 171def get_liconn_paths( 172 path: Union[os.PathLike, str], 173 segmentation: str = "proofread", 174 download: bool = False, 175) -> str: 176 """Get the filepath to the LICONN zarr store. 177 178 The store contains arrays 'raw', 'seg_proofread', and 'seg_agglomerated'. 179 180 Args: 181 path: Filepath to a folder where the data will be saved. 182 segmentation: Which segmentation variant to ensure is present. Either 'proofread' or 'agglomerated'. 183 download: Whether to download the data if it is not present. 184 185 Returns: 186 Filepath to the liconn.zarr store. 187 """ 188 get_liconn_data(path, segmentation, download) 189 return os.path.join(str(path), ZARR_FNAME) 190 191 192def get_liconn_dataset( 193 path: Union[os.PathLike, str], 194 patch_shape: Tuple[int, int, int], 195 segmentation: str = "proofread", 196 roi: Optional[Tuple[slice, ...]] = None, 197 download: bool = False, 198 offsets: Optional[List[List[int]]] = None, 199 boundaries: bool = False, 200 **kwargs, 201) -> Dataset: 202 """Get the LICONN dataset for neuron instance segmentation in expansion microscopy. 203 204 Args: 205 path: Filepath to a folder where the data will be saved. 206 patch_shape: The patch shape to use for training. 207 segmentation: Which segmentation variant to use. Either 'proofread' or 'agglomerated'. 208 roi: Optional region-of-interest as a tuple of slices (Z, Y, X) restricting which part 209 of the already-downloaded volume is used for patch sampling. The full volume is 210 always downloaded regardless of this parameter. 211 download: Whether to download the data if it is not present. 212 offsets: Offset values for affinity computation used as target. 213 boundaries: Whether to compute boundaries as the target. 214 kwargs: Additional keyword arguments for `torch_em.default_segmentation_dataset`. 215 216 Returns: 217 The segmentation dataset. 218 """ 219 assert len(patch_shape) == 3 220 221 zarr_path = get_liconn_paths(path, segmentation, download) 222 label_key = f"seg_{segmentation}" 223 224 kwargs, _ = util.add_instance_label_transform( 225 kwargs, add_binary_target=False, boundaries=boundaries, offsets=offsets 226 ) 227 kwargs = util.update_kwargs(kwargs, "is_seg_dataset", True) 228 229 return torch_em.default_segmentation_dataset( 230 raw_paths=zarr_path, 231 raw_key="raw", 232 label_paths=zarr_path, 233 label_key=label_key, 234 patch_shape=patch_shape, 235 rois=roi, 236 **kwargs, 237 ) 238 239 240def get_liconn_loader( 241 path: Union[os.PathLike, str], 242 batch_size: int, 243 patch_shape: Tuple[int, int, int], 244 segmentation: str = "proofread", 245 roi: Optional[Tuple[slice, ...]] = None, 246 download: bool = False, 247 offsets: Optional[List[List[int]]] = None, 248 boundaries: bool = False, 249 **kwargs, 250) -> DataLoader: 251 """Get the DataLoader for the LICONN dataset for neuron instance segmentation. 252 253 Args: 254 path: Filepath to a folder where the data will be saved. 255 batch_size: The batch size for training. 256 patch_shape: The patch shape to use for training. 257 segmentation: Which segmentation variant to use. Either 'proofread' or 'agglomerated'. 258 roi: Optional region-of-interest as a tuple of slices (Z, Y, X) restricting which part 259 of the already-downloaded volume is used for patch sampling. The full volume is 260 always downloaded regardless of this parameter. 261 download: Whether to download the data if it is not present. 262 offsets: Offset values for affinity computation used as target. 263 boundaries: Whether to compute boundaries as the target. 264 kwargs: Additional keyword arguments for `torch_em.default_segmentation_dataset` or for the PyTorch DataLoader. 265 266 Returns: 267 The DataLoader. 268 """ 269 ds_kwargs, loader_kwargs = util.split_kwargs(torch_em.default_segmentation_dataset, **kwargs) 270 dataset = get_liconn_dataset(path, patch_shape, segmentation, roi, download, offsets, boundaries, **ds_kwargs) 271 return torch_em.get_data_loader(dataset, batch_size, **loader_kwargs)
101def get_liconn_data( 102 path: Union[os.PathLike, str], 103 segmentation: str = "proofread", 104 download: bool = False, 105) -> None: 106 """Download the LICONN image and segmentation into a single zarr v3 store with sharding. 107 108 The entire volume is always downloaded (image at mip=1, 18x18x24 nm resolution; 109 segmentation at mip=0, same voxel grid). ROI-based sub-region selection is not supported 110 at download time - use the roi parameter in get_liconn_dataset to restrict patch sampling 111 to a sub-region after the full volume is on disk. 112 113 All arrays are stored in liconn.zarr with names 'raw', 'seg_proofread', and 114 'seg_agglomerated'. Each array uses zarr v3 sharding (shard shape SHARD_SHAPE, 115 inner chunk shape CHUNK_SHAPE) with zstd+blosc compression. 116 117 Args: 118 path: Filepath to a folder where the data will be saved. 119 segmentation: Which segmentation variant to download. Either 'proofread' or 'agglomerated'. 120 download: Whether to download the data if it is not present. 121 """ 122 if segmentation not in SEGMENTATIONS: 123 raise ValueError(f"'{segmentation}' is not a valid segmentation. Choose from {SEGMENTATIONS}.") 124 125 try: 126 from cloudvolume import CloudVolume 127 except ImportError: 128 raise ImportError( 129 "cloudvolume is required to download the LICONN data. Install it with: pip install cloud-volume" 130 ) 131 132 import zarr 133 134 os.makedirs(path, exist_ok=True) 135 zarr_path = os.path.join(str(path), ZARR_FNAME) 136 label_key = f"seg_{segmentation}" 137 138 def _array_complete(arr_name): 139 d = os.path.join(zarr_path, arr_name) 140 return os.path.isdir(d) and len(os.listdir(d)) > 1 141 142 raw_missing = not _array_complete("raw") 143 label_missing = not _array_complete(label_key) 144 145 if not raw_missing and not label_missing: 146 return 147 148 if not download: 149 missing = [k for k, m in [("raw", raw_missing), (label_key, label_missing)] if m] 150 raise RuntimeError(f"LICONN arrays {missing} not found in {zarr_path}. Pass download=True to download them.") 151 152 root = zarr.open_group(zarr_path, mode="a") 153 154 if raw_missing: 155 img_cv = CloudVolume(IMG_URL, mip=1, progress=False, cache=False, fill_missing=True) 156 x0, y0, z0 = map(int, img_cv.bounds.minpt) 157 x1, y1, z1 = map(int, img_cv.bounds.maxpt) 158 shape = (z1 - z0, y1 - y0, x1 - x0) 159 ds = _create_array(root, "raw", shape, np.dtype(img_cv.dtype), is_label=False) 160 _download_ng_volume(img_cv, ds, name="raw") 161 162 if label_missing: 163 seg_url = SEG_PR_URL if segmentation == "proofread" else SEG_AGG_URL 164 seg_cv = CloudVolume(seg_url, mip=0, progress=False, cache=False, fill_missing=True) 165 x0, y0, z0 = map(int, seg_cv.bounds.minpt) 166 x1, y1, z1 = map(int, seg_cv.bounds.maxpt) 167 shape = (z1 - z0, y1 - y0, x1 - x0) 168 ds = _create_array(root, label_key, shape, np.dtype(seg_cv.dtype), is_label=True) 169 _download_ng_volume(seg_cv, ds, name=label_key)
Download the LICONN image and segmentation into a single zarr v3 store with sharding.
The entire volume is always downloaded (image at mip=1, 18x18x24 nm resolution; segmentation at mip=0, same voxel grid). ROI-based sub-region selection is not supported at download time - use the roi parameter in get_liconn_dataset to restrict patch sampling to a sub-region after the full volume is on disk.
All arrays are stored in liconn.zarr with names 'raw', 'seg_proofread', and 'seg_agglomerated'. Each array uses zarr v3 sharding (shard shape SHARD_SHAPE, inner chunk shape CHUNK_SHAPE) with zstd+blosc compression.
Arguments:
- path: Filepath to a folder where the data will be saved.
- segmentation: Which segmentation variant to download. Either 'proofread' or 'agglomerated'.
- download: Whether to download the data if it is not present.
172def get_liconn_paths( 173 path: Union[os.PathLike, str], 174 segmentation: str = "proofread", 175 download: bool = False, 176) -> str: 177 """Get the filepath to the LICONN zarr store. 178 179 The store contains arrays 'raw', 'seg_proofread', and 'seg_agglomerated'. 180 181 Args: 182 path: Filepath to a folder where the data will be saved. 183 segmentation: Which segmentation variant to ensure is present. Either 'proofread' or 'agglomerated'. 184 download: Whether to download the data if it is not present. 185 186 Returns: 187 Filepath to the liconn.zarr store. 188 """ 189 get_liconn_data(path, segmentation, download) 190 return os.path.join(str(path), ZARR_FNAME)
Get the filepath to the LICONN zarr store.
The store contains arrays 'raw', 'seg_proofread', and 'seg_agglomerated'.
Arguments:
- path: Filepath to a folder where the data will be saved.
- segmentation: Which segmentation variant to ensure is present. Either 'proofread' or 'agglomerated'.
- download: Whether to download the data if it is not present.
Returns:
Filepath to the liconn.zarr store.
193def get_liconn_dataset( 194 path: Union[os.PathLike, str], 195 patch_shape: Tuple[int, int, int], 196 segmentation: str = "proofread", 197 roi: Optional[Tuple[slice, ...]] = None, 198 download: bool = False, 199 offsets: Optional[List[List[int]]] = None, 200 boundaries: bool = False, 201 **kwargs, 202) -> Dataset: 203 """Get the LICONN dataset for neuron instance segmentation in expansion microscopy. 204 205 Args: 206 path: Filepath to a folder where the data will be saved. 207 patch_shape: The patch shape to use for training. 208 segmentation: Which segmentation variant to use. Either 'proofread' or 'agglomerated'. 209 roi: Optional region-of-interest as a tuple of slices (Z, Y, X) restricting which part 210 of the already-downloaded volume is used for patch sampling. The full volume is 211 always downloaded regardless of this parameter. 212 download: Whether to download the data if it is not present. 213 offsets: Offset values for affinity computation used as target. 214 boundaries: Whether to compute boundaries as the target. 215 kwargs: Additional keyword arguments for `torch_em.default_segmentation_dataset`. 216 217 Returns: 218 The segmentation dataset. 219 """ 220 assert len(patch_shape) == 3 221 222 zarr_path = get_liconn_paths(path, segmentation, download) 223 label_key = f"seg_{segmentation}" 224 225 kwargs, _ = util.add_instance_label_transform( 226 kwargs, add_binary_target=False, boundaries=boundaries, offsets=offsets 227 ) 228 kwargs = util.update_kwargs(kwargs, "is_seg_dataset", True) 229 230 return torch_em.default_segmentation_dataset( 231 raw_paths=zarr_path, 232 raw_key="raw", 233 label_paths=zarr_path, 234 label_key=label_key, 235 patch_shape=patch_shape, 236 rois=roi, 237 **kwargs, 238 )
Get the LICONN dataset for neuron instance segmentation in expansion microscopy.
Arguments:
- path: Filepath to a folder where the data will be saved.
- patch_shape: The patch shape to use for training.
- segmentation: Which segmentation variant to use. Either 'proofread' or 'agglomerated'.
- roi: Optional region-of-interest as a tuple of slices (Z, Y, X) restricting which part of the already-downloaded volume is used for patch sampling. The full volume is always downloaded regardless of this parameter.
- download: Whether to download the data if it is not present.
- offsets: Offset values for affinity computation used as target.
- boundaries: Whether to compute boundaries as the target.
- kwargs: Additional keyword arguments for
torch_em.default_segmentation_dataset.
Returns:
The segmentation dataset.
241def get_liconn_loader( 242 path: Union[os.PathLike, str], 243 batch_size: int, 244 patch_shape: Tuple[int, int, int], 245 segmentation: str = "proofread", 246 roi: Optional[Tuple[slice, ...]] = None, 247 download: bool = False, 248 offsets: Optional[List[List[int]]] = None, 249 boundaries: bool = False, 250 **kwargs, 251) -> DataLoader: 252 """Get the DataLoader for the LICONN dataset for neuron instance segmentation. 253 254 Args: 255 path: Filepath to a folder where the data will be saved. 256 batch_size: The batch size for training. 257 patch_shape: The patch shape to use for training. 258 segmentation: Which segmentation variant to use. Either 'proofread' or 'agglomerated'. 259 roi: Optional region-of-interest as a tuple of slices (Z, Y, X) restricting which part 260 of the already-downloaded volume is used for patch sampling. The full volume is 261 always downloaded regardless of this parameter. 262 download: Whether to download the data if it is not present. 263 offsets: Offset values for affinity computation used as target. 264 boundaries: Whether to compute boundaries as the target. 265 kwargs: Additional keyword arguments for `torch_em.default_segmentation_dataset` or for the PyTorch DataLoader. 266 267 Returns: 268 The DataLoader. 269 """ 270 ds_kwargs, loader_kwargs = util.split_kwargs(torch_em.default_segmentation_dataset, **kwargs) 271 dataset = get_liconn_dataset(path, patch_shape, segmentation, roi, download, offsets, boundaries, **ds_kwargs) 272 return torch_em.get_data_loader(dataset, batch_size, **loader_kwargs)
Get the DataLoader for the LICONN dataset for neuron instance segmentation.
Arguments:
- path: Filepath to a folder where the data will be saved.
- batch_size: The batch size for training.
- patch_shape: The patch shape to use for training.
- segmentation: Which segmentation variant to use. Either 'proofread' or 'agglomerated'.
- roi: Optional region-of-interest as a tuple of slices (Z, Y, X) restricting which part of the already-downloaded volume is used for patch sampling. The full volume is always downloaded regardless of this parameter.
- download: Whether to download the data if it is not present.
- offsets: Offset values for affinity computation used as target.
- boundaries: Whether to compute boundaries as the target.
- kwargs: Additional keyword arguments for
torch_em.default_segmentation_datasetor for the PyTorch DataLoader.
Returns:
The DataLoader.