Source: kerchunk Section: python Priority: optional Maintainer: Debian GIS Project Uploaders: Antonio Valentino Build-Depends: debhelper-compat (= 13), dh-sequence-python3, dh-sequence-sphinxdoc , pybuild-plugin-pyproject, python3-aiohttp , python3-all, python3-astropy , python3-dask , python3-cfgrib, python3-cftime, python3-eccodes , python3-fsspec, python3-h5netcdf , python3-h5py, python3-netcdf4 , python3-numcodecs, python3-numpy, python3-numpydoc , python3-pytest , python3-scipy, python3-setuptools, python3-setuptools-scm, python3-sphinx , python3-sphinx-rtd-theme , python3-tifffile , python3-ujson, python3-xarray, python3-zarr Standards-Version: 4.7.2 Testsuite: autopkgtest-pkg-pybuild Homepage: https://github.com/fsspec/kerchunk Vcs-Browser: https://salsa.debian.org/debian-gis-team/kerchunk Vcs-Git: https://salsa.debian.org/debian-gis-team/kerchunk.git Description: Cloud-friendly access to archival data Kerchunk is a library that provides a unified way to represent a variety of chunked, compressed data formats (e.g. NetCDF, HDF5, GRIB), allowing efficient access to the data from traditional file systems or cloud object storage. It also provides a flexible way to create virtual datasets from multiple files. It does this by extracting the byte ranges, compression information and other information about the data and storing this metadata in a new, separate object. This means that you can create a virtual aggregate dataset over potentially many source files, for efficient, parallel and cloud-friendly *in-situ* access without having to copy or translate the originals. It is a gateway to in-the-cloud massive data processing while the data providers still insist on using legacy formats for archival storage. . Features: . * completely serverless architecture * metadata consolidation, so you can understand a many-file dataset (metadata plus physical storage) in a single read * read from all of the storage backends supported by fsspec, including object storage (s3, gcs, abfs, alibaba), http, cloud user storage (dropbox, gdrive) and network protocols (ftp, ssh, hdfs, smb...) * loading of various file types (currently netcdf4/HDF, grib2, tiff, fits, zarr), potentially heterogeneous within a single dataset, without a need to go via the specific driver (e.g., no need for h5py) * asynchronous concurrent fetch of many data chunks in one go, amortizing the cost of latency * parallel access with a library like zarr without any locks * logical datasets viewing many (>~millions) data files, and direct access/subselection to them via coordinate indexing across an arbitrary number of dimensions Package: python3-kerchunk Architecture: all Depends: ${python3:Depends}, ${misc:Depends} Recommends: python3-cfgrib, python3-cftime, python3-h5py, python3-scipy, python3-xarray Suggests: python3-aiohttp, python3-dask, python3-netcdf4 Description: ${source:Synopsis} ${source:Extended-Description} Package: python-kerchunk-doc Section: doc Architecture: all Depends: ${sphinxdoc:Depends}, ${misc:Depends} Suggests: www-browser Description: ${source:Synopsis} (documentation) ${source:Extended-Description} . This package provides the HTML documentation for kerchunk.