Source: llama.cpp Section: science Priority: optional Maintainer: Debian Deep Learning Team Uploaders: Christian Kastner Standards-Version: 4.7.2 Vcs-Browser: https://salsa.debian.org/deeplearning-team/llama.cpp Vcs-Git: https://salsa.debian.org/deeplearning-team/llama.cpp.git Homepage: https://github.com/ggml-org/llama.cpp/ # We could B-D on libc6 (>= 2.33) to ensure support for Hardware Capabilities, # but with our install layout, a lack of support means that the baseline # version will be used in such a case. Build-Depends: cmake, debhelper-compat (= 13), libcurl4-openssl-dev, libggml-cpu, pkgconf, Rules-Requires-Root: no Package: llama.cpp Architecture: any Multi-Arch: foreign Depends: libggml-cpu | libggml-backend, python3, ${misc:Depends}, ${shlibs:Depends}, Description: LLM inference in C/C++ The main goal of llama.cpp is to enable LLM inference with minimal setup and state-of-the-art performance on a wide range of hardware - locally and in the cloud. . * Plain C/C++ implementation without any dependencies * Apple silicon is a first-class citizen - optimized via ARM NEON, Accelerate and Metal frameworks * AVX, AVX2, AVX512 and AMX support for x86 architectures * 1.5-bit, 2-bit, 3-bit, 4-bit, 5-bit, 6-bit, and 8-bit integer quantization for faster inference and reduced memory use * Custom CUDA kernels for running LLMs on NVIDIA GPUs (support for AMD GPUs via HIP and Moore Threads MTT GPUs via MUSA) * Vulkan and SYCL backend support * CPU+GPU hybrid inference to partially accelerate models larger than the total VRAM capacity . The compute functionality is provided by ggml. By default, ggml's CPU backend is installed, but there are many other backends for CPUs and GPUs.