nltk - Debian Package Tracker

general

source: nltk (main)
version: 3.9.3-1
maintainer: Debian Science Maintainers (archive) (DMD)
uploaders: Mo Zhou [DMD]
arch: all
std-ver: 4.6.0.1
VCS: Git (Browse, QA)

versions [more versions can be listed by madison] [old versions available from snapshot.debian.org]

[pool directory]

o-o-stable: 3.5-1
oldstable: 3.8-1
stable: 3.9.1-2
testing: 3.9.3-1
unstable: 3.9.3-1

versioned links

3.5-1: [.dsc, use dget on this link to retrieve source package] [changelog] [copyright] [rules] [control]
3.8-1: [.dsc, use dget on this link to retrieve source package] [changelog] [copyright] [rules] [control]
3.9.1-2: [.dsc, use dget on this link to retrieve source package] [changelog] [copyright] [rules] [control]
3.9.3-1: [.dsc, use dget on this link to retrieve source package] [changelog] [copyright] [rules] [control]

binaries

python3-nltk (2 bugs: 0, 1, 1, 0)

action needed

A new upstream version is available: 3.9.4 high

A new upstream version 3.9.4 is available, you should consider packaging it.

Created: 2026-03-27 Last update: 2026-05-22 00:31

3 security issues in sid high

There are 3 open security issues in sid.

3 important issues:

CVE-2026-33230: NLTK (Natural Language Toolkit) is a suite of open source Python modules, data sets, and tutorials supporting research and development in Natural Language Processing. In versions 3.9.3 and prior, `nltk.app.wordnet_app` contains a reflected cross-site scripting issue in the `lookup_...` route. A crafted `lookup_<payload>` URL can inject arbitrary HTML/JavaScript into the response page because attacker-controlled `word` data is reflected into HTML without escaping. This impacts users running the local WordNet Browser server and can lead to script execution in the browser origin of that application. Commit 1c3f799607eeb088cab2491dcf806ae83c29ad8f fixes the issue.
CVE-2026-33231: NLTK (Natural Language Toolkit) is a suite of open source Python modules, data sets, and tutorials supporting research and development in Natural Language Processing. In versions 3.9.3 and prior, `nltk.app.wordnet_app` allows unauthenticated remote shutdown of the local WordNet Browser HTTP server when it is started in its default mode. A simple `GET /SHUTDOWN%20THE%20SERVER` request causes the process to terminate immediately via `os._exit(0)`, resulting in a denial of service. Commit bbaae83db86a0f49e00f5b0db44a7254c268de9b patches the issue.
CVE-2026-33236: NLTK (Natural Language Toolkit) is a suite of open source Python modules, data sets, and tutorials supporting research and development in Natural Language Processing. In versions 3.9.3 and prior, the NLTK downloader does not validate the `subdir` and `id` attributes when processing remote XML index files. Attackers can control a remote XML index server to provide malicious values containing path traversal sequences (such as `../`), which can lead to arbitrary directory creation, arbitrary file creation, and arbitrary file overwrite. Commit 89fe2ec2c6bae6e2e7a46dad65cc34231976ed8a patches the issue.

Created: 2026-02-19 Last update: 2026-04-28 19:02

3 security issues in forky high

There are 3 open security issues in forky.

3 important issues:

CVE-2026-33230: NLTK (Natural Language Toolkit) is a suite of open source Python modules, data sets, and tutorials supporting research and development in Natural Language Processing. In versions 3.9.3 and prior, `nltk.app.wordnet_app` contains a reflected cross-site scripting issue in the `lookup_...` route. A crafted `lookup_<payload>` URL can inject arbitrary HTML/JavaScript into the response page because attacker-controlled `word` data is reflected into HTML without escaping. This impacts users running the local WordNet Browser server and can lead to script execution in the browser origin of that application. Commit 1c3f799607eeb088cab2491dcf806ae83c29ad8f fixes the issue.
CVE-2026-33231: NLTK (Natural Language Toolkit) is a suite of open source Python modules, data sets, and tutorials supporting research and development in Natural Language Processing. In versions 3.9.3 and prior, `nltk.app.wordnet_app` allows unauthenticated remote shutdown of the local WordNet Browser HTTP server when it is started in its default mode. A simple `GET /SHUTDOWN%20THE%20SERVER` request causes the process to terminate immediately via `os._exit(0)`, resulting in a denial of service. Commit bbaae83db86a0f49e00f5b0db44a7254c268de9b patches the issue.
CVE-2026-33236: NLTK (Natural Language Toolkit) is a suite of open source Python modules, data sets, and tutorials supporting research and development in Natural Language Processing. In versions 3.9.3 and prior, the NLTK downloader does not validate the `subdir` and `id` attributes when processing remote XML index files. Attackers can control a remote XML index server to provide malicious values containing path traversal sequences (such as `../`), which can lead to arbitrary directory creation, arbitrary file creation, and arbitrary file overwrite. Commit 89fe2ec2c6bae6e2e7a46dad65cc34231976ed8a patches the issue.

Created: 2026-02-19 Last update: 2026-04-28 19:02

5 security issues in buster high

There are 5 open security issues in buster.

1 important issue:

CVE-2024-39705: NLTK through 3.8.1 allows remote code execution if untrusted packages have pickled Python code, and the integrated data package download functionality is used. This affects, for example, averaged_perceptron_tagger and punkt.

4 issues postponed or untriaged:

CVE-2021-3828: (needs triaging) nltk is vulnerable to Inefficient Regular Expression Complexity
CVE-2021-3842: (needs triaging) nltk is vulnerable to Inefficient Regular Expression Complexity
CVE-2019-14751: (needs triaging) NLTK Downloader before 3.4.5 is vulnerable to a directory traversal, allowing attackers to write arbitrary files via a ../ (dot dot slash) in an NLTK package (ZIP archive) that is mishandled during extraction.
CVE-2021-43854: (needs triaging) NLTK (Natural Language Toolkit) is a suite of open source Python modules, data sets, and tutorials supporting research and development in Natural Language Processing. Versions prior to 3.6.5 are vulnerable to regular expression denial of service (ReDoS) attacks. The vulnerability is present in PunktSentenceTokenizer, sent_tokenize and word_tokenize. Any users of this class, or these two functions, are vulnerable to the ReDoS attack. In short, a specifically crafted long input to any of these vulnerable functions will cause them to take a significant amount of execution time. If your program relies on any of the vulnerable functions for tokenizing unpredictable user input, then we would strongly recommend upgrading to a version of NLTK without the vulnerability. For users unable to upgrade the execution time can be bounded by limiting the maximum length of an input to any of the vulnerable functions. Our recommendation is to implement such a limit.

Created: 2024-06-28 Last update: 2024-06-28 15:00

lintian reports 1 warning normal

Lintian reports 1 warning about this package. You should make the package lintian clean getting rid of them.

Created: 2023-06-18 Last update: 2023-06-18 17:34

7 low-priority security issues in trixie low

There are 7 open security issues in trixie.

7 issues left for the package maintainer to handle:

CVE-2026-0846: (needs triaging) A vulnerability in the `filestring()` function of the `nltk.util` module in nltk version 3.9.2 allows arbitrary file read due to improper validation of input paths. The function directly opens files specified by user input without sanitization, enabling attackers to access sensitive system files by providing absolute paths or traversal paths. This vulnerability can be exploited locally or remotely, particularly in scenarios where the function is used in web APIs or other interfaces that accept user-supplied input.
CVE-2026-0847: (needs triaging) A vulnerability in NLTK versions up to and including 3.9.2 allows arbitrary file read via path traversal in multiple CorpusReader classes, including WordListCorpusReader, TaggedCorpusReader, and BracketParseCorpusReader. These classes fail to properly sanitize or validate file paths, enabling attackers to traverse directories and access sensitive files on the server. This issue is particularly critical in scenarios where user-controlled file inputs are processed, such as in machine learning APIs, chatbots, or NLP pipelines. Exploitation of this vulnerability can lead to unauthorized access to sensitive files, including system files, SSH private keys, and API tokens, and may potentially escalate to remote code execution when combined with other vulnerabilities.
CVE-2026-0848: (needs triaging) NLTK versions <=3.9.2 are vulnerable to arbitrary code execution due to improper input validation in the StanfordSegmenter module. The module dynamically loads external Java .jar files without verification or sandboxing. An attacker can supply or replace the JAR file, enabling the execution of arbitrary Java bytecode at import time. This vulnerability can be exploited through methods such as model poisoning, MITM attacks, or dependency poisoning, leading to remote code execution. The issue arises from the direct execution of the JAR file via subprocess with unvalidated classpath input, allowing malicious classes to execute when loaded by the JVM.
CVE-2025-14009: (needs triaging) A critical vulnerability exists in the NLTK downloader component of nltk/nltk, affecting all versions. The _unzip_iter function in nltk/downloader.py uses zipfile.extractall() without performing path validation or security checks. This allows attackers to craft malicious zip packages that, when downloaded and extracted by NLTK, can execute arbitrary code. The vulnerability arises because NLTK assumes all downloaded packages are trusted and extracts them without validation. If a malicious package contains Python files, such as __init__.py, these files are executed automatically upon import, leading to remote code execution. This issue can result in full system compromise, including file system access, network access, and potential persistence mechanisms.
CVE-2026-33230: (needs triaging) NLTK (Natural Language Toolkit) is a suite of open source Python modules, data sets, and tutorials supporting research and development in Natural Language Processing. In versions 3.9.3 and prior, `nltk.app.wordnet_app` contains a reflected cross-site scripting issue in the `lookup_...` route. A crafted `lookup_<payload>` URL can inject arbitrary HTML/JavaScript into the response page because attacker-controlled `word` data is reflected into HTML without escaping. This impacts users running the local WordNet Browser server and can lead to script execution in the browser origin of that application. Commit 1c3f799607eeb088cab2491dcf806ae83c29ad8f fixes the issue.
CVE-2026-33231: (needs triaging) NLTK (Natural Language Toolkit) is a suite of open source Python modules, data sets, and tutorials supporting research and development in Natural Language Processing. In versions 3.9.3 and prior, `nltk.app.wordnet_app` allows unauthenticated remote shutdown of the local WordNet Browser HTTP server when it is started in its default mode. A simple `GET /SHUTDOWN%20THE%20SERVER` request causes the process to terminate immediately via `os._exit(0)`, resulting in a denial of service. Commit bbaae83db86a0f49e00f5b0db44a7254c268de9b patches the issue.
CVE-2026-33236: (needs triaging) NLTK (Natural Language Toolkit) is a suite of open source Python modules, data sets, and tutorials supporting research and development in Natural Language Processing. In versions 3.9.3 and prior, the NLTK downloader does not validate the `subdir` and `id` attributes when processing remote XML index files. Attackers can control a remote XML index server to provide malicious values containing path traversal sequences (such as `../`), which can lead to arbitrary directory creation, arbitrary file creation, and arbitrary file overwrite. Commit 89fe2ec2c6bae6e2e7a46dad65cc34231976ed8a patches the issue.

You can find information about how to handle these issues in the security team's documentation.

Created: 2026-02-19 Last update: 2026-04-28 19:02

8 low-priority security issues in bookworm low

There are 8 open security issues in bookworm.

8 issues left for the package maintainer to handle:

CVE-2026-0846: (needs triaging) A vulnerability in the `filestring()` function of the `nltk.util` module in nltk version 3.9.2 allows arbitrary file read due to improper validation of input paths. The function directly opens files specified by user input without sanitization, enabling attackers to access sensitive system files by providing absolute paths or traversal paths. This vulnerability can be exploited locally or remotely, particularly in scenarios where the function is used in web APIs or other interfaces that accept user-supplied input.
CVE-2026-0847: (needs triaging) A vulnerability in NLTK versions up to and including 3.9.2 allows arbitrary file read via path traversal in multiple CorpusReader classes, including WordListCorpusReader, TaggedCorpusReader, and BracketParseCorpusReader. These classes fail to properly sanitize or validate file paths, enabling attackers to traverse directories and access sensitive files on the server. This issue is particularly critical in scenarios where user-controlled file inputs are processed, such as in machine learning APIs, chatbots, or NLP pipelines. Exploitation of this vulnerability can lead to unauthorized access to sensitive files, including system files, SSH private keys, and API tokens, and may potentially escalate to remote code execution when combined with other vulnerabilities.
CVE-2026-0848: (needs triaging) NLTK versions <=3.9.2 are vulnerable to arbitrary code execution due to improper input validation in the StanfordSegmenter module. The module dynamically loads external Java .jar files without verification or sandboxing. An attacker can supply or replace the JAR file, enabling the execution of arbitrary Java bytecode at import time. This vulnerability can be exploited through methods such as model poisoning, MITM attacks, or dependency poisoning, leading to remote code execution. The issue arises from the direct execution of the JAR file via subprocess with unvalidated classpath input, allowing malicious classes to execute when loaded by the JVM.
CVE-2024-39705: (postponed; to be fixed through a stable update) NLTK through 3.8.1 allows remote code execution if untrusted packages have pickled Python code, and the integrated data package download functionality is used. This affects, for example, averaged_perceptron_tagger and punkt.
CVE-2025-14009: (needs triaging) A critical vulnerability exists in the NLTK downloader component of nltk/nltk, affecting all versions. The _unzip_iter function in nltk/downloader.py uses zipfile.extractall() without performing path validation or security checks. This allows attackers to craft malicious zip packages that, when downloaded and extracted by NLTK, can execute arbitrary code. The vulnerability arises because NLTK assumes all downloaded packages are trusted and extracts them without validation. If a malicious package contains Python files, such as __init__.py, these files are executed automatically upon import, leading to remote code execution. This issue can result in full system compromise, including file system access, network access, and potential persistence mechanisms.
CVE-2026-33230: (needs triaging) NLTK (Natural Language Toolkit) is a suite of open source Python modules, data sets, and tutorials supporting research and development in Natural Language Processing. In versions 3.9.3 and prior, `nltk.app.wordnet_app` contains a reflected cross-site scripting issue in the `lookup_...` route. A crafted `lookup_<payload>` URL can inject arbitrary HTML/JavaScript into the response page because attacker-controlled `word` data is reflected into HTML without escaping. This impacts users running the local WordNet Browser server and can lead to script execution in the browser origin of that application. Commit 1c3f799607eeb088cab2491dcf806ae83c29ad8f fixes the issue.
CVE-2026-33231: (needs triaging) NLTK (Natural Language Toolkit) is a suite of open source Python modules, data sets, and tutorials supporting research and development in Natural Language Processing. In versions 3.9.3 and prior, `nltk.app.wordnet_app` allows unauthenticated remote shutdown of the local WordNet Browser HTTP server when it is started in its default mode. A simple `GET /SHUTDOWN%20THE%20SERVER` request causes the process to terminate immediately via `os._exit(0)`, resulting in a denial of service. Commit bbaae83db86a0f49e00f5b0db44a7254c268de9b patches the issue.
CVE-2026-33236: (needs triaging) NLTK (Natural Language Toolkit) is a suite of open source Python modules, data sets, and tutorials supporting research and development in Natural Language Processing. In versions 3.9.3 and prior, the NLTK downloader does not validate the `subdir` and `id` attributes when processing remote XML index files. Attackers can control a remote XML index server to provide malicious values containing path traversal sequences (such as `../`), which can lead to arbitrary directory creation, arbitrary file creation, and arbitrary file overwrite. Commit 89fe2ec2c6bae6e2e7a46dad65cc34231976ed8a patches the issue.

You can find information about how to handle these issues in the security team's documentation.

Created: 2024-06-28 Last update: 2026-04-28 19:02

Standards version of the package is outdated. wishlist

The package should be updated to follow the last version of Debian Policy (Standards-Version 4.7.4 instead of 4.6.0.1).

Created: 2022-05-11 Last update: 2026-03-31 15:01

news

[rss feed]

[2026-03-13] nltk 3.9.3-1 MIGRATED to testing (Debian testing watch)
[2026-03-07] Accepted nltk 3.9.3-1 (source) into unstable (Mo Zhou)
[2026-02-20] nltk 3.9.2-1 MIGRATED to testing (Debian testing watch)
[2026-02-14] Accepted nltk 3.9.2-1 (source) into unstable (Mo Zhou)
[2024-10-18] nltk 3.9.1-2 MIGRATED to testing (Debian testing watch)
[2024-10-12] Accepted nltk 3.9.1-2 (source) into unstable (Santiago Vila)
[2024-10-07] nltk 3.9.1-1 MIGRATED to testing (Debian testing watch)
[2024-10-02] Accepted nltk 3.9.1-1 (source) into unstable (Mo Zhou)
[2023-06-23] nltk 3.8.1-1 MIGRATED to testing (Debian testing watch)
[2023-06-18] Accepted nltk 3.8.1-1 (source) into unstable (Mo Zhou)
[2022-12-20] nltk 3.8-1 MIGRATED to testing (Debian testing watch)
[2022-12-14] Accepted nltk 3.8-1 (source) into unstable (Mo Zhou)
[2022-02-22] nltk 3.7-1 MIGRATED to testing (Debian testing watch)
[2022-02-16] Accepted nltk 3.7-1 (source) into unstable (Mo Zhou)
[2022-01-12] nltk 3.6.7-1 MIGRATED to testing (Debian testing watch)
[2022-01-06] Accepted nltk 3.6.7-1 (source) into unstable (Mo Zhou)
[2021-11-19] nltk 3.6.5-1 MIGRATED to testing (Debian testing watch)
[2021-11-13] Accepted nltk 3.6.5-1 (source) into unstable (Mo Zhou)
[2020-04-28] nltk 3.5-1 MIGRATED to testing (Debian testing watch)
[2020-04-23] Accepted nltk 3.5-1 (source) into unstable (Mo Zhou)
[2019-12-27] nltk 3.4.5-2 MIGRATED to testing (Debian testing watch)
[2019-12-22] Accepted nltk 3.4.5-2 (source) into unstable (Mo Zhou) (signed by: Zhou Mo)
[2019-08-30] nltk 3.4.5-1 MIGRATED to testing (Debian testing watch)
[2019-08-24] Accepted nltk 3.4.5-1 (source) into unstable (Mo Zhou) (signed by: Zhou Mo)
[2019-07-27] nltk 3.4.3-1 MIGRATED to testing (Debian testing watch)
[2019-07-22] Accepted nltk 3.4.3-1 (source) into unstable (Mo Zhou) (signed by: Zhou Mo)
[2019-05-01] Accepted nltk 3.4.1-1 (source) into experimental (Mo Zhou) (signed by: Zhou Mo)
[2018-11-28] nltk 3.4-1 MIGRATED to testing (Debian testing watch)
[2018-11-23] Accepted nltk 3.4-1 (source) into unstable (Mo Zhou) (signed by: Zhou Mo)
[2018-06-02] nltk 3.3.0-1 MIGRATED to testing (Debian testing watch)

bugs [bug history graph]

all: 5
RC: 0
I&N: 4
M&W: 1
F&P: 0
patch: 0

links

homepage
lintian (0, 1)
buildd: logs, reproducibility
popcon
browse source code
other distros
security tracker

ubuntu

[Information about Ubuntu for Debian Developers]

version: 3.9.2-1