distro_tracker.core.utils

Various utilities for the distro-tracker project.

distro_tracker.core.utils.get_or_none(model, **kwargs)[source]

Gets a Django Model object from the database or returns None if it does not exist.

distro_tracker.core.utils.distro_tracker_render_to_string(template_name, context=None)[source]

A custom function to render a template to a string which injects extra distro-tracker specific information to the context, such as the name of the derivative.

This function is necessary since Django’s TEMPLATE_CONTEXT_PROCESSORS, whereas this function can be called independently from any HTTP request.

distro_tracker.core.utils.render_to_json_response(response)[source]

Helper function creating an HttpResponse by serializing the given response object to a JSON string.

The resulting HTTP response has Content-Type set to application/json.

Parameters

response – The object to be serialized in the response. It must be serializable by the json module.

Return type

HttpResponse

class distro_tracker.core.utils.PrettyPrintList(the_list=None, delimiter=' ')[source]

Bases: object

A class which wraps the built-in list object so that when it is converted to a string, its contents are printed using the given delimiter.

The default delimiter is a space.

>>> a = PrettyPrintList([1, 2, 3])
>>> print(a)
1 2 3
>>> print(PrettyPrintList([u'one', u'2', u'3']))
one 2 3
>>> print(PrettyPrintList([1, 2, 3], delimiter=', '))
1, 2, 3
>>> # Still acts as a list
>>> a == [1, 2, 3]
True
>>> a == ['1', '2', '3']
False
class distro_tracker.core.utils.SpaceDelimitedTextField(*args, db_collation=None, **kwargs)[source]

Bases: django.db.models.fields.TextField

A custom Django model field which stores a list of strings.

It stores the list in a TextField as a space delimited list. It is marshalled back to a PrettyPrintList in the Python domain.

description = 'Stores a space delimited list of strings'
from_db_value(value, expression, connection)[source]
to_python(value)[source]

Convert the input value into the expected Python data type, raising django.core.exceptions.ValidationError if the data can’t be converted. Return the converted value. Subclasses should override this.

get_prep_value(value, **kwargs)[source]

Perform preliminary non-db specific value checks and conversions.

get_db_prep_value(value, **kwargs)[source]

Return field’s value prepared for interacting with the database backend.

Used by the default implementations of get_db_prep_save().

value_to_string(obj)[source]

Return a string value of this field from the passed obj. This is used by the serialization framework.

distro_tracker.core.utils.VCS_SHORTHAND_TO_NAME = {'bzr': 'Bazaar', 'cvs': 'CVS', 'darcs': 'Darcs', 'git': 'Git', 'hg': 'Mercurial', 'mtn': 'Monotone', 'svn': 'Subversion'}

A map of currently available VCS systems’ shorthands to their names.

distro_tracker.core.utils.get_vcs_name(shorthand)[source]

Returns a full name for the VCS given its shorthand.

If the given shorthand is unknown an empty string is returned.

Parameters

shorthand – The shorthand of a VCS for which a name is required.

Return type

string

distro_tracker.core.utils.verify_signature(content)[source]

The function extracts any possible signature information found in the given content.

Uses the DISTRO_TRACKER_KEYRING_DIRECTORY setting to access the keyring. If this setting does not exist, no signatures can be validated.

Returns

Information about the signers of the content as a list or None if there is no (valid) signature.

Return type

list of (name, email) pairs or None

distro_tracker.core.utils.now(tz=datetime.timezone.utc)[source]

Returns the current timestamp in the requested timezone (UTC by default) and can be easily mocked out for tests.

distro_tracker.core.utils.get_developer_information_url(email)[source]

Returns developer’s information url based on his/her email through vendor-specific function

distro_tracker.core.utils.add_developer_extras(general, url_only=False)[source]

Receives a general dict with package data and add to it more data regarding that package’s developers

distro_tracker.core.utils.compression

Utilities for handling compression

distro_tracker.core.utils.compression.guess_compression_method(filepath)[source]

Given filepath, tries to determine the compression of the file.

distro_tracker.core.utils.compression.get_uncompressed_stream(input_stream, compression='auto', text=False, encoding='utf-8')[source]

Returns a file-like object (aka stream) providing an uncompressed version of the content read on the input stream provided.

Parameters
  • input_stream – The file-like object providing compressed data.

  • compression (str) – The compression type. Specify “auto” to let the function guess it out of the associated filename (the input_stream needs to have a name attribute, otherwise a ValueError is raised).

  • text (boolean) – If True, open the stream as a text stream.

  • encoding (str) – Encoding to use to decode the text.

distro_tracker.core.utils.compression.get_compressor_factory(compression)[source]

Returns a function that can create a file-like object used to compress data. The returned function has actually the same API as gzip.open, lzma.open and bz2.open. You have to pass mode=’wb’ or mode=’wt’ to the returned function to use it in write mode.

compressor_factory = get_compressor_factory("xz")
compressor = compressor_factory(path, mode="wb")
compressor.write(b"Test")
compressor.close()
Parameters

compression (str) – The compression method to use.

distro_tracker.core.utils.email_messages

Module including some utility functions and classes for manipulating email.

distro_tracker.core.utils.email_messages.extract_email_address_from_header(header)[source]

Extracts the email address from the From email header.

>>> str(extract_email_address_from_header('Real Name <foo@domain.com>'))
'foo@domain.com'
>>> str(extract_email_address_from_header('foo@domain.com'))
'foo@domain.com'
distro_tracker.core.utils.email_messages.name_and_address_from_string(content)[source]

Takes an address in almost-RFC822 format and turns it into a dict {‘name’: real_name, ‘email’: email_address}

The difference with email.utils.parseaddr and rfc822.parseaddr is that this routine allows unquoted commas to appear in the real name (in violation of RFC822).

distro_tracker.core.utils.email_messages.names_and_addresses_from_string(content)[source]

Takes a string with addresses in RFC822 format and returns a list of dicts {‘name’: real_name, ‘email’: email_address} It tries to be forgiving about unquoted commas in addresses.

distro_tracker.core.utils.email_messages.get_decoded_message_payload(message, default_charset='utf-8')[source]

Extracts the payload of the given email.message.Message and returns it decoded based on the Content-Transfer-Encoding and charset.

distro_tracker.core.utils.email_messages.patch_message_for_django_compat(message)[source]

Live patch the email.message.Message object passed as parameter so that:

  • the as_string() method return the same set of bytes it has been parsed from (to preserve as much as possible the original message)

  • the as_bytes() is added too (this method is expected by Django’s SMTP backend)

distro_tracker.core.utils.email_messages.message_from_bytes(message_bytes)[source]

Returns a live-patched email.Message object from the given bytes.

The changes ensure that parsing the message’s bytes with this method and then returning them by using the returned object’s as_string method is an idempotent operation.

An as_bytes method is also created since Django’s SMTP backend relies on this method (which is usually brought by its own django.core.mail.SafeMIMEText object but that we don’t use in our CustomEmailMessage).

distro_tracker.core.utils.email_messages.get_message_body(msg)[source]

Returns the message body, joining together all parts into one string.

Parameters

msg (email.message.Message) – The original received package message

class distro_tracker.core.utils.email_messages.CustomEmailMessage(msg=None, *args, **kwargs)[source]

Bases: django.core.mail.message.EmailMessage

A subclass of django.core.mail.EmailMessage which can be fed an email.message.Message instance to define the body of the message.

If msg is set, the body attribute is ignored.

If the user wants to attach additional parts to the message, the attach() method can be used but the user must ensure that the given msg instance is a multipart message before doing so.

Effectively, this is also a wrapper which allows sending instances of email.message.Message via Django email backends.

message()[source]

Returns the underlying email.message.Message object. In case the user did not set a msg attribute for this instance the parent EmailMessage.message method is used.

distro_tracker.core.utils.email_messages.decode_header(header, default_encoding='utf-8')[source]

Decodes an email message header and returns it coded as a unicode string.

This is necessary since it is possible that a header is made of multiple differently encoded parts which makes email.header.decode_header() insufficient.

distro_tracker.core.utils.email_messages.unfold_header(header)[source]

Unfolding is the process to remove the line wrapping added by mail agents. A header is a single logical line and they are not allowed to be multi-line values.

We need to unfold their values in particular when we want to reuse the values to compose a reply message as Python’s email API chokes on those newline characters.

If header is None, the return value is None as well.

Param:header

the header value to unfold

Returns

the unfolded version of the header.

Return type

str

distro_tracker.core.utils.http

Utilities for handling HTTP resource access.

distro_tracker.core.utils.http.parse_cache_control_header(header)[source]

Parses the given Cache-Control header’s values.

Returns

The key-value pairs found in the header. If some key did not have an associated value in the header, None is used instead.

Return type

dict

class distro_tracker.core.utils.http.HttpCache(cache_directory_path, url_to_cache_path=None)[source]

Bases: object

A class providing an interface to a cache of HTTP responses.

is_expired(url)[source]

If the cached response for the given URL is expired based on Cache-Control or Expires headers, returns True.

get_content_stream(url, compression='auto', text=False)[source]

Returns a file-like object that reads the cached copy of the given URL.

If the file is compressed, the file-like object will read the decompressed stream.

get_content(url, compression='auto')[source]

Returns the content of the cached response for the given URL.

If the file is compressed, then uncompress it, else, consider it as plain file.

Parameters

compression (str) – Specifies the compression method used to generate the resource, and thus the compression method one should use to decompress it.

Return type

bytes

get_headers(url)[source]

Returns the HTTP headers of the cached response for the given URL.

Return type

dict

remove(url)[source]

Removes the cached response for the given URL.

update(url, force=False, invalidate_cache=True)[source]

Performs an update of the cached resource. This means that it validates that its most current version is found in the cache by doing a conditional GET request.

Parameters

force – To force the method to perform a full GET request, set the parameter to True

Returns

The original HTTP response and a Boolean indicating whether the cached value was updated.

Return type

two-tuple of (requests.Response, Boolean)

url_to_cache_path(url)[source]

Transforms an arbitrary URL into a relative path within the cache directory. Can be overridden by the user by supplying its own implementation in the url_to_cache_path attribute of the __init__() method.

Parameters

url (str) – The URL to be cached.

Returns

A relative path within the cache directory, used to store a copy of the resource.

distro_tracker.core.utils.http.get_resource_content(url, cache=None, compression='auto', only_if_updated=False, force_update=False, ignore_network_failures=False, ignore_http_error=None)[source]

A helper function which returns the content of the resource found at the given URL.

If the resource is already cached in the cache object and the cached content has not expired, the function will not do any HTTP requests and will return the cached content.

If the resource is stale or not cached at all, it is from the Web.

If the HTTP request returned an error code, the requests module will raise a requests.exceptions.HTTPError.

In case of network failures, some IOError exception will be raised unless ignore_network_failures is set to True.

Parameters
  • url (str) – The URL of the resource to be retrieved

  • cache (HttpCache or an object with an equivalent interface) – A cache object which should be used to look up and store the cached resource. If it is not provided, an instance of HttpCache with a DISTRO_TRACKER_CACHE_DIRECTORY cache directory is used.

  • compression (str) – Specifies the compression method used to generate the resource, and thus the compression method one should use to decompress it. If auto, then guess it from the url file extension.

  • only_if_updated (bool) – if set to True returns None when no update is done. Otherwise, returns the content in any case.

  • force_update (bool) – if set to True do a new HTTP request even if we non-expired data in the cache.

  • ignore_network_failures (bool) – if set to True, then the function will return None in case of network failures and not raise any exception.

  • ignore_http_error (int) – if the request results in an HTTP error with the given status code, then the error is ignored and no exception is raised. And None is returned.

Returns

The bytes representation of the resource found at the given url

Return type

bytes

distro_tracker.core.utils.http.get_resource_text(*args, **kwargs)[source]

Clone of get_resource_content() which transparently decodes the downloaded content into text. It supports the same parameters and adds the encoding parameter.

Parameters

encoding (str) – Specifies an encoding to decode the resource content.

Returns

The textual representation of the resource found at the given url.

Return type

str

distro_tracker.core.utils.http.safe_redirect(to, fallback, allowed_hosts=None)[source]

Implements a safe redirection to to provided that it’s safe. Else, goes to fallback. allowed_hosts describes the list of valid hosts for the call to django.utils.http.url_has_allowed_host_and_scheme().

Parameters
  • to (str or None) – The URL that one should be returned to.

  • fallback (str) – A safe URL to fall back on if to isn’t safe. WARNING! This url is NOT checked! The developer is advised to put only an url he knows to be safe!

  • allowed_hosts (list of str) – A list of “safe” hosts. If None, relies on the default behaviour of django.utils.http.url_has_allowed_host_and_scheme().

Returns

A ResponseRedirect instance containing the appropriate intel for the redirection.

Return type

django.http.HttpResponseRedirectBase

distro_tracker.core.utils.linkify

Module including some utility functions to inject links in plain text.

class distro_tracker.core.utils.linkify.Linkify[source]

Bases: object

A base class representing ways to inject useful links in plain text data

If you want to recognize a new syntax where links could provide value to a view of the content, just create a subclass and implement the linkify method.

static linkify(text)[source]
Parameters

text – the text where we should inject HTML links

Returns

the text formatted with HTML links

Return type

str

plugins = [<class 'distro_tracker.core.utils.linkify.LinkifyHttpLinks'>, <class 'distro_tracker.core.utils.linkify.LinkifyDebianBugLinks'>, <class 'distro_tracker.core.utils.linkify.LinkifyUbuntuBugLinks'>, <class 'distro_tracker.core.utils.linkify.LinkifyCVELinks'>]
classmethod unregister_plugin()

Bases: distro_tracker.core.utils.linkify.Linkify

Detect http:// and https:// URLs and transform them in true HTML links.

static linkify(text)[source]
Parameters

text – the text where we should inject HTML links

Returns

the text formatted with HTML links

Return type

str

classmethod unregister_plugin()

Bases: distro_tracker.core.utils.linkify.Linkify

Detect “Closes: #123, 234” syntax used in Debian changelogs to close bugs and inject HTML links to the corresponding bug tracker entry. Also handles the “Closes: 123 456” fields of .changes files.

close_prefix = 'Closes:'
close_field = 'Closes:'
bug_url = 'https://bugs.debian.org/'
classmethod linkify(text)[source]
Parameters

text – the text where we should inject HTML links

Returns

the text formatted with HTML links

Return type

str

classmethod unregister_plugin()

Bases: distro_tracker.core.utils.linkify.LinkifyDebianBugLinks

Detect “LP: #123, 234” syntax used in Ubuntu changelogs to close bugs and inject HTML links to the corresponding bug tracker entry.

close_prefix = 'LP:'
close_field = 'Launchpad-Bugs-Fixed:'
bug_url = 'https://bugs.launchpad.net/bugs/'
classmethod unregister_plugin()

Bases: distro_tracker.core.utils.linkify.Linkify

Detect “CVE-2014-1234” words and transform them into links to the CVE tracker at cve.mitre.org. The exact URL can be overridden with a DISTRO_TRACKER_CVE_URL configuration setting to redirect the URL to a custom tracker.

static linkify(text)[source]
Parameters

text – the text where we should inject HTML links

Returns

the text formatted with HTML links

Return type

str

classmethod unregister_plugin()
distro_tracker.core.utils.linkify.linkify(message)[source]
Parameters

message – the message where we should inject HTML links

Returns

the message formatted with HTML links

Return type

str

distro_tracker.core.utils.misc

Miscellaneous utilities that don’t require their own python module.

distro_tracker.core.utils.misc.get_data_checksum(data)[source]

Checksums a dict, without its prospective ‘checksum’ key/value.

distro_tracker.core.utils.misc.call_methods_with_prefix(obj, prefix, *args, **kwargs)[source]

Identify all the object’s methods that start with the given prefix and calls them in the alphabetical order while passing the remaining arguments as positional and keywords arguments.

Parameters
  • obj (object) – The object instance to inspect

  • prefix (str) – The prefix used to identify the methods to call

distro_tracker.core.utils.packages

Utilities for processing Debian package information.

distro_tracker.core.utils.packages.package_hashdir(package_name)[source]

Returns the name of the hash directory used to avoid having too many entries in a single directory. It’s usually the first letter of the package except for lib* packages where it’s the first 4 letters.

Parameters

package_name (str) – The package name.

Returns

Name of the hash directory.

Return type

str

distro_tracker.core.utils.packages.package_url(package_name)[source]

Returns the URL of the page dedicated to this package name.

Parameters

package_name (str or PackageName model) – The package name.

Returns

Name of the hash directory.

Return type

str

distro_tracker.core.utils.packages.extract_vcs_information(stanza)[source]

Extracts the VCS information from a package’s Sources entry.

Parameters

stanza (dict) – The Sources entry from which to extract the VCS info. Maps Sources key names to values.

Returns

VCS information regarding the package. Contains the following keys: type[, browser, url, branch]

Return type

dict

distro_tracker.core.utils.packages.extract_dsc_file_name(stanza)[source]

Extracts the name of the .dsc file from a package’s Sources entry.

Parameters

stanza (dict) – The Sources entry from which to extract the VCS info. Maps Sources key names to values.

distro_tracker.core.utils.packages.extract_information_from_sources_entry(stanza)[source]

Extracts information from a Sources file entry and returns it in the form of a dictionary.

Parameters

stanza (Case-insensitive dict) – The raw entry’s key-value pairs.

distro_tracker.core.utils.packages.extract_information_from_packages_entry(stanza)[source]

Extracts information from a Packages file entry and returns it in the form of a dictionary.

Parameters

stanza (Case-insensitive dict) – The raw entry’s key-value pairs.

exception distro_tracker.core.utils.packages.SourcePackageRetrieveError[source]

Bases: Exception

class distro_tracker.core.utils.packages.AptCache[source]

Bases: object

A class for handling cached package information.

DEFAULT_MAX_SIZE = 1073741824
QUILT_FORMAT = '3.0 (quilt)'
class AcquireProgress(*args, **kwargs)[source]

Bases: apt.progress.base.AcquireProgress

Instances of this class can be passed to apt.cache.Cache.update() calls. It provides a way to track which files were changed and which were not by an update operation.

done(item)[source]

Invoked when an item is successfully and completely fetched.

ims_hit(item)[source]

Invoked when an item is confirmed to be up-to-date.

Invoked when an item is confirmed to be up-to-date. For instance, when an HTTP download is informed that the file on the server was not modified.

pulse(owner)[source]

Periodically invoked while the Acquire process is underway.

This method gets invoked while the Acquire progress given by the parameter ‘owner’ is underway. It should display information about the current state.

This function returns a boolean value indicating whether the acquisition should be continued (True) or cancelled (False).

source_cache_directory

The directory where source package files are cached

property cache_size
get_directory_size(directory_path)[source]

Returns the total space taken by the given directory in bytes.

Parameters

directory_path (string) – The path to the directory

Return type

int

clear_cache()[source]

Removes all cache information. This causes the next update to retrieve fresh repository files.

update_sources_list()[source]

Updates the sources.list file used to list repositories for which package information should be cached.

update_apt_conf()[source]

Updates the apt.conf file which gives general settings for the apt.cache.Cache.

In particular, this updates the list of all architectures which should be considered in package updates based on architectures that the repositories support.

configure_cache()[source]

Configures the cache based on the most current repository information.

get_cached_files(filter_function=None)[source]

Returns cached files, optionally filtered by the given filter_function

Parameters

filter_function (callable) – Takes a file name as the only parameter and returns a bool indicating whether it should be included in the result.

Returns

A list of cached file names

Return type

list

get_sources_files_for_repository(repository)[source]

Returns all Sources files which are cached for the given repository.

For instance, Sources files for different suites are cached separately.

Parameters

repository (Repository) – The repository for which to return all cached Sources files

Return type

iterable of strings

get_packages_files_for_repository(repository)[source]

Returns all Packages files which are cached for the given repository.

For instance, Packages files for different suites are cached separately.

Parameters

repository (Repository) – The repository for which to return all cached Packages files

Return type

iterable of strings

update_repositories(force_download=False)[source]

Initiates a cache update.

Parameters

force_download – If set to True causes the cache to be cleared before starting the update, thus making sure all index files are downloaded again.

Returns

A two-tuple (updated_sources, updated_packages). Each of the tuple’s members is a list of (Repository, component, file_name) tuple representing the repository which was updated, component, and the file which contains the fresh information. The file is either a Sources or a Packages file respectively.

get_package_source_cache_directory(package_name)[source]

Returns the path to the directory where a particular source package is cached.

Parameters

package_name (string) – The name of the source package

Return type

string

get_source_version_cache_directory(package_name, version)[source]

Returns the path to the directory where a particular source package version files are extracted.

Parameters
  • package_name (string) – The name of the source package

  • version (string) – The version of the source package

Return type

string

clear_cached_sources()[source]

Clears all cached package source files.

retrieve_source(source_name, version, debian_directory_only=False)[source]

Retrieve the source package files for the given source package version.

Parameters
  • source_name (string) – The name of the source package

  • version (string) – The version of the source package

  • debian_directory_only (Boolean) – Flag indicating if the method should try to retrieve only the debian directory of the source package. This is usually only possible when the package format is 3.0 (quilt).

Returns

The path to the directory containing the extracted source package files.

Return type

string

distro_tracker.core.utils.packages.html_package_list(packages)[source]

Return a HTML-formatted list of packages.

distro_tracker.core.utils.plugins

Classes to build a plugin mechanism.

class distro_tracker.core.utils.plugins.PluginRegistry(name, bases, attrs)[source]

Bases: type

A metaclass which any class that wants to behave as a registry can use.

When classes derived from classes which use this metaclass are instantiated, they are added to the list plugins. The concrete classes using this metaclass are free to decide how to use this list.

This metaclass also adds an unregister_plugin() classmethod to all concrete classes which removes the class from the list of plugins.

distro_tracker.core.utils.urls

Utilities for generating URLs of various kinds

distro_tracker.core.utils.urls.RepologyUrl(target_page, repo, package)[source]

Build a repology.org URL

distro_tracker.core.utils.urls.RepologyVersionsUrl(repo, package)[source]

Build a repology.org URL for the project_versions page

distro_tracker.core.utils.urls.RepologyPackagesUrl(repo, package)[source]

Build a repology.org URL for the project_packages page

distro_tracker.core.utils.verp

Module for encoding and decoding Variable Envelope Return Path addresses.

It is implemented following the recommendations laid out in VERP and https://www.courier-mta.org/draft-varshavchik-verp-smtpext.txt

>>> from distro_tracker.core.utils import verp
>>> str(verp.encode('itny-out@domain.com', 'node42!ann@old.example.com'))
'itny-out-node42+21ann=old.example.com@domain.com'
>>> map(str, decode('itny-out-node42+21ann=old.example.com@domain.com'))
['itny-out@domain.com', 'node42!ann@old.example.com']
distro_tracker.core.utils.verp.encode(sender_address, recipient_address, separator='-')[source]

Encodes sender_address, recipient_address to a VERP compliant address to be used as the envelope-from (return-path) address.

Parameters
  • sender_address (string) – The email address of the sender

  • recipient_address (string) – The email address of the recipient

  • separator – The separator to be used between the sender’s local part and the encoded recipient’s local part in the resulting VERP address.

Return type

string

>>> str(encode('itny-out@domain.com', 'node42!ann@old.example.com'))
'itny-out-node42+21ann=old.example.com@domain.com'
>>> str(encode('itny-out@domain.com', 'tom@old.example.com'))
'itny-out-tom=old.example.com@domain.com'
>>> str(encode('itny-out@domain.com', 'dave+priority@new.example.com'))
'itny-out-dave+2Bpriority=new.example.com@domain.com'
>>> str(encode('bounce@dom.com', 'user+!%-:@[]+@other.com'))
'bounce-user+2B+21+25+2D+3A+40+5B+5D+2B=other.com@dom.com'
distro_tracker.core.utils.verp.decode(verp_address, separator='-')[source]

Decodes the given VERP encoded from address and returns the original sender address and recipient address, returning them as a tuple.

Parameters
  • verp_address – The return path address

  • separator – The separator to be expected between the sender’s local part and the encoded recipient’s local part in the given verp_address

>>> from_email, to_email = 'bounce@domain.com', 'user@other.com'
>>> decode(encode(from_email, to_email)) == (from_email, to_email)
True
>>> map(str, decode('itny-out-dave+2Bpriority=new.example.com@domain.com'))
['itny-out@domain.com', 'dave+priority@new.example.com']
>>> map(str, decode('itny-out-node42+21ann=old.example.com@domain.com'))
['itny-out@domain.com', 'node42!ann@old.example.com']
>>> map(str, decode('bounce-addr+2B40=dom.com@asdf.com'))
['bounce@asdf.com', 'addr+40@dom.com']
>>> s = 'bounce-user+2B+21+25+2D+3A+40+5B+5D+2B=other.com@dom.com'
>>> str(decode(s)[1])
'user+!%-:@[]+@other.com'