utils Package

utils Package

Various utilities for the distro-tracker project.

class distro_tracker.core.utils.PrettyPrintList(the_list=None, delimiter=' ')[source]

Bases: object

A class which wraps the built-in list object so that when it is converted to a string, its contents are printed using the given delimiter.

The default delimiter is a space.

>>> a = PrettyPrintList([1, 2, 3])
>>> print(a)
1 2 3
>>> print(PrettyPrintList([u'one', u'2', u'3']))
one 2 3
>>> print(PrettyPrintList([1, 2, 3], delimiter=', '))
1, 2, 3
>>> # Still acts as a list
>>> a == [1, 2, 3]
True
>>> a == ['1', '2', '3']
False
class distro_tracker.core.utils.SpaceDelimitedTextField(verbose_name=None, name=None, primary_key=False, max_length=None, unique=False, blank=False, null=False, db_index=False, rel=None, default=<class 'django.db.models.fields.NOT_PROVIDED'>, editable=True, serialize=True, unique_for_date=None, unique_for_month=None, unique_for_year=None, choices=None, help_text='', db_column=None, db_tablespace=None, auto_created=False, validators=(), error_messages=None)[source]

Bases: django.db.models.fields.TextField

A custom Django model field which stores a list of strings.

It stores the list in a TextField as a space delimited list. It is marshalled back to a PrettyPrintList in the Python domain.

description = 'Stores a space delimited list of strings'
from_db_value(value, expression, connection, context)[source]
get_db_prep_value(value, **kwargs)[source]
get_prep_value(value, **kwargs)[source]
to_python(value)[source]
value_to_string(obj)[source]
distro_tracker.core.utils.VCS_SHORTHAND_TO_NAME = {'svn': 'Subversion', 'cvs': 'CVS', 'git': 'Git', 'darcs': 'Darcs', 'mtn': 'Monotone', 'bzr': 'Bazaar', 'hg': 'Mercurial'}

A map of currently available VCS systems’ shorthands to their names.

distro_tracker.core.utils.distro_tracker_render_to_string(template_name, context=None)[source]

A custom function to render a template to a string which injects extra distro-tracker specific information to the context, such as the name of the derivative.

This function is necessary since Django’s TEMPLATE_CONTEXT_PROCESSORS, whereas this function can be called independently from any HTTP request.

distro_tracker.core.utils.get_or_none(model, **kwargs)[source]

Gets a Django Model object from the database or returns None if it does not exist.

distro_tracker.core.utils.get_vcs_name(shorthand)[source]

Returns a full name for the VCS given its shorthand.

If the given shorthand is unknown an empty string is returned.

Parameters:shorthand – The shorthand of a VCS for which a name is required.
Return type:string
distro_tracker.core.utils.now()[source]

Returns datetime.datetime.now() and can be easily mocked out for tests.

distro_tracker.core.utils.render_to_json_response(response)[source]

Helper function creating an HttpResponse by serializing the given response object to a JSON string.

The resulting HTTP response has Content-Type set to application/json.

Parameters:response – The object to be serialized in the response. It must be serializable by the json module.
Return type:HttpResponse
distro_tracker.core.utils.verify_signature(content)[source]

The function extracts any possible signature information found in the given content.

Uses the DISTRO_TRACKER_KEYRING_DIRECTORY setting to access the keyring. If this setting does not exist, no signatures can be validated.

Returns:Information about the signers of the content as a list or None if there is no (valid) signature.
Return type:list of (name, email) pairs or None

datastructures Module

Utility data structures for Distro Tracker.

class distro_tracker.core.utils.datastructures.DAG[source]

Bases: object

A class representing a Directed Acyclic Graph.

Allows clients to build up a DAG where the nodes of the graph are any type of object which can be placed in a dictionary.

class Node(id, original)[source]

Bases: object

DAG.add_edge(node1, node2)[source]

Adds an edge between two nodes.

Raises:InvalidDAGException – If the edge would introduce a cycle in the graph structure.
DAG.add_node(node)[source]

Adds a new node to the graph.

DAG.all_nodes

Returns a list of all nodes in the DAG.

DAG.dependent_nodes(node)[source]

Returns all nodes which are directly dependent on the given node, i.e. returns a set of all nodes N where there exists an edge(node, N) in the DAG.

DAG.graph = None

Represents the graph structure of the DAG as an adjacency list

DAG.in_degree = None

Holds the in-degree of each node to allow constant-time lookups instead of iterating through all nodes in the graph.

DAG.nodes_map = None

Maps original node objects to their internal representation

DAG.nodes_reachable_from(node)[source]

Returns a set of all nodes reachable from the given node.

DAG.remove_node(node)[source]

Removes a given node from the graph.

The node parameter can be either the internal Node type or the node as the client sees them.

DAG.replace_node(original_node, replacement_node)[source]

Replaces a node already present in the graph original_node by a new object. The internal representation of the DAG remains the same, except the new object now takes the place of the original one.

DAG.topsort_nodes()[source]

Generator which returns DAG nodes in toplogical sort order.

exception distro_tracker.core.utils.datastructures.InvalidDAGException[source]

Bases: Exception

email_messages Module

Module including some utility functions and classes for manipulating email.

class distro_tracker.core.utils.email_messages.CustomEmailMessage(msg=None, *args, **kwargs)[source]

Bases: django.core.mail.message.EmailMessage

A subclass of django.core.mail.EmailMessage which can be fed an email.message.Message instance to define the body of the message.

If msg is set, the body attribute is ignored.

If the user wants to attach additional parts to the message, the attach() method can be used but the user must ensure that the given msg instance is a multipart message before doing so.

Effectively, this is also a wrapper which allows sending instances of email.message.Message via Django email backends.

message()[source]

Returns the underlying email.message.Message object. In case the user did not set a msg attribute for this instance the parent EmailMessage.message method is used.

distro_tracker.core.utils.email_messages.decode_header(header, default_encoding='utf-8')[source]

Decodes an email message header and returns it coded as a unicode string.

This is necessary since it is possible that a header is made of multiple differently encoded parts which makes email.header.decode_header() insufficient.

distro_tracker.core.utils.email_messages.extract_email_address_from_header(header)[source]

Extracts the email address from the From email header.

>>> str(extract_email_address_from_header('Real Name <foo@domain.com>'))
'foo@domain.com'
>>> str(extract_email_address_from_header('foo@domain.com'))
'foo@domain.com'
distro_tracker.core.utils.email_messages.get_decoded_message_payload(message, default_charset='utf-8')[source]

Extracts the payload of the given email.message.Message and returns it decoded based on the Content-Transfer-Encoding and charset.

distro_tracker.core.utils.email_messages.message_from_bytes(message_bytes)[source]

Returns a live-patched email.Message object from the given bytes.

The changes ensure that parsing the message’s bytes with this method and then returning them by using the returned object’s as_string method is an idempotent operation.

An as_bytes method is also created since Django’s SMTP backend relies on this method (which is usually brought by its own django.core.mail.SafeMIMEText object but that we don’t use in our CustomEmailMessage).

distro_tracker.core.utils.email_messages.name_and_address_from_string(content)[source]

Takes an address in almost-RFC822 format and turns it into a dict {‘name’: real_name, ‘email’: email_address}

The difference with email.utils.parseaddr and rfc822.parseaddr is that this routine allows unquoted commas to appear in the real name (in violation of RFC822).

distro_tracker.core.utils.email_messages.names_and_addresses_from_string(content)[source]

Takes a string with addresses in RFC822 format and returns a list of dicts {‘name’: real_name, ‘email’: email_address} It tries to be forgiving about unquoted commas in addresses.

distro_tracker.core.utils.email_messages.patch_message_for_django_compat(message)[source]

Live patch the email.message.Message object passed as parameter so that: - the as_string() method return the same set of bytes it has been parsed

from (to preserve as much as possible the original message)
  • the as_bytes() is added too (this method is expected by Django’s SMTP backend)
distro_tracker.core.utils.email_messages.unfold_header(header)[source]

Unfolding is the process to remove the line wrapping added by mail agents. A header is a single logical line and they are not allowed to be multi-line values.

We need to unfold their values in particular when we want to reuse the values to compose a reply message as Python’s email API chokes on those newline characters.

If header is None, the return value is None as well.

:param:header: the header value to unfold :type param: str :returns: the unfolded version of the header. :rtype: str

http Module

Utilities for handling HTTP resource access.

class distro_tracker.core.utils.http.HttpCache(cache_directory_path)[source]

Bases: object

A class providing an interface to a cache of HTTP responses.

get_content(url, compression='auto')[source]

Returns the content of the cached response for the given URL.

If the file is compressed, then uncompress it, else, consider it as plain file.

Parameters:compression (str) – Specifies the compression method used to generate the resource, and thus the compression method one should use to decompress it.
Return type:bytes
get_headers(url)[source]

Returns the HTTP headers of the cached response for the given URL.

Return type:dict
is_expired(url)[source]

If the cached response for the given URL is expired based on Cache-Control or Expires headers, returns True.

remove(url)[source]

Removes the cached response for the given URL.

update(url, force=False)[source]

Performs an update of the cached resource. This means that it validates that its most current version is found in the cache by doing a conditional GET request.

Parameters:force – To force the method to perform a full GET request, set the parameter to True
Returns:The original HTTP response and a Boolean indicating whether the cached value was updated.
Return type:two-tuple of (requests.Response, Boolean)
distro_tracker.core.utils.http.get_resource_content(url, cache=None, compression='auto', only_if_updated=False)[source]

A helper function which returns the content of the resource found at the given URL.

If the resource is already cached in the cache object and the cached content has not expired, the function will not do any HTTP requests and will return the cached content.

If the resource is stale or not cached at all, it is from the Web.

Parameters:
  • url – The URL of the resource to be retrieved
  • cache (HttpCache or an object with an equivalent interface) – A cache object which should be used to look up and store the cached resource. If it is not provided, an instance of HttpCache with a DISTRO_TRACKER_CACHE_DIRECTORY cache directory is used.
  • compression (str) – Specifies the compression method used to generate the resource, and thus the compression method one should use to decompress it. If auto, then guess it from the url file extension.
  • only_if_updated (bool) – if set to True returns None when no update is done. Otherwise, returns the content in any case.
Returns:

The bytes representation of the resource found at the given url

Return type:

bytes

distro_tracker.core.utils.http.get_resource_text(*args, **kwargs)[source]

Clone of get_resource_content() which transparently decodes the downloaded content into text. It supports the same parameters and adds the encoding parameter.

Parameters:encoding (str) – Specifies an encoding to decode the resource content.
Returns:The textual representation of the resource found at the given url.
Return type:str
distro_tracker.core.utils.http.parse_cache_control_header(header)[source]

Parses the given Cache-Control header’s values.

Returns:The key-value pairs found in the header. If some key did not have an associated value in the header, None is used instead.
Return type:dict

packages Module

Utilities for processing Debian package information.

class distro_tracker.core.utils.packages.AptCache[source]

Bases: object

A class for handling cached package information.

class AcquireProgress(*args, **kwargs)[source]

Bases: apt.progress.base.AcquireProgress

Instances of this class can be passed to apt.cache.Cache.update() calls. It provides a way to track which files were changed and which were not by an update operation.

done(item)[source]
ims_hit(item)[source]
pulse(owner)[source]
AptCache.DEFAULT_MAX_SIZE = 1073741824
AptCache.QUILT_FORMAT = '3.0 (quilt)'
AptCache.cache_size
AptCache.clear_cache()[source]

Removes all cache information. This causes the next update to retrieve fresh repository files.

AptCache.clear_cached_sources()[source]

Clears all cached package source files.

AptCache.configure_cache()[source]

Configures the cache based on the most current repository information.

AptCache.get_cached_files(filter_function=None)[source]

Returns cached files, optionally filtered by the given filter_function

Parameters:filter_function (callable) – Takes a file name as the only parameter and returns a bool indicating whether it should be included in the result.
Returns:A list of cached file names
Return type:list
AptCache.get_directory_size(directory_path)[source]

Returns the total space taken by the given directory in bytes.

Parameters:directory_path (string) – The path to the directory
Return type:int
AptCache.get_package_source_cache_directory(package_name)[source]

Returns the path to the directory where a particular source package is cached.

Parameters:package_name (string) – The name of the source package
Return type:string
AptCache.get_packages_files_for_repository(repository)[source]

Returns all Packages files which are cached for the given repository.

For instance, Packages files for different suites are cached separately.

Parameters:repository (Repository) – The repository for which to return all cached Packages files
Return type:iterable of strings
AptCache.get_source_version_cache_directory(package_name, version)[source]

Returns the path to the directory where a particular source package version files are extracted.

Parameters:
  • package_name (string) – The name of the source package
  • version (string) – The version of the source package
Return type:

string

AptCache.get_sources_files_for_repository(repository)[source]

Returns all Sources files which are cached for the given repository.

For instance, Sources files for different suites are cached separately.

Parameters:repository (Repository) – The repository for which to return all cached Sources files
Return type:iterable of strings
AptCache.retrieve_source(source_name, version, debian_directory_only=False)[source]

Retrieve the source package files for the given source package version.

Parameters:
  • source_name (string) – The name of the source package
  • version (string) – The version of the source package
  • debian_directory_only (Boolean) – Flag indicating if the method should try to retrieve only the debian directory of the source package. This is usually only possible when the package format is 3.0 (quilt).
Returns:

The path to the directory containing the extracted source package files.

Return type:

string

AptCache.source_cache_directory = None

The directory where source package files are cached

AptCache.update_apt_conf()[source]

Updates the apt.conf file which gives general settings for the apt.cache.Cache.

In particular, this updates the list of all architectures which should be considered in package updates based on architectures that the repositories support.

AptCache.update_repositories(force_download=False)[source]

Initiates a cache update.

Parameters:force_download – If set to True causes the cache to be cleared before starting the update, thus making sure all index files are downloaded again.
Returns:A two-tuple (updated_sources, updated_packages). Each of the tuple’s members is a list of (Repository,
file_name) pairs representing the repository which was updated

and the file which contains the fresh information. The file is either a Sources or a Packages file, respectively.

AptCache.update_sources_list()[source]

Updates the sources.list file used to list repositories for which package information should be cached.

exception distro_tracker.core.utils.packages.SourcePackageRetrieveError[source]

Bases: Exception

distro_tracker.core.utils.packages.extract_dsc_file_name(stanza)[source]

Extracts the name of the .dsc file from a package’s Sources entry.

Parameters:stanza (dict) – The Sources entry from which to extract the VCS info. Maps Sources key names to values.
distro_tracker.core.utils.packages.extract_information_from_packages_entry(stanza)[source]

Extracts information from a Packages file entry and returns it in the form of a dictionary.

Parameters:stanza (Case-insensitive dict) – The raw entry’s key-value pairs.
distro_tracker.core.utils.packages.extract_information_from_sources_entry(stanza)[source]

Extracts information from a Sources file entry and returns it in the form of a dictionary.

Parameters:stanza (Case-insensitive dict) – The raw entry’s key-value pairs.
distro_tracker.core.utils.packages.extract_vcs_information(stanza)[source]

Extracts the VCS information from a package’s Sources entry.

Parameters:stanza (dict) – The Sources entry from which to extract the VCS info. Maps Sources key names to values.
Returns:VCS information regarding the package. Contains the following keys: type[, browser, url]
Return type:dict
distro_tracker.core.utils.packages.package_hashdir(package_name)[source]

Returns the name of the hash directory used to avoid having too many entries in a single directory. It’s usually the first letter of the package except for lib* packages where it’s the first 4 letters.

Parameters:package_name (str) – The package name.
Returns:Name of the hash directory.
Return type:str

plugins Module

class distro_tracker.core.utils.plugins.PluginRegistry(name, bases, attrs)[source]

Bases: type

A metaclass which any class that wants to behave as a registry can use.

When classes derived from classes which use this metaclass are instantiated, they are added to the list plugins. The concrete classes using this metaclass are free to decide how to use this list.

This metaclass also adds an unregister_plugin() classmethod to all concrete classes which removes the class from the list of plugins.

verp Module

Module for encoding and decoding Variable Envelope Return Path addresses.

It is implemented following the recommendations laid out in VERP and http://www.courier-mta.org/draft-varshavchik-verp-smtpext.txt

>>> from distro_tracker.core.utils import verp
>>> str(verp.encode('itny-out@domain.com', 'node42!ann@old.example.com'))
'itny-out-node42+21ann=old.example.com@domain.com'
>>> map(str, decode('itny-out-node42+21ann=old.example.com@domain.com'))
['itny-out@domain.com', 'node42!ann@old.example.com']
distro_tracker.core.utils.verp.encode(sender_address, recipient_address, separator='-')[source]

Encodes sender_address, recipient_address to a VERP compliant address to be used as the envelope-from (return-path) address.

Parameters:
  • sender_address (string) – The email address of the sender
  • recipient_address (string) – The email address of the recipient
  • separator – The separator to be used between the sender’s local part and the encoded recipient’s local part in the resulting VERP address.
Return type:

string

>>> str(encode('itny-out@domain.com', 'node42!ann@old.example.com'))
'itny-out-node42+21ann=old.example.com@domain.com'
>>> str(encode('itny-out@domain.com', 'tom@old.example.com'))
'itny-out-tom=old.example.com@domain.com'
>>> str(encode('itny-out@domain.com', 'dave+priority@new.example.com'))
'itny-out-dave+2Bpriority=new.example.com@domain.com'
>>> str(encode('bounce@dom.com', 'user+!%-:@[]+@other.com'))
'bounce-user+2B+21+25+2D+3A+40+5B+5D+2B=other.com@dom.com'
distro_tracker.core.utils.verp.decode(verp_address, separator='-')[source]

Decodes the given VERP encoded from address and returns the original sender address and recipient address, returning them as a tuple.

Parameters:
  • verp_address – The return path address
  • separator – The separator to be expected between the sender’s local part and the encoded recipient’s local part in the given verp_address
>>> from_email, to_email = 'bounce@domain.com', 'user@other.com'
>>> decode(encode(from_email, to_email)) == (from_email, to_email)
True
>>> map(str, decode('itny-out-dave+2Bpriority=new.example.com@domain.com'))
['itny-out@domain.com', 'dave+priority@new.example.com']
>>> map(str, decode('itny-out-node42+21ann=old.example.com@domain.com'))
['itny-out@domain.com', 'node42!ann@old.example.com']
>>> map(str, decode('bounce-addr+2B40=dom.com@asdf.com'))
['bounce@asdf.com', 'addr+40@dom.com']
>>> s = 'bounce-user+2B+21+25+2D+3A+40+5B+5D+2B=other.com@dom.com'
>>> str(decode(s)[1])
'user+!%-:@[]+@other.com'