.. _unique-file-format: Unique File Format ================== The ``unique`` file format is a naive binary file format that stores at each position the minimum length k, a k-mer sequence is unique from the sequence ID (e.g. chr1) matched against a generated index (see :ref:`index`). The range of values for k depends on the parameters of :ref:`search`. Values of 0 are positions in the sequence where no unique length was found. ``Unique`` files can be used to generate mappability datasets for a given k-mer size since a k-mer that is unique to that sequence will also be assumed to be unique for a "k+1"-mer. The conversion to mappability datasets for a given k-mer length is done with the :ref:`track` tool. ---------- Data types ---------- The suffix of the filename specifies the type of the underlying binary data. The suffix ``uint8`` specifies that each minimum length is represented with a single unsigned 8 bit integer (1 byte each), and ``uint16`` likewise has each length represented by unsigned 16 bit integer (2 bytes each). No other data is stored in the file. The data type is chosen based on the maximum length specified in range specified to :ref:`search`. For example, in a search range from 20 to 255, the maximum unique minimum length is less than or equal to 255 (which is the maximum value that can be represented with an unsigned byte), therefore the ``uint8`` format will be used. ----- Usage ----- Since the file format is simple, it can be easily processed with any language. Below is an example of how to read a ``unique.uint8`` file in Python using numpy. .. code-block:: python import numpy as np unique_lengths = np.fromfile('chr1.unique.uint8', dtype=np.uint8) # Print fraction of unique lengths found print(np.count_nonzero(unique_lengths) / len(unique_lengths))