Commands

All commands start with the newmap prefix. Help with is available for each command by running newmap <command> --help.

index

Generates an index file for a given sequence file.

Positional Arguments

  • fasta_file: The name of the input sequence file to generate an index for. Required.

Options

  • output: The name of the index file to generate. Defaults to the name of the sequence file with the extension replaced with .awfmi. If the output file already exists, it will be overwritten.

FM-index parameters

  • compression-ratio: The compression ratio of the suffix array. Defaults to 8.

  • seed-length: The length of the k-mer in the seed table. Defaults to 12.

Example:

$ newmap index hg38.fa

This will generate an index file named hg38.awfmi for the sequence hg38.fa.

FM-index technical details

The default parameters are the recommended set to be used for matching dinucleotide sequences and likely do not need to be changed. The parameters may be changed for technical reasons trading off disk space and/or memory available to adjust performance. Each increase in the compression-ratio reduces the index file size at the cost of number of operations to get a count on the occurrences of a given k-mer. Each increase in the seed-length increases the memory required to speed up k-mer searches in the index. Each increase by 1, multiplies the memory usage of the index by 4.

track

Generates mappability tracks from one or more given unique files (see Unique File Format). There are two types of mappability files that can be generated:

  1. Single-read mappability (see Single-read mappability)

  2. Multi-read mappability (see Multi-read mappability)

Positional Arguments

  • read_length: The read length to generate mappability tracks for. Defaults to 24.

  • unique_count_files: One or more unique count files to generate mappability from. The resulting mappability from each unique file will be appended to files specified by the single-read and multi-read options.

Options

  • single-read: The name of the BED file to write the single-read mappability to. Specify - for stdout. Defaults to - if multi-read is not specified, otherwise nothing.

  • multi-read: The name of the WIG file to write the multi-read

    mappability to. Specify - for stdout.

  • verbose: Print verbose output. Default is False.

Note

Only single-read or multi-read can output to stdout when both are specified on the command line.

Mappability datasets

The mappability datasets are generated from the minimum unique length dataset and defined for a given k-mer length.

Single-read mappability

Single-read mappability is a binary value (0 or 1) for each position in the sequence where a 1 signifies that there exists for a length k, at least 1 unique k-mer that overlaps that position and 0 otherwise.

The resulting BED file from this command will place the resulting binary value in the “score” column of the BED file.

Multi-read mappability

Multi-read mappability is a floating point value between 0 and 1 for each position in the sequence. Each value represents the fraction of sequence positions that have a unique k-mer length which overlap that sequence position. For example, for a given sequence position for a k-mer length of 24, if all 24-mers that overlap that position are also unique at their respective positions, the resulting value will be 1. If only 12 24-mers (half the amount) are unique at their respective positions, the resulting value will be 0.5. All values are put into a WIG file. The WIG file will have a “fixedStep” format and may be very large.

Example:

$ newmap track --multi-read=k24_multiread_mappability.wig --single-read=k24_singleread_mappability.bed 24 chr*.unique.uint8