Description
Assgin taxonomy and function to placed query sequences
Epilog
Semidán Robaina Estévez (srobaina@ull.edu.es), 2021
Usage:
usage: metatag assign [-h] [args]
Arguments
short | long | default | help |
---|---|---|---|
-h |
--help |
show this help message and exit | |
--jplace |
None |
path to placements jplace file | |
--labels |
None |
path to label dict in pickle format. More than one space-separated path can be input | |
--query_labels |
None |
path to query label dict in pickle format. More than one space-separated path can be input | |
--ref_clusters |
None |
path to tsv file containing cluster assignment to each reference sequence id. Must contain one column named "id" and another (tab-separated) column named "cluster" | |
--ref_cluster_scores |
None |
path to tsv file containing cluster quality scores assigned to each cluster in the reference tree. Contains one column named "cluster" and another (tab-separated) column named "score" | |
--outgroup |
None |
path to text file containing IDs of sequences to be considered as an outgroup to root the tree. It can also be a fasta file from which sequence names will be extracted. It can also be a string containing a tag to filter record labels by it. The outgroup will be used to recover missing taxonomic infomation by gappa examine assign. | |
--prefix |
placed_tax_ |
prefix to be added to output files | |
--outdir |
None |
path to output directory | |
--max_placement_distance |
None |
Maximum allowed pendant distance to consider a placement as valid. Change distance measure with parameter: "distance_measure" (defaults to pendant length) | |
--distance_measure |
pendant |
Choose distance measure to remove placements with distance larger than "max_placement_distance". Choose among: 1. "pendant": corresponding to pendant length of placement 2. "pendant_distal_ratio": ratio between pendant and distal distances 3. "pendant_diameter_ratio": ratio between pendant and tree diameter (largest pairwise distance) ratio. See https://github.com/lczech/gappa/wiki for a description of distal and pendant lengths. | |
--min_placement_lwr |
None |
Minimum allowed placement LWR to consider a placement as valid. Values between 0 and 1. | |
--duplicated_query_ids |
None |
path to text file containing duplicated query ids as output by seqkit rmdup | |
--taxonomy_file |
None |
path to tsv containing taxonomy, formated like GTDB taxopaths, for each genome ID in reference database. Defaults to None, in which case a custom GTDB taxonomy database of marine prokaryotes is used. |