Module api
Classes to facilitate usage within Python scripts / Notebooks
Build
Bases: Command
Build command object.
Source code in src/pynteny/api.py
170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 |
|
__init__(data, prepend=False, outfile=None, logfile=None, processes=None, tempdir=None)
Translate nucleotide assembly file and assign contig and gene location info to each identified ORF (using prodigal).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data |
Path
|
description |
required |
outfile |
Path
|
path to file containing built database. Defaults to None. |
None
|
logfile |
Path
|
path to logfile. Defaults to None. |
None
|
processes |
int
|
maximum number of processes. Defaults to all minus one. |
None
|
tempdir |
Path
|
path to directory to contain temporary files. Defaults to None. |
None
|
Source code in src/pynteny/api.py
173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 |
|
run()
Run pynteny search
Source code in src/pynteny/api.py
202 203 204 |
|
Command
Parent class for Pynteny command
Source code in src/pynteny/api.py
26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 |
|
cite()
staticmethod
Display Pynteny citation
Source code in src/pynteny/api.py
57 58 59 60 61 62 |
|
update(argname, value)
Update argument value in pynteny command.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
argname |
str
|
argument name to be updated. |
required |
value |
str
|
new argument value. |
required |
Source code in src/pynteny/api.py
48 49 50 51 52 53 54 55 |
|
Download
Bases: Command
Download HMM database from NCBI.
Source code in src/pynteny/api.py
207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 |
|
__init__(outdir, logfile=None, force=False, unpack=False)
Download HMM database from NCBI.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
outdir |
Path
|
path to ouput directory in which to store downloaded HMMs. |
required |
logfile |
Path
|
path to log file. Defaults to None. |
None
|
force |
bool
|
force-download database again if already downloaded. |
False
|
unpack |
bool
|
whether to unpack downloaded file. If False, then PGAP's database. will be unpacked in each Pynteny session. Defaults to False. |
False
|
Source code in src/pynteny/api.py
210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 |
|
run()
Run pynteny download
Source code in src/pynteny/api.py
234 235 236 |
|
Search
Bases: Command
Search command object.
Source code in src/pynteny/api.py
65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 |
|
__init__(data, synteny_struc, gene_ids=False, unordered=False, reuse=False, hmm_dir=None, hmm_meta=None, outdir=None, prefix='', hmmsearch_args=None, logfile=None, processes=None)
Query sequence database for HMM hits arranged in provided synteny structure.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data |
Path
|
path to input labelled database. |
required |
synteny_struc |
str
|
a str describing the desired synteny structure, structured as follows: '>hmm_a N_ab hmm_b bc <hmm_c' where N_ab corresponds to the maximum number of genes separating gene found by hmm_a and gene found by hmm_b, and hmm_ corresponds to the name of the hmm as provided in the keys of hmm_hits. More than two hmms can be concatenated. Strand location may be specificed by using '>' for sense and '<' for antisense. |
required |
gene_ids |
bool
|
whether gene symbols are used in synteny string instead of HMM names. Defaults to False. |
False
|
unordered |
bool
|
whether the HMMs should be arranged in the exact same order displayed in the synteny_structure or in any order If ordered, the filters would filter collinear rather than syntenic structures. Defaults to False. |
False
|
reuse |
bool
|
if True then HMMER3 won't be run again for HMMs already searched in the same output directory. Defaults to False. |
False
|
hmm_dir |
Path
|
path to directory containing input HMM files. Defaults to None, in which case the PGAP database is downloaded if not already so. |
None
|
hmm_meta |
Path
|
path to PGAP's metadata file. Defaults to None. |
None
|
outdir |
Path
|
path to output directory. Defaults to None. |
None
|
prefix |
str
|
prefix of output file names. Defaults to "". |
''
|
hmmsearch_args |
str
|
additional arguments to hmmsearch or hmmscan. Each element in the list is a string with additional arguments for each input hmm (arranged in the same order), an element can also take a value of None to avoid passing additional arguments for a specific input hmm. A single string may also be passed, in which case the same additional argument is passed to hmmsearch for all input hmms. Defaults to None. Defaults to None. |
None
|
logfile |
Path
|
path to log file. Defaults to None. |
None
|
processes |
int
|
maximum number of threads to be employed. Defaults to all minus one. |
None
|
Source code in src/pynteny/api.py
68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 |
|
parse_genes(synteny_struc)
Parse gene IDs in synteny structure and find corresponding HMM names in provided HMM database.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
synteny_struc |
str
|
a str describing the desired synteny structure, structured as follows: '>hmm_a N_ab hmm_b bc <hmm_c' where N_ab corresponds to the maximum number of genes separating gene found by hmm_a and gene found by hmm_b, and hmm_ corresponds to the name of the hmm as provided in the keys of hmm_hits. More than two hmms can be concatenated. Strand location may be specificed by using '>' for sense and '<' for antisense. |
required |
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
parsed synteny structure in which gene symbols are replaced by corresponding HMM names. |
Source code in src/pynteny/api.py
138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 |
|
run()
Run pynteny search
Source code in src/pynteny/api.py
165 166 167 |
|
Module filter
Tools to filter HMM hits by synteny structure
SyntenyHMMfilter
Tools to search for synteny structures among sets of hmm models
Source code in src/pynteny/filter.py
144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 |
|
__init__(hmm_hits, synteny_structure, unordered=True)
Search for contigs that satisfy the given gene synteny structure.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
hmm_hits |
dict
|
a dict of pandas DataFrames, as output by parseHMMsearchOutput with keys corresponding to hmm names |
required |
synteny_structure |
str
|
a str describing the desired synteny structure, structured as follows: '>hmm_a N_ab hmm_b bc <hmm_c' where N_ab corresponds to the maximum number of genes separating gene found by hmm_a and gene found by hmm_b, and hmm_ corresponds to the name of the hmm as provided in the keys of hmm_hits. More than two hmms can be concatenated. Strand location may be specificed by using '>' for sense and '<' for antisense. |
required |
unordered |
bool
|
whether the HMMs should be arranged in the exact same order displayed in the synteny_structure or in any order. If ordered, the filters would filter collinear rather than syntenic structures. Defaults to True. |
True
|
Source code in src/pynteny/filter.py
147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 |
|
filter_hits_by_synteny_structure()
Search for contigs that satisfy the given gene synteny structure.
Returns:
Name | Type | Description |
---|---|---|
dict |
dict
|
HMMER3 hits separated by contig. |
Source code in src/pynteny/filter.py
282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 |
|
get_all_HMM_hits()
Group and preprocess all hit labels into a single dataframe.
Returns:
Type | Description |
---|---|
pd.DataFrame
|
pd.DataFrame: HMMER3 hit labels matching provided HMMs. |
Source code in src/pynteny/filter.py
254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 |
|
SyntenyHits
Store and manipulate synteny hits by contig
Source code in src/pynteny/filter.py
344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 |
|
hits: pd.DataFrame
property
Return synteny hits.
Returns:
Type | Description |
---|---|
pd.DataFrame
|
pd.DataFrame: Synteny hits as dataframe. |
__init__(synteny_hits)
Initialize from synteny hits object.
Source code in src/pynteny/filter.py
347 348 349 350 351 |
|
add_HMM_meta_info_to_hits(hmm_meta)
Add molecular metadata to synteny hits.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
hmm_meta |
Path
|
path to PGAP metadata file. |
required |
Returns:
Name | Type | Description |
---|---|---|
SyntenyHits |
SyntenyHits
|
and instance of class SyntenyHits. |
Source code in src/pynteny/filter.py
406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 |
|
from_hits_dict(hits_by_contig)
classmethod
Initialize SyntenyHits object from hits_by_contig dictionary.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
hits_by_contig |
dict
|
HMMER3 hit labels separated by contig name. |
required |
Returns:
Name | Type | Description |
---|---|---|
SyntenyHits |
SyntenyHits
|
initialized object of class SyntenyHits. |
Source code in src/pynteny/filter.py
385 386 387 388 389 390 391 392 393 394 395 |
|
write_hit_sequences_to_FASTA_files(sequence_database, output_dir, output_prefix=None)
Write matching sequences to FASTA files.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
sequence_database |
Path
|
path to the peptide or nucleotide sequence database in which the synteny search was conducted. |
required |
output_dir |
Path
|
path to output directory. |
required |
output_prefix |
str
|
prefix for output files. Defaults to None. |
None
|
Source code in src/pynteny/filter.py
444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 |
|
write_to_TSV(output_tsv)
Write synteny hits to a TSV file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
output_tsv |
Path
|
path to output tsv file. |
required |
Source code in src/pynteny/filter.py
435 436 437 438 439 440 441 442 |
|
SyntenyPatternFilters
Methods to filter hmm hits in the same contig by synteny structure or collinearity. These filters are inputs to pandas.Dataframe.rolling method.
Source code in src/pynteny/filter.py
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 |
|
__init__(synteny_structure, unordered=False)
Initialize filter class from synteny structure.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
synteny_structure |
str
|
a str describing the desired synteny structure, structured as follows: '>hmm_a N_ab hmm_b bc <hmm_c' where N_ab corresponds to the maximum number of genes separating gene found by hmm_a and gene found by hmm_b, and hmm_ corresponds to the name of the hmm as provided in the keys of hmm_hits. More than two hmms can be concatenated. Strand location may be specified by using '>' for sense and '<' for antisense. |
required |
unordered |
bool
|
whether the HMMs should be arranged in the exact same order displayed in the synteny_structure or in any order. If ordered, the filters would filter collinear rather than syntenic structures. Defaults to False. |
False
|
Source code in src/pynteny/filter.py
30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 |
|
contains_distance_pattern(data)
Check if series items satisfy the maximum distance between HMM hits.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data |
pd.Series
|
a series resulting from calling rolling on a pandas column. |
required |
Returns:
Name | Type | Description |
---|---|---|
int |
int
|
1 for True 0 for False. |
Source code in src/pynteny/filter.py
102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 |
|
contains_hmm_pattern(data)
Check if series items contain a profile HMM
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data |
pd.Series
|
a series resulting from calling rolling on a pandas column. |
required |
Returns:
Name | Type | Description |
---|---|---|
int |
int
|
1 for True 0 for False. |
Source code in src/pynteny/filter.py
88 89 90 91 92 93 94 95 96 97 98 99 100 |
|
contains_strand_pattern(data)
Check if series items satisfy the strand pattern between HMM hits.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data |
pd.Series
|
a series resulting from calling rolling on a pandas column. |
required |
Returns:
Name | Type | Description |
---|---|---|
int |
int
|
1 for True 0 for False. |
Source code in src/pynteny/filter.py
125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 |
|
filter_FASTA_by_synteny_structure(synteny_structure, input_fasta, input_hmms, unordered=False, hmm_meta=None, hmmer_output_dir=None, reuse_hmmer_results=True, method='hmmsearch', processes=None, additional_args=None)
Generate protein-specific database by filtering sequence database to only contain sequences which satisfy the provided (gene/hmm) structure.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
synteny_structure |
str
|
a str describing the desired synteny structure, structured as follows: '>hmm_a N_ab hmm_b bc <hmm_c' where N_ab corresponds to the maximum number of genes separating gene found by hmm_a and gene found by hmm_b, and hmm_ corresponds to the name of the hmm as provided in the keys of hmm_hits. More than two hmms can be concatenated. Strand location may be specificed by using '>' for sense and '<' for antisense. |
required |
input_fasta |
Path
|
input fasta containing sequence database to be searched. |
required |
input_hmms |
list[Path]
|
list containing paths to hmms contained in synteny structure. |
required |
unordered |
bool
|
whether HMM hits should follow the exact order displayed in the synteny structure string or not, i.e., whether to search for only synteny (colocation) or collinearity as well (same order). Defaults to False. |
False
|
hmm_meta |
Path
|
path to PGAP's metadata file. Defaults to None. |
None
|
hmmer_output_dir |
Path
|
output directory to store HMMER3 output files. Defaults to None. |
None
|
reuse_hmmer_results |
bool
|
if True then HMMER3 won't be run again for HMMs already searched in the same output directory. Defaults to True. |
True
|
method |
str
|
select between 'hmmsearch' or 'hmmscan'. Defaults to 'hmmsearch'. |
'hmmsearch'
|
processes |
int
|
maximum number of threads to be employed. Defaults to all minus one. |
None
|
additional_args |
list[str]
|
additional arguments to hmmsearch or hmmscan. Each element in the list is a string with additional arguments for each input hmm (arranged in the same order), an element can also take a value of None to avoid passing additional arguments for a specific input hmm. A single string may also be passed, in which case the same additional argument is passed to hmmsearch for all input hmms. Defaults to None. |
None
|
Returns:
Name | Type | Description |
---|---|---|
SyntenyHits |
SyntenyHits
|
object of class SyntenyHits containing labels matching synteny structure. |
Source code in src/pynteny/filter.py
503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 |
|
Module preprocessing
Tools to preprocess sequence databases
- Remove illegal characters from peptide sequences
- Remove illegal symbols from file paths
- Relabel fasta records and make dictionary with old labels
Database
_Sequence database constructor
Source code in src/pynteny/preprocessing.py
583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 |
|
__init__(data)
Initialize Database object
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data |
Path
|
path to either assembly fasta file (or a directory containing assembly fasta files) or a genbank file containing ORF annotations (or a directory containing genbank files) |
required |
Raises:
Type | Description |
---|---|
FileNotFoundError
|
if file or directory doesn't exist |
Source code in src/pynteny/preprocessing.py
586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 |
|
build(seq_prefix=None, prepend_file_name=False, output_file=None, processes=None, tempdir=None)
Build database from data files.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
seq_prefix |
str, optionall
|
prefix to be added to each sequence in database. Defaults to "". |
None
|
prepend_file_name |
bool
|
whether to add file name as genome ID to each record in the result merged fasta file. |
False
|
output_file |
Path
|
path to output file. Defaults to None. |
None
|
processes |
int
|
maximum number of threads. Defaults to all minus one. |
None
|
tmpdir |
Path
|
path to temporary directory. Defaults to tempfile default. |
required |
Returns:
Name | Type | Description |
---|---|---|
LabelledFASTA |
LabelledFASTA
|
object containing the labelled peptide database. |
Source code in src/pynteny/preprocessing.py
640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 |
|
is_fasta(filename)
staticmethod
Check if file is in fasta format
Parameters:
Name | Type | Description | Default |
---|---|---|---|
filename |
Path
|
path to input file |
required |
Returns:
Name | Type | Description |
---|---|---|
bool |
bool
|
whether the file is in fasta format |
Source code in src/pynteny/preprocessing.py
606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 |
|
is_gbk(filename)
staticmethod
Check if file is in genbank format
Parameters:
Name | Type | Description | Default |
---|---|---|---|
filename |
Path
|
path to input file |
required |
Returns:
Name | Type | Description |
---|---|---|
_bool |
bool
|
whether the file is in genbank format |
Source code in src/pynteny/preprocessing.py
623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 |
|
FASTA
Handle and process fasta files.
Source code in src/pynteny/preprocessing.py
137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 |
|
file_path: Path
property
writable
Set new path to fasta file
Parameters:
Name | Type | Description | Default |
---|---|---|---|
new_path |
Path
|
path to fasta file. Defaults to None. |
required |
Returns:
Name | Type | Description |
---|---|---|
Path |
Path
|
path to fasta file. |
__init__(input_file)
Initialize FASTA object.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_file |
Path
|
path to input fasta file. |
required |
Source code in src/pynteny/preprocessing.py
140 141 142 143 144 145 146 |
|
add_prefix_to_records(prefix, output_file=None, point_to_new_file=True)
Add prefix to sequence records in FASTA
Parameters:
Name | Type | Description | Default |
---|---|---|---|
prefix |
str
|
prefix to be added. |
required |
output_file |
Path
|
path to output filtered fasta file. Defaults to None. |
None
|
point_to_new_file |
bool
|
whether FASTA object should point to the newly generated file. Defaults to True. |
True
|
Source code in src/pynteny/preprocessing.py
335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 |
|
filter_by_IDs(record_ids, output_file=None, point_to_new_file=True)
Filter records in fasta file matching provided IDs.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
record_ids |
list
|
list of record IDs to keep of original fasta file. |
required |
output_file |
Path
|
path to output filtered fasta file. Defaults to None. |
None
|
point_to_new_file |
bool
|
whether FASTA object should point to the newly generated file. Defaults to True. |
True
|
Source code in src/pynteny/preprocessing.py
256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 |
|
filter_by_minimum_length(min_length, output_file=None, point_to_new_file=True)
Filter records in fasta file by minimum length.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
min_length |
int
|
minimal length of sequences to be kept in filtered fasta file. |
required |
output_file |
Path
|
path to output filtered fasta file. Defaults to None. |
None
|
point_to_new_file |
bool
|
whether FASTA object should point to the newly generated file. Defaults to True. |
True
|
Source code in src/pynteny/preprocessing.py
308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 |
|
from_FASTA_directory(input_dir, merged_fasta=None, prepend_file_name=False)
classmethod
Initialize FASTA class from directory of fasta files.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_dir |
Path
|
path to input directory. |
required |
merged_fasta |
Path
|
path to output merged fasta. Defaults to None. |
None
|
prepend_file_name |
bool
|
whether to add file name as genome ID to each record in the result merged fasta file. |
False
|
Returns:
Name | Type | Description |
---|---|---|
FASTA |
FASTA
|
an initialized instance of class FASTA. |
Source code in src/pynteny/preprocessing.py
164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 |
|
remove_corrupted_sequences(output_file=None, is_peptide=True, keep_stop_codon=False, point_to_new_file=True)
Filter out (DNA or peptide) sequences containing illegal characters.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
output_file |
Path
|
path to output fasta file. Defaults to None. |
None
|
is_peptide |
bool
|
select if input is a peptide sequence, otherwise taken as nucleotide. Defaults to True. |
True
|
keep_stop_codon |
bool
|
whether to keep the stop codon in the peptide sequence. Defaults to False. |
False
|
point_to_new_file |
bool
|
whether FASTA object should point to the newly generated file. Defaults to True. |
True
|
Source code in src/pynteny/preprocessing.py
218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 |
|
remove_duplicates(output_file=None, export_duplicates=False, point_to_new_file=True)
Removes duplicate entries (either by sequence or ID) from fasta.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
output_file |
Path
|
path to output fasta file. Defaults to None. |
None
|
export_duplicates |
bool
|
whether duplicated records are exported to a file. Defaults to False. |
False
|
point_to_new_file |
bool
|
whether FASTA object should point to the newly generated file. Defaults to True. |
True
|
Yields:
Name | Type | Description |
---|---|---|
None |
None
|
None |
Source code in src/pynteny/preprocessing.py
187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 |
|
split_by_contigs(output_dir=None)
Split large fasta file into several ones containing one contig each.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
output_dir |
Path
|
description. Defaults to None. |
None
|
Source code in src/pynteny/preprocessing.py
284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 |
|
FASTAmerger
Source code in src/pynteny/preprocessing.py
92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 |
|
merge(output_file=None, prepend_file_name=False)
Merge input fasta files into a one (multi)fasta file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
output_file |
Path
|
path to ouput merged fasta file. Defaults to None. |
None
|
prepend_file_name |
bool
|
whether to add file name as genome ID to each record in the result merged fasta file. |
False
|
Source code in src/pynteny/preprocessing.py
113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 |
|
prepend_filename_to_record_names(output_dir)
Prepend file name to each record label in fasta file
Parameters:
Name | Type | Description | Default |
---|---|---|---|
output_dir |
Path
|
description |
required |
Source code in src/pynteny/preprocessing.py
96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 |
|
GeneAnnotator
Run prodigal on assembly, predict ORFs and extract location info
Source code in src/pynteny/preprocessing.py
512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 |
|
__init__(assembly)
Initialize GeneAnnotator
Parameters:
Name | Type | Description | Default |
---|---|---|---|
assembly |
FASTA
|
fasta object containing assembled nucleotide sequences |
required |
Source code in src/pynteny/preprocessing.py
515 516 517 518 519 520 521 |
|
annotate(processes=None, metagenome=True, output_file=None, prodigal_args=None, tempdir=None)
Run prodigal on assembly and export single fasta file with peptide ORFs predictions
Parameters:
Name | Type | Description | Default |
---|---|---|---|
processes |
int
|
maximum number of threads. Defaults to all minus one. |
None
|
metagenome |
bool
|
whether assembled sequences correspond to metagenomic data. Defaults to True. |
True
|
output_file |
Path
|
path to output fasta file. Defaults to None. |
None
|
prodigal_args |
str
|
additional arguments to be passed to prodigal CLI. Defaults to None. |
None
|
tempdir |
Path
|
path to temporary directory. Defaults to tempfile default. |
None
|
Returns:
Name | Type | Description |
---|---|---|
LabelledFASTA |
LabelledFASTA
|
object containing the labelled peptide database. |
Source code in src/pynteny/preprocessing.py
523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 |
|
LabelledFASTA
Bases: FASTA
Tools to add and parse FASTA with positional info on record tags
Source code in src/pynteny/preprocessing.py
363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 |
|
from_genbank(gbk_data, output_file=None, prefix=None, nucleotide=False, prepend_file_name=False)
classmethod
Assign gene positional info, such as contig, gene number and loci to each record in genbank database and return LabelledFASTA object.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
gbk_data |
Path
|
path to file or directory contanining genbank files |
required |
output_file |
Path
|
path to output labelled fasta file. Defaults to None. |
None
|
prefix |
str
|
prefix for output file. Defaults to None. |
None
|
nucleotide |
bool
|
whether records corresponds to nucleotide sequences instead of peptides. Defaults to False. |
False
|
prepend_file_name |
bool
|
whether to add file name as genome ID to each record in the result merged fasta file. |
False
|
Returns:
Name | Type | Description |
---|---|---|
LabelledFASTA |
LabelledFASTA
|
object containing the labelled peptide database. |
Source code in src/pynteny/preprocessing.py
409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 |
|
from_prodigal_output(prodigal_faa, output_file=None)
classmethod
Instantiate class from prodigal output file. Extract positional gene info from prodigal output and export to fasta file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
prodigal_faa |
Path
|
path to prodigal output file containing peptide sequences |
required |
output_file |
Path
|
path to output labelled fasta file. Defaults to None. |
None
|
Returns:
Name | Type | Description |
---|---|---|
LabelledFASTA |
LabelledFASTA
|
object containing the labelled peptide database. |
Source code in src/pynteny/preprocessing.py
366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 |
|
is_legit_DNA_sequence(record_seq)
Assert that DNA sequence only contains valid symbols.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
record_seq |
str
|
nucleotide sequence. |
required |
Returns:
Name | Type | Description |
---|---|---|
bool |
bool
|
whether nucleotide sequence only contains legit symbols. |
Source code in src/pynteny/preprocessing.py
78 79 80 81 82 83 84 85 86 87 88 89 |
|
is_legit_peptide_sequence(record_seq)
Assert that peptide sequence only contains valid symbols.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
record_seq |
str
|
peptide sequence. |
required |
Returns:
Name | Type | Description |
---|---|---|
bool |
bool
|
whether peptide sequence only contains legit symbols. |
Source code in src/pynteny/preprocessing.py
42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 |
|
remove_stop_sodon_signals(record_seq)
Remove stop codon signals from peptide sequence
Parameters:
Name | Type | Description | Default |
---|---|---|---|
record_seq |
str
|
peptide sequence. |
required |
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
a peptide sequence without stop codon symbols. |
Source code in src/pynteny/preprocessing.py
30 31 32 33 34 35 36 37 38 39 |
|
Module hmm
Tools to parse Hmmer output and PGAP (HMM) database
HMMER
Run Hmmer on multiple hmms and parse output
Source code in src/pynteny/hmm.py
26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 |
|
hmm_names: list[str]
property
Get file names of input HMMs
Returns:
Type | Description |
---|---|
list[str]
|
list[str]: list of file names. |
__init__(input_hmms, hmm_output_dir, input_data, additional_args=None, processes=None)
Initialize class HMMER
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_hmms |
list[Path]
|
list of paths to input HMM files. |
required |
hmm_output_dir |
Path
|
path to output directory to HMMER output files. |
required |
input_data |
Path
|
path to input fasta file with sequence database. |
required |
additional_args |
list[str]
|
additional arguments to hmmsearch or hmmscan. Each element in the list is a string with additional arguments for each input hmm (arranged in the same order), an element can also take a value of None to avoid passing additional arguments for a specific input hmm. A single string may also be passed, in which case the same additional argument is passed to hmmsearch for all input hmms. Defaults to None. |
None
|
processes |
int
|
maximum number of threads to be employed. Defaults to all minus one. |
None
|
Source code in src/pynteny/hmm.py
29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 |
|
get_HMMER_tables(reuse_hmmer_results=True, method='hmmsearch')
Run hmmer for given hmm list
Parameters:
Name | Type | Description | Default |
---|---|---|---|
reuse_hmmer_results |
bool
|
if True then HMMER3 won't be run again for HMMs already searched in the same output directory. Defaults to True. |
True
|
method |
str
|
select between 'hmmsearch' or 'hmmscan'. Defaults to 'hmmsearch'. |
'hmmsearch'
|
Returns:
Type | Description |
---|---|
dict[pd.DataFrame]
|
dict[pd.DataFrame]: dict of HMMER hits as pandas dataframes. |
Source code in src/pynteny/hmm.py
90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 |
|
parse_HMM_search_output(hmmer_output)
staticmethod
Parse hmmsearch or hmmscan summary table output file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
hmmer_output |
str
|
path to HMMER output file. |
required |
Returns:
Type | Description |
---|---|
pd.DataFrame
|
pd.DataFrame: a dataframe containing parsed HMMER output. |
Source code in src/pynteny/hmm.py
70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 |
|
PGAP
Tools to parse PGAP hmm database metadata
Source code in src/pynteny/hmm.py
123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 |
|
__init__(meta_file)
Initialize class PGAP
Parameters:
Name | Type | Description | Default |
---|---|---|---|
meta_file |
Path
|
path to PGAP's metadata file. |
required |
Source code in src/pynteny/hmm.py
126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 |
|
extract_PGAP_to_directory(pgap_tar, output_dir)
staticmethod
Extract PGAP hmm database (tar.gz) to given directory
Parameters:
Name | Type | Description | Default |
---|---|---|---|
pgap_tar |
Path
|
path to compressed PGAP database. |
required |
output_dir |
Path
|
path to output directory. |
required |
Source code in src/pynteny/hmm.py
150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 |
|
get_HMM_gene_ID(hmm_name)
Get gene symbols matching given hmm.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
hmm_name |
str
|
query HMM name. |
required |
Returns:
Type | Description |
---|---|
list[str]
|
list[str]: list of gene symbols matching given HMM. |
Source code in src/pynteny/hmm.py
235 236 237 238 239 240 241 242 243 244 245 |
|
get_HMM_group_for_gene_symbol(gene_symbol)
Get HMMs corresponding to gene symbol in PGAP metadata. If more than one HMM matching gene symbol, return a HMM group
Parameters:
Name | Type | Description | Default |
---|---|---|---|
gene_symbol |
str
|
gene symbol to be searched for HMM. |
required |
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
string of HMM names (group separated by |) |
Source code in src/pynteny/hmm.py
216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 |
|
get_HMM_names_by_gene_symbol(gene_symbol)
Try to retrieve HMM by its gene symbol, more than one HMM may map to a single gene symbol
Parameters:
Name | Type | Description | Default |
---|---|---|---|
gene_symbol |
str
|
gene symbol to be searched for HMM. |
required |
Returns:
Type | Description |
---|---|
list[str]
|
list[str]: list of HMM names matching gene symbol. |
Source code in src/pynteny/hmm.py
196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 |
|
get_meta_info_for_HMM(hmm_name)
Get meta info for given hmm.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
hmm_name |
str
|
query HMM name. |
required |
Returns:
Name | Type | Description |
---|---|---|
dict |
dict
|
metadata of provided HMM. |
Source code in src/pynteny/hmm.py
247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 |
|
remove_missing_HMMs_from_metadata(hmm_dir, output_file=None)
Remove HMMs from metadata that are not in HMM directory
Parameters:
Name | Type | Description | Default |
---|---|---|---|
hmm_dir |
Path
|
path to directory containing PGAP database. |
required |
output_file |
Path
|
path to output file. Defaults to None. |
None
|
Source code in src/pynteny/hmm.py
169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 |
|
Module labelparser
Tools to parse record labels to extract coded info
parse(label)
Parse sequence labels to obtain contig and locus info
Parameters:
Name | Type | Description | Default |
---|---|---|---|
label |
str
|
sequence label |
required |
Returns:
Name | Type | Description |
---|---|---|
dict |
dict
|
dictionary with parsed label info. |
Source code in src/pynteny/parsers/labelparser.py
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 |
|
parse_from_list(labels)
Parse labels in list of labels and return DataFrame.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
labels |
list
|
list of labels as stringgs. |
required |
Returns:
Type | Description |
---|---|
pd.DataFrame
|
pd.DataFrame: Dataframe containing parsed information from labels. |
Source code in src/pynteny/parsers/labelparser.py
52 53 54 55 56 57 58 59 60 61 |
|
Module syntenyparser
Tools to parse synteny structure strings
contains_HMM_groups(synteny_structure)
Check whether structure contains groups of gene-equivalent HMMs.
Source code in src/pynteny/parsers/syntenyparser.py
26 27 28 |
|
get_HMM_groups_in_structure(synteny_structure)
Get hmm names employed in synteny structure, if more than one hmm for the same gene, return a list with all of them.
Source code in src/pynteny/parsers/syntenyparser.py
73 74 75 76 77 78 79 80 81 82 83 |
|
get_all_HMMs_in_structure(synteny_structure)
Get hmm names employed in synteny structure, if more than one hmm for the same gene, return a list with all of them.
Source code in src/pynteny/parsers/syntenyparser.py
96 97 98 99 100 101 102 103 |
|
get_gene_symbols_in_structure(synteny_structure)
Retrieve gene symbols contained in synteny structure.
Source code in src/pynteny/parsers/syntenyparser.py
86 87 88 89 90 91 92 93 |
|
get_maximum_distances_in_structure(synteny_structure)
Get maximum gene distances in synteny structure.
Source code in src/pynteny/parsers/syntenyparser.py
129 130 131 132 133 134 135 |
|
get_strands_in_structure(synteny_structure, parsed_symbol=True)
Get strand sense list in structure.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
synteny_structure |
str
|
a synteny structure. |
required |
parsed_symbol |
bool
|
if True, strand info '>' is parsed as 'pos' and '<' as 'neg'. Defaults to True. |
True
|
Returns:
Type | Description |
---|---|
list[str]
|
list[str]: parsed synteny structure as a list of tuples containing HMM name and strand info for each HMM group. |
Source code in src/pynteny/parsers/syntenyparser.py
106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 |
|
is_valid_structure(synteny_structure)
Validate synteny structure format.
Source code in src/pynteny/parsers/syntenyparser.py
31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 |
|
parse_genes_in_synteny_structure(synteny_structure, hmm_meta)
Convert gene-based synteny structure into a HMM-based one. If a gene symbol matches more than one HMM, return a HMM group like: (HMM1 | HMM2 | ...).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
synteny_structure |
str
|
a string like the following:
where '>' indicates a hmm target located on the positive strand, '<' a target located on the negative strand, and n_ab cooresponds to the maximum number of genes separating matched gene a and b. Multiple hmms may be employed (limited by computational capabilities). No order symbol in a hmm indicates that results should be independent of strand location. |
required |
hmm_meta |
Path
|
path to PGAP's metadata file. |
required |
Returns:
Type | Description |
---|---|
tuple[str, dict]
|
tuple[str,dict]: parsed synteny structure where gene symbols are replaced by HMM names. |
Source code in src/pynteny/parsers/syntenyparser.py
161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 |
|
parse_synteny_structure(synteny_structure)
Parse synteny structure string.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
synteny_structure |
str
|
a string like the following:
where '>' indicates a hmm target located on the positive strand, '<' a target located on the negative strand, and n_ab cooresponds to the maximum number of genes separating matched gene a and b. Multiple hmms may be employed (limited by computational capabilities). No order symbol in a hmm indicates that results should be independent of strand location. |
required |
Returns:
Name | Type | Description |
---|---|---|
dict |
dict
|
parsed synteny structure. |
Source code in src/pynteny/parsers/syntenyparser.py
138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 |
|
reformat_synteny_structure(synteny_structure)
Remove illegal symbols and extra white space.
Source code in src/pynteny/parsers/syntenyparser.py
20 21 22 23 |
|
split_strand_from_locus(locus_str, parsed_symbol=True)
Split strand info from locus tag / HMM model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
locus_str |
str
|
a substring of a synteny structure containing a gene symbol / HMM name and strand info. |
required |
parsed_symbol |
bool
|
if True, strand info '>' is parsed as 'pos' and '<' as 'neg'. Defaults to True. |
True
|
Returns:
Type | Description |
---|---|
tuple[str]
|
tuple[str]: tuple with parsed strand info and gene symbol / HMM name. |
Source code in src/pynteny/parsers/syntenyparser.py
48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 |
|
Module cli
Pynteny
Main command based on: https://selvakumar-arumugapandian.medium.com/ \ command-line-subcommands-with-pythons-argparse-4dbac80f7110
Source code in src/pynteny/cli.py
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 |
|
__init__(subcommand, subcommand_args)
Initialize main command
Parameters:
Name | Type | Description | Default |
---|---|---|---|
subcommand |
str
|
subcommand name |
required |
subcommand_args |
list[str]
|
list of subcommand arguments and values. |
required |
Source code in src/pynteny/cli.py
27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 |
|
build()
Call build subcommand.
Source code in src/pynteny/cli.py
104 105 106 107 108 |
|
cite()
Print pynteny's citation string
Source code in src/pynteny/cli.py
122 123 124 125 126 127 |
|
download()
Call download subcommand.
Source code in src/pynteny/cli.py
116 117 118 119 120 |
|
parse()
Call parse subcommand.
Source code in src/pynteny/cli.py
110 111 112 113 114 |
|
search()
Call search subcommand.
Source code in src/pynteny/cli.py
98 99 100 101 102 |
|
SubcommandParser
Argparse parsers for Pynteny's subcommands
Source code in src/pynteny/cli.py
130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 |
|
build()
staticmethod
Parser for the build subcommand.
Returns:
Type | Description |
---|---|
argparse.ArgumentParser
|
argparse.ArgumentParser: ArgumentParser object. |
Source code in src/pynteny/cli.py
323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 |
|
cite()
staticmethod
Parser for the cite subcommand.
Returns:
Type | Description |
---|---|
argparse.ArgumentParser
|
argparse.ArgumentParser: ArgumentParser object. |
Source code in src/pynteny/cli.py
532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 |
|
download()
staticmethod
Parser for the download subcommand.
Returns:
Type | Description |
---|---|
argparse.ArgumentParser
|
argparse.ArgumentParser: ArgumentParser object. |
Source code in src/pynteny/cli.py
471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 |
|
get_help_str(subcommand)
staticmethod
Get help string for subcommand.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
subcommand |
str
|
subcommand name. |
required |
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
help string. |
Source code in src/pynteny/cli.py
133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 |
|
parse()
staticmethod
Parser for the parse subcommand.
Returns:
Type | Description |
---|---|
argparse.ArgumentParser
|
argparse.ArgumentParser: ArgumentParser object. |
Source code in src/pynteny/cli.py
417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 |
|
search()
staticmethod
Parser for the search subcommand.
Returns:
Type | Description |
---|---|
argparse.ArgumentParser
|
argparse.ArgumentParser: ArgumentParser object. |
Source code in src/pynteny/cli.py
151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 |
|
Module subcommands
Functions containing CLI subcommands
build_database(args)
Build annotated peptide database from input assembly or GenBank data.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
args |
Union[CommandArgs, ArgumentParser]
|
arguments object. |
required |
Source code in src/pynteny/subcommands.py
164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 |
|
download_hmms(args)
Download HMM (PGAP) database from NCBI.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
args |
Union[CommandArgs, ArgumentParser]
|
arguments object. |
required |
Source code in src/pynteny/subcommands.py
224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 |
|
get_citation(args, silent=False)
Get Pynteny citation string.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
args |
argparse.ArgumentParser
|
arguments object. |
required |
silent |
bool
|
do not print to terminal. Defaults to False. |
False
|
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
Pyntey citation text. |
Source code in src/pynteny/subcommands.py
275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 |
|
init_logger(args)
Initialize logger object
Parameters:
Name | Type | Description | Default |
---|---|---|---|
args |
Union[CommandArgs, ArgumentParser]
|
arguments object |
required |
Returns:
Type | Description |
---|---|
logging.Logger
|
logging.Logger: initialized logger object |
Source code in src/pynteny/subcommands.py
30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 |
|
parse_gene_ids(args)
Convert gene symbols to hmm names.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
args |
Union[CommandArgs, ArgumentParser]
|
arguments object. |
required |
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
synteny structure where gene symbols are replaced by HMM names. |
Source code in src/pynteny/subcommands.py
191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 |
|
synteny_search(args)
Search peptide database by synteny structure containing HMMs.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
args |
Union[CommandArgs, ArgumentParser]
|
arguments object. |
required |
Returns:
Name | Type | Description |
---|---|---|
SyntenyHits |
SyntenyHits
|
instance of SyntenyHits. |
Source code in src/pynteny/subcommands.py
52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 |
|
Module utils
Functions and classes for general purposes
CommandArgs
Base class to hold command line arguments.
Source code in src/pynteny/utils.py
27 28 29 30 31 |
|
ConfigParser
Handle Pynteny configuration file.
Source code in src/pynteny/utils.py
34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 |
|
get_config()
Load config file.
Returns:
Name | Type | Description |
---|---|---|
dict |
dict
|
dict containing fields and values of config file. |
Source code in src/pynteny/utils.py
71 72 73 74 75 76 77 78 79 |
|
get_config_path()
Show config file path.
Source code in src/pynteny/utils.py
67 68 69 |
|
get_default_config()
classmethod
Initialize ConfigParser with default config file.
Source code in src/pynteny/utils.py
41 42 43 44 |
|
get_field(key)
Get field from config file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
key |
str
|
key name to get the value from. |
required |
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
key value. |
Source code in src/pynteny/utils.py
96 97 98 99 100 101 102 103 104 105 |
|
initialize_config_file()
staticmethod
Initialize empty config file.
Returns:
Name | Type | Description |
---|---|---|
Path |
Path
|
path to generated config file. |
Source code in src/pynteny/utils.py
46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 |
|
update_config(key, value)
Update config file
Parameters:
Name | Type | Description | Default |
---|---|---|---|
key |
str
|
config file key name to be updated. |
required |
value |
str
|
new value. |
required |
Source code in src/pynteny/utils.py
86 87 88 89 90 91 92 93 94 |
|
write_config()
Write config dict to file.
Source code in src/pynteny/utils.py
81 82 83 84 |
|
download_file(url, output_file)
Download file from url
Parameters:
Name | Type | Description | Default |
---|---|---|---|
url |
str
|
url where file to be downloaded |
required |
output_file |
Path
|
path to downloaded file |
required |
Source code in src/pynteny/utils.py
201 202 203 204 205 206 207 208 209 210 211 212 213 |
|
extract_tar_file(tar_file, dest_dir=None)
Extract tar or tar.gz files to dest_dir.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
tar_file |
Path
|
path to tar file. |
required |
dest_dir |
Path
|
path to destination directory to store the uncompressed file. Defaults to None. |
None
|
Source code in src/pynteny/utils.py
229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 |
|
flatten_directory(directory)
Flatten directory, i.e. remove all subdirectories and copy all files to the top level directory.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
directory |
Path
|
path to directory. |
required |
Source code in src/pynteny/utils.py
270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 |
|
is_right_list_nested_type(list_object, inner_type)
Check if all elements in list are of the same type.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
list_object |
list
|
list containing elements. |
required |
inner_type |
type
|
type to be checked. |
required |
Returns:
Name | Type | Description |
---|---|---|
bool |
bool
|
whether list contains elements of the same specified type. |
Source code in src/pynteny/utils.py
297 298 299 300 301 302 303 304 305 306 307 |
|
is_tar_file(tar_file)
Check whether file is tar-compressed.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
tar_file |
Path
|
path to file. |
required |
Returns:
Name | Type | Description |
---|---|---|
bool |
bool
|
whether file is compressed or not. |
Source code in src/pynteny/utils.py
216 217 218 219 220 221 222 223 224 225 226 |
|
list_tar_dir(tar_dir)
List files within tar or tar.gz directory.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
tar_dir |
Path
|
path to directory containing tar files. |
required |
Returns:
Name | Type | Description |
---|---|---|
list |
list
|
list of tar files. |
Source code in src/pynteny/utils.py
255 256 257 258 259 260 261 262 263 264 265 266 267 |
|
parallelize_over_input_files(callable, input_list, processes=None, max_tasks_per_child=10, **callable_kwargs)
Parallelize callable over a set of input objects using a pool of workers. Inputs in input list are passed to the first argument of the callable. Additional callable named arguments may be passed.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
callable |
_type_
|
function to be run. |
required |
input_list |
list
|
list of inputs to callable. |
required |
n_processes |
int
|
maximum number of threads. Defaults to all minus one. |
required |
max_tasks_per_child |
int
|
maximum number of tasks per child process before is reset. Defaults to 10. |
10
|
Source code in src/pynteny/utils.py
174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 |
|
set_default_output_path(input_path, tag=None, extension=None, only_filename=False, only_basename=False, only_dirname=False)
Utility function to generate a default path to output file or directory based on an input file name and path.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_path |
Path
|
path to input file. |
required |
tag |
str
|
text tag to be added to file name. Defaults to None. |
None
|
extension |
str
|
change input file extension with this one. Defaults to None. |
None
|
only_filename |
bool
|
output only default filename. Defaults to False. |
False
|
only_basename |
bool
|
output only default basename (no extension). Defaults to False. |
False
|
only_dirname |
bool
|
output only path to default output directory. Defaults to False. |
False
|
Returns:
Name | Type | Description |
---|---|---|
Path |
Path
|
a path or name to a default output file. |
Source code in src/pynteny/utils.py
108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 |
|
terminal_execute(command_str, suppress_shell_output=False, work_dir=None, return_output=False)
Execute given command in terminal through Python.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
command_str |
str
|
terminal command to be executed. |
required |
suppress_shell_output |
bool
|
suppress shell output. Defaults to False. |
False
|
work_dir |
Path
|
change working directory. Defaults to None. |
None
|
return_output |
bool
|
whether to return execution output. Defaults to False. |
False
|
Returns:
Type | Description |
---|---|
subprocess.STDOUT
|
subprocess.STDOUT: subprocess output. |
Source code in src/pynteny/utils.py
148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 |
|
Module wrappers
Simple CLI wrappers to several tools
run_HMM_search(hmm_model, input_fasta, output_file=None, method='hmmsearch', processes=None, additional_args=None)
Simple CLI wrapper to hmmsearch or hmmscan.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
hmm_model |
Path
|
path to profile HMM to be used. |
required |
input_fasta |
Path
|
path to fasta containing sequence database to be searched. |
required |
output_file |
Path
|
path to prodigal output table file. Defaults to None. |
None
|
method |
str
|
either 'hmmsearch' or 'hmmscan'. Defaults to 'hmmsearch'. |
'hmmsearch'
|
n_processes |
int
|
maximum number of threads. Defaults to all minus one. |
required |
additional_args |
str
|
a string containing additional arguments to hmmsearch/scan. Defaults to None. |
None
|
Source code in src/pynteny/wrappers.py
92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 |
|
run_prodigal(input_file, output_file=None, output_dir=None, output_format='fasta', metagenome=True, additional_args=None)
Simple CLI wrapper to prodigal.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_file |
Path
|
path to input fasta file with nucleotide sequences. |
required |
output_file |
Path
|
path to output file containing translated peptides. Defaults to None. |
None
|
output_dir |
Path
|
path to output directory (all prodigal output files). Defaults to None. |
None
|
output_format |
str
|
either 'gbk' or 'fasta'. Defaults to 'fasta'. |
'fasta'
|
metagenome |
bool
|
whether input fasta correspond to a metagenomic sample. Defaults to False. |
True
|
additional_args |
str
|
a string containing additional arguments to prodigal. Defaults to None. |
None
|
Source code in src/pynteny/wrappers.py
44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 |
|
run_seqkit_nodup(input_fasta, output_fasta=None, export_duplicates=False)
Simpe CLI wrapper to seqkit rmdup to remove sequence duplicates in fasta file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_fasta |
Path
|
path to input fasta. |
required |
output_fasta |
Path
|
path to output fasta. Defaults to None. |
None
|
export_duplicates |
bool
|
whether to export a file containing duplicated sequences. Defaults to False. |
False
|
Source code in src/pynteny/wrappers.py
14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 |
|