Utilities
taxotagger.utils
¶
download_from_url
¶
download_from_url(
url: str,
root: str | PathLike,
overwrite_existing: bool = False,
http_method: str = "GET",
allow_http_redirect: bool = True,
) -> str
Download data from the given URL.
The output file name is determined by the URL and saved to the given root directory.
Parameters:
-
url(str) –The URL of the file to download.
-
root(str | PathLike) –The directory to save the file to.
-
overwrite_existing(bool, default:False) –Whether to overwrite the existing file. Defaults to False.
-
http_method(str, default:'GET') –The HTTP method to use. Defaults to "GET".
-
allow_http_redirect(bool, default:True) –Whether to allow HTTP redirects. Defaults to True.
Returns:
-
str–The path to the downloaded file.
Examples:
Download the MycoAI-CNN model to the current directory
Source code in src/taxotagger/utils.py
load_model
¶
load_model(model_id: str, config: ProjectConfig) -> Any
Load the pretrained model with pytorch for the given model identifier.
Available models are defined in the default
PRETRAINED_MODELS.
If the model {model_id}.pt is not found in the cache, it will be downloaded from the
predefined URL.
Parameters:
-
model_id(str) –The identifier of the model to load.
-
config(ProjectConfig) –The configurations for the project.
Returns:
-
Any–The pretrained model loaded with
torch.load.
Examples:
Source code in src/taxotagger/utils.py
parse_fasta
¶
Parse FASTA data and return a dictionary of sequences.
Parameters:
-
data(str | PathLike | TextIO) –Can be one of the following:
- A file-like object (with .read() or .readline() methods)
- A file path (string or PathLike) to a FASTA file
- A string containing FASTA content
Returns:
-
dict–A dictionary with the FASTA headers as keys and the sequences as values.
Raises:
-
ValueError–If there are duplicate FASTA headers.
Source code in src/taxotagger/utils.py
parse_unite_fasta_header
¶
Parse metadata from a UNITE FASTA file header.
The header of a FASTA file must follow the formats:
- the UNITE format:
- only the accession:
Note that the SHIdentifier (Species Hypothesis identifier) is optional.
Parameters:
-
header(str) –A string representing the header of a FASTA file from the UNITE database.
Returns:
-
list[str]–A list of parsed metadata in the following order:
[Accession, Kingdom, Phylum, Class, Order, Family, Genus, Species, SH_ID]. Empty strings are returned for missing metadata.
Examples:
Parse the header of a UNITE FASTA file
>>> header = ">MH855962|k__Fungi;p__Basidiomycota;c__Agaricomycetes;o__Corticiales;f__Corticiaceae;g__Waitea;s__Waitea_circinata|SH1011630.09FU"
>>> parse_unite_fasta_header(header)
['MH855962', 'Fungi', 'Basidiomycota', 'Agaricomycetes', 'Corticiales', 'Corticiaceae', 'Waitea', 'Waitea_circinata', 'SH1011630.09FU']
Parse the header of a FASTA file with only the accession