Utilities
taxotagger.utils
¶
download_from_url
¶
download_from_url(
url: str,
root: str | PathLike,
overwrite_existing: bool = False,
http_method: str = "GET",
allow_http_redirect: bool = True,
) -> str
Download data from the given URL.
The output file name is determined by the URL and saved to the given root
directory.
Parameters:
-
url
(str
) –The URL of the file to download.
-
root
(str | PathLike
) –The directory to save the file to.
-
overwrite_existing
(bool
, default:False
) –Whether to overwrite the existing file. Defaults to False.
-
http_method
(str
, default:'GET'
) –The HTTP method to use. Defaults to "GET".
-
allow_http_redirect
(bool
, default:True
) –Whether to allow HTTP redirects. Defaults to True.
Returns:
-
str
–The path to the downloaded file.
Examples:
Download the MycoAI-CNN model to the current directory
Source code in src/taxotagger/utils.py
load_model
¶
load_model(model_id: str, config: ProjectConfig) -> Any
Load the pretrained model with pytorch for the given model identifier.
Available models are defined in the default
PRETRAINED_MODELS
.
If the model {model_id}.pt
is not found in the cache, it will be downloaded from the
predefined URL.
Parameters:
-
model_id
(str
) –The identifier of the model to load.
-
config
(ProjectConfig
) –The configurations for the project.
Returns:
-
Any
–The pretrained model loaded with
torch.load
.
Examples:
Source code in src/taxotagger/utils.py
parse_fasta
¶
Parse FASTA data and return a dictionary of sequences.
Parameters:
-
data
(str | PathLike | TextIO
) –Can be one of the following:
- A file-like object (with .read() or .readline() methods)
- A file path (string or PathLike) to a FASTA file
- A string containing FASTA content
Returns:
-
dict
–A dictionary with the FASTA headers as keys and the sequences as values.
Raises:
-
ValueError
–If there are duplicate FASTA headers.
Source code in src/taxotagger/utils.py
parse_unite_fasta_header
¶
Parse metadata from a UNITE FASTA file header.
The header of a FASTA file must follow the formats:
- the UNITE format:
- only the accession:
Note that the SHIdentifier
(Species Hypothesis identifier) is optional.
Parameters:
-
header
(str
) –A string representing the header of a FASTA file from the UNITE database.
Returns:
-
list[str]
–A list of parsed metadata in the following order:
[Accession, Kingdom, Phylum, Class, Order, Family, Genus, Species, SH_ID]
. Empty strings are returned for missing metadata.
Examples:
Parse the header of a UNITE FASTA file
>>> header = ">MH855962|k__Fungi;p__Basidiomycota;c__Agaricomycetes;o__Corticiales;f__Corticiaceae;g__Waitea;s__Waitea_circinata|SH1011630.09FU"
>>> parse_unite_fasta_header(header)
['MH855962', 'Fungi', 'Basidiomycota', 'Agaricomycetes', 'Corticiales', 'Corticiaceae', 'Waitea', 'Waitea_circinata', 'SH1011630.09FU']
Parse the header of a FASTA file with only the accession