Base classes
taxotagger.abc
¶
EmbedModelBase
¶
Bases: ABC
Base class for embedding models.
embed
abstractmethod
¶
Calculate the embeddings for the given FASTA file.
Parameters:
-
fasta_file
(str
) –The path to the FASTA file to embed.
Returns:
-
dict[str, list[dict[str, Any]]]
–A dictionary of embeddings for each taxonomy level. The dictionary keys are the taxonomy levels, and the values are lists of dictionaries containing the id, embeddings and metadata for each sequence.
The shape of the list is
(n_samples)
, wheren_samples
is the number of sequences.The keys
id
andvector
must be present in the inside dictionaries to present the accession and the embedding vector of the sequence, respectively.For example:
{ "phylum": [ {"id": "seq1", "vector": [0.1, 0.2, ...], "phylum": "Ascomycota", ...}, {"id": "seq2", "vector": [0.3, 0.4, ...], "phylum": "Basidiomycota", ...}, ... ], "class": [ {"id": "seq1", "vector": [0.5, 0.6, ...], "class": "Dothideomycetes", ...}, {"id": "seq2", "vector": [0.7, 0.8, ...], "class": "Agaricomycetes", ...}, ... ], ... }