Interested in carrying out highly sensitive and efficient large-scale protein homology searches? STEAM, Search with TEA against Many, performs a fast search against large datasets of proteins translated to TEA (The Embedded Alphabet).
The preprint Rewriting protein alphabets with language models introduces a novel approach using contrastive learning to convert protein language model embeddings into a new 20-letter alphabet, TEA, enabling highly efficient large-scale protein homology searches.