A protein can be represented by its primary, secondary, or tertiary structure. With recent advances in AI, there is now as much tertiary as primary structural data available. Fast and accurate search methods exist for both types of data, with searches over both representations being highly precise. However, primary structure data can sometimes be incomplete. As a result, tertiary structure has become the gold standard for remote homology detection.How does secondary structure perform in remote homology detection? Secondary structure interprets proteins as a sequence using an alphabet representing helices, strands, or loops. It shares its sequ…(more)
Please log in to take part in the discussion (add own reviews or comments).
Cite this publication
More citation styles
- please select -
%0 Journal Article
%1 alfatlawi2024protein
%A Al-Fatlawi, Ali
%A Hossen, Md. Ballal
%A El-Hendi, Ferras
%A Schroeder, Michael
%D 2024
%I Cold Spring Harbor Laboratory
%J bioRxiv
%K imported
%R 10.1101/2024.09.03.611022
%T Protein secondary structure and remote homology detection
%U https://www.biorxiv.org/content/early/2024/09/06/2024.09.03.611022
%X A protein can be represented by its primary, secondary, or tertiary structure. With recent advances in AI, there is now as much tertiary as primary structural data available. Fast and accurate search methods exist for both types of data, with searches over both representations being highly precise. However, primary structure data can sometimes be incomplete. As a result, tertiary structure has become the gold standard for remote homology detection.How does secondary structure perform in remote homology detection? Secondary structure interprets proteins as a sequence using an alphabet representing helices, strands, or loops. It shares its sequential nature with primary structure while retaining topological information similar to tertiary structure.To assess the effectiveness of secondary structure in remote homology detection, we devised a challenging classification task aimed at determining the superfamily membership of very distantly related protein domains. We used benchmarks from the CATH and SCOP databases and evaluated sequence and structure alignment algorithms on primary, secondary, and tertiary structures.As expected, both basic and advanced sequence alignment algorithms applied to primary structure achieved high precision, but their overall area under the curve was lower compared to the gold standard of structural alignment using tertiary structure.Surprisingly, a simple string comparison algorithm applied to secondary structure performed close to the gold standard. This result supports the hypothesis that key structural information is already encoded in secondary structure and suggests that secondary structure may be a promising representation to use when high-confidence structural data is unavailable, such as in cases involving protein flexibility and disorder.Competing Interest StatementThe authors have declared no competing interest.
@article{alfatlawi2024protein,
abstract = {A protein can be represented by its primary, secondary, or tertiary structure. With recent advances in AI, there is now as much tertiary as primary structural data available. Fast and accurate search methods exist for both types of data, with searches over both representations being highly precise. However, primary structure data can sometimes be incomplete. As a result, tertiary structure has become the gold standard for remote homology detection.How does secondary structure perform in remote homology detection? Secondary structure interprets proteins as a sequence using an alphabet representing helices, strands, or loops. It shares its sequential nature with primary structure while retaining topological information similar to tertiary structure.To assess the effectiveness of secondary structure in remote homology detection, we devised a challenging classification task aimed at determining the superfamily membership of very distantly related protein domains. We used benchmarks from the CATH and SCOP databases and evaluated sequence and structure alignment algorithms on primary, secondary, and tertiary structures.As expected, both basic and advanced sequence alignment algorithms applied to primary structure achieved high precision, but their overall area under the curve was lower compared to the gold standard of structural alignment using tertiary structure.Surprisingly, a simple string comparison algorithm applied to secondary structure performed close to the gold standard. This result supports the hypothesis that key structural information is already encoded in secondary structure and suggests that secondary structure may be a promising representation to use when high-confidence structural data is unavailable, such as in cases involving protein flexibility and disorder.Competing Interest StatementThe authors have declared no competing interest.},
added-at = {2024-12-16T11:35:23.000+0100},
author = {Al-Fatlawi, Ali and Hossen, Md. Ballal and El-Hendi, Ferras and Schroeder, Michael},
biburl = {https://puma.scadsai.uni-leipzig.de/bibtex/24e15f83772cec52a7420b14fc54bf244/scadsfct},
doi = {10.1101/2024.09.03.611022},
elocation-id = {2024.09.03.611022},
eprint = {https://www.biorxiv.org/content/early/2024/09/06/2024.09.03.611022.full.pdf},
interhash = {a73237919ca7d48bcb2d491515479890},
intrahash = {4e15f83772cec52a7420b14fc54bf244},
journal = {bioRxiv},
keywords = {imported},
publisher = {Cold Spring Harbor Laboratory},
timestamp = {2024-12-16T11:35:23.000+0100},
title = {Protein secondary structure and remote homology detection},
url = {https://www.biorxiv.org/content/early/2024/09/06/2024.09.03.611022},
year = 2024
}