Skip to article frontmatterSkip to article content

Data

Input

External tools

Protein sequences are only needed for running DeepHelicon, and are also used to generate a list of intermediate files before running DeepHelicon, as shown in Table 1.

Table 1:External tools for generating intermediate files before running DeepHelicon.

ToolRoleFunctionSource
HHblitsInputgenerating multiple sequence alignmentsRemmert et al. (2011)
CCMPredInputpredictor of residue contactsSeemayer et al. (2014)
plmDCAInputpredictor of residue contactsEkeberg et al. (2013)
Gaussian DCAInputpredictor of residue contactsBaldassi et al. (2014)
FreecontactInputpredictor of residue contactsKaján et al. (2014)
TMHMMInputpredictor of transmembrane topologiesKrogh et al. (2001)
Uniclust30 databaseIntermediatesequence databaseMirdita et al. (2016)

Output files

DeepHelicon can return an output file with the suffix of .s1 (stage 1), .s2i1 (stage 2), .s2i2 (stage 2), .s2i3 (stage 2), .s2i4 (stage 2) or .deephelicon depending on different models used at different stages.

This predictor returns predictions of inter-helical residue contacts in tansmembrane proteins. If non-transmembrane segment or <1 transmembrane segment is detected, the programe will not return final results. However, you can still utilise the intermediate results at stage 1 and 2 as stated in the paper Sun & Frishman (2020). Considering <1 helix detection by inside transmembrane topology predictor, we will consider extending our module to generate a file including entire results in the future work. DeepHelicon outputs results in two formats.

Deephelicon-format

Prediction results of interaction sites in tansmembrane proteins consist of 5 columns.

Table 2:DeepHelicon output format.

Position 1Residue 1Position 2Residue 2Probability
1S6L0.14790976
1S7R0.041100707
1S8W0.04841847
...............
170F176K0.05994133
171A176K0.07471807

CASP14 format

CASP14 output has 3 columns: positions of residue pairs and their contact probabilities.

Table 3:DeepHelicon output format.

Position 1Position 2Probability
16.148
17.041
18.048
.........
170176.060
171176.075

Example data

Users can download some example data and check an assortment of input files.

Code
Output
import deephelicon

deephelicon.predict.download_data(
    url='https://github.com/2003100127/deephelicon/releases/download/example_data/example_data.zip',
    sv_fpn='../../data/deephelicon/example_data.zip',
)
References
  1. Remmert, M., Biegert, A., Hauser, A., & Söding, J. (2011). HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment. Nature Methods, 9(2), 173–175. 10.1038/nmeth.1818
  2. Seemayer, S., Gruber, M., & Söding, J. (2014). CCMpred—fast and precise prediction of protein residue–residue contacts from correlated mutations. Bioinformatics, 30(21), 3128–3130. 10.1093/bioinformatics/btu500
  3. Ekeberg, M., Lövkvist, C., Lan, Y., Weigt, M., & Aurell, E. (2013). Improved contact prediction in proteins: Using pseudolikelihoods to infer Potts models. Physical Review E, 87(1). 10.1103/physreve.87.012707
  4. Baldassi, C., Zamparo, M., Feinauer, C., Procaccini, A., Zecchina, R., Weigt, M., & Pagnani, A. (2014). Fast and Accurate Multivariate Gaussian Modeling of Protein Families: Predicting Residue Contacts and Protein-Interaction Partners. PLoS ONE, 9(3), e92721. 10.1371/journal.pone.0092721
  5. Kaján, L., Hopf, T. A., Kalaš, M., Marks, D. S., & Rost, B. (2014). FreeContact: fast and free software for protein contact prediction from residue co-evolution. BMC Bioinformatics, 15(1). 10.1186/1471-2105-15-85
  6. Krogh, A., Larsson, B., von Heijne, G., & Sonnhammer, E. L. L. (2001). Predicting transmembrane protein topology with a hidden markov model: application to complete genomes11Edited by F. Cohen. Journal of Molecular Biology, 305(3), 567–580. 10.1006/jmbi.2000.4315
  7. Mirdita, M., von den Driesch, L., Galiez, C., Martin, M. J., Söding, J., & Steinegger, M. (2016). Uniclust databases of clustered and deeply annotated protein sequences and alignments. Nucleic Acids Research, 45(D1), D170–D176. 10.1093/nar/gkw1081
  8. Marks, D. S., Colwell, L. J., Sheridan, R., Hopf, T. A., Pagnani, A., Zecchina, R., & Sander, C. (2011). Protein 3D Structure Computed from Evolutionary Sequence Variation. PLoS ONE, 6(12), e28766. 10.1371/journal.pone.0028766
  9. Sun, J., & Frishman, D. (2020). DeepHelicon: Accurate prediction of inter-helical residue contacts in transmembrane proteins by residual neural networks. Journal of Structural Biology, 212(1), 107574. https://doi.org/10.1016/j.jsb.2020.107574