Skip to article frontmatterSkip to article content

Prediction

Overview

You need to decompress the example_data.zip file in your preferred folder, e.g., deephelicon/.

Multi-stage deep learning

DeepHelicon is a multi-stage, iterative deep learning framework for residue-residue contact prediction. We first perform stage-1 prediction.

Stage-1 prediction

We define several parameters required for running DeepHelicon at stage 1.

params_s1 = {
    'prot_name': '2wsc',
    'prot_chain': '2',
    'fasta_fp': '../../data/deephelicon/example_data/',
    'fc_fp': '../../data/deephelicon/example_data/',
    'gdca_fp': '../../data/deephelicon/example_data/',
    'cp_fp': '../../data/deephelicon/example_data/',
    'plmc_fp': '../../data/deephelicon/example_data/',
    'plmc_param_fp': '../../data/deephelicon/example_data/',
    'sv_fp_feature': '../../data/deephelicon/',
    'sv_suffix_feature': '.fs1',
    'model_frozen_fpn': '../../data/deephelicon/model/tf2/frozen_graph/s1.pb',
    'batch_size': 100,
    'seq_sep_inferior': 4,
    'sv_fp_pred': '../../data/deephelicon/',
    'sv_suffix_pred': '.s1',
}
Code
Output
import deephelicon

deephelicon.predict.rrcontact_stage1(
    prot_name=params_s1['prot_name'],
    prot_chain=params_s1['prot_chain'],
    fasta_fp=params_s1['fasta_fp'],
    fc_fp=params_s1['fc_fp'],
    gdca_fp=params_s1['gdca_fp'],
    cp_fp=params_s1['cp_fp'],
    plmc_fp=params_s1['plmc_fp'],
    plmc_param_fp=params_s1['plmc_param_fp'],
    sv_fp_feature=params_s1['sv_fp_feature'],
    sv_suffix_feature=params_s1['sv_suffix_feature'],
    model_frozen_fpn=params_s1['model_frozen_fpn'],
    sv_fp_pred=params_s1['sv_fp_pred'],
    sv_suffix_pred=params_s1['sv_suffix_pred'],
    seq_sep_inferior=params_s1['seq_sep_inferior'],
    batch_size=100,
    verbose=True,
)

Alternatively, it can run in shell. For usage, please type

deephelicon_s1 -h

It shows the usage of different parameters.

-pn, --prot_name              Protein name identifier (e.g., "5w1h")
-pc, --prot_chain             Protein chain identifier (e.g., "A")
-fa, --fasta_fp               Path to the FASTA file containing the protein sequence
-fcfp, --fc_fp                Path to the FreeContact feature file
-gdcafp, --gdca_fp            Path to the Gaussian DCA feature file
-cpfp, --cp_fp                Path to the CCMPred feature file
-plmcfp, --plmc_fp            Path to the PLMC coupling score file
-plmcparamfp, --plmc_param_fp Path to the PLMC parameter file
-sv_fp_f, --sv_fp_feature     Path to save the generated input features for the model (default: ./)
-sv_suf_f, --sv_suffix_feature    Suffix name for the saved feature file (e.g., ".fs1", default: .fs1)
-m, --model_frozen_fpn        Path to the frozen model (.pb) file for prediction
-sv_fp_p, --sv_fp_pred        Path to save the model prediction outputs (default: ./)
-sv_suf_p, --sv_suffix_pred   Suffix name for the saved prediction file (e.g., ".s1", default: .s1)
-ss_inf, --seq_sep_inferior   Sequence separation inferior limit (default: 4)
-bs, --batch_size             Batch size used during prediction (default: 100)
-vb, --verbose                Whether to print detailed logs during processing (default: True)

You can run it using the following code.

Command
Output
deephelicon_s1 -pn 2wsc -pc 2 -fa ./data/deephelicon/example_data/ -fcfp ./data/deephelicon/example_data/ -gdcafp ./data/deephelicon/example_data/ -cpfp ./data/deephelicon/example_data/ -plmcfp ./data/deephelicon/example_data/ -plmcparamfp ./data/deephelicon/example_data/ -sv_fp_f ./data/deephelicon/ -sv_suf_f .fs1 -m ./data/deephelicon/model/tf2/frozen_graph/s1.pb -sv_fp_p ./data/deephelicon/ -ss_inf 4 -sv_suf_p .s1 -bs 100

Stage-2 prediction

We define several parameters required for running DeepHelicon at stage 1.

params_s2i1 = {
    'prot_name': '2wsc',
    'prot_chain': '2',
    'fasta_fp': '../../data/deephelicon/example_data/',
    'dhc_fp': '../../data/deephelicon/',
    'dhc_suffix': '.s1',
    'sv_fp_feature': '../../data/deephelicon/',
    'sv_suffix_feature': '.fs2i1',
    'model_frozen_fpn': '../../data/deephelicon/model/tf2/frozen_graph/s2i1.pb',
    'batch_size': 100,
    'seq_sep_inferior': 5,
    'sv_fp_pred': '../../data/deephelicon/',
    'sv_suffix_pred': '.s2i1',
}
Code
Output
import deephelicon

deephelicon.predict.rrcontact_stage2(
    prot_name=params_s2i1['prot_name'],
    prot_chain=params_s2i1['prot_chain'],
    fasta_fp=params_s2i1['fasta_fp'],
    dhc_fp=params_s2i1['dhc_fp'],
    dhc_suffix=params_s2i1['dhc_suffix'],
    sv_fp_feature=params_s2i1['sv_fp_feature'],
    sv_suffix_feature=params_s2i1['sv_suffix_feature'],
    model_frozen_fpn=params_s2i1['model_frozen_fpn'],
    sv_fp_pred=params_s2i1['sv_fp_pred'],
    sv_suffix_pred=params_s2i1['sv_suffix_pred'],
    seq_sep_inferior=params_s2i1['seq_sep_inferior'],
    batch_size=params_s2i1['batch_size'],
)

In shell, for usage, please type

deephelicon_s2 -h

It shows the usage of different parameters.

-pn, --prot_name              Protein name identifier (e.g., "5w1h")
-pc, --prot_chain             Protein chain identifier (e.g., "A")
-fa, --fasta_fp               Path to the FASTA file containing the protein sequence
-dhcfp, --dhc_fp              Path to save a residue-residue contact file
-dhc_suf, --dhc_suffix        Suffix name for the saved residue-residue contact prediction file
                            (e.g., ".s1" or ".s2i1")
-sv_fp_f, --sv_fp_feature     Path to save the generated structural feature file for the model
                            (default: "./")
-sv_suf_f, --sv_suffix_feature
                            Suffix name for the saved structural feature file
                            (e.g., ".s2i2", default: ".fs2i1")
-m, --model_frozen_fpn        Path to the frozen model (.pb file) used for prediction
-sv_fp_p, --sv_fp_pred        Path to save the model prediction output (default: "./")
-sv_suf_p, --sv_suffix_pred   Suffix name for the saved prediction file
                            (e.g., "_pred.csv", default: ".s2i1")
-ss_inf, --seq_sep_inferior   Inferior threshold for sequence separation filtering
                            (residue pairs partitioned less than this number will be ignored, default: 5)
-bs, --batch_size             Batch size used during model inference (default: 100)
-vb, --verbose                Whether to print detailed logs during processing (default: True)

You can run it using the following code.

Command
Output
deephelicon_s2 -pn 2wsc -pc 2 -fa ./data/deephelicon/example_data/ -dhcfp ./data/deephelicon/ -dhc_suf .s1 -sv_fp_f ./data/deephelicon/ -sv_suf_f .fs2i1 -m ./data/deephelicon/model/tf2/frozen_graph/s2i1.pb -sv_fp_p ./data/deephelicon/ -sv_suf_p .s2i1 -ss_inf 5 -bs 100

Formatting

DeepHelicon offers results in 2 formats, CASP14 for the CASP14 prediction format and Normal for the DeepHelicon format.

params_reformat = {
    'prot_name': '2wsc',
    'prot_chain': '2',
    'file_paths': '../../data/deephelicon/;../../data/deephelicon/',
    'file_suffixs': '.s2i3;.s2i4',
    'file_ids': '1;2',
    'tmhmm_fp': '../../data/deephelicon/example_data/',
    'sv_fp': '../../data/deephelicon/',
    'format_rr': 'CASP14',
}
Code
Output
import deephelicon

deephelicon.predict.reformat(
    prot_name=params_reformat['prot_name'],
    prot_chain=params_reformat['prot_chain'],
    file_paths=params_reformat['file_paths'],
    file_suffixs=params_reformat['file_suffixs'],
    file_ids=params_reformat['file_ids'],
    tmhmm_fp=params_reformat['tmhmm_fp'],
    sv_fp=params_reformat['sv_fp'],
    format_rr=params_reformat['format_rr'],
)

Please type the following command for usage in shell.

deephelicon_fmt -h
-pn, --prot_name    Protein name identifier (e.g., "5w1h")
-pc, --prot_chain   Protein chain identifier (e.g., "A")
-fps, --file_paths  Python dict of input file paths. The files are produced by DeepHelicon at stage 1 and stage 2.
-fss, --file_suffixs    Python dict of corresponding suffixes for each of the input files (file_paths)
-fids, --file_ids   IDs of input files
-tmhmm, --tmhmm_fp  Path to the TMHMM prediction file for transmembrane region annotation
-sv_fp, --sv_fp Path to save the merged or processed feature file (e.g., .s1, s2i1, s2i2, etc.) (default: ./)
-fmt, --format_rr   Format of the output file for residue-residue contact predictions. The default is "Normal", but "CASP14" is also supported.
-vb, --verbose  Whether to print detailed logs during processing (default: True)
Code
Output
deephelicon_fmt -pn 2wsc -pc 2 -fps ./data/deephelicon/;./data/deephelicon/ -fss .s2i3;.s2i4 -fids 1;2 -tmhmm ./data/deephelicon/example_data/ -sv_fp ./data/deephelicon/ -fmt Normal