Residue contact#
Protein residue contacts are crucial for protein structural prediction and drug target interaction prediction. TMKit allows users to parse protein residue contacts predicted by PSICOV[1], FreeContact[2], CCMPred[3], Gremlin[4], GDCA[5], PlmDCA[6], MemConP[7], Membrain2[8], and DeepHelicon[9].
PSICOV, FreeContact, CCMPred, Gremlin, GDCA, and PlmDCA are canonical covariance methods, while MemConP, Membrain2, and DeepHelicon are machine learning methods specialized for transmembrane proteins.
Reminder of data
Please make sure that the build-in example dataset has been downloaded before you walk through the tutorial.
Example usage#
We can use the following code to obtain the residue contacts of protein 1xqf chain A. Similar to other cases in our tutorial, there are commonly used parameters. Please the next section for details.
import tmkit as tmk
df1, df2 = tmk.rrc.read(
prot_name='1xqf',
seq_chain='A',
fasta_fp='data/fasta/',
pdb_fp='data/pdb/',
xml_fp='data/xml/',
dist_fp='data/rrc/',
tool_fp='data/rrc/tool/',
seq_sep_inferior=1,
seq_sep_superior=None,
tool='membrain2',
)
Attributes#
Attribute |
Description |
|---|---|
|
path where a target PDB file is placed |
|
path where a target Fasta file is placed |
|
path where a target XML file is placed |
|
path where a file containing real distances between residues is placed (please check the file at ./data/rrc in the example dataset) |
|
path where a protein residue contact map file is placed |
|
name of a contact prediction tool. It can be one of PSICOV, FreeContact, CCMPred, Gremlin, GDCA, PlmDCA, MemConP, Membrain2, and DeepHelicon |
|
The lower bounds of how far any two residues are in pairs |
|
The upper bounds of how far any two residues are in pairs |
|
name of a protein in the prefix of a PDB file name (e.g., 1xqf in 1xqfA.pdb) |
|
chain of a protein in the prefix of a PDB file name (e.g., A in 1xqfA.pdb). Parameter file_chain will be converted within the function |
See also
Please see here for better understanding the file-naming system.
Output#
There are two Pandas dataframes. The first one df1 is the dataframe containing the predicted contacts by tool Membrain2.
print(df1)
contact_id_1 contact_id_2 score
0 13 44 0.032846
1 13 45 0.011669
2 13 46 0.019312
3 13 47 0.089862
4 13 48 0.026575
... ... ... ...
19443 308 349 0.044726
19444 308 350 0.080527
19445 308 351 0.039438
19446 308 352 0.034000
19447 308 353 0.074005
[19448 rows x 3 columns]
The second one df2 is the dataframe containing the real distances between two residues, such that.
Attribute |
Description |
|---|---|
fasta_id_1 |
Fasta id of the first residue |
aa_1 |
Amino acid type of the first residue |
pdb_id_1 |
PDB id of the first residue |
fasta_id_2 |
Fasta id of the second residue |
aa_2 |
Amino acid type of the second residue |
pdb_id_2 |
PDB id of the second residue |
dist |
distance |
is_contact |
if they are in contact |
print(df2)
fasta_id_1 aa_1 pdb_id_1 fasta_id_2 aa_2 pdb_id_2 dist is_contact
0 13 I 15 44 T 46 23.495386 0
1 13 I 15 45 Q 47 22.651615 0
2 13 I 15 46 V 48 18.67347 0
3 13 I 15 47 T 49 19.484049 0
4 13 I 15 48 V 50 21.53894 0
... ... ... ... ... ... ... ... ...
19443 308 F 332 349 G 373 35.690994 0
19444 308 F 332 350 Y 374 32.043457 0
19445 308 F 332 351 K 375 38.532841 0
19446 308 F 332 352 L 376 40.355228 0
19447 308 F 332 353 A 377 40.803558 0
[19448 rows x 8 columns]
You can combine the two dataframes directly because they have been aligned this way below, which makes your research easier.
import pandas as pd
df = pd.concat([df1, df2], axis=1)
print(df)
It outputs:
contact_id_1 contact_id_2 score ... pdb_id_2 dist is_contact
0 13 44 0.032846 ... 46 23.495386 0
1 13 45 0.011669 ... 47 22.651615 0
2 13 46 0.019312 ... 48 18.67347 0
3 13 47 0.089862 ... 49 19.484049 0
4 13 48 0.026575 ... 50 21.53894 0
... ... ... ... ... ... ... ...
19443 308 349 0.044726 ... 373 35.690994 0
19444 308 350 0.080527 ... 374 32.043457 0
19445 308 351 0.039438 ... 375 38.532841 0
19446 308 352 0.034000 ... 376 40.355228 0
19447 308 353 0.074005 ... 377 40.803558 0
[19448 rows x 11 columns]