Residue contact#
Protein residue contacts are crucial for protein structural prediction and drug target interaction prediction. TMKit allows users to parse protein residue contacts predicted by PSICOV[1], FreeContact[2], CCMPred[3], Gremlin[4], GDCA[5], PlmDCA[6], MemConP[7], Membrain2[8], and DeepHelicon[9].
PSICOV, FreeContact, CCMPred, Gremlin, GDCA, and PlmDCA are canonical covariance methods, while MemConP, Membrain2, and DeepHelicon are machine learning methods specialized for transmembrane proteins.
Reminder of data
Please make sure that the build-in example dataset has been downloaded before you walk through the tutorial.
Example usage#
We can use the following code to obtain the residue contacts of protein 1xqf
chain A
. Similar to other cases in our tutorial, there are commonly used parameters. Please the next section for details.
import tmkit as tmk
df1, df2 = tmk.rrc.read(
prot_name='1xqf',
seq_chain='A',
fasta_fp='data/fasta/',
pdb_fp='data/pdb/',
xml_fp='data/xml/',
dist_fp='data/rrc/',
tool_fp='data/rrc/tool/',
seq_sep_inferior=1,
seq_sep_superior=None,
tool='membrain2',
)
Attributes#
Attribute |
Description |
---|---|
|
path where a target PDB file is placed |
|
path where a target Fasta file is placed |
|
path where a target XML file is placed |
|
path where a file containing real distances between residues is placed (please check the file at ./data/rrc in the example dataset) |
|
path where a protein residue contact map file is placed |
|
name of a contact prediction tool. It can be one of PSICOV, FreeContact, CCMPred, Gremlin, GDCA, PlmDCA, MemConP, Membrain2, and DeepHelicon |
|
The lower bounds of how far any two residues are in pairs |
|
The upper bounds of how far any two residues are in pairs |
|
name of a protein in the prefix of a PDB file name (e.g., 1xqf in 1xqfA.pdb) |
|
chain of a protein in the prefix of a PDB file name (e.g., A in 1xqfA.pdb). Parameter file_chain will be converted within the function |
See also
Please see here for better understanding the file-naming system.
Output#
There are two Pandas dataframes. The first one df1 is the dataframe containing the predicted contacts by tool Membrain2.
print(df1)
contact_id_1 contact_id_2 score
0 13 44 0.032846
1 13 45 0.011669
2 13 46 0.019312
3 13 47 0.089862
4 13 48 0.026575
... ... ... ...
19443 308 349 0.044726
19444 308 350 0.080527
19445 308 351 0.039438
19446 308 352 0.034000
19447 308 353 0.074005
[19448 rows x 3 columns]
The second one df2 is the dataframe containing the real distances between two residues, such that.
Attribute |
Description |
---|---|
fasta_id_1 |
Fasta id of the first residue |
aa_1 |
Amino acid type of the first residue |
pdb_id_1 |
PDB id of the first residue |
fasta_id_2 |
Fasta id of the second residue |
aa_2 |
Amino acid type of the second residue |
pdb_id_2 |
PDB id of the second residue |
dist |
distance |
is_contact |
if they are in contact |
print(df2)
fasta_id_1 aa_1 pdb_id_1 fasta_id_2 aa_2 pdb_id_2 dist is_contact
0 13 I 15 44 T 46 23.495386 0
1 13 I 15 45 Q 47 22.651615 0
2 13 I 15 46 V 48 18.67347 0
3 13 I 15 47 T 49 19.484049 0
4 13 I 15 48 V 50 21.53894 0
... ... ... ... ... ... ... ... ...
19443 308 F 332 349 G 373 35.690994 0
19444 308 F 332 350 Y 374 32.043457 0
19445 308 F 332 351 K 375 38.532841 0
19446 308 F 332 352 L 376 40.355228 0
19447 308 F 332 353 A 377 40.803558 0
[19448 rows x 8 columns]
You can combine the two dataframes directly because they have been aligned this way below, which makes your research easier.
import pandas as pd
df = pd.concat([df1, df2], axis=1)
print(df)
It outputs:
contact_id_1 contact_id_2 score ... pdb_id_2 dist is_contact
0 13 44 0.032846 ... 46 23.495386 0
1 13 45 0.011669 ... 47 22.651615 0
2 13 46 0.019312 ... 48 18.67347 0
3 13 47 0.089862 ... 49 19.484049 0
4 13 48 0.026575 ... 50 21.53894 0
... ... ... ... ... ... ... ...
19443 308 349 0.044726 ... 373 35.690994 0
19444 308 350 0.080527 ... 374 32.043457 0
19445 308 351 0.039438 ... 375 38.532841 0
19446 308 352 0.034000 ... 376 40.355228 0
19447 308 353 0.074005 ... 377 40.803558 0
[19448 rows x 11 columns]