Lipophilicity score#
After generating LIPS results for protein 1xqf
chain A
, we can utilize them to annotate the protein at both the surface level and per-residue level.
In this tutorial, we demonstrate how to use TMKit to annotate each residue with its corresponding lipophilicity scores derived from the LIPS results.
Reminder of data
Please make sure that the build-in example dataset has been downloaded before you walk through the tutorial.
Example usage#
We can first read the file of a helix surface ID id=1
using the following code. Please remember that we have put the results for protein 1xqf
chain A
in folder ./data/lips/
as in the last tutorial.
import tmkit as tmk
df = tmk.feature.get_surf_lips(
fp='./data/lips/',
prot_name='1xqf',
file_chain='A',
id=1,
)
print(df)
aa_ids lipos
0 2 0.026
1 5 0.804
2 6 0.573
3 9 0.697
4 12 0.973
.. ... ...
150 352 0.828
151 355 0.688
152 356 0.535
153 359 0.984
154 362 0.615
[155 rows x 2 columns]
The output is a dataframe containing 2 columns, that is, aa_ids and lipos, as shown below.
Attribute |
Description |
---|---|
aa_ids |
amino acid ID |
lipos |
lipophilicity score |
If we want to annotate all amino acids with the lipophilicity scores, we need to use all 7 surface data. In TMKit, we can do it this way.
import tmkit as tmk
_, lipos_dict, _, _ = tmk.feature.read(
fp='./data/lips/',
prot_name='1xqf',
file_chain='A',
)
print(lipos_dict)
{1.0: 0.018, 4.0: 0.74, 5.0: 0.804, ..., 357.0: 0.727, 360.0: 1.174}
Actually, we can use TMKit to check the summary about all 7 surfaces, which shows the overall lipophilicity score at the surface level.
import tmkit as tmk
df = tmk.feature.get_helix_all_surf_lips(
prot_name='1xqf',
file_chain='A',
fp='./data/lips/',
)
print(df)
surfs lipos
0 5 1.834
1 0 1.770
2 3 1.729
3 1 1.815
4 2 1.791
5 6 1.777
6 4 1.767
Additionally, there are average lipophilicity scores at the surface level, which can be accessed as follows. The column lxe
represents the average lipophilicity scores.
import tmkit as tmk
df = tmk.feature.get_helix_all_surf_avelips(
fp='./data/lips/',
prot_name='1xqf',
file_chain='A',
)
print(df)
surfs lxe
0 5 8.889
1 0 8.694
2 3 8.389
3 1 8.865
4 2 8.507
5 6 8.435
6 4 8.741
We can continue to make most of the average lipophilicity scores to annotate helix surfaces.
import tmkit as tmk
_, _, _, lips_dict = tmk.feature.read(
fp='./data/lips/',
prot_name='1xqf',
file_chain='A',
)
print(lips_dict)
{5.0: [1.834, 4.846, 8.889], 0.0: [1.77, 4.912, 8.694], 3.0: [1.729, 4.852, 8.389], 1.0: [1.815, 4.885, 8.865], 2.0: [1.791, 4.749, 8.507], 6.0: [1.777, 4.746, 8.435], 4.0: [1.767, 4.948, 8.741]}
We finally sort out an overall dataframe containing the LIPS results for all amino acids using the code below.
import tmkit as tmk
df_surf = tmk.feature.read_helix_all_surf(
fp='./data/lips/',
prot_name='1xqf',
file_chain='A',
)
df = tmk.feature.torosseta(
fp='./data/lips/',
prot_name='1xqf',
file_chain='A',
df_surf_lips=df_surf,
)
print(df)
======>reading surface 5
======>reading surface 0
======>reading surface 3
======>reading surface 1
======>reading surface 2
======>reading surface 6
======>reading surface 4
aa_ids mean_lipo lipos ents
0 6 8.435 0.573 4.896
1 9 8.435 0.697 4.852
2 10 8.435 1.621 2.735
3 13 8.435 0.838 5.327
4 16 8.435 0.722 4.914
... ... ... ... ...
1080 358 8.507 0.522 2.980
1081 359 8.507 0.984 3.276
1082 362 8.507 0.615 4.539
1083 1 8.507 0.018 1.125
1084 2 8.507 0.026 1.158
[1085 rows x 4 columns]
Attributes#
Attribute |
Description |
---|---|
prot_name |
name of a protein in the prefix of a PDB file name (e.g., 1xqf in 1xqfA.pdb) |
file_chain |
chain of a protein in the prefix of a PDB file name (e.g., A in 1xqfA.pdb) |
fp |
path where the LIPS results for a protein are placed |
id |
surface id, 0-6 |
See also
Please see here for better understanding the file-naming system.
Output#
Please check the output in each vignette above.