TMHMM

TMHMM#

In this tutorial, we showcase the usage of parsing the topologies predicted by TMHMM2[1].

Attention

TMKit originally supported transmembrane topology prediction using the TMHMM program entirely within Python. However, in our latest version, we observed that updates in Python—particularly major revisions in NumPy’s numerical operations (e.g., deprecation of np.int and np.float)—have introduced compatibility issues with the Python-based TMHMM method.

To accommodate these Python version updates, we have discontinued the built-in TMHMM topology prediction feature in TMKit. However, TMKit still fully supports parsing transmembrane topologies predicted externally using the TMHMM method.

You can still generate TMHMM-based topology predictions via this link.

Example usage#

In TMKit, you can obtain the topologies of a transmembrane protein through tmk.topo.from_tmhmm by simply specifying two parameters topo and tmhmm_fpn. See explanations in Attributes below. We placed an example TMHMM prediction file in ./data/topo/1xqfA.tmhmm. Suppose you have this TMHMM prediction file below.

%pred NB(0): o 1 9, M 10 32, i 33 38, M 39 61, o 62 95, M 96 118, i 119 124, M 125 147, o 148 181, M 182 204, i 205 210, M 211 233, o 234 261, M 262 284, i 285 290, M 291 313, o 314 327, M 328 350, i 351 362
?0 oooooooooMMMMMMMMMMMMMMMMMMMMMMMiiiiiiMMMMMMMMMMMMMMMMMMMMMMMooooooooooo

?0 oooooooooooooooooooooooMMMMMMMMMMMMMMMMMMMMMMMiiiiiiMMMMMMMMMMMMMMMMMMMM

?0 MMMooooooooooooooooooooooooooooooooooMMMMMMMMMMMMMMMMMMMMMMMiiiiiiMMMMMM

?0 MMMMMMMMMMMMMMMMMooooooooooooooooooooooooooooMMMMMMMMMMMMMMMMMMMMMMMiiii

?0 iiMMMMMMMMMMMMMMMMMMMMMMMooooooooooooooMMMMMMMMMMMMMMMMMMMMMMMiiiiiiiiii

?0 ii

You can put the following codes in either a Jupyter notevbook or a Python script. If you want to see the predicted transmembrane topologies there, you can simply assign topo tmh. If you want to see the predicted cytoplasmic or extra-cellular topologies there, you can simply assign topo cyto/extra.

import tmkit as tmk

lower_ids, upper_ids = tmk.topo.from_tmhmm(
    topo='tmh',
    tmhmm_fpn='./data/topo/1xqfA.tmhmm',
    from_fasta=False,
    file_kind='Linux',
)
print('---lower bounds', lower_ids)
print('---upper bounds', upper_ids)

Attributes#

Attribute	Description
`tmhmm_fpn`	path to a target TMHMM file
`topo`	name of a topology kind. It can be `cyto`, `extra`, or `tmh`

Output#

Finally, you will see the following output showing the transmembrane segment that protein 1xqf chain A has.

In the output, lower bounds are the set of starting positions of residues in the PDB structure while upper bounds are the set of ending positions of residues in the PDB structure. They match each other this way. For example, for topology Side2, the first continuous segment is from residue 10 to residue 32, and the second one is from residue 39 to residue 61, …, and the last one is from residue 328 to residue 350.

pper bounds [32, 61, 118, 147, 204, 233, 284, 313, 350]