Dependencies Status Code style: black Security: bandit Anaconda run with docker Downloads

TMKit: transmembrane protein analysis#

TMKit is an open-source Python programming interface, which is modular, scalable, and specifically designed for processing transmembrane protein data. It enables users to perform database wrangling, engineer features at the mutational, domain, and topological levels, and visualise protein-protein interaction interfaces through its unique programming interface. In addition, TMKit includes seqNetRR, a high-performance computing library that allows for customised construction and rewiring of residue connections. This library is particularly well-suited for assigning coevolutionary features at a fast speed.

_images/tmkit_logo.png

Python Docker Anaconda PyPI

🌟Feature

  • βœ… handing multiple kinds of transmembrane protein data

  • βœ… fast speed

  • βœ… structural visualisation

πŸ”§ Functionalities#

TMKit provides 9 function classes to handle a number of transmembrane protein sequence and structural analysis problems, including visualisation, sequence, quality control, topology, mapping, annotation, connectivity, edge extraction, and feature.

Sequence

A fundamental component designed to handle sequence reading in diverse formats, sequence retrieval from various sources, and MSA generation.

Topology

TM and non-TM topologies (side 1, side 2, strand, coil, inside, loop, and interfacial), structure-derived (TOPDB) or predicted topologies (TMHMM and Phobius).

Annotation

Amino acid residues in biological functions annotated through the MutHTP, Pred-MutHTP and CATH databases.

Edge extraction

A high-performance computing library for extracting connections between residues by building graphs and assigning features quickly.

Visualisation

Identification of protein-protein interaction (PPI) interfaces critical to understand the biological processes.

Quality control

Evaluation criteria, including the experimentation methods used, resolution, subclass, and sequence length, to qualify proteins.

Mapping

Identifier mapping between structural and sequence data (e.g., FASTA IDs and PDB IDs) to guarantee the correct interpretation of biological findings.

Connectivity

Studying connections of a protein to others in a PPI network is of crucial importance to understand its biological role.

Feature

A set of transmembrane protein-specific and general-purpose features is provided by TMKit in support of machine learning modelling.

🎯 Easy to use#

After installation, you can import TMKit by putting the following code in a Python script or a Jupyter notebook.

import tmkit as tmk

πŸ“Œ Modules#

You can access the 14 modules covering 9 function classes.

See also

install

No.

Module name

Function class

Description

1

tmk.fetch

Quality control

fetch example data

2

tmk.qc

Quality control

generate and extract metrics of sequences and structures

3

tmk.seq

Sequence

parse sequences and structures

4

tmk.msa

Sequence

produce commands for generating multiple sequence alignment

5

tmk.feature

Feature

protein biological features

6

tmk.collate

Mapping

seek difference between RCSB and PDBTM structures

7

tmk.topo

Topology

transmembrane protein topologies

8

tmk.rrc

Feature

performance evaluation of residue contact prediction

9

tmk.ppi

Connectivity

protein connectivity

10

tmk.mut

Annotation

transmembrane protein’s mutation data processing

11

tmk.vs

Visualisation

visualise protein structures

12

tmk.cath

Annotation

access protein domains and families

13

tmk.mapping

Mapping

conversion between protein identifiers

14

tmk.edge

Edge extraction

rewiring of connections between residues

πŸ‘¨β€πŸ’» Developer#

Jianfeng Sun, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences (NDORMS), Headington, Oxford OX3 7LD, University of Oxford.

πŸ“˜ Citation#

Citation

Citation Jianfeng Sun, Arulsamy Kulandaisamy, Jinlong Ru, M Michael Gromiha, Adam P Cribbs, TMKit: a Python interface for computational analysis of transmembrane proteins, Briefings in Bioinformatics, Volume 24, Issue 5, September 2023, bbad288, https://doi.org/10.1093/bib/bbad288