Integrated metric#
TMkit can bulk generate quality control (QC) metrics. The tmk.qc.integrate
will integrate separate metrics generated through this tutorial.
Reminder of data
Please make sure that the build-in example dataset has been downloaded before you walk through the tutorial.
Example usage#
First, we still use the 5 transmembrane proteins, that is, protein 1xqf
chain A
, protein 1eq8
chain A
, protein 6e3y
chain E
, protein 3pux
chain G
and protein 3udc
chain A
, and put them in a Pandas dataframe as follows.
import pandas as pd
prots = [['1xqf', 'A'], ['1eq8', 'A'], ['6e3y', 'E'], ['3pux', 'G'], ['3udc', 'A'], ['3rko', 'A']]
df_prot = pd.DataFrame(prots, columns=['prot', 'chain'])
df_prot = df_prot.rename(columns={
0: 'prot',
1: 'chain',
})
print(df_prot)
Please refer to this tutorial for generating their PDB structures and XML files.
Then, we can generate QC metrics for them altogether using the following code.
import tmkit as tmk
df = tmk.qc.integrate(
df_prot=df_prot,
pdb_cplx_fp='data/pdb/pdbtm/',
fasta_fp='data/fasta/',
xml_fp='data/xml/',
sv_fp='data/qc/',
metrics=['rez', 'met', 'bio_name', 'head', 'desc', 'mthm', 'seq'],
)
print(df)
Attributes#
Attribute |
Description |
---|---|
|
Pandas dataframe storing protein names and chain names |
|
path where a protein complex file from PDBTM is placed |
|
path where a protein Fasta file is placed |
|
path where a protein XML file from PDBTM is placed |
|
path to save files |
|
a QC metric: |
See also
Please see here for better understanding the file-naming system.
Output#
Finally, you can see the following output containing 7 metrics as illustrated above to allow for further QC analysis of the proteins.
======>metric: rez
=========>protein 1xqf chain A with rez 1.8
=========>protein 1eq8 chain A with rez nan
=========>protein 6e3y chain E with rez 3.3
=========>protein 3pux chain G with rez 2.3
=========>protein 3udc chain A with rez 3.35
=========>protein 3rko chain A with rez 3.0
======>metric: met
=========>protein 1xqf chain A with met x-ray diffraction
=========>protein 1eq8 chain A with met unknown
=========>protein 6e3y chain E with met x-ray diffraction
=========>protein 3pux chain G with met x-ray diffraction
=========>protein 3udc chain A with met x-ray diffraction
=========>protein 3rko chain A with met x-ray diffraction
======>metric: bio_name
=========>protein 1xqf chain A with bio_name the mechanism of ammonia transport based on the crystal structure of amtb of e. coli.
=========>protein 1eq8 chain A with bio_name three-dimensional structure of the pentameric helical bundle of the acetylcholine receptor m2 transmembrane segment
=========>protein 6e3y chain E with bio_name cryo-em structure of the active, gs-protein complexed, human cgrp receptor
=========>protein 3pux chain G with bio_name crystal structure of an outward-facing mbp-maltose transporter complex bound to adp-bef3
=========>protein 3udc chain A with bio_name crystal structure of a membrane protein
=========>protein 3rko chain A with bio_name crystal structure of the membrane domain of respiratory complex i from e. coli at 3.0 angstrom resolution
======>metric: head
=========>protein 1xqf chain A with head transport protein
=========>protein 1eq8 chain A with head signaling protein
=========>protein 6e3y chain E with head signaling protein
=========>protein 3pux chain G with head hydrolase/transport protein
=========>protein 3udc chain A with head membrane protein
=========>protein 3rko chain A with head oxidoreductase
======>metric: desc
=========>protein 1xqf chain A with desc TRANSPORT PROTEIN
=========>protein 1eq8 chain A with desc SIGNALING PROTEIN
=========>protein 6e3y chain E with desc SIGNALING PROTEIN
=========>protein 3pux chain G with desc HYDROLASE/TRANSPORT PROTEIN
=========>protein 3udc chain A with desc MEMBRANE PROTEIN
=========>protein 3rko chain A with desc OXIDOREDUCTASE
======>metric: mthm
=========>protein 1xqf chain A with mthm 11.0
=========>protein 1eq8 chain A with mthm 1.0
=========>protein 6e3y chain E with mthm 1.0
=========>protein 3pux chain G with mthm 6.0
=========>protein 3udc chain A with mthm 3.0
=========>protein 3rko chain A with mthm 3.0
======>0 extraction items failed using mthm.
======>metric: seq
prot chain ... seq_aa seq_len
0 1xqf A ... AVADKADNAFMMICTALVLFMTIPGIALFYGGLIRGKNVLSMLTQV... 362.0
1 1eq8 A ... EKMSTAISVLLAQAVFLLLTSQR 23.0
2 6e3y E ... EANYGALLRELCLTQFQVDMEAVGETLWCDWGRTIRSYRELADCTW... 115.0
3 3pux G ... AMVQPQKARLFITHLLLLLFIAAIMFPLLMVVAISLRQGNFATGSL... 293.0
4 3udc A ... YDIKAVKFLLDVLKILIIAFIGIKFADFLIYRFYKLYSKSKIQLPQ... 267.0
5 3rko A ... AFAIFLIVAIGLCCLMLVGGWFLGGRARARLRLSAKFYLVAMFFVI... 95.0
[6 rows x 11 columns]