Pred-MutHTP#
The Pred-MutHTP database[1] contains predicted disease‐causing and neutral mutations in human transmembrane proteins. In the absence of experimentally-verified records, these are extremely useful for providing information about the status of mutations that occur in amino acid sites of human transmembrane proteins.
TMKit offers an interface (tmkit.mut) to access the database.
Reminder of data
Please make sure that the build-in example dataset has been downloaded before you walk through the tutorial.
Download and read#
First, let’s download the database of Pred-MutHTP. In the example dataset, there is a folder called mutation. The path is ./data/mutation/, which is the place where we suggest users to manage the data used and generated.
The database is called pred_varhtp_mut.zip. You should decompress it after downloading.
import tmkit as tmk
tmk.mut.download_predmuthtp_db(
sv_fp='./data/mutation/'
)
After decompressing it, you will have pred_varhtp_mut.csv. We can now access this database using the following code.
import tmkit as tmk
df = tmk.mut.read_predmuthtp_db(
pred_muthtp_fpn='./data/mutation/pred_varhtp_mut.csv'
)
print(df)
Finally, you will see the following output, which shows 5 features in Pred-MutHTP, including:
uniprot_idprotein_mutation_sitetopologymutation_typemut_prob
You can extract each of the feature in Python, e.g., df['topology'].
Attributes#
Attribute |
Description |
|---|---|
|
path where the Pred-MutHTP database is placed |
|
path to where you want to save the Pred-MutHTP database to be downloaded |
See also
Please see here for better understanding the file-naming system.
Output#
======>reading Pred-MutHTP...
======>Pred-MutHTP features are:
=========>No.1: uniprot_id
=========>No.2: protein_mutation_site
=========>No.3: topology
=========>No.4: mutation_type
=========>No.5: mut_prob
uniprot_id protein_mutation_site ... mutation_type mut_prob
0 A0PJX4 M1A ... Neutral 0.731
1 A0PJX4 M1C ... Neutral 0.731
2 A0PJX4 M1D ... Disease-causing 0.745
3 A0PJX4 M1E ... Disease-causing 0.745
4 A0PJX4 M1F ... Disease-causing 0.745
... ... ... ... ... ...
54962537 Q9Y277 A283S ... Neutral 0.820
54962538 Q9Y277 A283T ... Neutral 0.525
54962539 Q9Y277 A283V ... Disease-causing 0.542
54962540 Q9Y277 A283W ... Disease-causing 0.690
54962541 Q9Y277 A283Y ... Disease-causing 0.658
[54962542 rows x 5 columns]
Split into individual files#
This will split the Pred-MutHTP database into individual files according to the UniProt accession codes of human transmembrane proteins.
import tmkit as tmk
df = tmk.mut.read_predmuthtp_db(
pred_muthtp_fpn='./data/mutation/pred_varhtp_mut.csv'
)
tmk.mut.split_predmuthtp(
pred_muthtp_df=df,
sv_fp='data/mutation/'
)
Attributes#
Attribute |
Description |
|---|---|
|
Pandas dataframe of the Pred-MutHTP database |
|
path where the Pred-MutHTP database is placed |
|
path to where you want to save the Pred-MutHTP database to be downloaded |
See also
Please see here for better understanding the file-naming system.
Output#
In the console, it prints the output as shown below.
======>5185 uniprot proteins
=========>Splitting No.0 protein from Pred-MutHTP
=========>Splitting No.1 protein from Pred-MutHTP
=========>Splitting No.2 protein from Pred-MutHTP
=========>Splitting No.3 protein from Pred-MutHTP
......
Access an individual file#
Now, we can see what is contained in an individual file. Let’s check this protein A0PK00 (UniProt accession code) using the following code.
import tmkit as tmk
df = tmk.mut.read_split_predmuthtp(
pred_split_muthtp_fpn='data/mutation/A0PK00.predmuthtp'
)
Attributes#
Attribute |
Description |
|---|---|
|
path where an individual file from the Pred-MutHTP database is placed |
See also
Please see here for better understanding the file-naming system.
Output#
In the console, it prints the output as shown below.
======>reading split Pred-MutHTP...
uniprot_id protein_mutation_site mut_prob
0 W5XKT8 Y34A 0.698
1 W5XKT8 Y34C 0.567
2 W5XKT8 Y34D 0.551
3 W5XKT8 Y34E 0.642
4 W5XKT8 Y34F 0.785
... ... ... ...
6151 W5XKT8 N324S 0.598
6152 W5XKT8 N324T 0.711
6153 W5XKT8 N324V 0.859
6154 W5XKT8 N324W 0.874
6155 W5XKT8 N324Y 0.809
[6156 rows x 3 columns]