IntAct#
IntAct[1] is another one of the most widely-used databases that catalogues protein-protein interactions.
TMKit offers an interface, tmkit.ppi
, to access the database. In this tutorial, we will show how we can use this database in Python, starting from downloading it.
Example usage#
First, let’s download the IntAct database. In the example dataset, there is a folder called ppi
. The path is ./data/ppi/
, which is the place where we suggest users to manage the data used and generated. We can choose either a specific version or the most recent version of the database. Using current
, we can download the most recent version. Then we can save it in ./data/ppi/
through parameter sv_fp
.
You should have a file called intact.zip
after downloading. The tmk.ppi.download_intact_db
function will automatically decompress it as intact.txt
.
import tmkit as tmk
tmk.ppi.download_intact_db(
version='current',
sv_fp='./data/ppi/',
)
Then, using the following codes, you can access the database. The data/ppi/intact.txt
is the IntAct database. The tmk.ppi.read_intact_db
function will extract a subset of it containing only protein interactors A and B (#ID(s) interactor A
and ID(s) interactor B
).
Importantly, this function will save the subset as in interA_B.intact
in ./data/ppi/interA_B.intact
.
import tmkit as tmk
df = tmk.ppi.read_intact_db(
intact_fpn='./data/ppi/intact.txt',
extract_ids=[
'#ID(s) interactor A',
'ID(s) interactor B',
],
sv_fpn='data/ppi/interA_B.intact',
)
print(df)
Attributes#
Attribute |
Description |
---|---|
|
version of a BioGRID database, for example, |
|
path where a IntAct database is placed |
|
path to where you want to save files |
|
a list that can include more than one feature, such as |
See also
Please see here for better understanding the file-naming system.
Output#
Finally, you will see the following output, which shows 42 features in IntAct, for example Taxid interactor A
. You can extract each of the feature in Python, e.g., df['ID(s) interactor B']
.
======>reading IntAct...
======>IntAct features are:
=========>No.1: #ID(s) interactor A
=========>No.2: ID(s) interactor B
=========>No.3: Alt. ID(s) interactor A
=========>No.4: Alt. ID(s) interactor B
=========>No.5: Alias(es) interactor A
=========>No.6: Alias(es) interactor B
=========>No.7: Interaction detection method(s)
=========>No.8: Publication 1st author(s)
=========>No.9: Publication Identifier(s)
=========>No.10: Taxid interactor A
=========>No.11: Taxid interactor B
=========>No.12: Interaction type(s)
=========>No.13: Source database(s)
=========>No.14: Interaction identifier(s)
=========>No.15: Confidence value(s)
=========>No.16: Expansion method(s)
=========>No.17: Biological role(s) interactor A
=========>No.18: Biological role(s) interactor B
=========>No.19: Experimental role(s) interactor A
=========>No.20: Experimental role(s) interactor B
=========>No.21: Type(s) interactor A
=========>No.22: Type(s) interactor B
=========>No.23: Xref(s) interactor A
=========>No.24: Xref(s) interactor B
=========>No.25: Interaction Xref(s)
=========>No.26: Annotation(s) interactor A
=========>No.27: Annotation(s) interactor B
=========>No.28: Interaction annotation(s)
=========>No.29: Host organism(s)
=========>No.30: Interaction parameter(s)
=========>No.31: Creation date
=========>No.32: Update date
=========>No.33: Checksum(s) interactor A
=========>No.34: Checksum(s) interactor B
=========>No.35: Interaction Checksum(s)
=========>No.36: Negative
=========>No.37: Feature(s) interactor A
=========>No.38: Feature(s) interactor B
=========>No.39: Stoichiometry(s) interactor A
=========>No.40: Stoichiometry(s) interactor B
=========>No.41: Identification method participant A
=========>No.42: Identification method participant B
======>The file is saved.
#ID(s) interactor A ID(s) interactor B
0 P49418 O43426
1 intact:EB7121639 P49418
2 intact:EB7121654 P49418
3 intact:EB7121715 P49418
4 P49418 intact:EB7121765
... ... ...
1262938 Q80TR1 Q9WTS4
1262939 Q92556 P07355
1262940 Q92556 Q14185
1262941 Q92556 P07355
1262942 Q92556 P07355
[1262943 rows x 2 columns]