IO
1. Get file names¶
We can get the file names in a specific directory using pp.io.find_from_folder
.
Python
1 2 3 4 5 6 7 8 9 10 |
|
Output
29/07/2024 02:31:43 logger: ======>0. Find file (like "Q86V85"): 1a11
29/07/2024 02:31:43 logger: ======>1. Find file (like "Q86V85"): 1a91
29/07/2024 02:31:43 logger: ======>2. Find file (like "Q86V85"): 1afo
...
29/07/2024 02:31:43 logger: ======>8954. Find file (like "Q86V85"): 8wbx
29/07/2024 02:31:43 logger: ======>8955. Find file (like "Q86V85"): 8wd6
0
0 1a11
1 1a91
2 1afo
3 1aig
4 1aij
... ...
8951 8u5b
8952 8wam
8953 8wba
8954 8wbx
8955 8wd6
[8956 rows x 1 columns]
Info
flag
: which method used to suit file names. Default value: 1
-
1 - a general function for finding the prefixes of files
-
2 - separate protein names and chains from file prefixes, like 1atz_A, PDBTM format
-
3 - separate protein names and chains from file prefixes, like 1atzA
-
4 - separate protein names and multiple chains from file prefixes, like 1atz_ABCD
-
5 - separate protein names and multiple chains from file prefixes with regular expression
2. Different and repeated files between 1D lists¶
To identify different and repeated files between two lists data/pdbtm_alpha_10.02.2023.txt
and df[0]
, we can do
Python
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
|
Output
# print(pds_diff)
Series([], dtype: object)
# print(psd_rept)
0 3zmj
1 5mur
2 7f92
3 7zdv
4 2fyn
...
8814 8a1v
8815 7wu9
8816 5iou
8817 3rce
8818 7f1s
Length: 8819, dtype: object
In addition, we can find newly added proteins between two versions of PDBTM protein lists.
Python
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
Output
0 6s7o_A
1 8ha2_A
2 8j5p_P
3 1izl_A
4 8p3w_H
...
2951 8t6u_A
2952 8hju_B
2953 6zjy_H
2954 4qkm_B
2955 6hjr_D
Length: 2956, dtype: object
0 5va3_A
1 8f4f_T
2 6yto_F
3 4knf_D
4 2xq3_A
...
33371 6nt7_B
33372 6jlo_Z
33373 8h9j_8
33374 6idf_E
33375 5jtg_E
Length: 33376, dtype: object
3. Different and repeated files between 2D lists¶
We can get the list of Different and repeated files between 2D lists. It drops duplicates by considering two columns in lists.
Python
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
|
Output
# df_differ
Empty DataFrame
Columns: []
Index: []
# df_repeat
0 1
0 8t2v A
1 6b8h 3
2 8p3s H
3 7thu K
4 8bx5 C
... ... ..
2951 7a24 V
2952 6vam B
2953 8go3 C
2954 8c29 n
2955 6psn F
[2956 rows x 2 columns]
4. move, copy, remove, rename, create¶
PyPropel also provides a few move
, copy
, remove
, rename
, create
operations on a list of IDs rendered in pandas Series format.
move_files
copy_files
delete_files
rename_file_suffix
rename_file_prefix
makedir
Below shows an example that a list of proteins are moved from data/isoform/
to data/isoform/transmembrane/
.
Python
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
|