Usage

Usage#

After understanding the concept of cumuCCs, you can use the following example to generate per-residue cumuCCs and perform the feature assignment.

Example code#

We can see tab attributes to better understand the input.

import tmkit as tmk

tmk.edge.extract(
    method='cumulative',
    fasta_fpn='data/fasta/1xqfA.fasta',
    net_fpn='data/rrc/tool/1xqfA.evfold',
    window_size=5,
    seq_sep_inferior=0,
    seq_sep_superior=None,
    pair_mode='patch',
    assign_mode='hash',
    input_kind='freecontact',
    cumu_ratio=1.,
    is_sv=True,
    sv_fpn='./data/output.txt',
)

Name

Description

input_kind

feature format that is regulated by a co-evolution method, such CCMPred, FreeContact, GDCA, or Plmc.

fpn

path where a covariance matrix (co-evolutionary features) is placed. The covariance matrix should be generated by either CCMPred, FreeContact, GDCA, or plm-DCA. Or, a file that contains three columns (the 1st two for residue pair IDs and the 3rd one for co-evolutionary strengths) is fine.

assign_mode

method to generate cumuCCs. It can be hash, hash_rf, hash_ori, pandas, or numpy.

window_size

window size

window_m_ids

list of residues after applying a window

sequence

molecular sequence

seq_sep_inferior

The lower bounds of how far any two residues are in pairs

seq_sep_superior

The upper bounds of how far any two residues are in pairs.

cumu_ratio

top-ranked int(len_seq*cumu_ratio) co-evolutionary strengths that a residue of interest is involved in.

is_sv

if saving result, True or False. If False, you can assign sv_fpn None. path to save the result

sv_fpn

path to save the result.

Please see here for better understanding the file-naming system.

Output#

Finally, you will see the following output.

===>Molecular sequence: AVADKADNAFMMICTALVLFMTIPGIALFYGGLIRGKNVLSMLTQVTVTFALVCILWVVYGYSLAFGEGNNFFGNINWLMLKNIELTAVMGSIYQYIHVAFQGSFACITVGLIVGALAERIRFSAVLIFVVVWLTLSYIPIAHMVWGGGLLASHGALDFAGGTVVHINAAIAGLVGAYLPHNLPMVFTGTAILYIGWFGFNAGSAGTANEIAALAFVNTVVATAAAILGWIFGEWALRGKPSLLGACSGAIAGLVGVTPACGYIGVGGALIIGVVAGLAGLWGVTMPCDVFGVHGVCGIVGCIMTGIFAASSLGGVGFAEGVTMGHQLLVQLESIAITIVWSGVVAFIGYKLADLTVGLRVP
===>Molecule length: 362
===>Window size: 5
===>Sequence separation inferior: 0
===>Sequence separation superior: None
===>Mode: internal
===>Input kind: freecontact
cumulative ratio: 1.0
===>pair number: 362
=========>Window molecule generation: 0.0010023117065429688s.
======>cumulative assignment: 0.3800020217895508s.
===>total time: 0.38100433349609375s.
===>saving...
===>saved!