4.5.3 Bulk RNA seq
tresor.gene.simu_umi_len
is a Python function in charge of simulating reads with respect to a series of UMI lengths at the bulk RNA-seq level.
Usage¶
We take the following command as an example to generate FastQ files with UMIs of lengths from 7 to 36.
import tresor as ts
for perm_i in range(1):
print(perm_i)
ts.gene.simu_umi_len(
# initial sequence generation
gspl=gspl,
len_params={
'umi': {
'umi_unit_pattern': 3,
'umi_unit_lens': np.arange(7, 36 + 1, 1),
},
'seq': 100,
},
material_params={
'fasta_cdna_fpn': to('data/Homo_sapiens.GRCh38.cdna.all.fa.gz'), # None False
},
seq_num=50,
working_dir=to('data/simu/docs/'),
condis=['umi', 'seq'],
sim_thres=3,
permutation=0,
# PCR amplification
ampl_rate=0.9,
err_route='sptree', # bftree sptree err1d err2d mutation_table_minimum mutation_table_complete
pcr_error=1e-4,
pcr_num=10,
err_num_met='nbinomial',
# PCR amplification
seq_error=0.01,
seq_sub_spl_number=200, # None
# seq_sub_spl_rate=0.333,
use_seed=True,
seed=1,
verbose=False, # True False
mode='short_read', # long_read short_read
sv_fastq_fp=to('data/simu/docs/'),
)
1 2 3 4 5 6 7 8 9 |
|
Attributes¶
Illustration
Attribute | Description |
---|---|
seq_num |
number of RNA molecules. 50 by default |
len_params |
lengths of different components of a read. Lengths of UMIs vary in a range |
seq_params |
sequences of different components of a read, It allows users to add their customised sequences |
material_params |
a Python dictionary. Showing if cDNA libraries are provided, please use key word fasta_cdna_fpn . The human cDNA library can be downloaded through the Ensembl genome database |
ampl_rate |
float number ranging from 0 to 1 |
err_route |
the computational algorithm to generate errors. There are 6 methods, including bftree , sptree , err1d , err2d , mutation_table_minimum , and mutation_table_complete . |
pcr_error |
PCR error rate |
pcr_num |
number of PCR cycles to amplify reads |
err_num_met |
the method to generate errors, that is, binomial or nbinomial |
seq_error |
sequencing error rate |
seq_sub_spl_number |
number of subsampling PCR amplified reads. It exists when seq_sub_spl_rate is specified to None |
seq_sub_spl_rate |
rate of subsampling PCR amplified reads. It exists when seq_sub_spl_number is specified to None |
sv_fastq_fp |
folder to save FastQ files |
is_seed |
if seeds are used to simulate sequencing libraries. This is designed to make in silico experiments reproducible |
working_dir |
working directory where all simulation results are about to be saved |
condis |
names of components that a read contains. It can contains an unlimited number of read components |
sim_thres |
similarity threshold. 3 by default |
permutation |
permutation times |
mode |
long_read or short_read |
verbose |
whether to print intermediate results |
Attribute | Description |
---|---|
cfpn |
location to the yaml configuration file. Users can specify the atrributes illustrated on the Python tab in the .yml file. |
snum |
number of sequencing molecules |
permut |
permutation times |
sthres |
similarity threshold. 3 by default |
wd |
working directory where all simulation results are about to be saved |
md |
long_read or short_read mode |
is |
if seeds are used to simulate sequencing libraries. This is designed for reproducible in silico experiments |
vb |
whether to print intermediate results |
Output¶
Console¶
======>simulation completes in 2.8730275630950928s
======>simulation completes in 2.9519758224487305s
======>simulation completes in 2.930018663406372s
======>simulation completes in 3.0289931297302246s
======>simulation completes in 3.19893479347229s
======>simulation completes in 3.496039390563965s
======>simulation completes in 3.4660286903381348s
======>simulation completes in 3.4059395790100098s
======>simulation completes in 3.371000051498413s
======>simulation completes in 3.4959704875946045s
======>simulation completes in 3.7300221920013428s
======>simulation completes in 3.6379969120025635s
======>simulation completes in 3.91005277633667s
======>simulation completes in 4.011698961257935s
======>simulation completes in 4.11646032333374s
======>simulation completes in 4.311992883682251s
======>simulation completes in 4.139020919799805s
======>simulation completes in 4.244063138961792s
======>simulation completes in 4.463680982589722s
======>simulation completes in 4.64800238609314s
======>simulation completes in 4.433006048202515s
======>simulation completes in 5.417316675186157s
======>simulation completes in 4.551185369491577s
======>simulation completes in 4.674246311187744s
======>simulation completes in 5.081998348236084s
======>simulation completes in 4.961503744125366s
======>simulation completes in 5.0400168895721436s
======>simulation completes in 5.126001596450806s
======>simulation completes in 4.928617477416992s
======>simulation completes in 5.1075379848480225s
Finished!
Understanding files¶
The resultant files of the simulated reads are shown as follows.

It is worth mentioning the libraries of UMIs with different lengths rendered in the following form.
