Gene regulatory network inference
Contents
Gene regulatory network inference#
import os
import re
import pandas as pd
from collections import defaultdict
from arboreto.algo import grnboost2, genie3
from arboreto.utils import load_tf_names
from distributed import LocalCluster, Client
tfdf = pd.read_csv("Auxiliary_File/Arabidopsis_TF and family.csv")
tf_names = list(set(tfdf['Protein ID'].values.tolist()))
len(tf_names)
2192
Uncut#
# Exp = pd.read_csv("1_Expression_data/Expr_Uncut.csv")
# Exp.T
ex_matrix = pd.read_csv("1_Expression_data/Expr_Uncut.csv", sep=',', index_col=0).T
ex_matrix.head()
Locus | AT1G01010 | AT1G01020 | AT1G01030 | AT1G01040 | AT1G01046 | AT1G01050 | AT1G01060 | AT1G01070 | AT1G01073 | AT1G01080 | ... | ATMG01330 | ATMG01340 | ATMG01350 | ATMG01360 | ATMG01370 | ATMG01380 | ATMG01390 | ATMG01400 | ATMG01410 | CFP |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
wolsc_kb2_4_10 | 7.702431 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 10.359552 |
wolsc_kb2_4_1 | 0.000000 | 8.378906 | 0.0 | 0.0 | 0.0 | 10.810870 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 |
wolsc_kb2_4_18 | 0.000000 | 0.000000 | 0.0 | 0.0 | 0.0 | 9.858738 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 |
wolsc_kb2_4_22 | 0.000000 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 6.388047 |
wolsc_kb2_4_26 | 0.000000 | 0.000000 | 0.0 | 0.0 | 0.0 | 5.779188 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 4.552472 |
5 rows × 32679 columns
%%time
# tfdf = pd.read_csv("masterTF-target.txt", sep="\t")
# tf_names = list(set(tfdf.TF.values.tolist()))
# len(tf_names)
# ex_matrix = pd.read_csv("1_Expression_data/GSE10576_Fe_arboreto.tsv", sep='\t')
# ex_matrix.head()
local_cluster = LocalCluster(n_workers=10,
threads_per_worker=1,
memory_limit=8e9)
custom_client = Client(local_cluster)
network = grnboost2(expression_data=ex_matrix,
tf_names=tf_names, verbose=True, client_or_address=custom_client)
network.to_csv('3_GRN_data/GSE74488_Uncut_arboreto_regnet.tsv', sep='\t', index=False)
network.head()
preparing dask client
parsing input
creating dask graph
10 partitions
computing dask graph
not shutting down client, client was created externally
finished
CPU times: total: 47min 22s
Wall time: 2h 5min 40s
TF | target | importance | |
---|---|---|---|
287 | AT1G34190 | AT1G54150 | 123.624165 |
280 | AT1G33240 | AT1G32640 | 118.439476 |
1375 | AT4G01120 | AT3G02180 | 116.013558 |
1837 | AT5G20900 | AT2G46140 | 115.321538 |
41 | AT1G04990 | AT2G42230 | 112.942165 |
3hpc#
ex_matrix = pd.read_csv("1_Expression_data/Expr_3hpc.csv", sep=',', index_col=0).T
ex_matrix.head()
Locus | AT1G01010 | AT1G01020 | AT1G01030 | AT1G01040 | AT1G01046 | AT1G01050 | AT1G01060 | AT1G01070 | AT1G01073 | AT1G01080 | ... | ATMG01330 | ATMG01340 | ATMG01350 | ATMG01360 | ATMG01370 | ATMG01380 | ATMG01390 | ATMG01400 | ATMG01410 | CFP |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
sc_1228_pa_30 | 0.000000 | 0.000000 | 0.0 | 0.0 | 0.0 | 5.380400 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 4.009010 |
wolsc_kb2_3_13 | 0.000000 | 7.092747 | 0.0 | 0.0 | 0.0 | 10.358681 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 |
wolsc_kb2_3_14 | 0.000000 | 5.949744 | 0.0 | 0.0 | 0.0 | 9.528396 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 6.615233 |
wolsc_kb2_3_2 | 3.829904 | 7.912041 | 0.0 | 0.0 | 0.0 | 10.317229 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 11.349404 |
wolsc_kb2_3_27 | 0.000000 | 0.000000 | 0.0 | 0.0 | 0.0 | 8.714814 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 8.626259 |
5 rows × 32679 columns
local_cluster = LocalCluster(n_workers=10,
threads_per_worker=1,
memory_limit=8e9)
custom_client = Client(local_cluster)
network = grnboost2(expression_data=ex_matrix,
tf_names=tf_names, verbose=True, client_or_address=custom_client)
network.to_csv('3_GRN_data/GSE74488_3hpc_arboreto_regnet.tsv', sep='\t', index=False)
network.head()
preparing dask client
parsing input
creating dask graph
10 partitions
computing dask graph
2022-09-26 15:38:38,710 - distributed.scheduler - WARNING - Worker failed to heartbeat within 300 seconds. Closing: <WorkerState 'tcp://127.0.0.1:59010', name: 3, status: running, memory: 1294, processing: 2851>
2022-09-26 15:38:40,154 - distributed.scheduler - WARNING - Worker failed to heartbeat within 300 seconds. Closing: <WorkerState 'tcp://127.0.0.1:59016', name: 2, status: running, memory: 1290, processing: 4233>
2022-09-26 15:38:43,509 - distributed.scheduler - WARNING - Worker failed to heartbeat within 300 seconds. Closing: <WorkerState 'tcp://127.0.0.1:59022', name: 0, status: running, memory: 1312, processing: 2648>
2022-09-26 15:38:44,025 - distributed.scheduler - WARNING - Worker failed to heartbeat within 300 seconds. Closing: <WorkerState 'tcp://127.0.0.1:59028', name: 1, status: running, memory: 1181, processing: 4211>
2022-09-26 15:38:44,917 - distributed.scheduler - WARNING - Worker failed to heartbeat within 300 seconds. Closing: <WorkerState 'tcp://127.0.0.1:59034', name: 5, status: running, memory: 861, processing: 5912>
2022-09-26 15:38:46,231 - distributed.scheduler - WARNING - Received heartbeat from unregistered worker 'tcp://127.0.0.1:59010'.
2022-09-26 15:38:46,237 - distributed.scheduler - WARNING - Received heartbeat from unregistered worker 'tcp://127.0.0.1:59028'.
2022-09-26 15:38:46,240 - distributed.scheduler - WARNING - Received heartbeat from unregistered worker 'tcp://127.0.0.1:59034'.
2022-09-26 15:38:46,248 - distributed.scheduler - WARNING - Received heartbeat from unregistered worker 'tcp://127.0.0.1:59022'.
2022-09-26 15:38:46,252 - distributed.scheduler - WARNING - Received heartbeat from unregistered worker 'tcp://127.0.0.1:59016'.
2022-09-26 15:38:49,243 - distributed.nanny - WARNING - Restarting worker
2022-09-26 15:38:49,265 - distributed.nanny - WARNING - Restarting worker
2022-09-26 15:38:49,274 - distributed.nanny - WARNING - Restarting worker
2022-09-26 15:38:49,286 - distributed.nanny - WARNING - Restarting worker
2022-09-26 15:38:49,957 - distributed.nanny - WARNING - Restarting worker
not shutting down client, client was created externally
finished
TF | target | importance | |
---|---|---|---|
421 | AT1G62990 | AT4G09990 | 123.463492 |
421 | AT1G62990 | AT3G18660 | 96.355794 |
902 | AT2G42680 | AT4G32470 | 91.878618 |
2006 | AT5G49450 | AT5G49448 | 89.479224 |
660 | AT2G18160 | AT2G18162 | 88.655565 |
END