CentralityCosDist Tutorial#
The CentralityCosDist has following steps:#
The CentralityCosDist algorithm takes a network and a list of seeds as input.
It calculates the centrality of each node in the network using multiple centrality measures.
It then calculates the cosine similarity between each seed and all other nodes.
It calculates the mean similarity of all nodes from all seed nodes.
It ranks all nodes based on the mean similarity score.
It sorts the rankings and returns them.
The following the pseudo code of CentralityCosDist#
# CentralityCosDist
# Input: Network, seeds
# Output: Rankings of nodes
# Initialize rankings
rankings = []
# For each seed:
for seed in seeds:
# Perform multiple centrality analyses on the network
centralities = [
centrality(network, seed)
for centrality in CENTRALITY_MEASURES
]
# Determine the cosine similarity among seeds
seed_similarities = cosine_similarity(centralities)
# (Optional) eliminate seeds that are highly dissimilar to the majority of the other seeds
if ELIMINATE_SEEDS:
seed_similarities = eliminate_seeds(seed_similarities)
# Determine the cosine similarity between the chosen seed and all other nodes
node_similarities = cosine_similarity(centralities, seed)
# Calculate the mean similarity of all nodes from all seed nodes
mean_similarity = mean_similarity(node_similarities)
# Rank all nodes based on the mean similarity score
rankings.append(mean_similarity)
# Sort the rankings
rankings.sort()
# Return the rankings
return rankings
Network centrality analysis#
Network centrality analysis is a way of measuring the importance of nodes in a network. There are many different centrality measures, but some of the most common ones include:
Degree centrality: This measures how many nodes a node is connected to.
Betweenness centrality: This measures how often a node lies on the shortest path between other nodes.
Closeness centrality: This measures how close a node is to all other nodes.
There are many common tools for network centrality analysis. Some of the most popular ones include:
NetworkX is a Python library for analyzing graphs and networks. It has a number of functions that can be used to calculate centrality measures, visualize networks, and perform other network analysis tasks.
Gephi is an open source software for visualizing and analyzing networks. It has a user-friendly interface that makes it easy to create and explore networks.
Cytoscape is an open source software for visualizing and analyzing networks. It has a variety of features that make it a powerful tool for network analysis.
R is a programming language and environment for statistical computing and graphics. It has a number of packages (e.g: igraph) that can be used for network analysis.
MATLAB is a programming language and environment for scientific computing. It has a number of functions that can be used for network analysis.
Here is how to calculate multiple centrality analysis of a graph and export as csv file using NetworkX:
import networkx as nx
# Create a graph
graph = nx.Graph()
# Add some nodes
graph.add_nodes([1, 2, 3, 4])
# Add some edges
graph.add_edges([1, 2], [2, 3], [3, 4])
# Calculate the centrality measures
centralities = {}
for centrality_measure in ['degree', 'betweenness', 'closeness']:
centrality = nx.centrality.calculate(graph, centrality_measure)
centrality_measures.update({centrality_measure: centrality})
# Write the centrality measures to a CSV file
with open('centralities.csv', 'w') as csvfile:
writer = csv.writer(csvfile, delimiter=',')
writer.writerows([centrality_measures.items() for centrality_measures in centrality_measures.items()])
This will create a CSV file called centralities.csv that contains the following columns:
Node
Degree Centrality
Betweenness Centrality
Closeness Centrality
Following is the Network centrality file will used in this tutorial.
import pandas as pd
from IPython.display import HTML
Network_Centrality_File = "data/Network_Centrality.csv"
Seed_File = "data/Seeds.tsv"
df_centralites = pd.read_csv(Network_Centrality_File)
display(df_centralites.head(5))
| ID | Information_centrality | Degree_centrality | Betweenness_centrality | Eigenvector_centrality | Closeness_centrality | clustering_coefficient | Load_centrality | Page_rank | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | AT5G13650 | 0.000147 | 0.000369 | 0.000509 | 1.836035e-07 | 0.053561 | 0.000000 | 0.000509 | 0.000133 |
| 1 | AT5G65360 | 0.000243 | 0.007556 | 0.001312 | 1.591563e-01 | 0.054834 | 0.273171 | 0.001209 | 0.000353 |
| 2 | AT5G14030 | 0.000105 | 0.000184 | 0.000000 | 8.635333e-13 | 0.037856 | 0.000000 | 0.000000 | 0.000076 |
| 3 | AT3G48070 | 0.000242 | 0.002396 | 0.000404 | 1.440151e-06 | 0.057145 | 0.000000 | 0.000461 | 0.000247 |
| 4 | AT4G35590 | 0.000099 | 0.000369 | 0.000170 | 1.255741e-09 | 0.041956 | 0.000000 | 0.000170 | 0.000181 |
Seed node list#
Seed nodes are a subset of nodes in a network that are used to start the ranking process. The algorithm then ranks the remaining nodes based on their relationship to the seed nodes.
Seed nodes can be chosen in a variety of ways. Some common methods include:
Choosing the nodes with the highest biological process significance.
Choosing the nodes with the highest degree centrality.
Choosing the nodes with the highest betweenness centrality.
Choosing the nodes that are connected to the most other nodes.
Choosing the nodes that are connected to the most important nodes.
The choice of seed nodes can have a significant impact on the accuracy of the ranking algorithm.
Before we move forward, we need to filter out seed nodes for which we don't have centrality information. This is because the centrality measures are used to rank the nodes, and we can't rank a node if we don't have any information about its centrality. We can filter out the seed nodes by simply removing them from the network. This will ensure that the remaining nodes all have centrality information, and that the ranking algorithm will be able to rank them accurately.
Seeds = set(open(Seed_File).read().splitlines()[1:]) # [1:] to remove header
Seeds
{'AT1G09100',
'AT1G09770',
'AT1G63290',
'AT3G01850',
'AT3G03900',
'AT3G05530',
'AT3G51840',
'AT5G08670',
'AT5G17310',
'ATCG00480'}
Nodes = set(df_centralites.ID.to_list())
Seeds = list(Nodes.intersection(Seeds))
Seeds
['AT3G05530',
'AT3G03900',
'AT5G08670',
'AT5G17310',
'AT3G01850',
'AT3G51840',
'AT1G63290',
'AT1G09770',
'AT1G09100',
'ATCG00480']
CentralityCosDist#
Load CentralityCosDist and create new instance of CentralityCosDist#
from centralitycosdist import CentralityCosDist
algorithm = CentralityCosDist(Centrality_file=Network_Centrality_File)
Execute CentralityCosDist#
algorithm.run(seed_nodes=Seeds)
Get ranks#
df_rank = algorithm.rank
display(df_rank.head(10))
ID
AT3G03900 1.0
AT1G09100 2.0
AT3G51840 3.0
AT3G05530 4.0
AT5G17310 5.0
ATCG00480 6.0
AT5G08670 7.5
AT5G08680 7.5
AT5G08690 9.0
AT5G19680 10.0
Name: Rank, dtype: float64
Get similarity score#
display(algorithm.similarity_score.head(10))
ID
AT3G03900 0.984956
AT1G09100 0.984796
AT3G51840 0.983411
AT3G05530 0.981703
AT5G17310 0.977869
ATCG00480 0.975787
AT5G08670 0.973695
AT5G08680 0.973695
AT5G08690 0.971715
AT5G19680 0.970025
Name: Similarity_score, dtype: float64
Checkout ranks of seed nodes#
display(df_rank.loc[list(Seeds)])
ID
AT3G05530 4.0
AT3G03900 1.0
AT5G08670 7.5
AT5G17310 5.0
AT3G01850 11.5
AT3G51840 3.0
AT1G63290 11.5
AT1G09770 25.0
AT1G09100 2.0
ATCG00480 6.0
Name: Rank, dtype: float64
import session_info
session_info.show()
Click to view session information
----- centralitycosdist 0.1.1 pandas 1.5.3 session_info 1.0.0 -----
Click to view modules imported as dependencies
asttokens NA backcall 0.2.0 colorama 0.4.6 comm 0.1.3 cython_runtime NA dateutil 2.8.2 debugpy 1.5.1 decorator 5.1.1 executing 1.2.0 ipykernel 6.20.2 ipython_genutils 0.2.0 jedi 0.18.2 joblib 1.2.0 jupyter_server 1.23.6 networkx 2.8.4 numpy 1.24.2 packaging 23.0 parso 0.8.3 pexpect 4.8.0 pickleshare 0.7.5 pkg_resources NA platformdirs 3.2.0 prompt_toolkit 3.0.38 psutil 5.9.0 ptyprocess 0.7.0 pure_eval 0.2.2 pydev_ipython NA pydevconsole NA pydevd 2.6.0 pydevd_concurrency_analyser NA pydevd_file_utils NA pydevd_plugins NA pydevd_tracing NA pygments 2.14.0 pytz 2023.2 scipy 1.10.1 setuptools 65.6.3 six 1.16.0 sklearn 1.2.2 sphinxcontrib NA stack_data 0.6.2 threadpoolctl 3.1.0 tornado 6.2 traitlets 5.9.0 typing_extensions NA wcwidth 0.2.6 zmq 23.2.0 zoneinfo NA
----- IPython 8.11.0 jupyter_client 8.1.0 jupyter_core 5.3.0 jupyterlab 3.5.0 notebook 6.5.3 ----- Python 3.11.0 (main, Mar 1 2023, 18:26:19) [GCC 11.2.0] Linux-5.15.90.1-microsoft-standard-WSL2-x86_64-with-glibc2.31 ----- Session information updated at 2023-03-27 16:58
🔚