The extraction of targeted subnetworks is a powerful way to identify functional modules and pathways within complex networks. Here, we present SubNet, a Java-based standalone program for extracting subnetworks given a basal network and a set of selected nodes. Designed with both graphical user interface (GUI) and command line user interface (CUI), SubNet combines four different extraction methods, which offers the possibility to interrogate a biological network according to the question investigated.
SubNet: a Java application for subnetwork extraction. Zhang Q, Zhang ZD (2013) Bioinformatics 29(19):2509-2511. pubmed reprint
» SubNet
» Network
Source | Version | Edge weights? | |
HPRD | 9 | / | No |
STRING | 9.05 | Yes | No |
GeneMANIA (Human) | Oct 2012 | ||
Coexpression (Shaykhiev,Crystal,2011) | Yes | No | |
Coexpression (Wang,Crystal,2011) | Yes | No | |
Coexpression (Barnes,Colbert,2009) | Yes | No | |
Coexpression (Tilley,Crystal,2011) | Yes | No | |
Coexpression (Barnes,Glass,2010) | Yes | No | |
Colocalization (Johnson,Shoemaker,2003) | Yes | No | |
Colocalization (Schadt,Shoemaker,2004) | Yes | No | |
Physical interaction (IREF.OPHID) | Yes | No | |
Physical interaction (IREF.SmallScaleStudies) | Yes | No | |
Physical interaction (IREF.HPRD) | Yes | No | |
Physical interaction (IREF.GRID) | Yes | No | |
Physical interaction (BIOGRID) | Yes | No | |
Notes: (1) A large number of coexpression and physical interaction networks are available at GeneMANIA. We only list the largest five of each type. (2) All networks are undirected. |
» System requirements
» Input files
» Output files
Output file for | Generated by | Tab-delimited columns | ||
Network (.top) | All methods | Node1 name | Node2 name | |
Edges | Shortest path | Node1 name | Node2 name | Betweenness |
All other | Node1 name | Node2 name | ||
Nodes | Shell | Node name | Its shell number | Its degree |
Shortest path | Node name | Its betweenness | Its degree | |
Emission decay | Node name | Its decay value | ||
PageRank | Node name | Its PageRank score | Its degree | |
Notes: (1) The network file can also contain orphan nodes, which are not linked to any other nodes. (2) The node degree is its degree in the subnetwork. |
» Graphical user interface
» Command line user interface
Formally, given a network G = (V, E) with its ensemble of n vertices V and edges E and a set of preselected nodes S, SubNet constructs a subnetwork W based on S so that W is a subset of G.
» Extraction by shell
A node in a network has neighbors on different levels: one edge away, on the so-called first shell, are the direct neighbors, two edges away are the indirect neighbors on the second shell, and so on. Directly based on the notion of "guilty by association" – the closest molecules are highly likely to be related to each other and involved in the same pathways and mechanisms (Schwikowski et al., 2000), this method extracts the neighbors of each of preselected nodes within a certain distance. Given the set of selected genes S = {vi}, the set of nodes within the k-th shell neighborhood, T, is
in which {vi}j is the set of neighbors of vi on the j-th shell. The subnetwork W is thus composed of the nodes in T and edges among them transferred from G.
» Extraction by shortest path
To make use of indirect interactions, one can take higher-order neighborhoods into consideration. In a complex molecular interaction network, given a pair of connected biomolecules, there are almost always a large number of routes to reach one node from the other. The shortest path between them is of particular interest and significance, as it is assumed to be the route used most for information communication between the two molecules and has been shown to provide a good measure for functional relatedness among genes (Sharan, et al., 2007; Managbanag et al., 2008). We use the shortest path in our second method – if the shortest path between two genes includes at least one transitive gene it is postulated that the transitive genes on the path will be involved in the same biological process as the terminal genes. To find the shortest path between two nodes, we use Dijkstra's algorithm (Dijkstra, 1959), which constructs a shortest-path tree from a selected node to every other node in the network. Because the shortest path between two nodes may not be unique, we will recover all the shortest paths if there is more than one. Given the set of selected genes, S = {vi}, the set of nodes on the shortest paths among them, T, is
in which {vi <-> vj} is the set of nodes on the shortest path between and including vi and vj. The subnetwork, W, is thus composed of the nodes in T and edges among them transferred from G.
» Extraction by emission decay
Given the network connectivity, the preselected seeds can be considered as heat sources, emitting functionality along the edges to other nodes in the network. Such emission decays as the distance increases, and the total retainment at each node (excluding seeds) is the sum of functionality transmitted from all seeds:
in which di and wi are the distance and the total edge weight, respectively, between the node and the seed i and λ is the exponent parameter whose value (λ = 1, 2, or 3) is selected by the user. The subnetwork, W, is thus composed of the nodes whose functionality retainment is greater than a preset threshold and edges among them transferred from G.
Each node sees its decay value updated after each seed is being considered, with the initial decay value at 0 before the first seed. Once every seed has been explored, the final decay value of each node is then compared to a threshold τ, which is initially set by the user. The resulting subnetwork W consists then in the nodes with decay value higher than τ and the edges connecting them.
» Extraction by PageRank
We also developed a method based on the node centrality/influence and adapted the Google PageRank algorithm (Page et al., 1999) to incorporate preselected nodes. Let A be the adjacency matrix of network G with:
in which wij is the weight of the edge between nodes i and j in a undirected network or from node i to j in a directed network (wij = 1 if the weight is unspecified). For nodes that are preselected as seeds, values in their corresponding columns in A are increased: aij = c·wij (c > 1). SubNet uses 100 for c.
The centrality/influence of each node (pi) is the sum of the influence it receives from each of its source node (pj) divided by the number of out-edges from that source node (oj):
It can be written in the matrix form as:
This is the characteristic equation of the eigensystem, and the solution to vector P is an eigenvector of B with the corresponding eigenvalue of 1. However, if B is a stochastic matrix that is also irreducible and aperiodic, 1 is the largest eigenvalue and P is the principal eigenvector (Perron–Frobenius theorem). To make such a matrix, we first replace every cell of a row with 1/n if that row contains only 0 and then add a link from each node to every node with a small transition probability. This operation generates a new matrix C:
in which E is an n×n matrix of ones and d is a parameter controlling the weight of the E component in C. To obtain P, we use the power iteration method (von Mises et al., 1929), which can be applied to large sparse matrices of biological networks:
.
The resulting subnetwork W consists of the nodes with values in P higher than a preselected thresholdand the edges connecting them.
Schwikowski, B., Uetz, P. and Fields, S. (2000) A network of protein-protein interactions in yeast. Nat Biotechnol, 18, 1257-1261.
Sharan, R., Ulitsky, I. and Shamir, R. (2007) Network-based prediction of protein function. Mol Syst Biol, 3, 88-100.
Managbanag, J.R., Witten, T.M., Bonchev, D., Fox, L.A., Tsuchiya, M., Kennedy, B.K., Kaeberlein, M. (2008) Shortest-Path Network Analysis Is a Useful Approach toward Identifying Genetic Determinants of Longevity. PLoS ONE, 3, e3802.
Dijkstra, E. W. (1959) A note on two problems in connexion with graphs, Numerische Mathematik, 1, 269-271.
Page, L., Brin, S., Motwani, R. and Winograd, T. (1999) The PageRank citation ranking: Bringing order to the Web.
von Mises, R. and Pollaczek-Geiringer H. (1929) Praktische Verfahren der Gleichungsauflösung. ZAMM, Zeitschrift für Angewandte Mathematik und Mechanik, 9, 152-164.