This web-site was developed to support our publication:

Neil R Clark, Ruth Dannenfelser, Christopher M Tan, Michael E Komosinski and Avi Ma'ayan
Sets2Networks: network inference from repeated observations of sets
BMC Systems Biology 6, 89 (2012) PMID: 22824380.

Please cite our paper if you are using our algorithm and or tool.

Abstract

Background
The skeleton of complex systems can be represented as networks where vertices represent entities, and edges represent the relations between these entities. Often it is impossible, or expensive, to determine the network structure by experimental validation of the binary interactions between every vertex pair. It is usually more practical to infer the network from surrogate observations. Network inference is the process by which an underlying network of relations between entities is determined from indirect evidence. While many algorithms have been developed to infer networks from quantitative data, less attention has been paid to methods which infer networks from repeated observations of related sets. This type of data is ubiquitous in the field of systems biology and in other areas of complex systems research, hence such methods would be of great utility and value.

Results
Here we present a general method for network inference from repeated observations of sets of related entities. Given experimental observations of such sets, we infer the underlying network connecting these entities by generating an ensemble of networks consistent with the data. The frequency of occurrence of a given link throughout this ensemble is interpreted as the probability that the link is present in the underlying real network conditioned on the data. Exponential random graphs are used to generate and sample the ensemble of consistent networks, and we take an algorithmic approach to numerically executing the inference method. The effectiveness of the method is demonstrated on synthetic data before employing this inference approach to problems in systems biology and systems pharmacology, as well as to construct a co-authorship collaboration network. We predict direct protein-protein interactions from high-throughput mass-spectrometry proteomics; build networks that connect pluripotency regulators based on ChIP-seq and loss-of-function/gain-of-function followed by expression data; extract a network that connects 53 cancer drugs to each other and to 34 severe adverse events by mining the FDA's Adverse Events Reporting Systems (AERS); and construct a co-authorship network that connects Mount Sinai School of Medicine investigators. The predicted networks and online software to create networks from entity-set libraries are provided online at http://www.maayanlab.net/S2N.

Conclusions
As empirical data about sets of related entities accrues, there are more constraints on possible network realizations that can fit the data; in the language of statistical mechanics, the size of the microstate ensemble shrinks, until the underlying network resolves. The network inference method presented here can be applied to resolve different types of networks in current systems biology and systems pharmacology as well as in other fields of research.

Presentations and a Poster

Powerpoint slides describing the project presented by Dr. Neil R. Clark at the SBBQ International Conference at Iguassu, Brazil on 5/23/2012

Powerpoint slides describing the project presented by Professor Avi Ma'ayan at the National Systems Biology Centers's Annual Meeting in Chicago, USA 7/20/2012

Poster describing the project presented by Professor Avi Ma'ayan at the National Systems Biology Centers's Annual Meeting in Chicago, USA 7/20/2012

Workflow of the Algorithm Applied to a Synthetic Network

Original Network

Random Walks (Gene Sets)

Inferred Network

The synthetic network is first converted into gene sets by following a series of random walks. After running Sets2Networks on the file an inferred network is derived which closely resembles the synthetic network.

Download the original synthetic network or the gene sets.

Applications of the Method

Protein Protein Interactions

The highest confidence inferred PPI interactions. White edges are confirmed interactions and dark edges are predictions. Edge weight corresponds to the probability of the prediction.

Click on the image to view an interactive version of the network.
Download the interactions.

Drug and Side Effect Interactions

Inferred interactions between side effects and drugs using data from the FDA's Adverse Reporting System (AERS). Light brown square nodes are side effects and dark brown circle nodes are drugs. Edges between side effects are colored red, edges between drugs are colored white, and side effect-drug interactions are black.

Click on the image to view an interactive version of the network.
Download the interactions.

Mount Sinai Co-Authorship

Inferred interactions between researchers at Mount Sinai School of Medicine using co-authorship data from PubMed. Only the latest 5,000 PubMed articles affilated with Mount Sinai School of Medicine were used as input. Predicted edges with scores higher than 0.67 were preserved in the network, giving a sparse snapshot of collaborations.

Click on the image to view an interactive version of the network.
Download the interactions.

Stem Cell Networks

ChIP-X

LoGoF

Consensus

The ChIP-X network is made of the highest confidence interactions inferred from stem cell ChIP-chip and ChIP-seq experiments. The stem cell data covers 203,192 protein-DNA binding interactions in proximity to the coding regions of 48 ESC transcriptional regulators. Similarly, the LoGoF network is derived from 153,920 stem cell protein-mRNA interactions extracted from loss of function, gain of function studies followed by microarray profiling. The consensus network is inferred from a combination of these two networks.

Click on the images to view an interactive version of the network.
Download the ChIP-X , LoGoF, and consensus interactions.
The original stem cell data can be found in the ESCAPE database.

Predicting New Interactions for Protein Complexes in CORUM

We applied the S2N algorithm to predict new protein-protein interactions for 50 CORUM complexes. The higher the confidence of the prediction the lighter the color in the left heatmap. The heatmap on the right contains known PPI interactions in the background.

20S Methylosome	20S Proteasome	26S Proteasome	Anti-BHC110	Anti-Sm
Anti-SMN	ARC-L	BARD1-BRCA1-CSTF	BHC110	BLM
BRAF53-BRCA2	BRD4	Brg1-associated	BRG1-SIN3A	BRG1-SIN3A-HDAC
BRM-SIN3A	CDCA5-PDS5A-RAD21-SMC1A-PDS5B-SMC3	CoREST-HDAC	CtBP core	CtBP
EIF	EIF3	EIF3-2	EIF3S	HES1
Integrator	Integrator-RNAPII	LARC	LSD1	MCM
Mediator	NCOR	NCOR2	NCOR-HDAC3	NCOR-SIN3-HDAC-HESX1
NRD	p300-CBP-p270-SWI	PA28-20S Proteasome	PA28gamma-20S Proteasome	PC2
Polyadenylation	PTIP-DNA Damage Repair	SMN	SMRT	Sororin-cohesin
TFIIH	TFIIH 2	TFIIH 3	TFIIH core	TFTC