You are given a large collection of gene sets with no annotations. Your goal is to construct a network from this gene sets to predict protein-protein interactions. The algorithm you develop must be self-contained. This means you can’t use as part of the executable program any prior knowledge about known protein-protein interactions (PPI). The computer program that you will develop to predict protein-protein interactions needs to be a UNIX based command line executable that takes as an input the file containing the gene sets and outputs the top 5000 predicted PPI. You are provided a network of known PPI to test your program but we will use other networks to benchmark the different submissions. Your code needs to be open source upon completion of the task deadline and documentation of the inner workings of the code needs to be provided as part of the submission process of this task.


