SNAP : Stanford Network Analysis Platform
Stanford Network Analysis Platform (SNAP) is a general purpose, high performance system for analysis and manipulation of large networks. SNAP is written in C++ and it scales to massive graphs with hundreds of millions of nodes and billions of edges.
Compiling on Mac OSX
To compile on Mac OSX, using Xcode:
 From the Toolbar, select Scheme (e.g. ‘bigclam’).
 Product > Build. (or Cmd + B).

Run executable via the command line; or Choose the scheme’s executable (Product > Edit Scheme > Run > Info) and run: Product > Run (or Cmd + R). Note: If using Gnuplot, add the PATH to the scheme’s environment variables. or create symlink to /usr/bin:
sudo ln s <gnuplot_dir>/gnuplot /usr/bin/
For code completion, the “docs” target has been created which includes all Snaprelated files and example programs.
SNAP tutorials
Sample programs demonstrating the use of foundational SNAP classes and functionality are available in the tutorials directory.
To compile all the tutorials, execute the following command line:
cd tutorials; make all # generates all the tutorials
SNAP unit tests
Unit tests are available in the test directory.
To run unit tests, install googletest (code.google.com/p/googletest) and execute:
cd test; make run # compiles and runs all the tests
SNAP Package Description
SNAP Directory Structure
The SNAP package contains the following directories:
 snapcore:the core SNAP graph library;
 snapadv:advanced SNAP components, not in the core, but used by examples;
 snapexp:experimental SNAP components, still in development;
 examples:small sample applications that demonstrate SNAP functionality;
 tutorials:simple programs, demonstrating use of various classes;
 glibcore: STLlike library that implements basic data structures, like vectors (TVec), hashtables (THash) and strings (TStr), provides serialization and so on;
 test:unit tests for various classes;
 doxygen:SNAP reference manuals.
Example Applications
 agmfit: Detects network communities from a given network by fitting the Affiliation Graph Model (AGM) to the given network by maximum likelihood estimation.
 agmgen: Implements the Affiliation Graph Model (AGM). AGM generates a realistic looking graph from the community affiliation of the nodes.
 bigclam: Formulates community detection problems into nonnegative matrix factorization and discovers community membership factors of nodes.
 cascades: Simulates a SI (susceptibleinfected) model on a network and computes structural properties of cascades.
 centrality: Computes node centrality measures for a graph: closeness, eigen, degree, betweenness, page rank, hubs and authorities.
 cliques: Finds overlapping dense groups of nodes in networks, based on the Clique Percolation Method.
 community: Implements network community detection algorithms: GirvanNewman and ClausetNewmanMoore.
 concomp: Computes weakly, strongly and biconnected connected components, articulation points and bridge edges of a graph.
 forestfire: Generates graphs using the Forest Fire model.
 graphgen: Generates undirected graphs using one of the many SNAP graph generators.
 graphhash: Demonstrates the use of TGHash graph hash table, useful for counting frequencies of small subgraphs or information cascades.
 infopath: Implements stochastic algorithm for dynamic network inference from cascade data, see Structure and Dynamics of Information Pathways in OnLine Media
 kcores: Computes the kcore decomposition of the network and plots the number of nodes in a kcore of a graph as a function of k.
 kronem: Estimates Kronecker graph parameter matrix using EM algorithm.
 kronfit: Estimates Kronecker graph parameter matrix.
 krongen: Generates Kronecker graphs.
 magfit: Estimates Multiplicative Attribute Graph (MAG) model parameter.
 maggen: Generates Multiplicative Attribute Graphs (MAG).
 mkdatasets: Demonstrates how to load different kinds of networks in various network formats and how to compute various statistics of the network.
 motifs: Counts the number of occurence of every possible subgraph on K nodes in the network.
 ncpplot: Plots the Network Community Profile (NCP).
 netevol: Computes properties of an evolving network, like evolution of diameter, densification power law, degree distribution, etc.
 netinf: Implements netinf algorithm for network inference from cascade data, see Inferring Networks of Diffusion and Influence
 netstat: Computes structural properties of a static network, like degree distribution, hop plot, clustering coefficient, distribution of sizes of connected components, spectral properties of graph adjacency matrix, etc.
 testgraph: Demonstrates some of the basic SNAP functionality.
Functionality of the Core SNAP Library
Brief description of SNAP functionality implemented in various files in the distribution package:
snapcore  

alg.h  Simple algorithms like counting node degrees, simple graph manipulation (adding/deleting self edges, deleting isolated nodes) and testing whether graph is a tree or a star. 
anf.h  Approximate Neighborhood Function: linear time algorithm to approximately calculate the diameter of massive graphs. 
bfsdfs.h  Algorithms based on Breath First Search (BFS) and Depth First Search (DFS): shortest paths, spanning trees, graph diameter, and similar. 
bignet.h  Memory efficient implementation of a network with data on nodes. Use when working with very large networks. 
centr.h  Node centrality measures: closeness, betweenness, PageRank, ... 
cmty.h  Algorithms for network community detection: Modularity, GirvanNewman, ClausetNewmanMoore. 
cncom.h  Connected components: weakly, strongly and biconnected components, articular nodes and bridge edges. 
ff.h  Forest Fire model for generating networks that densify and have shrinking diameters. 
gbase.h  Defines flags that are used to identify functionality of graphs. 
ggen.h  Various graph generators: random graphs, copying model, preferential attachment, RMAT, configuration model, Small world model. 
ghash.h  Hash table with directed graphs (TNGraph) as keys. Uses efficient adaptive approximate graph isomorphism testing to scale to large graphs. Useful when one wants to count frequencies of various small subgraphs or cascades. 
gio.h  Graph input output. Methods for loading and saving various textual and XML based graph formats: Pajek, ORA, DynNet, GraphML (GML), Matlab. 
graph.h  Implements graph types TUNGraph, TNGraph and TNEGraph. 
gstat.h  Computes many structural properties of static and evolving networks. 
gsvd.h  Eigen and singular value decomposition of graph adjacency matrix. 
gviz.h  Interface to Graphviz. 
kcore.h  Kcore decomposition of networks. 
network.h  Implements network types TNodeNet, TNodeEDatNet and TNodeEdgeNet. 
Snap.h  Main include file of the library. 
statplot.h  Plots of various structural network properties: clustering, degrees, diameter, spectrum, connected components. 
subgraph.h  Extracting subgraphs and converting between different graph/network types. 
timenet.h  Temporally evolving networks. 
triad.h  Functions for counting triads (triples of connected nodes in the network) and computing clustering coefficient. 
util.h  Utilities to manipulate PDFs, CDFs and CCDFs. Quick and dirty string manipulation, URL and domain manipulation routines. 
snapadv  
agm.h  Implements the Affiliation Graph Model (AGM). 
cliques.h  Maximal clique detection and Clique Percolation method. 
graphcounter.h  Performs fast graph isomorphism testing to count the frequency of topologically distinct subgraphs. 
kronecker.h  Kronecker Graph generator and KronFit algorithm for estimating parameters of Kronecker graphs. 
mag.h  Implements the Multiplicative Attribute Graph (MAG). 
ncp.h  Network community profile plot. Implements local spectral graph partitioning method to efficiently find communities in networks. 
subgraphenum.h  Subgraph enumeration and network motif computations. 
snapexp  
arxiv.h  Functions for parsing Arxiv data and standardizing author names. 
dblp.h  Parser for XML dump of DBLP data. 
imdbnet.h  Actorstomovies bipartite network of IMDB. 
mxdag.h  Finds the maximum directedacyclic subgraph of a given directed graph. 
signnet.h  Networks with signed (+1, 1) edges that can denote trust/distrust between the nodes of the network. 
sir.h  SIR epidemic model and SIR parameter estimation. 
spinn3r.h  Past parser for loading blog post data from Spinn3r. 
trawling.h  Algorithm of extracting bipartite cliques from the network. 
wgtnet.h  Weighted networks. 
wikinet.h  Networks based on Wikipedia. 
Stanford Large Network Dataset Collection
 Social networks : online social networks, edges represent interactions between people
 Networks with groundtruth communities : groundtruth network communities in social and information networks
 Communication networks : email communication networks with edges representing communication
 Citation networks : nodes represent papers, edges represent citations
 Collaboration networks : nodes represent scientists, edges represent collaborations (coauthoring a paper)
 Web graphs : nodes represent webpages and edges are hyperlinks
 Amazon networks : nodes represent products and edges link commonly copurchased products
 Internet networks : nodes represent computers and edges communication
 Road networks : nodes represent intersections and edges roads connecting the intersections
 Autonomous systems : graphs of the internet
 Signed networks : networks with positive and negative edges (friend/foe, trust/distrust)
 Locationbased online social networks : Social networks with geographic checkins
 Wikipedia networks and metadata : Talk, editing and voting data from Wikipedia
 Twitter and Memetracker : Memetracker phrases, links and 467 million Tweets
 Online communities : Data from online communities such as Reddit and Flickr
 Online reviews : Data from online review systems such as BeerAdvocate and Amazon
SNAP networks are also availalbe from UF Sparse Matrix collection. Visualizations of SNAP networks by Tim Davis.
Name  Type  Nodes  Edges  Description 

egoFacebook  Undirected  4,039  88,234  Social circles from Facebook (anonymized) 
egoGplus  Directed  107,614  13,673,453  Social circles from Google+ 
egoTwitter  Directed  81,306  1,768,149  Social circles from Twitter 
socEpinions1  Directed  75,879  508,837  Whotrustswhom network of Epinions.com 
socLiveJournal1  Directed  4,847,571  68,993,773  LiveJournal online social network 
socPokec  Directed  1,632,803  30,622,564  Pokec online social network 
socSlashdot0811  Directed  77,360  905,468  Slashdot social network from November 2008 
socSlashdot0922  Directed  82,168  948,464  Slashdot social network from February 2009 
wikiVote  Directed  7,115  103,689  Wikipedia whovotesonwhom network 
<h3>Networks with groundtruth communities</h3>
Name  Type  Nodes  Edges  Communities  Description 

comLiveJournal  Undirected, Communities  3,997,962  34,681,189  287,512  LiveJournal online social network 
comFriendster  Undirected, Communities  65,608,366  1,806,067,135  957,154  Friendster online social network 
comOrkut  Undirected, Communities  3,072,441  117,185,083  6,288,363  Orkut online social network 
comYoutube  Undirected, Communities  1,134,890  2,987,624  8,385  Youtube online social network 
comDBLP  Undirected, Communities  317,080  1,049,866  13,477  DBLP collaboration network 
comAmazon  Undirected, Communities  334,863  925,872  151,037  Amazon product network 
<h3>Communication networks</h3>
Name  Type  Nodes  Edges  Description 

emailEuAll  Directed  265,214  420,045  Email network from a EU research institution 
emailEnron  Undirected  36,692  367,662  Email communication network from Enron 
wikiTalk  Directed  2,394,385  5,021,410  Wikipedia talk (communication) network 
Name  Type  Nodes  Edges  Description 

citHepPh  Directed, Temporal, Labeled  34,546  421,578  Arxiv High Energy Physics paper citation network 
citHepTh  Directed, Temporal, Labeled  27,770  352,807  Arxiv High Energy Physics paper citation network 
citPatents  Directed, Temporal, Labeled  3,774,768  16,518,948  Citation network among US Patents 
<h3>Collaboration networks</h3>
Name  Type  Nodes  Edges  Description 

caAstroPh  Undirected  18,772  396,160  Collaboration network of Arxiv Astro Physics 
caCondMat  Undirected  23,133  186,936  Collaboration network of Arxiv Condensed Matter 
caGrQc  Undirected  5,242  28,980  Collaboration network of Arxiv General Relativity 
caHepPh  Undirected  12,008  237,010  Collaboration network of Arxiv High Energy Physics 
caHepTh  Undirected  9,877  51,971  Collaboration network of Arxiv High Energy Physics Theory 
Name  Type  Nodes  Edges  Description 

webBerkStan  Directed  685,230  7,600,595  Web graph of Berkeley and Stanford 
webGoogle  Directed  875,713  5,105,039  Web graph from Google 
webNotreDame  Directed  325,729  1,497,134  Web graph of Notre Dame 
webStanford  Directed  281,903  2,312,497  Web graph of Stanford.edu 
<h3>Product copurchasing networks</h3>
Name  Type  Nodes  Edges  Description 

amazon0302  Directed  262,111  1,234,877  Amazon product copurchasing network from March 2 2003 
amazon0312  Directed  400,727  3,200,440  Amazon product copurchasing network from March 12 2003 
amazon0505  Directed  410,236  3,356,824  Amazon product copurchasing network from May 5 2003 
amazon0601  Directed  403,394  3,387,388  Amazon product copurchasing network from June 1 2003 
amazonmeta  Metadata  548,552  1,788,725  Amazon product metadata: product info and all reviews on around 548,552 products. 
<h3>Internet peertopeer networks</h3>
Name  Type  Nodes  Edges  Description 

p2pGnutella04  Directed  10,876  39,994  Gnutella peer to peer network from August 4 2002 
p2pGnutella05  Directed  8,846  31,839  Gnutella peer to peer network from August 5 2002 
p2pGnutella06  Directed  8,717  31,525  Gnutella peer to peer network from August 6 2002 
p2pGnutella08  Directed  6,301  20,777  Gnutella peer to peer network from August 8 2002 
p2pGnutella09  Directed  8,114  26,013  Gnutella peer to peer network from August 9 2002 
p2pGnutella24  Directed  26,518  65,369  Gnutella peer to peer network from August 24 2002 
p2pGnutella25  Directed  22,687  54,705  Gnutella peer to peer network from August 25 2002 
p2pGnutella30  Directed  36,682  88,328  Gnutella peer to peer network from August 30 2002 
p2pGnutella31  Directed  62,586  147,892  Gnutella peer to peer network from August 31 2002 
Name  Type  Nodes  Edges  Description 

roadNetCA  Undirected  1,965,206  5,533,214  Road network of California 
roadNetPA  Undirected  1,088,092  3,083,796  Road network of Pennsylvania 
roadNetTX  Undirected  1,379,917  3,843,320  Road network of Texas 
<h3>Autonomous systems graphs</h3>
Name  Type  Nodes  Edges  Description 

as733 (733 graphs) 
Undirected  1036,474  24313,233  733 daily instances(graphs) from November 8 1997 to January 2 2000 
asSkitter  Undirected  1,696,415  11,095,298  Internet topology graph, from traceroutes run daily in 2005 
asCaida (122 graphs) 
Directed  8,02026,475  36,406106,762  The CAIDA AS Relationships Datasets, from January 2004 to November 2007 
Oregon1 (9 graphs) 
Undirected  10,67011,174  22,00223,409  AS peering information inferred from Oregon routeviews between March 31 and May 26 2001 
Oregon2 (9 graphs) 
Undirected  10,90011,461  31,18032,730  AS peering information inferred from Oregon routeviews between March 31 and May 26 2001 
oregon020515  Undirected  13,579  37,448  3 different networks based on AS peering information inferred on May 15 2002 
Name  Type  Nodes  Edges  Description 

socsignepinions  Directed  131,828  841,372  Epinions signed social network 
wikiElec  Directed, Bipartite  ~7,000  ~100,000  Wikipedia adminship election data 
socsignSlashdot081106  Directed  77,357  516,575  Slashdot Zoo signed social network from November 6 2008 
socsignSlashdot090216  Directed  81,871  545,671  Slashdot Zoo signed social network from February 16 2009 
socsignSlashdot090221  Directed  82,144  549,202  Slashdot Zoo signed social network from February 21 2009 
<h3>Locationbased online social networks</h3>
Name  Type  Nodes  Edges  Description 

locGowalla  Undirected, GeoLocation  196,591  950,327  Gowalla location based online social network 
locBrightkite  Unirected, GeoLocation  58,228  214,078  Brightkite location based online social network 
<h3>Wikipedia networks and metadata</h3>
Name  Type  Nodes  Edges  Description 

wikiVote  Directed  7,115  103,689  Wikipedia whovotesonwhom network 
wikiTalk  Directed  2,394,385  5,021,410  Wikipedia talk (communication) network 
wikiElec  Bipartite  ~7,000  ~100,000  Wikipedia adminship election data 
wikimeta  Edits  2.3M users, 3.5M pages 
250M edits  Complete Wikipedia edit history (who edited what page) 
<h3>Memetracker and Twitter</h3>
Name  Type  Nodes  Edges  Description 

twitter7  Tweets  17,069,982 users  476,553,560 tweets  A collection of 476 million tweets collected between JuneDec 2009 
memetracker9  Memes  96 million  418 million links  Memetracker phrases and hyperlinks between 96 million blog posts from Aug 2008 to Apr 2009 
ksctimeseries  Time Series 
2,000  418 million links  Time series of volume of 1,000 most popular Memetracker phrases and 1,000 most popular Twitter hashtags 
higgstwitter  Tweets  456,631  14,855,875  Spreading processes of the announcement of the discovery of a new particle with the features of the Higgs boson on 4th July 2012. 
Name  Type  Number of items  Description 

Reddit submissions  132,308 submissions  Resubmitted content on reddit.com  
flickr  Images  2,316,948 related images  Images sharing common metadata on Flickr 
Name  Type  Number of items  Description 

BeerAdvocate  Beer reviews  1,586,259 beer reviews  Beer reviews from BeerAdvocate 
RateBeer  Beer reviews  2,924,127 beer reviews  Beer reviews from RateBeer 
CellarTracker  Wine reviews  2,025,995 wine reviews  Wine reviews from CellarTracker 
Amazon reviews  Amazon reviews (all categories)  34,686,770 product reviews  Reviews from Amazon 
Fine Foods  Food reviews  568,454 food reviews  Food reviews from Amazon 
Movies  Movie reviews  7,911,684 movie reviews  Movie reviews from Amazon 
 Directed : directed network
 Undirected : undirected network
 Bipartite : bipartite network
 Multigraph : network has multiple edges between a pair of nodes
 Temporal : for each node/edge we know the time when it appeared in the network
 Labeled : network contains labels (weights, attributes) on nodes and/or edges
Dataset statistics  

Nodes  Number of nodes in the network 
Edges  Number of edges in the network 
Nodes in largest WCC  Number of nodes in the largest weakly connected component 
Edges in largest WCC  Number of edges in the largest weakly connected component 
Nodes in largest SCC  Number of nodes in the largest strongly connected component 
Edges in largest SCC  Number of edges in the largest strongly connected component 
Average clustering coefficient  Average clustering coefficient 
Number of triangles  Number of triples of connected nodes (considering the network as undirected) 
Fraction of closed triangles  Number of connected triples of nodes / number of (undirected) length 2 paths 
Diameter (longest shortest path)  Maximum undirected shortest path length (sampled over 1,000 random nodes) 
90percentile effective diameter  90th percentile of undirected shortest path length distribution (sampled over 1,000 random nodes) 
</div>
References
[1] SNAP Package Description, 2013.