Stop Thinking, Just Do!

Sungsoo Kim's Blog

Stanford Network Analysis Platform(SNAP)

tagsTags

22 March 2014


SNAP : Stanford Network Analysis Platform

Stanford Network Analysis Platform (SNAP) is a general purpose, high performance system for analysis and manipulation of large networks. SNAP is written in C++ and it scales to massive graphs with hundreds of millions of nodes and billions of edges.

Compiling on Mac OSX

To compile on Mac OSX, using Xcode:

  1. From the Toolbar, select Scheme (e.g. ‘bigclam’).
  2. Product -> Build. (or Cmd + B).
  3. Run executable via the command line; or Choose the scheme’s executable (Product -> Edit Scheme -> Run -> Info) and run: Product -> Run (or Cmd + R). Note: If using Gnuplot, add the PATH to the scheme’s environment variables. or create symlink to /usr/bin:

      sudo ln -s <gnuplot_dir>/gnuplot /usr/bin/
    

    For code completion, the “docs” target has been created which includes all Snap-related files and example programs.

SNAP tutorials

Sample programs demonstrating the use of foundational SNAP classes and functionality are available in the tutorials directory.

To compile all the tutorials, execute the following command line:

  cd tutorials; make all    # generates all the tutorials

SNAP unit tests

Unit tests are available in the test directory.

To run unit tests, install googletest (code.google.com/p/googletest) and execute:

  cd test; make run    # compiles and runs all the tests

SNAP Package Description

SNAP Directory Structure

The SNAP package contains the following directories:

  • snap-core:the core SNAP graph library;
  • snap-adv:advanced SNAP components, not in the core, but used by examples;
  • snap-exp:experimental SNAP components, still in development;
  • examples:small sample applications that demonstrate SNAP functionality;
  • tutorials:simple programs, demonstrating use of various classes;
  • glib-core: STL-like library that implements basic data structures, like vectors (TVec), hash-tables (THash) and strings (TStr), provides serialization and so on;
  • test:unit tests for various classes;
  • doxygen:SNAP reference manuals.

Example Applications

  • agmfit: Detects network communities from a given network by fitting the Affiliation Graph Model (AGM) to the given network by maximum likelihood estimation.
  • agmgen: Implements the Affiliation Graph Model (AGM). AGM generates a realistic looking graph from the community affiliation of the nodes.
  • bigclam: Formulates community detection problems into non-negative matrix factorization and discovers community membership factors of nodes.
  • cascades: Simulates a SI (susceptible-infected) model on a network and computes structural properties of cascades.
  • centrality: Computes node centrality measures for a graph: closeness, eigen, degree, betweenness, page rank, hubs and authorities.
  • cliques: Finds overlapping dense groups of nodes in networks, based on the Clique Percolation Method.
  • community: Implements network community detection algorithms: Girvan-Newman and Clauset-Newman-Moore.
  • concomp: Computes weakly, strongly and biconnected connected components, articulation points and bridge edges of a graph.
  • forestfire: Generates graphs using the Forest Fire model.
  • graphgen: Generates undirected graphs using one of the many SNAP graph generators.
  • graphhash: Demonstrates the use of TGHash graph hash table, useful for counting frequencies of small subgraphs or information cascades.
  • infopath: Implements stochastic algorithm for dynamic network inference from cascade data, see Structure and Dynamics of Information Pathways in On-Line Media
  • kcores: Computes the k-core decomposition of the network and plots the number of nodes in a k-core of a graph as a function of k.
  • kronem: Estimates Kronecker graph parameter matrix using EM algorithm.
  • kronfit: Estimates Kronecker graph parameter matrix.
  • krongen: Generates Kronecker graphs.
  • magfit: Estimates Multiplicative Attribute Graph (MAG) model parameter.
  • maggen: Generates Multiplicative Attribute Graphs (MAG).
  • mkdatasets: Demonstrates how to load different kinds of networks in various network formats and how to compute various statistics of the network.
  • motifs: Counts the number of occurence of every possible subgraph on K nodes in the network.
  • ncpplot: Plots the Network Community Profile (NCP).
  • netevol: Computes properties of an evolving network, like evolution of diameter, densification power law, degree distribution, etc.
  • netinf: Implements netinf algorithm for network inference from cascade data, see Inferring Networks of Diffusion and Influence
  • netstat: Computes structural properties of a static network, like degree distribution, hop plot, clustering coefficient, distribution of sizes of connected components, spectral properties of graph adjacency matrix, etc.
  • testgraph: Demonstrates some of the basic SNAP functionality.

Functionality of the Core SNAP Library

Brief description of SNAP functionality implemented in various files in the distribution package:

snap-core
alg.hSimple algorithms like counting node degrees, simple graph manipulation (adding/deleting self edges, deleting isolated nodes) and testing whether graph is a tree or a star.
anf.hApproximate Neighborhood Function: linear time algorithm to approximately calculate the diameter of massive graphs.
bfsdfs.hAlgorithms based on Breath First Search (BFS) and Depth First Search (DFS): shortest paths, spanning trees, graph diameter, and similar.
bignet.hMemory efficient implementation of a network with data on nodes. Use when working with very large networks.
centr.hNode centrality measures: closeness, betweenness, PageRank, ...
cmty.hAlgorithms for network community detection: Modularity, Girvan-Newman, Clauset-Newman-Moore.
cncom.hConnected components: weakly, strongly and biconnected components, articular nodes and bridge edges.
ff.hForest Fire model for generating networks that densify and have shrinking diameters.
gbase.hDefines flags that are used to identify functionality of graphs.
ggen.hVarious graph generators: random graphs, copying model, preferential attachment, RMAT, configuration model, Small world model.
ghash.hHash table with directed graphs (TNGraph) as keys. Uses efficient adaptive approximate graph isomorphism testing to scale to large graphs. Useful when one wants to count frequencies of various small subgraphs or cascades.
gio.hGraph input output. Methods for loading and saving various textual and XML based graph formats: Pajek, ORA, DynNet, GraphML (GML), Matlab.
graph.hImplements graph types TUNGraph, TNGraph and TNEGraph.
gstat.hComputes many structural properties of static and evolving networks.
gsvd.hEigen and singular value decomposition of graph adjacency matrix.
gviz.hInterface to Graphviz.
kcore.hK-core decomposition of networks.
network.hImplements network types TNodeNet, TNodeEDatNet and TNodeEdgeNet.
Snap.hMain include file of the library.
statplot.hPlots of various structural network properties: clustering, degrees, diameter, spectrum, connected components.
subgraph.hExtracting subgraphs and converting between different graph/network types.
timenet.hTemporally evolving networks.
triad.h Functions for counting triads (triples of connected nodes in the network) and computing clustering coefficient.
util.hUtilities to manipulate PDFs, CDFs and CCDFs. Quick and dirty string manipulation, URL and domain manipulation routines.
snap-adv
agm.hImplements the Affiliation Graph Model (AGM).
cliques.hMaximal clique detection and Clique Percolation method.
graphcounter.hPerforms fast graph isomorphism testing to count the frequency of topologically distinct sub-graphs.
kronecker.hKronecker Graph generator and KronFit algorithm for estimating parameters of Kronecker graphs.
mag.hImplements the Multiplicative Attribute Graph (MAG).
ncp.hNetwork community profile plot. Implements local spectral graph partitioning method to efficiently find communities in networks.
subgraphenum.hSub-graph enumeration and network motif computations.
snap-exp
arxiv.hFunctions for parsing Arxiv data and standardizing author names.
dblp.hParser for XML dump of DBLP data.
imdbnet.hActors-to-movies bipartite network of IMDB.
mxdag.hFinds the maximum directed-acyclic subgraph of a given directed graph.
signnet.hNetworks with signed (+1, -1) edges that can denote trust/distrust between the nodes of the network.
sir.h SIR epidemic model and SIR parameter estimation.
spinn3r.hPast parser for loading blog post data from Spinn3r.
trawling.hAlgorithm of extracting bipartite cliques from the network.
wgtnet.hWeighted networks.
wikinet.hNetworks based on Wikipedia.

Stanford Large Network Dataset Collection

SNAP networks are also availalbe from UF Sparse Matrix collection. Visualizations of SNAP networks by Tim Davis.

<h3>Social networks</h3>

Name Type Nodes Edges Description
ego-Facebook Undirected 4,039 88,234 Social circles from Facebook (anonymized)
ego-Gplus Directed 107,614 13,673,453 Social circles from Google+
ego-Twitter Directed 81,306 1,768,149 Social circles from Twitter
soc-Epinions1 Directed 75,879 508,837 Who-trusts-whom network of Epinions.com
soc-LiveJournal1 Directed 4,847,571 68,993,773 LiveJournal online social network
soc-Pokec Directed 1,632,803 30,622,564 Pokec online social network
soc-Slashdot0811 Directed 77,360 905,468 Slashdot social network from November 2008
soc-Slashdot0922 Directed 82,168 948,464 Slashdot social network from February 2009
wiki-Vote Directed 7,115 103,689 Wikipedia who-votes-on-whom network

<h3>Networks with ground-truth communities</h3>

Name Type Nodes Edges Communities Description
com-LiveJournal Undirected, Communities 3,997,962 34,681,189 287,512 LiveJournal online social network
com-Friendster Undirected, Communities 65,608,366 1,806,067,135 957,154 Friendster online social network
com-Orkut Undirected, Communities 3,072,441 117,185,083 6,288,363 Orkut online social network
com-Youtube Undirected, Communities 1,134,890 2,987,624 8,385 Youtube online social network
com-DBLP Undirected, Communities 317,080 1,049,866 13,477 DBLP collaboration network
com-Amazon Undirected, Communities 334,863 925,872 151,037 Amazon product network

<h3>Communication networks</h3>

Name Type Nodes Edges Description
email-EuAll Directed 265,214 420,045 Email network from a EU research institution
email-Enron Undirected 36,692 367,662 Email communication network from Enron
wiki-Talk Directed 2,394,385 5,021,410 Wikipedia talk (communication) network

<h3>Citation networks</h3>

Name Type Nodes Edges Description
cit-HepPh Directed, Temporal, Labeled 34,546 421,578 Arxiv High Energy Physics paper citation network
cit-HepTh Directed, Temporal, Labeled 27,770 352,807 Arxiv High Energy Physics paper citation network
cit-Patents Directed, Temporal, Labeled 3,774,768 16,518,948 Citation network among US Patents

<h3>Collaboration networks</h3>

Name Type Nodes Edges Description
ca-AstroPh Undirected 18,772 396,160 Collaboration network of Arxiv Astro Physics
ca-CondMat Undirected 23,133 186,936 Collaboration network of Arxiv Condensed Matter
ca-GrQc Undirected 5,242 28,980 Collaboration network of Arxiv General Relativity
ca-HepPh Undirected 12,008 237,010 Collaboration network of Arxiv High Energy Physics
ca-HepTh Undirected 9,877 51,971 Collaboration network of Arxiv High Energy Physics Theory

<h3>Web graphs</h3>

Name Type Nodes Edges Description
web-BerkStan Directed 685,230 7,600,595 Web graph of Berkeley and Stanford
web-Google Directed 875,713 5,105,039 Web graph from Google
web-NotreDame Directed 325,729 1,497,134 Web graph of Notre Dame
web-Stanford Directed 281,903 2,312,497 Web graph of Stanford.edu

<h3>Product co-purchasing networks</h3>

Name Type Nodes Edges Description
amazon0302 Directed 262,111 1,234,877 Amazon product co-purchasing network from March 2 2003
amazon0312 Directed 400,727 3,200,440 Amazon product co-purchasing network from March 12 2003
amazon0505 Directed 410,236 3,356,824 Amazon product co-purchasing network from May 5 2003
amazon0601 Directed 403,394 3,387,388 Amazon product co-purchasing network from June 1 2003
amazon-meta Metadata 548,552 1,788,725 Amazon product metadata: product info and all reviews on around 548,552 products.

<h3>Internet peer-to-peer networks</h3>

Name Type Nodes Edges Description
p2p-Gnutella04 Directed 10,876 39,994 Gnutella peer to peer network from August 4 2002
p2p-Gnutella05 Directed 8,846 31,839 Gnutella peer to peer network from August 5 2002
p2p-Gnutella06 Directed 8,717 31,525 Gnutella peer to peer network from August 6 2002
p2p-Gnutella08 Directed 6,301 20,777 Gnutella peer to peer network from August 8 2002
p2p-Gnutella09 Directed 8,114 26,013 Gnutella peer to peer network from August 9 2002
p2p-Gnutella24 Directed 26,518 65,369 Gnutella peer to peer network from August 24 2002
p2p-Gnutella25 Directed 22,687 54,705 Gnutella peer to peer network from August 25 2002
p2p-Gnutella30 Directed 36,682 88,328 Gnutella peer to peer network from August 30 2002
p2p-Gnutella31 Directed 62,586 147,892 Gnutella peer to peer network from August 31 2002

<h3>Road networks</h3>

Name Type Nodes Edges Description
roadNet-CA Undirected 1,965,206 5,533,214 Road network of California
roadNet-PA Undirected 1,088,092 3,083,796 Road network of Pennsylvania
roadNet-TX Undirected 1,379,917 3,843,320 Road network of Texas

<h3>Autonomous systems graphs</h3>

Name Type Nodes Edges Description
as-733
(733 graphs)
Undirected 103-6,474 243-13,233 733 daily instances(graphs) from November 8 1997 to January 2 2000
as-Skitter Undirected 1,696,415 11,095,298 Internet topology graph, from traceroutes run daily in 2005
as-Caida
(122 graphs)
Directed 8,020-26,475 36,406-106,762 The CAIDA AS Relationships Datasets, from January 2004 to November 2007
Oregon-1
(9 graphs)
Undirected 10,670-11,174 22,002-23,409 AS peering information inferred from Oregon route-views between March 31 and May 26 2001
Oregon-2
(9 graphs)
Undirected 10,900-11,461 31,180-32,730 AS peering information inferred from Oregon route-views between March 31 and May 26 2001
oregon020515 Undirected 13,579 37,448 3 different networks based on AS peering information inferred on May 15 2002

<h3>Signed networks</h3>

Name Type Nodes Edges Description
soc-sign-epinions Directed 131,828 841,372 Epinions signed social network
wiki-Elec Directed, Bipartite ~7,000 ~100,000 Wikipedia adminship election data
soc-sign-Slashdot081106 Directed 77,357 516,575 Slashdot Zoo signed social network from November 6 2008
soc-sign-Slashdot090216 Directed 81,871 545,671 Slashdot Zoo signed social network from February 16 2009
soc-sign-Slashdot090221 Directed 82,144 549,202 Slashdot Zoo signed social network from February 21 2009

<h3>Location-based online social networks</h3>

Name Type Nodes Edges Description
loc-Gowalla Undirected, Geo-Location 196,591 950,327 Gowalla location based online social network
loc-Brightkite Unirected, Geo-Location 58,228 214,078 Brightkite location based online social network

<h3>Wikipedia networks and metadata</h3>

Name Type Nodes Edges Description
wiki-Vote Directed 7,115 103,689 Wikipedia who-votes-on-whom network
wiki-Talk Directed 2,394,385 5,021,410 Wikipedia talk (communication) network
wiki-Elec Bipartite ~7,000 ~100,000 Wikipedia adminship election data
wiki-meta Edits 2.3M users,
3.5M pages
250M edits Complete Wikipedia edit history (who edited what page)

<h3>Memetracker and Twitter</h3>

Name Type Nodes Edges Description
twitter7 Tweets 17,069,982 users 476,553,560 tweets A collection of 476 million tweets collected between June-Dec 2009
memetracker9 Memes 96 million 418 million links Memetracker phrases and hyperlinks between 96 million blog posts from Aug 2008 to Apr 2009
ksc-time-series Time
Series
2,000 418 million links Time series of volume of 1,000 most popular Memetracker phrases and 1,000 most popular Twitter hashtags
higgs-twitter Tweets 456,631 14,855,875 Spreading processes of the announcement of the discovery of a new particle with the features of the Higgs boson on 4th July 2012.

<h3>Online Communities</h3>

Name Type Number of items Description
Reddit Reddit submissions 132,308 submissions Resubmitted content on reddit.com
flickr Images 2,316,948 related images Images sharing common metadata on Flickr

<h3>Online Reviews</h3>

Name Type Number of items Description
BeerAdvocate Beer reviews 1,586,259 beer reviews Beer reviews from BeerAdvocate
RateBeer Beer reviews 2,924,127 beer reviews Beer reviews from RateBeer
CellarTracker Wine reviews 2,025,995 wine reviews Wine reviews from CellarTracker
Amazon reviews Amazon reviews (all categories) 34,686,770 product reviews Reviews from Amazon
Fine Foods Food reviews 568,454 food reviews Food reviews from Amazon
Movies Movie reviews 7,911,684 movie reviews Movie reviews from Amazon


<h2>Network types</h2>

  • Directed : directed network
  • Undirected : undirected network
  • Bipartite : bipartite network
  • Multigraph : network has multiple edges between a pair of nodes
  • Temporal : for each node/edge we know the time when it appeared in the network
  • Labeled : network contains labels (weights, attributes) on nodes and/or edges

<h2>Network statistics</h2>

Dataset statistics
Nodes Number of nodes in the network
Edges Number of edges in the network
Nodes in largest WCC Number of nodes in the largest weakly connected component
Edges in largest WCC Number of edges in the largest weakly connected component
Nodes in largest SCC Number of nodes in the largest strongly connected component
Edges in largest SCC Number of edges in the largest strongly connected component
Average clustering coefficient Average clustering coefficient
Number of triangles Number of triples of connected nodes (considering the network as undirected)
Fraction of closed triangles Number of connected triples of nodes / number of (undirected) length 2 paths
Diameter (longest shortest path) Maximum undirected shortest path length (sampled over 1,000 random nodes)
90-percentile effective diameter 90-th percentile of undirected shortest path length distribution (sampled over 1,000 random nodes)

</div>

References

[1] SNAP Package Description, 2013.


comments powered by Disqus