Stop Thinking, Just Do!

Sung-Soo Kim's Blog



9 December 2014

Article Source


The goal of DryadLINQ is to make distributed computing on large compute cluster simple enough for every programmer. DryadLINQ combines two important pieces of Microsoft technology: the Dryaddistributed execution engine and the .NET Language Integrated Query (LINQ).

Dryad provides reliable, distributed computing on thousands of servers for large-scale data parallel applications. LINQ enables developers to write and debug their applications in a SQL-like query language, relying on the entire .NET library and using Visual Studio.  

DryadLINQ translates LINQ programs into distributed Dryad computations:

  • C# and LINQ data objects become distributed partitioned files.
  • LINQ queries become distributed Dryad jobs.
  • C# methods become code running on the vertices of a Dryad job.

DryadLINQ has the following features:

  • Declarative programming: computations are expressed in a high-level language containing a superset of the best features of SQL, functional programming and .Net.
  • Automatic parallelization: from sequential declarative code the DryadLINQ compiler generates highly parallel query plans spanning large computer clusters. DryadLINQ also exploits multi-core parallelism on each machine.
  • Integration with Visual Studio: programmers in DryadLINQ take advantage of the comprehensive VS set of tools: Intellisense, code refactoring, integrated debugging, build, source code management.
  • Integration with .Net: all .Net libraries, including Visual Basic, and dynamic languages are available.
  • Type safety: distributed computations are statically type-checked.
  • Automatic serialization: data transport mechanisms automatically handle all .Net object types.
  • Job graph optimizations
    • static: a rich set of term-rewriting query optimization rules is applied to the query plan, optimizing locality and improving performance.
    • dynamic: run-time query plan optimizations automatically adapt the plan taking into account the statistics of the data set processed.
  • Conciseness: the following line of code is a complete implementation of the Map-Reduce computation framework in DryadLINQ:

A commercial implementation of Dryad and DryadLINQ was released in 2011 in beta form under the name Linq to HPC:



comments powered by Disqus