Grappa: Scaling data-intensive applications on commodity clusters

Note: Grappa is no longer under active development.

A modern take on distributed shared memory

Grappa makes an entire cluster look like a single, powerful, shared-memory machine. By leveraging the massive amount of concurrency in large-scale data-intensive applications, Grappa can provide this useful abstraction with high performance. Unlike classic distributed shared memory (DSM) systems, Grappa does not require spatial locality or data reuse to perform well.

Platform for data-intensive applications

Data-intensive, or “Big Data”, workloads are an important class of large-scale computations. However, the commodity clusters they are run on are not well suited to these problems, requiring careful partitioning of data and computation. A diverse ecosystem of frameworks have arisen to tackle these problems, such as MapReduce, Spark, Dryad, and GraphLab, which ease development of large-scale applications by specializing to particular algorithmic structure and behavior.

Grappa provides abstraction at a level high enough to subsume many performance optimizations common to these data-intensive platforms. However, its relatively low-level interface provides a convenient abstraction for building data-intensive frameworks on top of. Prototype implementations of (simplified) MapReduce, GraphLab, and a relational query engine have been built on Grappa that out-perform the original systems.

Grappa’s core features

Grappa’s runtime system consists of three key components:

Distributed shared memory (DSM): Provides fine-grain access to data anywhere in the system with strong consistency guarantees.
Tasking system: Supports millions of lightweight threads and global distributed work stealing for load balance.
Communication layer: Supports high throughput even for extremely small messages by delaying and aggregating them into larger network packets.

Try it out now!

Grappa is freely available on Github under a BSD license. Anyone interested in seeing Grappa at work can follow the quick-start directions in the README to build and run it on their cluster. To learn how to write your own Grappa applications, check out the Tutorial.

Grappa is still quite young, so please don’t hesitate to ask for help if you run into problems. To find answers to questions or ask new ones, please use Github Issues. The developers hang out in the #grappa.io IRC channel on freenode; you can join with your favorite IRC client or this web interface. Finally, to stay up-to-date on the latest releases and information about the project, you can subscribe to the mailing list below.

Publications

Latency-Tolerant Software Distributed Shared Memory.
Jacob Nelson, Brandon Holt, Brandon Myers, Preston Briggs, Luis Ceze, Simon Kahan, and Mark Oskin
USENIX Annual Technical Conference (USENIX ATC), July 2015 (Best Paper Award)

Alembic: Automatic Locality Extraction via Migration.
Brandon Holt, Preston Briggs, Luis Ceze, Mark Oskin
OOPSLA 2014

Radish: Compiling Efficient Query Plans for Distributed Shared Memory.
Brandon Myers, Daniel Halperin, Jacob Nelson, Mark Oskin, Luis Ceze, Bill Howe
Tech report, October 2014

Grappa: A Latency-Tolerant Runtime for Large-Scale Irregular Applications. (Expanded tech report)
Jacob Nelson, Brandon Holt, Brandon Myers, Preston Briggs, Luis Ceze, Simon Kahan, and Mark Oskin
International Workshop on Rack-Scale Computing (WRSC w/EuroSys), April 2014

Flat Combining Synchronized Global Data Structures.
Brandon Holt, Jacob Nelson, Brandon Myers, Preston Briggs, Luis Ceze, Simon Kahan, and Mark Oskin
International Conference on PGAS Programming Models (PGAS), October 2013

Compiled Plans for In-Memory Path-Counting Queries.
Brandon Myers, Jeremy Hyrkas, Daniel Halperin, and Bill Howe
International Workshop on In-Memory Data Management and Analytics (IMDM w/ VLDB), August 2013

Crunching Large Graphs With Commodity Processors.
Jacob Nelson, Brandon Myers, A. H. Hunter, Preston Briggs, Luis Ceze, Carl Ebeling, Dan Grossman, Simon Kahan, Mark Oskin
USENIX Workshop on Hot Topics in Parallelism (HOTPAR), June 2011

About the project

Grappa is a project group in the Sampa Group at the University of Washington.

Grappa
Scaling data-intensive applications on commodity clusters

A modern take on distributed shared memory

Platform for data-intensive applications

Grappa’s core features

Try it out now!

Publications

Other documentation

About the project

Graduate Students

Jacob Nelson

Brandon Myers

Brandon Holt

Vincent Lee

Faculty

Simon Kahan

Preston Briggs

Luis Ceze

Mark Oskin

Grappa Scaling data-intensive applications on commodity clusters

A modern take on distributed shared memory

Platform for data-intensive applications

Grappa’s core features

Try it out now!

Publications

Other documentation

About the project

Graduate Students

Jacob Nelson

Brandon Myers

Brandon Holt

Vincent Lee

Faculty

Simon Kahan

Preston Briggs

Luis Ceze

Mark Oskin

Grappa
Scaling data-intensive applications on commodity clusters