Distributed Systems links

pg-distrib-logoA few years ago, while researching Zookeeper for a project I was working on, I realized that there was a whole field Computer Science, Distributed Systems, that I was totally unfamiliar with. That started a journey of discovery that’s been very rewarding.   In response to a question on the Akka mailing list I put together a list of links to Distributed Systems resources.  I’ve been meaning to translate that email to a blog post for a while.

To start off I would definitely recommend checking out a talk that Jonas Boner from Typesafe gave at Strange Loop called The Road to Akka Cluster and Beyond (slides).

Implementation-oriented books that I would recommend for developers are:

These are all filled with practical advice for building real-world distributed systems.

One thing I found is that there is a big gap between academic and industry knowledge right now.  This is discussed in a post on Henry Robinson’s excellent Paper Trail blog where he provides a guide to digging deeper both on the academic side and by reading research papers written by industry leaders like Google, Yahoo, etc.   Definitely read the links in the “First Steps” section.  The gap is also the topic of a post on Marc Brooker’s blog and a post on Murat’s blog.  Besides papers he links to some other good people to follow like Aphyr and Peter Bailis.  Two blogs that review Distributed Systems papers are the Morning Paper and MetaData.  I also recommend following Brave New Geek, Ben Stopford and Kellabyte, and the Hacking, Distributed, High Scalability and Highly Scalable blogs.

Papers We Love is a collective of meetups across the world where people present their takes on research papers that they find fascinating.  Their web site has videos taken at these meetups.  They also hold yearly conferences.

YouTubers are also getting into the act – for example, Vivek Haldar has a series of videos called Read a Paper where he summarizes papers in around ten minutes.

Many times the conferences where the papers are presented also publish videos, slide decks and posters that are much easier to consume for a working developer.  If you have a paper that you are really interested in be sure and and check out the web site of the conference where the paper was published.  Usenix in particular is really good at this.  In addition in the last few years a number of research projects have been creating web sites to promote the research where you can find code, videos and more.  For example, check out the site for Hermes, a replication protocol.

Working to fill the gap between academia and industry:

Essential ACM Queue articles

Notable blog posts

Online Courses

I recommend getting familiar with the CAP Theorem.  You’re going to run into it all over the place.

Zookeeper is a Consensus (or Coordination) system.  Consensus is a major topic in theoretical and practical distributed systems and is what got me started digging into distributed systems originally.  To start getting familiar with Consensus I recommend:

On the academic textbook side, I have these on my stack to read:

This is just the tip of the iceberg.  Besides consensus, other distributed systems topics that I’ve found interesting include distributed databases, group membership, gossip protocols (used in Akka, Cassandra and Consul), time and clocks, and peer-to-peer systems.

My first computer

I was looking at this old issue of Byte Magazine online talking about Smalltalk.

Lo and behold there was an ad for my first computer, an MTI TRS-80 Model III.


It was $1,998 in 1981 dollars which would be $5,136 today.  That’s was a lot of money for my family and a ton of money to be spent on a 14-year old.  But well worth it!

Thanks Mom and Dad!

Adventures in Clustering – part 2

Embedding a Zookeeper Server To minimize the number of moving parts in the message delivery system I wanted to embed the Zookeeper server in the application, rather than running a separate ensemble of Zookeeper servers. The embedded Zookeeper … [Continue reading]

Adventures in Clustering – part 1

Last year I added clustering support to a system I had previously developed for a client. The requirements were to implement automated failover to eliminate a single point of failure and to distribute certain kinds of work among members of the … [Continue reading]

Happy Holidays!

[Continue reading]

An Auto-Updating Caching System – part 2

In the previous post we imagined that we needed to build a caching system in front of a slow backend system. The cache needed to meet the following requirements: The data in the backend system is constantly being updated so the caches need to be … [Continue reading]

An Auto-Updating Caching System – part 1

Imagine you needed to build a caching system in front of a slow backend system with the following requirements: The data in the backend system is constantly being updated so the caches need to be updated every N minutes. Requests to the backend … [Continue reading]

Using Groovy Closures as Scala Functions

I have a Scala trait for persistence and transaction management (which I will blog about in more detail later). The trait looks like: trait DomainManager { def get[E](id: Long)(implicit m: ClassManifest[E]): Option[E] def find[E](namedQuery: … [Continue reading]

Templating XML data with Velocity

Velocity is an easy-to-use templating system for the JVM. It's commonly used to code templates for web pages and email. To use Velocity you pass it a template (a string) and a context, which is a map of Javabeans and collections of Javabeans. The … [Continue reading]

Composing Traits and Declarative Validation

Scala's traits are a nice fit for JSR-303 validation. Here's an example. Suppose we have a web service interface that has methods like: @WebService trait Notification { def deleteTopic(apiKey: String, topicId: Long) def getSubscriber(apiKey: … [Continue reading]