Zookeeper is a distributed database originally developed as part of the Hadoop project. It’s spawned several imitators: Consul, etcd, and Doozerd.
This is a multi-part series:
1. ephemeral nodes, 2. observable<zookeeper>, 3. where is zookeeper?, 4. .net api and tasks, 5. c# iobservable
Zookeeper is a distributed database originally developed as part of the Hadoop project. It’s spawned several imitators: Consul, etcd, and Doozerd (itself a clone of chubby.) A lot of the material out there about Zookeeper describes just how it works, not necessarily what you’d use it for. In this series of posts, we’ll cover how we used it at one client — and it how it also got abused.
The canonical usage of the Zookeeper system is to maintain a service registry. In our application, we had large datasets that would be hosted by data servers. (Each server might be backed by a cluster of machines.) Clients requested visualizations of these datasets. To do so, they would need to connect to the server hosting the data they were interested in. But how does a client know which data servers are running? In this picture we imagine a system with two such data servers, each having registered with Zookeeper. When the server starts, it places a document in a pre-arranged place (/services/{dataset}), describing where it can be found. The document might be JSON, e.g. { dataset: 'X', host:'abc', port:'123' }.
This arrangement takes care of a couple of things nicely for us
As we’ve seen, the data in Zookeeper can roughly be thought of as a filesystem with small files and clean concurrency guarantees. Text data is added/removed/updated at a particular path (e.g. /services/X), where it can be seen or observed by everyone connected to that Zookeeper instance. Zookeeper itself also runs on a cluster of machines (they recommend at least three) so that load among many competing clients can be shared, and so that the likelihood of complete data loss is very low.
Why not use the actual file system? The simple answer is that most file system implementations (e.g NFS) don’t give clean consistency guarantees. If I launch a new service and put a file in place declaring I’m running (/services/hammer-manager) you might read it at just the wrong time and see an incomplete result (/host:123 instead of, say, /host:1234). Zookeeper defines a strict guarantee of what will appear when, and how operations will be serialized. You’ll want to read the manual carefully.
Why not use a database? You could, but the features in Zookeeper have been designed to all work together in a way that would take a while to reproduce in a traditional database.
What happens when the cluster hosting /service/X crashes? How does the node describing its services get removed? How do we keep clients from trying to connect? There are several approaches, but a nice one involves a key Zookeeper feature: ephemeral nodes. When a Zookeeper client (in this case our data server) creates a node, it can create it as ephemeral. So long as that client is alive (as determined by a heartbeat built into the Zookeeper protocol), the node is kept alive. If the Zookeeper server ever loses the heartbeat, it will delete the relevant node. Now without any extra work on your part, the service registry is kept up to date, even in the face of recalcitrant servers.
Because the server may crash and restart faster than the heartbeat, leaving an ephemeral node in place for a few seconds, and because there may be a race condition in starting two servers for the same dataset, you’ll want to be a little smarter than just ‘create ephemeral node on startup’, but you get the idea.
In the next post, we’ll discover that other clients can be told about this failure immediately, by watching Zookeeper documents, which is where we discover why these documents are stored in a hierarchy to begin with.
Tell us what you need and one of our experts will get back to you.