DSE Graph Partitioning Part 2: Taming Your Supernodes

In the previous blog post, Custom Vertex Ids, I discussed graph partitioning, and focused specifically on how vertices can be distributed around your DSE Graph cluster to minimize the number of internode vertex connections such as edge cuts, thereby reducing query latency. As we saw, in many cases it is difficult to come up with an optimal partitioning scheme, but DSE Graph’s custom vertex ids can be used when there are prominent partitions within your graph. In addition to allowing the developer to control vertex locality, DSE Graph gives the graph developer another tool to handle a common graph gotcha: the dreaded supernode.  

A supernode is a vertex with a large number of edges. In graph theory, these high-degree vertices are known as hubs. Looking at Twitter’s social graph, we can easily find a real-world supernode example, Justin Bieber.  At a whopping 90 million followers, the Biebs easily qualifies for supernode status with his single Bieber vertex having 90 million inbound “follows” edges from his army of Beliebers.

Graph database providers have come up with a number of different in-memory and on-disk representations of property graphs, and these representations and their corresponding indexing structures will determine (along with your hardware, of course) what vertex degrees are reasonable and where you may start running into performance and operational issues. There is no single number where a vertex will suddenly hit supernode status on a vertex, but it would be reasonable to say that my Twitter handle is NOT a supernode, and Mr. Bieber’s is.  

Recall from the previous partitioning blog post that DSE Graph data is stored in Apache Cassandra. There are vertex and edge tables and indexing structures. Edges are duplicated automatically, stored with their inbound and outbound vertices. This denormalization supports speedy traversals of relationships from either direction. The edge tables are partitioned by the inbound or outbound vertex id (depending on direction), so one practical limit you can run into is the maximum number of rows (edges) that can performantly live in a single Cassandra partition. If your partitions get too wide, say in the 10,000s to 100,000s of edges, you may run into operational issues associated with large partitions. These can include poor query performance and possibly failed compactions and repairs. Supernodes can also lead to hotspots when traversals that frequently navigate from the supernode side will unevenly tax the DSE nodes that hold the supernode data.

One of the best ways to deal with supernodes is to avoid them in the first place. Think of a use case where we are storing users and their locations. The user, Susan, lives in the United States along with another 10 million users. It’s tempting to make the United States its own vertex. Years of third normal form can’t be wrong, right!? Your queries will perform fine if you traverse from the user side of the relationship here; Susan’s vertex will only have one outbound edge associated with it. The problem occurs if you try to do anything from the direction of the United States vertex. From this direction, we’re confronted with 10 million edges. Vertex-centric index (VCI) or not, we’ll still end up with an enormous Cassandra partition. Figure 1 illustrates this scenario on a 3-node cluster with a replication factor of 1. Note that the United States vertex and all of its inbound edges are located on node 2.

pvt_blog_1

Figure 1: Inbound edge storage locations

In a case like this, you’re better off making the user’s country a property on the user vertex. OLTP queries requiring user filtering by country can happen just as well with a user vertex country property. Any query you do from the United States side will most likely need to be a non-real-time, analytics query (e.g., get me a count of all users by country across my whole graph). Running that query on any non-trivially sized graph will be untenable in OLTP mode and will instead require DSE Graph Analytics.  

When should you keep a supernode around for OLTP usage? In our example, having the separate country vertex doesn’t really give us anything modeling-wise. We can answer the same sorts of questions cleanly with a vertex property. When the supernode isn’t a terminating stop in most traversals, it makes more sense to keep it around. Instead of countries, let’s take a simple data center monitoring example. Figure 2 shows the model. We have a data center vertex that will have thousands of edges connecting it to all of the servers in the data center. Notice that the data center vertex also connects to other portions of our model. In this case it makes sense to pull it out as its own entity because it’s an important part of other pieces of our data model and, in many cases, an integral hop into another dimension of our graph.

pvt_blog_2

Figure 2: Data center monitoring model

Partitioned Vertices

If you decide that certain supernodes make sense for your use case, you can use DSE Graph’s partitioned vertex support to lessen the impact of high-degree vertices on your query performance and operations. Partitioned vertices distribute their edges around your cluster, reducing some of the problems traditionally associated with supernodes.  

The following schema snippet builds off of the above data center example. We will start by creating “DataCenter” and “Server” vertex labels. Servers are “locatedIn” data centers and that relationship has an “installedOn” timestamp noting when the server was installed in the data center.

schema.propertyKey("installedOn").Timestamp().ifNotExists().create()
schema.propertyKey("serialNumber").Text().ifNotExists().create()
schema.propertyKey("name").Text().ifNotExists().create()
schema.vertexLabel("Server").properties("serialNumber").ifNotExists().create()
schema.vertexLabel("DataCenter").properties("name").ifNotExists().create()
schema.edgeLabel("locatedIn").connection("Server", "DataCenter").ifNotExists().create()

Now for the partitioned vertex portion. Here we will partition the data center vertices by their inbound “locatedIn” edges.

schema.vertexLabel("DataCenter").partition().inE("locatedIn").ifNotExists().add()

You can add a VCI on the “installedOn” property of the “locatedIn” edges in the same way that you would add one to a non-partitioned vertex.

schema.vertexLabel("Server").index("bySerialNumber").materialized().by("serialNumber").ifNotExists().add()
schema.vertexLabel("DataCenter").index("byName").materialized().by("name").ifNotExists().add()
schema.vertexLabel("DataCenter").index("byInstalledOn").inE("locatedIn").by("installedOn").ifNotExists().add()

DSE uses a special keyspace to store the edges of partitioned vertices. If you connect to your cluster with cqlsh and list the keyspaces, you’ll see a keyspace ending with “_pvt”, which stands for partitioned vertex table. Running a describe will show you that this keyspace uses a replication strategy that you may not be familiar with, the “LocalStrategy”. This is a strategy used internally by Cassandra to store data local to each node as opposed to replicated around the cluster. The system keyspace and secondary indexes also make use of this strategy.

Adding Vertices and Edges

Going back to our data center supernode example, as you add new servers and their respective “locatedIn” relationships to the Data Center 1 (DC1) supernode, the new edges will be distributed around your cluster. Instead of the DC1 vertex and edges being stored together, each inbound edge to the DC1 vertex will be placed on the node that owns its outbound “Server” vertex. In other words, the “locatedIn” inbound edge to DC1 for the server with serial number S123 will be located on the same nodes as the S123 server. Cassandra and DSE Graph’s id allocation scheme will more or less evenly spread the user vertices around your cluster, thereby spreading the the data center edges at the same time. Each node will have its own materialized view-based VCI to enable indexed lookups on its local set of “locatedIn” edges by the “installedOn” property.

pvt_blog_3

Figure 3: Partitioned vertex table edge placement

Querying

From a Gremlin language standpoint, traversing a partitioned vertex and its edges is no different from traversing a regular vertex, but behind the scenes, the query is being executed differently. Let’s look at an example to illustrate the difference using the following query. This query will return all of the servers that were installed on January 1, 2015.

g.V().has('DataCenter','name','DC1').inE('locatedIn').has('installedOn', '2015-01-01T00:00:00.00Z').outV()

First, regardless of partitioning, DSE Graph will use an index to lookup the DC1 vertex. After that point, execution paths will diverge. In the non-PVT case, assuming there is a VCI defined on “locatedIn” for “installedOn”, a slice query will be run hitting a materialized view that has all of the DC1 inbound edges stored in a single Cassandra partition.  

pvt_blog_4

Figure 4: VCI query without partitioned vertices

If the data center vertex is partitioned, the edges aren’t in one partition, and it’s possible any of the partitions could have an edge with an “installedOn” of January 1, 2015, so instead DSE Graph will query all of the separate materialized view VCIs in parallel and then reconstitute the results.

pvt_blog_5

Figure 5: VCI query with partitioned vertices

Gotchas

Partitioned vertices are powerful but will not solve all of your supernode problems so it is important to be aware of gotchas. First, you can still end up with Cassandra edge table partitions that are too wide. In our little 3-node cluster example, with a replication factor (RF) of 1, you would have approximately one-third of DC1’s edges being served by each node. If you bump that RF up to 3 (which is really what it should be anyway to safely run in production!), each node will serve as a replica for the other 2 and consequently will serve ALL of DC1’s edges, putting it in essentially the same boat as if you hadn’t partitioned the data center vertices. Of course, a 3-node cluster is a rarity so in most cases with larger clusters, you will still get the benefit of spreading the supernode edges around your cluster even with an RF=3. Say you have a 9-node cluster with an RF=3 and expect to have a supernode with 9 million edges. Is that reasonable? Each node in that cluster will own about one-third of the data when the RF is taken into account, so you’d be looking at partitions with approximately 3 million edges per node, which is much too large, even with partitioning enabled. If instead your largest supernode had a few hundred thousand edges, you should be okay.

In addition to large partitions still popping up, you will also need to be cognizant of your query patterns, especially from the supernode side. Even though certain portions of your query will be able to run concurrently across your cluster, full edge scans should be avoided and any analytical queries should be run with DSE Graph Analytics.

Finally, it is important to approach supernode deletion very carefully. When you delete a supernode, it will delete not only the supernode vertex, but all of its edges, and all of the edge copies that had been denormalized onto the other end of the edge relationships. This operation will likely touch every node in your cluster and in the best-case scenario will take a while but will more likely time out and fail. If you do need to delete a large supernode, hopefully you have defined some sort of index on the supernode edges and then you can write queries to manually “page” through the supernode edges, deleting them in smaller, more manageable chunks. This will allow you to control the load on the cluster and avoid failures as you incrementally make your deletes.

Conclusion

In many cases, it is best to identify supernodes early on in your graph modeling efforts, and you can avoid creating them in the first place. When it does make sense to support them, you can use DSE Graph’s innovative partitioned vertex functionality. Chances are you’ll still need to be careful with access patterns, but a partitioned vertex table can reduce query latency and, in many cases, the operational headaches associated with wide Cassandra partitions.