Multi-Tenant Applications in OrientDB

orientdb_logo_2x11OrientDB is one of several popular graph data stores on the market today. It provides a multi-model approach with the powerful nature of a graph database and the flexibility of a document data store. If you have decided to build out your multi-tenant application on top of OrientDB, you are in luck as it has several built-in, out-of-the-box methods for handling multi-tenancy. In this post, I’ll look at three specific approaches:

  • Graph Partitioning: logically partitioning the data at query time
  • Separate Databases: logically partitioning your data within the application layer using tooling and/or frameworks
  • Clustering: physically partitioning data using OrientDB’s clustering mechanisms

Note: For this blog series we will talk about a multi-tenant application as an application in which two or more tenants are served software by a set of resources. A “tenant” in this case is a group of users with common access to a dedicated share of the data, configuration and resources of the system. For an overview on reducing complexity in multi-tenant applications, read Part 1 of this series, Multi-Tenant Applications: Reduce the Complexity.

Graph Partitioning

OrientDB has support for Record-Level Security that can be leveraged to create a Partitioned Graph per tenant. A Partitioned Graph is a subgraph of data that is only available to specifically authorized subsets of users. You accomplish this by leveraging the robust security model that OrientDB provides to logically isolate one tenant’s data from other tenants. In order to enable this functionality, there are three basic steps:

  1. Create a database and extend the V and E super classes with ORestricted. ORestricted is a special property that will now be appended to all vertexes and edges in the graph and is used to restrict access to those entities.
  2. Create users and roles in that database. OrientDB allows you to create roles that can then be assigned specific permissions (All, Read, Update, Delete). You are then able to create users who are assigned to those roles. With this robust security model you can not only isolate one tenant from another but also enforce application roles within a single tenant. For example, administrators can modify master data but all others can only read it.).
  3. Finally, create a new graph for each tenant and associate a user/role with it. When you connect as a specific user, you will automatically be restricted to viewing only the Partitioned Graph associated with that user.

Since the partitioning of the graph occurs at the low level within the core of OrientDB, this approach also has the advantage of being enforced on other clients such as those using Gremlin or the Java API. However, there is a price to pay here as each record returned by a query has a low-level hook that is called to process the security model. In most cases this is not a problem, but if you have a complicated security model such as those containing multiple levels of inheritance, the cost to process these can become noticeable.

Note: An excellent walkthrough of how to create Partitioned Graphs is available here.

Pros

  • Easy to set up and develop against since its implementation is invisible to the developer
  • Security is applied even when coming from other clients.
  • Allows for leveraging of role-based permissions within a database

Cons

  • Performance overhead associated with security
  • Scales only to data that can be contained in a single database
  • All customers are in the same database so they are vulnerable to the Noisy Neighbor Problem.

11_orientdb

Clustering

One of the unique features of OrientDB is Clusters, which allow you to specify how the data in each class (vertex) is grouped on the physical disc. By creating a separate cluster for each class per tenant, you can physically isolate one tenant’s data from another. One major drawback to using this methodology is that due to the class inheritance structure you will also need to logically isolate that tenant’s data. This means that you must explicitly include that tenant’s cluster identifier in each query. The added need for logical isolation increases the developmental complexity of this method but allows for additional flexibility.

For example, given a database with a class called Customer, you would create a custom cluster for each tenant Customer_Tenant1, Customer_Tenant2, etc., as they are added to your system. You would include the cluster identifier in each query to limit the results to just a single tenant’s data, e.g., SELECT FROM cluster:Customer_Tenant1.

One advantage of using this methodology is that aggregating data across all tenants is as simple as only using the base class name in the query instead of the tenant cluster identifier, e.g., SELECT FROM Customer.

Pros

  • Tenant’s data is logically isolated, but clusters can be stored on separate physical discs if desired.
  • The query optimizer knows how to optimize queries for use in clusters.
  • Data is easily aggregated across tenants.

Cons

  • Developmental complexity is significantly increased by the need to include the cluster identifier in order to assure data isolation.
  • Tenants are susceptible to the Noisy Neighbor Problem, which can be mitigated by migrating tenant clusters to different physical discs.

12_orientdb

Separate Databases

OrientDB has the capability to run multiple databases on a single OrientDB server. You can leverage this functionality to give each tenant in your system a unique database. This provides each tenant with complete physical isolation of their own from other tenants’ data while allowing you to leverage a shared infrastructure. This simplifies the development of applications by removing the need to worry about handling any logical isolation but adds some additional complexity to the operational aspects of the system. With this method you will have to coordinate database upgrades/migrations, handle tenant resource concerns, as well as handle routing tenants to different databases. While all of these operational aspects are well understood and not unique to OrientDB, it does add to the overall workload required to make this method work efficiently.

Pros

  • Tenant’s data is truly isolated; databases can even be stored on separate physical discs if desired.
  • Supports an unlimited number of tenants by adding additional servers/hardware

Cons

  • Operational overhead is increased by dealing with many databases
  • Loss of ability to easily aggregate data across tenants
  • Tenants are susceptible to the Noisy Neighbor Problem, which can be mitigated by migrating tenants to different servers.

13_orientdb

Conclusions

14_orientdb

OrientDB has robust support for both physical and logical isolation using a variety of different methods. Each of the options presented here have pros and cons but each are also the right fit for certain use cases. If you are looking to completely physically separate tenant data, then you should look into using Separate Databases. If you are fine with only a logical separation of data, then Graph Partitioning is probably the best option. If you end up needing something in between, then you can look at Clustering as the possible solution.

For other posts in this series, see:

Multi-Tenant Applications: Reduce the Complexity

Multi-Tenant Applications in Neo4j

Multi-Tenant Applications in DataStax Graph