Multi-Tenant Applications in Neo4j

neo4j-logo-2015Note: In speaking with members of the Neo4j team, I have been told that a comprehensive plan for multi-tenancy is already in the works, with plans to release a rich set of multi-tenancy features in 2017.

Neo4j is the most popular graph data store available today. It leverages graph technologies to help build modern high-performing applications, but it does not have any native multi-tenant support. However, you may have decided to build out your multi-tenant application and that Neo4j is the right graph data store to fit your needs. In any multi-tenant system, the trick (from a data-store side) really comes down to how to isolate one tenant’s data (physically or logically) from another tenant’s. In this post, I will look at three specific approaches:

  • User-Defined Functions: logically partitioning the data at query time
  • Third-Party Tooling: logically partitioning your data within the application layer using tooling and/or frameworks
  • Virtualization/Containerization: physically partitioning data using virtualization technologies

Note: A “tenant” in this case refers to a group of users with common access to a dedicated share of the data, configuration and resources of the system. For an overview on reducing complexity in multi-tenant applications, read Part 1 of this series, Multi-Tenant Applications – Reduce the Complexity.

User-Defined Functions

Neo4j has the ability to create user-defined functions that can be called within a Cypher query. This function would then be available to call from within your query, where you would pass in the current results as well as some security credentials. The procedure would return a subset of the results based on a security policy that would be applied to each node/edge. The logic to apply the filtering based on the security predicate would need to be written in Java and deployed as a plug-in to your Neo4j installation. You will end up with something that might look like the query below where custom.applySecurity is your custom Java procedure that should be applied to each query:

     MATCH (node:Label) as nodes RETURN CALL custom.applySecurity(userName, nodes)

An option when using this method would be to store a disconnected graph for each tenant in the same database. By leveraging the multi-label capability available, you can assign a tenant-specific label to each node in the disconnected graph, which can be used for logically separating data and limiting access.  

Note: With the release of Neo4j 3.1, security via this method has now be simplified with the addition of custom role mapping for subgraph access control. This new feature allows you to create users and roles per tenant and have security applied by the user-defined procedure. This new functionality reduces the developmental complexity for this option. For more information on this, please check out the documentation available here.

Pros

  • Provides logical partitioning of data in a single graph
  • Encapsulates your security policy logic in a single location
  • Easier development because security specifics are implicit in the strategy

Cons

  • Scales to large data volumes but some nodes that are read replicas only see Causal Clustering
  • Needs to be applied to each query, potentially leading to insecure queries without other measures in place to make sure it’s included (e.g., code reviews, testing)
  • All customers are in the same graph, so they are vulnerable to the Noisy Neighbor Problem.

3

Third-Party Tools

There are several third-party application platforms that are available to help create a secured data layer on top of Neo4j. Three of these tools are Graphilieon Interactor, GraphAware Enterprise and Structr. These tools do not implement the security themselves; they simply provide a framework to allow you to efficiently build out your security mechanisms, so the amount of complexity can vary based on requirements. Another option in addition to the frameworks above would be to create a domain-specific language (DSL) within Cypher and then use a library such as neo4j-ogm, hibernate ogm or spring data neo4j to handle automatically generating the query syntax based on your defined DSL.

Pros

  • Provides logical partitioning of data in a single graph
  • Encapsulates your security policy logic in a single location

Cons

  • Data store is unsecured; all traffic must be sent through application layer
  • Reliant on third parties to maintain compatibility as versions move forward

4

Virtualization/Containerization

Neo4j is built to run have a single graph per Neo4j server, meaning that if you have the need to physically isolate tenants from one another, each tenant would need a separate physical Neo4j server running. Each individual instance (or cluster) has specific configurations and port requirements that need to be managed, so configuring and maintaining this information in this scenario is something that must be considered.

Neo4j allows for multiple instances (or clusters) to be run from the same installation with each instance (or cluster) pointed to a different database. Each tenant would need to get a unique configuration file with the dbms.active_database parameter set to the appropriate database location (see here). This configuration file can then be passed in at startup using the NEO4j_CONF environment variable. If you choose to run multiple instances of Neo4j on the same server, you will need to ensure that the server has sufficient resources available to satisfy all the running servers.

Another option would be to use a containerization such as Docker to handle the isolation and resource management. Neo4j is officially supported on Docker (see here) and has been shown to work on Kubernetes, DC/OS and in the Mesosphere. These sorts of infrastructure automation tools in combination with appropriate CI/CD and configuration management tools can be leveraged to provide a scalable deployment pipeline.  

Pros

  • Officially supported deployment method
  • Allows for high hardware reuse

Cons

  • Requires investment in operational infrastructure (CI/CD and configuration management) to provide scalability
  • Vulnerable to the Noisy Neighbor Problem

5


Conclusions

09_Summary_Neo4jNeo4j supports multi-tenancy with either physical or logical isolation, despite having no built-in capabilities. This requires the implementor to create a custom solution that has the benefits of being tailored to their specific requirements. Both user-defined functions and third-party tools provide the ability to logically isolate customers, while the well-supported virtualization environment provided by Neo4j provides a solid approach to physically isolating tenants.

For other posts in this series, see:

Multi-Tenant Applications: Reduce the Complexity

Multi-Tenant Applications in DataStax Graph

Multi-Tenant Applications in OrientDB