What are the weaknesses of Neo4j

Business Informatics Wiki - Kewee

Graph databases (also graph-oriented databases) save data sets in the form of graphs and are therefore well suited for working with such structures, such as finding and analyzing relationships.

Data model

The data model of the graph databases is basically based on Leonard Euler's graph theory. They are in the form of graphs with node for entities and directed edge modeled for the relationships between them. Both can each be provided with properties. It is also called "Property graph data model". All nodes and edges together form the graph database.

Each node element is given its own unique name and, in contrast to relational systems, also contains the set of incoming and outgoing connections and relationships to the neighboring elements, as well as the properties in the form of a key-value pair. The edges also have a unique designation, properties and also information about their start and end nodes. (Rouse 2014)

A scheme does not have to be defined in advance; nodes, properties, connections and new types of relationships can be dynamically added or removed at any time. However, it should be noted that relationships always require a start and end node. A node can only be deleted if there are no longer any connections to it. In contrast to RDBMS, it is possible to interpret and use the database in a wide variety of ways, based on the network of relationships, without having to adjust the entered data.

Graph databases differ from the other, aggregator-oriented NoSQL models. While the motivation behind the other concepts was to create systems that can be easily scaled to large clusters and work with relatively large data sets that are only weakly interconnected, it is exactly the other way around with graph databases. You are pursuing a different goal, which arose from the inability of relational databases (but also other NoSQL proponents) to handle strongly linked data. They mostly run on one-server architectures (since graphs cannot be partitioned so easily) and hold rather small data sets that can, however, be interconnected in complex ways. (Sadalage / Fowler 2012: p. 26ff) They can also support full ACID-compliant transactions. For example, the Neo4J database only allows changes to a graph if this is done in a transaction (Neo4J 2015).

advantages

When it comes to complex relationships within large amounts of data, graph databases have a clear advantage over RDBMS, but also the other NoSQL databases.

RDBMS are rather unsuitable for this, as it may require many time-consuming JOINs to display and find connections, which become more and more expensive the more data records are added and the deeper the connections between them are to become. Graph databases offer a solution to this problem. They have been specially developed and optimized for this and come with corresponding algorithms.

Since relationships are already created as elements when they are created, they do not have to be laboriously calculated at runtime for queries, but can be used directly. This means higher "costs" when creating, but enables a constantly fast traversing of the connections / edges (and not via keys, for example) for queries regardless of the total amount of data, since only the connections relevant for the query are taken into account. When navigating through the relationships, one also speaks of index-free adjacency, because it is not done using global indices, but rather by traversing the edges (Armbruster 2014).

Another advantage can be seen in the conception phase during the modeling. Here you can occasionally hear the motto "If you can whiteboard it, you can graph it". This means that graphs, as they are easily understandable drawn on paper or whiteboards, can also be implemented as a database.

In addition, in contrast to the other NoSQL data models, graph databases also offer ACID consistency properties.

disadvantage

In contrast to the other NoSQL databases, which mainly distribute their data in aggregates according to their primary key, graph databases have to somehow break up their network of relationships in order to be able to scale the system on a distributed architecture and have to provide corresponding operations for this. Sharding is not as simple a process as with the other systems and leads to a loss of performance rather than a gain, as these systems were designed for single-server architectures on which traversing happens faster than on distributed systems. (Sadalage / Fowler 2012: p. 119) If the capacity of the server is insufficient and the graph database is to be distributed, the graph must be partitioned into subgraphs, whereby finding a meaningful place for it can prove to be difficult and should be examined in detail.

use

Your strengths lie in finding and representing networked relationships. They are therefore particularly suitable for scenarios in which there are networked relationships in the data, for example in navigation systems, in social networks ("who knows whom about whom?"), Also in online shops for purchase recommendations ("customers who bought this product, also bought… ”) or for the provision of personalized advertising content (possibly also including geo-data on mobile devices), but also in the area of ​​fraud detection when hidden relationships have to be found.

Representatives of this category are: Neo4j, InfiniteGraph, FlockDB.

Last modified: 2015/10/05 21:24 by