ArangoDB Use Case - Blog Brainhub.eu

April 11, 2018 0 Comments

ArangoDB Use Case - Blog Brainhub.eu

 

 

We are a JavaScript software house which builds epic apps using:

    • NodeJS as the platform
    • Express or Koa as an HTTP library/framework
    • Various DBMS which fit a particular need, mostly MongoDB and Redis
    • RabbitMQ for queuing
    • Consul for microservices registration
    • GitLab CI or Circle CI as a continuous integration tool
    • Google App Engine, Bluemix or own servers for deployment



    • React

    • Redux

    • Redux Saga

    • Scss





What kind of problem we have

We have a lot of data on many levels, which means, in a document model, many levels of nested documents. Moreover, we have to be able to operate directly on these nested documents (children, grandchildren, great-grandchildren etc.).

We have to create an API not only for our frontend but also for external integrations. The user should be able to send a JSON schema, which is later used for validation of provided data when creating or updating, and it’s also used to join documents from various collections.

An example of a simple JSON schema:

{ "$schema": "http://json-schema.org/draft-04/schema#";, "title": "profiles", "description": "Profile Schema", "type": "object", "properties": { "id": { "type": "string", "description": "ID" }, "address": { "type": "string", "description": "Address" }, "email": { "type": "string", "description": "E-mail address" }, "firstname": { "type": "string", "description": "First name" }, "lastname": { "type": "string", "description": "Last name" }, "transactions": { "type": "array", "description": "List of transactions connected to the profile", "items": { "title": "transactions", "type": "object", "properties": { "id": { "type": "string", "description": "ID" }, "orderTotal": { "type": "string", "description": "Total value of the transaction" }, "invoices": { "type": "array", "description": "List of invoices related to the transaction", "items": { "title": "invoices", "type": "object", "properties": { "id": { "type": "string", "description": "ID" }, "discountPercent": { "type": "string", "description": "Discount percent" }, "itemNo": { "type": "string", "description": "Item ID" } } } } } } } } }

 
So there are the following solutions:

  • Document database with foreign keys inside the documents:
    • Operating on data with many queries
    • Operating with a single query if a DBMS permits it


  • Relational database:

    • Manual serialization/deserialization
    • ORM


  • Graph database

  • Multi-model document-graph database

What type of DBMS we expect

We would like a DBMS which satisfies:

  • Open source
  • High performance
  • Good support for JavaScript/NodeJS
  • Good community
  • Supporting ACID (atomicity, consistency, isolability, durability)

Other useful features are:

  • Multi-model
  • Powerful query language

Potential DBMS to choose

We love open source solutions, so we eliminated DBMS such as Oracle, SQL Server, DB2 and for licensing issues MySQL.

We made a comparison of many No-SQL DBMS (not only for this project but also to have some overview for other projects, the data is as of February 18th, 2018):

Original file

We took some DBMS from the top of the rank above but eliminating:

  • LevelDB – is a great DBMS but designed for text storage, no document storage, which we need
  • PouchDB – though it is higher in this rank than CouchDB, we decided to consider CouchDB instead because PouchDB is generally designed for backend-frontend synchronization, which we don’t need in our app, but CouchDB is a DBMS used typically on the backend side
  • Memcached – allows caching data like Redis but after more detailed research, it looks like Redis is the undisputed winner over Memcached. Moreover, we’re looking for a document-based DBMS, so we don’t want to consider more key-value DBMS than Redis
  • Firebase Realtime Database – not an Open Source and generally seems not to be enough good to take the risk of having some bugs which we cannot fix
  • Neo4j – designed especially for Java but not NodeJS and is not document-based

 
Moreover, among the SQL DBMS, we decided to include only PostgreSQL in our research because it makes it possible to store JSON-like data, which we need.

Based on the table above and other research, DBMS that seem to suit our needs the most are:

  • MongoDB – the most popular document DBMS:
    • Advantages:
      • NodeJS developers have the greatest knowledge of this DBMS
      • Great clustering options


    • Disadvantages:

      • Missing joins – very important for our data; MongoDB Aggregation Framework is very limited and hard to debug
      • Missing transactions – we need ACID (though planned in MongoDB 4.0)
      • Missing expressive, dedicated query language – queries only in JSON
      • Issues can be reported only in Jira but not GitHub, which is less user-friendly for the Open Source community




  • CouchDB:

    • Advantages:
    • Disadvantages:
      • Missing transactions – we need ACID




  • RethinkDB:

    • Advantages:
    • Disadvantages:
      • Relatively slow performance




  • ArangoDB:

    • Advantages:
      • Great community – very helpful team on Slack
      • Multi-model
      • AQL
      • Transactions
      • Sharding and replication


    • Disadvantages:

      • Still not very popular, so it’s practically impossible to find developers experienced in ArangoDB
      • Relatively slow writes




  • Redis:

    • Advantages:
      • Super fast
      • Popular – among the non-relational databases – only MongoDB is more popular
      • The most stars on GitHub among all DBMS


    • Disadvantages:

      • Not designed for durable persistence (as default everything is kept inside RAM)
      • No query language
      • Very limited queries (only very basic operators like get or incr`)




  • PostgreSQL:

    • Advantages:
      • Very popular
      • Supports SQL
      • Supports many kinds of data like multi-dimensional arrays and user-defined types
      • Is proved to be very mature in production


    • Disadvantages:

      • JavaScript cannot be run on PostgreSQL server
      • Though user-defined functions and data types are very useful, their syntax seems to be from an old epoch like Fortran or Pascal




Why we chose ArangoDB

ArangoDB seems to be something like MongoDB (we have the most experience in MongoDB) with some extra features. Of course it lacks some MongoDB features like the Aggregation Framework but, in reality, this one is not lacking but replaced with something more user-friendly – AQL + joins.

ArangoDB like MongoDB provides clustering, though the ArangoDB clustering has not proven to work stably on production. One of the key factors was a very active community. It has a very low ratio of open issues to the total number of issues. Moreover, everyone can easily access ArangoDB Slack where the support team is very helpful, and also in Stackoverflow they give adequate responses.

Another reason was that ArangoDB is a multi-model DBMS, which is useful as we were planning to extend our documents with using graphs.

How we have used ArangoDB

We have used the following ArangoDB features:

  • AQL
  • Transactions
  • Admin UI (only for some debugging purposes)

We are potentially planning to use in the future:

 
In the ArangoDB shell, we found a very useful feature which doesn’t exist in MongoDB. To learn AQL, no data in the collections was needed because it’s possible to type something like this:

db._query('for i in [1,2,3] return i * i')

 
Because one of the requirements was to build the data from many collections using the provided JSON schema, we were looking for an ArangoDB query builder.

We found something which was rather unpopular and lacked many features, so we created our own ArangoDB query builder.

We created an abstract interface, so when replacing ArangoDB into another DBMS, only the inner implementation would be changed.

An example code of our query builder:

const QueryBuilder = () => { const priv = { // private fields and methods }; const pub = { getQueryTree() { return priv.queryTree; }, fromSchema(schema) { priv.mainCollectionName = schema.title; priv.queryTree.loop = `FOR ${schema.title}Item in ${schema.title}`; priv.queryTree.sorting = `SORT ${schema.title}Item.id`; // some more code return pub; }, withLimit(offset, count) { // some code }, byId(id) { // some code }, byIdentifiers(identifiers) { // some code }, byParentId(collectionName, id) { // some code }, toAQL() { return [ priv.toString(), priv.bindings, ]; }, }; return pub; }; export default QueryBuilder;

 
We have created some JavaScript code which runs on the ArangoDB server, and we use this code for most transactions.


Tag cloud