The purpose of this post is to give an overview of background, technical architecture, features offered by VoltDB. Traditional RDBMS systems like Oracle, Sybase are designed decades back and have architecture based on the needs of their times. Michael Stonebraker and his team from MIT have taken a closer look at the current database needs such as High Throughput and Low Latency for query execution, high scalability, high availability, durability, real time analytics, data integration and architected VoltDB accordingly. The Big Data requirements has spurred innovation in data storage and retrieval using NoSQL database solutions like Cassandra, MongoDB, etc. But VoltDB distinguishes itself from them as a NewSQL solution which is truly a RDBMS with support for ACID properties.
VoltDB has a highly-scalable, distributed, shared nothing architecture. VoltDB has innovated its architecture and design by exploiting the current hardware and software trends such as multi-core processors, size of memory and MapReduce style distributed query execution, etc. In traditional RDBMS product’s throughput and latency are affected by factors such as logging, latching, locking and buffer management. By serializing processing VoltDB avoids these issues. Both scale-up (bigger memory, more cpu-cores) and scale-out architecture is supported. In scale out architecture, data is stored in partitions residing in different nodes but the organization of data is transparent to the application. Certain query execution plan (where data doesn’t reside in the same partition) MapReduce style query plans are used this speeds up the query execution by order of magnitude. Though VoltDB comes with access libraries for various languages such as Java, C#, Python, C++, PHP, HTTP/JSON, Ruby, Node.js, etc.
* Data is stored in-memory in partitions (which are based on CPU cores) and organizes data and associated processing constructs together.
* Processes data in a sequential fashion (one transaction at a time) and hence avoids multi-threaded issues concerning logging or latching. Traditional databases actually write two times to disk one for logging (write-ahead log) and one for committing data to disk which minimizes throughput.
* In order to achieve high-throughput and low latency on SQL operations, VoltDB stores data in memory and is designed by default to partition on primary key or key specified by the developer/designer
* Recommends the use of stored procedures which is treated as a single transaction (and to either commit or rollback)
* Supports JDBC so ad-hoc queries can be performed. According to VoltDB its high priority in their product map for constant enhancements. VoltDB encourages the use of stored procedures to speed up transactions
* Easier to create materialized views which are refreshed automatically when the underlying table data changes. In worst-case scenario VoltDB claims to have a 15% performance hit.
* Durability is achieved through continuous snapshots and command-logging using which data is written to persistent storage.
* High availability is ensured by what is called K-safety (where K-safety of 1 determines 2 copies of the partition), automatic network fault detection, live node rejoin are some other features.
* Static data tables (or any table) can be replicated across partitions for faster joins
* Automatically co-ordinates fetch of data from multiple partitions and the architecture ensures that the throughput is kept at maximum
* Partitions can be “resyched” after automatically (or manually triggered) if any node in the cluster fails
* Excellent documentation and support (very responsive and truthful)
* Support for real-time analytics and so well suited for Business Intelligence and fraud detection applications.
* Its Cloudera certified technology
Few things to consider and remember:
* Doesn’t have Hibernate Dialect so Java developers trying to use VoltDB must be aware of this. But there is a Java API.
* A service window is required to add/remove cluster members (according to VoltDB this is high-priority to eliminate this)
* Custom design and development is required to mirror existing SQL databases to take advantage of high-throughput querying. This is also an ongoing priority as per VoltDB
There is no doubt that VoltDB is a high-performance innovative RDBMS with high-availability and scalability. This post summarizes various aspects of its architecture and capabilities which you can use as a guide to explore further and see how VoltDB can further help you in your environment. They also provide tools for Hadoop integration which can be used to derive intelligence and anomalies in your data.
VoltDB technical architecture documentation