Hadoop vs. NoSql vs. Sql vs. NewSql: A Tale of Database Evolution

Hadoop vs. NoSql vs. Sql vs. NewSql: A Tale of Database Evolution

In the realm of database technology, a significant shift has occurred since the 1970s, when relational databases (RDBMS) dominated the market with their SQL database systems. Although hierarchical databases still hold a presence on mainframes, relational databases have proven to be a reliable choice, especially when it comes to data integrity. The ACID (atomicity, consistency, independence, and persistence) principle ensures that relational databases maintain a high level of consistency and accuracy.

However, the rise of web technology and the increasing demand for large-scale transactions per second have exposed the limitations of relational databases. The mechanisms of relational databases are not designed to handle the high volume of data generated by online platforms like Amazon. This is where NoSql databases come into play, offering an alternative mechanism that has, however, weakened the ACID principle.

To address these issues, some NoSql suppliers have made significant progress in solving these problems, introducing a concept called eventual consistency. NewSql, on the other hand, takes advantage of modern programming languages and techniques to create a relational database without its disadvantages. This is how many NewSql suppliers started, aiming to provide a better solution for large-scale data storage and analysis.

A Different Species: Hadoop

Hadoop is a completely different entity from the traditional database systems. It is, in fact, a file system rather than a database. The roots of Hadoop lie in the development of internet search engines, which required a scalable and fault-tolerant system to handle large volumes of data. Although Hadoop and its partners (Hbase, Mapreduce, Hive, Pig, and Zookeeper) have transformed it into a powerful database, Hadoop remains a distributed file system designed for batch processing and data analysis.

A Real-World Example: A Video Game Company

Imagine a video game company that has been in business for a decade, with a recent surge in popularity. The company’s customer information is currently stored in a Sql Server database, which has proven to be optimistic. However, as players start playing the game online, the database struggles to keep up with the speed of data updates, leading to delayed experiences for players. The rapid growth of the user base has resulted in significant expenses on hardware and software upgrades, which have not helped the situation.

Splitting the Online User Base

To address these issues, the IT department decided to split the online user base, running the online games on NoSql and NewSql. The goal was to find the best solution, and after careful consideration, they chose NoSql CouchBase (similar to MongoDB’s document-oriented type) and NewSql VoltDB. CouchBase is an open-source database with an integrated cache mechanism and automatic data propagation between nodes. VoltDB, on the other hand, follows the ACID principle of relational databases, is fault-tolerant, scalable, and has no shared memory architecture.

Hadoop Debut

Finally, the online operation can be carried out smoothly, and the company wants to analyze its data to find the best market to sell its product. To achieve this, they need to combine the user data from Sql Server data warehouse databases and online games, and then run an analysis report. This is where Hadoop makes its debut, and the company builds a Hadoop system to merge the two data sources. The analysis report is then generated using open-source R language and its MapReduce modules.

By understanding the evolution of database technology and the strengths and weaknesses of each system, companies can make informed decisions about which solution best suits their needs. Whether it’s relational databases, NoSql, NewSql, or Hadoop, each system has its own advantages and disadvantages, and choosing the right one can make all the difference in the success of a business.