The “NoSQL” way of storing data

Most IT professionals are by now most likely aware of the NoSql and perhaps even heard of a few differences between it and the relational database models. The usual question is the real life application for those NoSql solutions, so let’s dig right into it.

In short, NoSQL stands for Not-Only-SQL and what it gives you is an amazing scalability, lightning fast performance, ease of use and no schema limitations. Of course, different NoSQL solutions offer a variety of feature alterations (increased performance with lower ACIDity being the general idea in most cases).

As a developer, while selecting a NoSQL solution you should think of what you need from your data store and pick one which focuses its feature on addressing your main concerns. In some cases you might be willing to sacrifice full data consistency in order to improve write speed. In some other you will prioritize read times over write times or perhaps the other way around…Quoting one of the larger NoSQL beneficiaries: “There is an app for that!” .

The one question which usually arises while discussing NoSQL is: “That’s all great and fun, but where could I really use this in real live?”. Well, let’s look at a few examples.

Event Stores:

Imagine you have an ordering system, storing customer details and product information in a MSSQL database and a NoSQL Event-Storing solution. Since there is no schema limitations you can  easily archive your events including all the relevant data as their properties and have them serialized for you automatically. Consider your event class:

internal class OrderProcessedEvent
{
         public DateTime Created getset;}
         public string Description getset;}
         public IEnumerable<Product> Products getset;}
         public Customer CustomerData getset;}
} 

Let’s assume we are storing an “Order Processed” event. Now, each time you create/store a new instance of this event, all the product and customer data will be serialized “as is” at the current point of time in your SQL database and stored right into your NoSQL database along with other events. A clear benefit from this would be the fact that if a customer decides to change his/her invoicing address or any other data for that matter, this event’s details would not change as it holds serialized data with no relation to the outside world or other entities. This fact makes it perfect for reporting purposes since the data is always accurate to single point in time when it was created!

Large scale social networking:

In this example the goal is to create worldwide social network let’s call it the “The Bark Book” in which users will store profiles of animals, so that other people can Comment/Like/Bark on them. The obvious design problem here is: “How on earth would one implement a repository for all of those hundreds of millions of potential profiles?”. Think of all the

funky-magic you would have to incorporate into your project trying to design this kind of massive data processing solution using standard RDBMS.

But before you do that and possibly change your profession to something less absorbing, like a firefighter in California, first consider what is it really that you expect from your repository and what do you care less about:

  1. We expect people to request a lot more content than actually create it, possibly 10:1 ratio initially.
  2. We need an easy and quick replication of data over hundreds of servers spanned around tens of countries.
  3. We don’t need a confirmation on data being stored at all the replica destinations straight away.

All this is easily achievable by most NoSQL solutions, one of which is Cassandra (utilized by Facebook) which allows you to specify exactly how to read and write to your sets of replicated nodes (shards). Using Cassandra you can say:

I want each of my writes to be confirmed by the first node I write to

Which means other nodes will eventually get the replicated data (but I don’t care about this now, I just want this to be done quickly).

I also want my reads to consist of data which have been written to all the shards.

This means that once a user has send a post to be stored in a node, no one will see this post until it is replicated (asynchronously) among all the other nodes. This gives us some consistency, just enough for what we need and the performance gain resulting from not having to care about all of this ACID nonsense by our data base management system, which is priceless in this particular scenario.

Some other examples to consider for NoSQL: Caching, Reporting, Video-Sharing, Messaging and all the other non-critical but performance and data hungry projects.

In order to pick the appropriate NoSQL solution out of many out there you will need to check on their detailed specs as some offer features not available in the others as a result of tradeoffs limiting or omitting other features. However,  in general there are a lot of solutions most likely meeting your design requirements out of the box. To narrow your search here are more popular battle-tested systems:

Kay/Value Databases

Voldemort, Redis, Scalaris

Columnar Databases

HBase, Hypertable, Cassandra

Document Databases

MongoDb, CouchDB

Graph Databases

InfoGrid, Neo4j

To sum-up, NoSQL is not a replacement for the system where ACID (Atomicity, Consistency, Isolation, Durability) is a requirement, hence managing crucial business information is not a valid use case. Nevertheless, if your user data consists of some read-later documents like videos, invoices or other non-crucial data which needs to be quickly accessible, but not necessarily being available straight away as result of transaction confirmed write action, NoSQL is your friend.

If you decide you can use this and want to give NoSQL approach a chance, here you can find a detailed comparison of popular NoSQL solutions, just pick whichever suits your needs and join the “BIG(Data)” boys.

Feel free to add comments below.

2 comments

Comments are closed.