A Beginner’s Guide to NoSQL
Traditionally the way that SQL works is through relational databases. As such, it tends to run into certain issues, for example, complex design due to needing to avoid data duplication. Another big issue is the fact that SQL is only vertically scalable, and therefore can become quite expensive as a project expands.
NoSQL was made as a solution and counter to SQL. Interestingly enough, NoSQL databases have existed since the late 60s and early 70s, but it wasn’t really until the early 2000s, that the whole thing took off. You can mostly thank Amazon and Google for helping get NoSQL off the ground with AmazonDynamo and BigTable respectively.
One of the biggest features of NoSQL, and the one that tends to attract the most attention, is that NoSQL doesn’t need to use any sort of schema. As such, data is not stored in a relational data model, but instead, can be custom-built to purpose.
Advantages and Disadvantages of NoSQL
There are several advantages to NoSQL for programmers. For starters, the whole system is much more programmer-friendly. The inclusion of simple APIs in nearly every language means that a programmer can relatively easily jump on and start working with NoSQL.
We touched on this above, but since NoSQL doesn’t use a schema it doesn’t require a large amount of complex pre-design; things can be changed on the fly to suit the needs of the project as it develops. This is great if your project is going to be going through several iterations or alteration as you move along the timeline and need that extra flexibility.
Aside from that one very important advantage is the savings in cost, not only from needing to hire a good designer for the schema but because NoSQL is horizontally scalable. With SQL, the only way to upgrade is to get more and more expensive equipment, but with NoSQL, you can just use another shard and that’s all you need in terms of upgrading. So not only is it straightforward, it’s cheap.
Finally, one of the biggest advantages is that NoSQL often performs much better than SQL. This is particularly the case if you’re using large volumes of data. Since NoSQL essentially contains information in one database, querying doesn’t need to be split across several like with SQL.
Of course, NoSQL has some disadvantages too. For starters, even though it’s been around since the 70s, interest in it has only really existed for about 10-15 years now. As such, NoSQL isn’t as mature as SQL is and so there isn’t as much widespread support and information on it. Similarly, it’s relatively easy to find an SQL expert, whereas it’s much more difficult to find a NoSQL expert.
Making the last point above even more complex is that NoSQL isn’t so much a database as it is a design philosophy, and in fact encompasses nearly a dozen different types of data models. The reason there are so many is that NoSQL is meant to be very specialized to its use-case, and therefore, you might often end up using several databases. In fact, you may still need to use SQL, since NoSQL is not meant to be a replacement at all.
Finally, and a slightly more minor issue: NoSQL databases tend to be truly gargantuan. Since data duplication is nearly completely ignored in NoSQL, database sizes can truly balloon to massive proportions. That being said, modern storage is so cheap per TB, that you don’t really need to worry about this issue as much; it’s just something to keep in mind.
Types of NoSQL Databases
Document stores
One of the issues with SQL, at least nowadays, is that it relies a lot on XML and JSON. That means that the two get inexorably tied together and both end up getting held back because of the inefficiency of the system.
With NoSQL, the lack of schema means that you don’t have to store data relationally, so you no longer have to tie XML and JSON together. This makes it so that you get a better overall experience because of optimization and metadata extraction due to this specific data model. Interestingly enough, there is actually an XML specific document store data model.
Hierarchical
Based on relevance, this data model uses a parent-child or tree relationship to describe data. Data is stored as a record with a pointer that identifies its location and can be cross-referenced with other records. There are several situations where a hierarchical structure could be called for:
- Website pages
- Map addresses
- File/folder structures and explorers
- Comments with threads (Reddit, Twitter, etc.)
- Warehouse storage and inventory systems
Of course, there’s no reason why you can’t do all this in SQL, but if you’re familiar with it, then you know that modeling a hierarchical database in SQL is problematic, if not downright exhausting. Using NoSQL is straightforward and much, much easier.
Graph
Graph data models, which can also be referred to as network data models, are made so that the relationships between data is just as important as the data itself. More specifically it’s handled as relationships and nodes, with the nodes being the data entity, and the relationship describes how any set of nodes are linked.
Since it’s built to show how one piece of data is related to another piece of data, this type of data model is really well suited for any information that would go on a graph, as the name suggests.
Why is there a whole data model just for graph information? We live in a world with absolute tons of data, and finding discrete information can be difficult. Using a graph can present information that human eyes and brains can consume much faster than just data lists. Furthermore, by refining the data that is shown, and increasing the number of relationships shown, you can truly get some great insight into pretty much any data.
Key-value store
Originally pioneered by Amazon, the key-value store data model is made for managing, storing and retrieving arrays. More importantly, this type of data model is purpose-built for high-volume and high-value applications. For example, it’s no coincidence that Amazon pioneered this data model, since it’s perfect for the Amazon store which houses millions of pieces of data, and requires access to said data many thousands of times a second.
One thing that you might also notice, is that the key-value store is actually a larger umbrella under which other data models exist under. For example, some graph data models use key-values internally, using pointers to create a relationship between different sets of records. Also, since both the keys and the values can essentially be any data you want, this data model is extremely flexible.
Object-oriented
Object-oriented is a bit quirky in that it isn’t necessarily a standalone database, as much as it is a database management system.
This specific data model is used to store information as objects, and since this is such a vague definition, that means that pretty much any type of data can be an object. This can be first/last name, addresses, GPS coordinates, numbers of different things such as comments and more. Using this data model/philosophy, data is made transparent, which is really useful when dealing with large volumes of data.
You’ll often find this data model used for research, web-scale and as support for object-oriented programming.
Column-oriented
Essentially, instead of using rows of data, column-oriented data models use columns. These columns are then grouped into families. Column families can then hold an infinite (to an extent) number of columns, and read and write is done by columns as well.
This data model is great for having some really fast search and access and is similarly great for data aggregation. There are actually quite a few popular use-cases:
- Social media platforms
- Counter maintenance systems
- Content management systems (CMS)
- Systems with heavy write requests.
We should also mention at this point, that if you need complex querying, this data model is best to be avoided.
Triple Stores
Triple stores deal with data with a triple model which is subject-predicate-object. Thankfully, this one is straightforward, where the object and subject function as they typically do, and the predicate describes the relationship between the two.
This type of data model is great for webs of data, and was, somewhat ironically, helped by Sir Tim Berners‐Lee and the World Wide Web Consortium (W3C).
We’ll also mention at this point that triple stores work similar to network models, except being focused on semantic queries.
How NoSQL Databases Work
NoSQL is more of an overall design philosophy than it is a specific database that you can download and use. In fact, there are a huge amount of NoSQL databases you can draw from, each with its own specialization and use-case. Therefore, there is no specific answer to how NoSQL databases work without asking “well, which NoSQL database?”
That being said, we can say in a very general sense, that NoSQL databases don’t work with schemas, or at least, they don’t need them to function. They’re also generally more flexible because you can add any kind of data whenever you want essentially, or change the data model type somewhat on the fly.
Examples of NoSQL databases
There are probably upwards of 50 NoSQL databases for you to pick from, each having a different use case. That being said, here are some popular ones:
- HBase
- Cassandra
- Neo4j
- FlockDB
- Redis
- Amazon DynamoDB
- MongoDB
- CouchBase
- Hypertable
- Riak
How to learn NoSQL
For starters, I’d suggest checking out edX’s NoSQL offerings, with structured learning for things like DynamoDB and Big Data systems.
Of course, the best way is to pick a specific NoSQL database and search for tutorials that way. For example, this Tutorialspoint has a great MongoDB course. If you’re interested in Neo4j, there’s actually a video tutorial series on their website. These are also some great video tutorials to check out if you’re just starting:
- How to Choose the Right Database?
- An Introduction To NoSQL Databases
- Introduction to NoSQL Databases for Beginners
The thing a lot of people tend to get confused about when it comes to NoSQL is what exactly it is. Hopefully, you’ve seen that NoSQL is not really comparable to SQL, either as a database (which the former isn’t) or in philosophy. NoSQL is incredibly versatile and is the scalpel to SQL’s shovel but the truth is, sometimes you do need that shovel.
All information in this article is provided to you “as is” and represents the views of the authors. TechChannel cannot guarantee or imply absolute reliability, serviceability or function of the information herein.