Technical

Big Data and NoSQL

Traditional relational databases have long dominated web development, but NoSQL is increasingly becoming a viable alternative option. Their scalability and flexibility in database structure make NoSQL databases an ideal candidate in cloud-based environments or when disorganised big data storage is required.



The increased flexibility of NoSQL schemes allows for a more Agile approach to both database and code development.

The database structure can change and evolve in real time with your application development. This reduces the need to thoroughly plan the complete structure of an application’s database up front, before development begins.

Also, when it comes to big data, traditional relational databases can struggle to scale. Even when more resources are added to the server hosting the database, there comes a point when the limit of server upgrades is reached. It is possible to distribute traditional databases across multiple servers but there is a big overhead involved in setting up the cluster and adding new servers when needed.

NoSQL, on the other hand, was designed with scalability in mind. It’s optimised to allow data to be spread across multiple nodes with minimal performance loss. The architecture for distribution is already engineered and it’s a simple task to add more servers or cloud instances to the pool.

Types of NoSQL

There is no single method of providing a NoSQL database and, as the approach has evolved, the following classifications have emerged as the most widely used:

  • Key-value store - Data is stored as a collection of key value pairs, where a key can only be used once in a collection.
  • Document store - The values for the entity are encoded into a document using a standard format such as JSON and are stored in reference to a unique key.
  • Graph – Used when the data represents a known number of relationships between the entities. It is useful for applications such as travel maps.

Cloud Datastore

There are many providers for NoSQL databases, with some offering multiple services for each classification. One of these is Google Cloud Datastore, which provides a document store database that can be accessed via a RESTful API. The advantages of using this service are:

  • The RESTful API can be consumed by many languages and a single datastore can be used by multiple applications.
  • The documents are encoded using the highly supported JSON format.
  • A query engine provides methods ‘out the box’ to filter data with the use of several operators.
  • It has a dashboard interface to view statistics and backup your data.
  • Developers just need to configure an entity kind, then they can add new documents to the store without having to plan the properties or the data types for those properties.

These advantages add up to mean the datastore can be deployed rapidly, and be accessed quickly by the application.

Example of Cloud Datastore vs Relational Database

In this example, I’ll compare the process of setting up a Films table on a relational database with the equivalent on a Cloud Datastore.

Relational Database

For a relational database, a developer will typically create a new table with schema similar to the following:

Field Name [Data Type]: Description

  • id [Integer]: This will be the unique identifier for the film and will be most likely set to auto increment.
  • title [varchar(255)]: This will be the title of the film.
  • description [text]: This will contain the short description of the film.

Now the table is set up, the developer can begin coding the application.

But what if, after the application has gone live, it is decided that two new fields are required for films? For example, we now need to store the film rating and the director's name.

Well in this case, we’ll need to edit the schema and update the queries, as well as handle the legacy records without these fields filed.

Cloud Datastore

With the service set up, the developer will just need to create an entity kind of ‘Film’. We don’t need to specify what the fields will be, or whether they should be text or integer format. When creating a new film record we just need to pass the JSON encoded object to the create endpoint of the REST API. An example object could be:

{
title: 'Film one',
description : 'Lorum ipsum dolor sit amet, consectetur adipiscin elit.'
}

Now when we need to update the application to add the film rating and director’s name we simply add them to the object we pass to the endpoint, without having to change any database schema.

{
title: 'Film one',
description : 'Lorum ipsum dolor sit amet, consectetur adipiscin elit.',
director: 'John Smith',
rating: 4
}

So, for application development in the rapidly evolving world of big data, a NoSQL database is the flexible alternative to traditional relational databases. The ability to scale not only the volume of records, but also the data held as part of each record, ensures your system will be future-proof against unforeseen changes in scope. As a result, you can be much more Agile in your approach to development and iterate for future system improvements.  

Traditional relational databases can struggle to scale. NoSQL, on the other hand, was designed with scalability in mind.

Nathan Powis
Head of Development, IE Digital