Distributed Databases, NOSQL Systems and Big Data

Binayak Niraula | Tue Jan 20 2026

Distributed Database
Data Fragmentation
Data Replication
Data Allocation
Types of Distributed Database Systems
Distributed Database Architectures
Introduction to NoSQL System
RDBMS vs NoSQL
The CAP Theorem
Big Data
Map Reduce
Transperency in Distributed Databases

Distributed Database

A distributed database (DDB) is a collection of multiple logically interrelated databases distributed over a computer network. A distributed database management system (DDBMS) is the software that manages the DDB and provides an access mechanism that makes the distribution transparent to the users. The distributed database DDB and distributed database management system DDBMS together is called distributed database system (DDBS).

A distributed database refers to the database system that is spread across multiple computers or nodes connected by a computer network. In this type of database, the data is stored in a distributed manner with each node holding a subset of overall data. The nodes can be located in the same physical location or geographically dispersed.

The primary goal of a distributed database is to provide scalability, fault tolerance, and improved performance compared to centralized databases. A distributed database can handle large data volumes and support higher levels of concurrent users.

distributed database

Fig: Distributed Database

Characteristics of Distributed Database System

A collection of logically related shared data
The data is split into a number of fragments
Fragments may be replicated
Fragments/replicas are allocated to sites
The sites are linked by a communications network
The data at each site is under the control of a DBMS
The DBMS at each site can handle local applications autonomously
Each DBMS participates in at least one global application

Components of Distributed Database System

The distributed database system consists of several essential components:

Computer workstations or remote devices (Sites or nodes) that form the computer network system. The distributed database system must be independent of the computer system hardware.
Network hardware and software components that reside in each workstation or device. The network components allow all sites to interact and exchange data.
Communications media that carry the data from one node to another. The DBMS must be communications media independent, that is, it must be able to support several types of communications media.
The transaction processor, which is the software component found in each computer or device that requests data. The transaction processor receives and processes the application's data request, remote and local. The transaction processor is also known as or the .

Distributed Database

Fig: Distributed Database

Characteristics of Distributed Database System

A collection of logically related shared data

The data is split into a number of fragments

Fragments may be replicated

Fragments/replicas are allocated to sites

The sites are linked by a communications network

The data at each site is under the control of a DBMS

The DBMS at each site can handle local applications autonomously

Each DBMS participates in at least one global application

Components of Distributed Database System

The distributed database system consists of several essential components:

Computer workstations or remote devices (Sites or nodes) that form the computer network system. The distributed database system must be independent of the computer system hardware.

Network hardware and software components that reside in each workstation or device. The network components allow all sites to interact and exchange data.

Communications media that carry the data from one node to another. The DBMS must be communications media independent, that is, it must be able to support several types of communications media.

The transaction processor, which is the software component found in each computer or device that requests data. The transaction processor receives and processes the application's data request, remote and local. The transaction processor is also known as or the .

Stu_id	Stu_name	Stu_address	Dept_id
10	Maya	Palpa	1
11	Abhin	KTM	2
12	Arnav	KTM	1

Stu_id	Stu_address
10	Palpa
11	KTM
12	KTM
13	Palpa

Stu_id	Stu_name	Stu_address	Dept_id
10	Maya	Palpa	1
12	Arnav	Kathmandu	1

Stu_id	Stu_name
12	Arnav

Aspect	RDBMS	NoSQL
Maturity & Expertise	Users know RDBMS well as it is old and many organizations use this database for the proper format of the data.	This is relatively new and experts in NoSQL are less as this database is evolving day by day.
User Interface Tools	User interface tools to access data is available in the market so that users can try with all the schema to the RDBMS infrastructure. This helps to interact with the data well and users will understand the data in a better manner.	User interface tools to access and manipulate data in NoSQL is very less and hence users do not have many options to interact with data.
Scalability & Performance	RDBMS scalability and performance faces some issues if the data is used. Servers may not run properly with the available load and this leads to performance issues.	It works well with high loads. Scalability is very good in NoSQL. This makes the performance of the database better when compared with RDBMS. A huge amount of data could easily be handled by users.
Joins	Multiple tables can be joined easily in RDBMS and this does not cause any latency in the working of the database. A primary key helps in this case.	Multiple tables cannot be joined in NoSQL as it is not an easy task for the database and does not work well with the performance of the database.
Availability & Consistency	The availability of the database depends on the server performance and it is mostly available whenever the database is open. The data provided is consistent and doesn't confuse users.	Though the databases are readily available, consistency provided in some databases is less. This results in the performance of the database and users should check the availability often.
Data Analysis	Data analysis and querying can be done easily with RDBMS even though the queries are complex. Slicing and dicing can be done with the available data to make a proper analysis of the data given.	Data analysis is done also in NoSQL, but it works well with real-time data analytics. Reports are not done in databases, but if the application has to be built, then NoSQL is the solution for the same.
Document Storage	Documents cannot be stored in RDBMS because the data in the database should be structured and in proper format to create identifiers.	Documents can be stored in the NoSQL database as this is unstructured and not in rows and columns format.
Partitions & Key-Value Pairs	Partitions cannot be created in a database. Key value pairs are needed to identify the data in particular format if used in the schema database.	Partitions can be created in a database easily and key-value pairs are not needed to identify the data in the source. Software as a service can be integrated with NoSQL.
Database Type	RDBMS is called a relational database.	NoSQL is called a distributed database.
Scaling Direction	RDBMS is scalable vertically (adding more power to existing server - CPU, RAM, storage).	NoSQL is scalable horizontally (adding more servers to distribute the load).
Maintenance	Maintenance of RDBMS is expensive as manpower is needed to manage the servers added in a database.	NoSQL is mostly automatic and does some repairs on its own.
Examples	MySQL, Oracle, SQL Server, etc.	IBM Domino, Oracle NoSQL, Apache HBase, etc.

Aspect

RDBMS

NoSQL

Maturity & Expertise

Users know RDBMS well as it is old and many organizations use this database for the proper format of the data.

This is relatively new and experts in NoSQL are less as this database is evolving day by day.

User Interface Tools

User interface tools to access data is available in the market so that users can try with all the schema to the RDBMS infrastructure. This helps to interact with the data well and users will understand the data in a better manner.

User interface tools to access and manipulate data in NoSQL is very less and hence users do not have many options to interact with data.

Scalability & Performance

RDBMS scalability and performance faces some issues if the data is used. Servers may not run properly with the available load and this leads to performance issues.

It works well with high loads. Scalability is very good in NoSQL. This makes the performance of the database better when compared with RDBMS. A huge amount of data could easily be handled by users.

Joins

Multiple tables can be joined easily in RDBMS and this does not cause any latency in the working of the database. A primary key helps in this case.

Multiple tables cannot be joined in NoSQL as it is not an easy task for the database and does not work well with the performance of the database.

Availability & Consistency

The availability of the database depends on the server performance and it is mostly available whenever the database is open. The data provided is consistent and doesn't confuse users.

Though the databases are readily available, consistency provided in some databases is less. This results in the performance of the database and users should check the availability often.

Data Analysis

Data analysis and querying can be done easily with RDBMS even though the queries are complex. Slicing and dicing can be done with the available data to make a proper analysis of the data given.

Data analysis is done also in NoSQL, but it works well with real-time data analytics. Reports are not done in databases, but if the application has to be built, then NoSQL is the solution for the same.

Document Storage

Documents cannot be stored in RDBMS because the data in the database should be structured and in proper format to create identifiers.

Documents can be stored in the NoSQL database as this is unstructured and not in rows and columns format.

Partitions & Key-Value Pairs

Partitions cannot be created in a database. Key value pairs are needed to identify the data in particular format if used in the schema database.

Partitions can be created in a database easily and key-value pairs are not needed to identify the data in the source. Software as a service can be integrated with NoSQL.

Database Type

RDBMS is called a relational database.

NoSQL is called a distributed database.

Scaling Direction

RDBMS is scalable vertically (adding more power to existing server - CPU, RAM, storage).

NoSQL is scalable horizontally (adding more servers to distribute the load).

Maintenance

Maintenance of RDBMS is expensive as manpower is needed to manage the servers added in a database.

NoSQL is mostly automatic and does some repairs on its own.

Examples

MySQL, Oracle, SQL Server, etc.

IBM Domino, Oracle NoSQL, Apache HBase, etc.

In this notes

Distributed Databases, NOSQL Systems and Big Data

Binayak Niraula | Tue Jan 20 2026

Table of Contents