Distributed database design using CAP theorem


CAP is a theorem/standard which can be applied while designing distributed systems like distributed databases. A distributed database is a set of databases(nodes) which are spread across the network. 
These set of databases(nodes) in the distributed database can be either designed as either a master/slave or a sharded model.

Consistency - Database should return the latest write on the DB.
Availability - Database should return data in a decent amount of timeBut, the data might not be latest.
Partition tolerant - In spite of n/w failure, the database should always be available.

A distributed database can be designed either as a CP, AP, CA but not with all the 3 capabilities. 

AP - Available and Partition tolerant  -  Sharded model (Data is distributed, No Master)  - 
    Availability - Because, the data is distributed, any of the nodes can return the data. So, always available
    Partition tolerant - If any node goes down, the other nodes can step up and returns the data since the data is distributed.
    Consistency  - Since there is no master and all the nodes are working collaboratively work to store the data.  Your DB might not return the latest write when requested. Because the consistency arrives once all of you nodes come to sync.  Though it takes less amount of time for all the nodes to be in sync, consistency is sacrificed during that time.
MORE READs, FEWER WRITEs databases can use this model



CP - Consistency and Partition tolerant - (Shraded/Distributed model but one node will be master)
    Consistency - Since, you have a master, your writes always goes through the master. So, you always have the latest read. 
    Partition tolerant -  If any node goes down, the other nodes can step up and returns the data since the data is distributed.                        
    Availability  - If the master goes down while writing, from the remaining nodes, one will be elected as a master, until that time the DB availability will be poor means delay in returning the data. 
MORE WRITEs, FEWER READs databases can use this model


CA - Consistency and available (Master and slave (slaves syncs data from master), not sharded)
    Consistency - Since, you have a master, So, you always have the latest read.
    Availability -  As long as the network is available, the DB is also available.
    Partition tolerant  -If the master  DB n/w is down, the system is down. (Until your slave is ready and in sync with master DB).
NON-PARTITIONED(single node) databases can use this model.




So, every model has a tradeoff. To met 2 aspects, you will have to relax the other aspect.

Comments

Popular posts from this blog

SQL Analytical Functions - Partition by (to split resultset into groups)

Web Security