CASSANDRA
Cassandra Architecture
Cassandra is
designed to handle big data. Cassandra’s main feature is to store data on
multiple nodes with no single point of failure.
The reason for this
kind of Cassandra’s architecture was that the hardware failure can occur at any
time. Any node can be down. In case of failure data stored in another node can
be used. Hence, Cassandra is designed with its distributed architecture.
Cassandra stores
data on different nodes
in "Ring Format" with a "peer to peer"
distributed fashion architecture.
All the nodes
exchange information with each other using "Gossip protocol". Gossip
is a protocol in Cassandra by which nodes can communicate with each other
Gossip Protocol

Cassandra uses a
gossip protocol to discover node state for all nodes in a cluster. Nodes
discover information about other nodes by exchanging state information about
themselves and other nodes they know about. This is done with a maximum of 3
other nodes. Nodes do not exchange information with every other node in the
cluster in order to reduce network load. They just exchange information with a
few nodes and over a period of time state information about every node
propagates throughout the cluster. The gossip protocol facilitates failure
detection.
Gossip protocol
used to relay information between nodes in cluster
Data
Replication
Replication = replication factor + replica placement
strategy
As hardware problem
can occur or link can be down at any time during data process, a solution is
required to provide a backup when the problem has occurred. So
data is replicated for assuring no single point of failure.
Default strategy
Preferred when you
have information about network map on your nodes
Cassandra places
replicas of data on different nodes based on these two factors.
·
Where to place next replica is determined by the "Replication
Strategy".
·
While the total number of replicas placed on different nodes is
determined by the "Replication Factor".
One Replication
factor means that there is only a single copy of data while three replication factor means that there are three copies of the data on
three different nodes.
For ensuring there
is no single point of failure, replication factor must be three.
There are two kinds
of replication strategies in Cassandra.
a. SimpleStrategy
SimpleStrategy is used when you
have just one data center. SimpleStrategy places the
first replica on the node selected by the partitioner.
After that, remaining replicas are placed in clockwise direction in the Node
ring.

b. NetworkTopologyStrategy
NetworkTopologyStrategy is used when you
have more than two data centers.
In NetworkTopologyStrategy, replicas are set for each data
center separately. NetworkTopologyStrategy places
replicas in the clockwise direction in the ring until reaches the first node in
another rack.
This strategy tries
to place replicas on different racks in the same data center. This is due to
the reason that sometimes failure or problem can occur in the rack. Then
replicas on other nodes can provide data.

Write
Operation:
The coordinator
sends a write request to replicas. If all the replicas are up, they will
receive write request regardless of their consistency level.
"Consistency
level" determines how many nodes will respond back with the success
acknowledgment.
The node will
respond back with the success acknowledgment if data is written successfully to
the commit log and memTable.
For example, in a
single data center with replication factor equals to three, three replicas will
receive write request. If consistency level is one, only one replica will
respond back with the success acknowledgment, and the remaining two will remain
dormant.
Suppose if remaining
two replicas lose data due to node downs or some other problem, Cassandra will
make the row consistent by the built-in repair mechanism in Cassandra.
Here it is
explained, how write process occurs in Cassandra,
When write request
comes to the node, first of all, it logs in the commit log.
Then Cassandra
writes the data in the mem-table. Data written in the mem-table on each write
request also writes in commit log separately. Mem-table is a temporarily stored
data in the memory while Commit log logs the transaction records for back up
purposes.
When mem-table is
full, data is flushed to the SSTable data file.

Read
Operation:
There are three
types of read requests that a coordinator sends to replicas.
·
Direct request
·
Digest request
·
Read repair request
The coordinator
sends direct request to one of the replicas. After that, the coordinator sends
the digest request to the number of replicas specified by the consistency level
and checks whether the returned data is an updated data.
After that, the
coordinator sends digest request to all the remaining replicas. If any node
gives out of date value, a background read repair request will update that
data. This process is called read repair mechanism.

Cassandra
Data Types:
Cassandra supports
different types of data types. Here is the table that shows data types, their
constants, and description.
