Cassandra and CQL — what they don’t tell you

Our journey with Cassandra has been a long one and the knowledge we have gained could help developers who are new to Cassandra.CQL is not SQLFor those who don’t know, the primary method of querying Cassandra is using the Cassandra Query Language (CQL)..To perform the query shown above, Cassandra doesn’t know where the data is stored so it will have to perform a full table scan..Still, CQL’s similarities to SQL give you the illusion that you do have those features which can be a real tripping point for new starters to the Cassandra.When you are creating a query you always need to keep in mind about how Cassandra is storing your data and what is the most efficient way to query it..How many nodes it will propagate to will depend on the schema of your keyspace and how data is distributed.How data is written across a Cassandra cluster with multiple nodesWith this write, if the write to nodes n6 and n7 were successful then when we want to read this data, the two nodes will give a consistent response.Now let’s consider the scenario where we want to delete this row from the database:-DELETE FROM customers WHERE customer_id=123456How data is deleted across a Cassandra cluster with multiple nodesWhat if one of these requests fails?.Ultimately Cassandra undergoes a process called compaction to clean up the tombstones.When we were using Cassandra, we came across a scenario where certain partitions within our Cassandra instance were undergoing very high data churn with lots of conflicting reads and writes on the same partition..If we add the following to Cassandra:-INSERT INTO customers (customer_id, order_id, order_timestamp) VALUES (123456, 100000, NULL)The data which is stored in the database looks like:-{"partition" : { "key" : [ "123456" ], "position" : 34 }, "rows" : [ { "type" : "row", "position" : 66, "clustering" : [ 100000 ], "liveness_info" : { "tstamp" : "2018-11-21T09:30:08.349095Z" }, "cells" : [ { "name" : "order_timestamp", "deletion_info" : { "local_delete_time" : "2018-11-21T09:30:08Z" } } ]}The reason for this is that Cassandra doesn’t try to read if data already exists, as it is too slow.. More details

Leave a Reply