Adding a New Cassandra Node

This guide will explain how to expand the Cassandra cluster in a Sysdig Monitor multi-server, distributed on-premise installation.

These steps were first tested with Sysdig Monitor on-premise release 439 and should work with all newer releases.

 

Expanding the Cluster

Expanding the cluster is performed through the same management server interface that was used for the initial on-premise install procedure.

Our example environment consists of 6 nodes (hosts/VMs) as seen in the  tab. The Tags reflect which component(s) are currently configured to run on each node.

 

Assume we’ve provisioned an additional Linux host to be our new Cassandra node. Now click on the  button.

Insert the new host’s Private IP address, Public IP Address (if it has one), and check the box to add the Cassandra Tag. Then login to the new host and execute the provided curl command line.

The additional node will now appear in the list.

The new node is now initialized, but the Sysdig Monitor application must be restarted in order for Cassandra to begin running on it. To restart, click to the  tab and click . Status messages will indicate services are stopping. Then click .

 

                                 

As part of the restart, a new Cassandra container is started on the newly-added node and it will join the cluster. The joining operation can be monitored using the nodetool status command:

Status during joining (UJ)

# docker exec -it <CONTAINER_ID> nodetool status draios
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address      Load       Tokens  Owns    Host ID                               Rack
UN  10.10.0.117  41.8 MB    256     ?       3c495d49-f311-43f3-9fef-0055f5943f8a  rack1
UJ  10.10.0.123  177.32 KB  256     ?       8199bfb4-92b3-4b74-a1ce-90310769d8b8  rack1

Status when joined (UN)

# docker exec -it <CONTAINER_ID> nodetool status draios
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address      Load       Tokens  Owns (effective)  Host ID                               Rack
UN  10.10.0.117  40.39 MB   256     51.2%             3c495d49-f311-43f3-9fef-0055f5943f8a  rack1
UN  10.10.0.123  20.12 MB   256     48.8%             8199bfb4-92b3-4b74-a1ce-90310769d8b8  rack1

Adding a new node to a Cassandra cluster can take a long time depending on how much data is currently stored in the cluster. Do not stop the application until the new node has finished joining the cluster.

Cleanup

Once the joining operation is finished, it is mandatory to perform a cleanup operation on all the old nodes of the cluster. This is performed using nodetool cleanup:

# docker exec -it <CONTAINER_ID> bash
root@<>:/# nohup nodetool cleanup draios &
[1] 470
root@<>:/# nohup: ignoring input and appending output to 'nohup.out'
root@<>:/# exit

 

Data Replication Factor

The default Cassandra replication factor used by Sysdig Monitor is 1. This was first set under Advanced Settings within the  tab of the management server interface.

It should be noted that changing the replication factor on a running cluster is a rather costly procedure and therefore should be carefully planned. These steps should only be executed by those who understand the underlying Cassandra operations and are comfortable with executing them on your data within appropriately-sized change windows.

Always first backup your data by copying the entire contents of the Cassandra data folder /opt/cassandra-data-volume.

If you want to increase the replication factor, you need to manually modify the replication factor setting of the draios Keyspace in the Cassandra cluster. To do so, pick any node in the Cassandra cluster and execute the following commands to review and then change the replication factor setting (from 1 to 2 in this example):

# docker exec -it <CONTAINER_ID> bash
root@<>:/# cqlsh
cqlsh> DESCRIBE KEYSPACE draios ;
CREATE KEYSPACE draios WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'}  AND durable_writes = true;
….

cqlsh> ALTER KEYSPACE draios WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 2 };
cqlsh> quit

Modify the Cassandra replication factor in the Advanced Settings of the Sysdig Monitor management server interface to match this new setting (2 in this case). Saving the new setting will trigger a restart of the application.

Once restart has completed, perform the Cassandra repair operation on the cluster, one node at a time:

root@<>:/# nodetool repair
[2017-01-05 14:55:03,614] Starting repair command #1, repairing 512 ranges for keyspace draios (parallelism=SEQUENTIAL, full=true)

Perform the repair operation on all the nodes of the cluster. Wait until repair completes on a node, then move on to the next one.

Note the change in the Owns column for the nodetool status output. In our example with two nodes and replication factor of 2, the 100% values are expected.

# docker exec -it <CONTAINER_ID> nodetool status draios
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address       Load       Tokens  Owns (effective)  Host ID                               Rack
UN  192.168.15.5  12.92 MB   256     100.0%            dca3e7f8-fa0b-4de1-8562-c4150121f8c7  rack1
UN  192.168.15.6  12.92 MB   256     100.0%            d8ea6d81-c7aa-4d49-8ea0-1fac2e136a5c  rack1

Have more questions? Submit a request