Change IP address for existing nodes in CDH 5.3

Today I discovered way to handle changes of IP address for Hadoop cluster managed by CDH 5.3. I think this should also be applicable for CDH 5.x. Took me couple of hours to figure this one out :/

I've done this previously but it was on CDH 4.x. On CDH 4.x, you need to access the DB holding hosts information. You can find the tutorial here. Since CDH 5.x, this is no longer valid, even if you change through the DB, it will get overwritten by Cloudera Manager (CM).

How it works

It's best to get a little bit background on how CM works in CDH 5.x. If you actually went to the link mentioned above and followed the steps up until accessing the DB under CDH 5.x, you will notice that the column host_identifier inside the table hosts is no longer the FQDN of each host but instead some sort of hash.

This hash (or uuid) is actually random string generated by Cloudera Manager agent (run by the daemon cloudera-scm-agent). Now, instead of each host identified by its IP like in CDH 4.x, each host is identified by its uuid. One of the advantage is that how ever you changed the IP it will always get propagated/updated to the CM server (run by the daemon cloudera-scm-server).

When IP address changes, whenever cloudera-scm-agent sent heartbeat to its cloudera-scm-server (set in /etc/cloudera-scm-agent/config.ini), it will send together its uuid and if there is any changes in hostname or ip address, the DB will get updated based on the uuid. Smart right compared to CDH 4.x?

Conflict

Say for example that there are nodes having the same uuid, in my case it happens since for development I run a small cluster in ESX and I create one master copy and clone it to other servers, thus it has the same uuid. If you have the same uuid, you will see in Cloudera Manager under the Hosts tab (menu), duplicated hostname/ip for that particular uuid.

In order to solve this, you need to generate new uuid, but how since these are automatically managed by CM.

These uuid is kept under /var/lib/cloudera-scm-agent. There will be two files there, uuid and response.avro. If you cat uuid, you will get the uuid for that particular node. In order to change it, follow these steps:

  1. Stop CM server - service cloudera-scm-server stop
  2. On all nodes, stop the CM agent - service cloudera-scm-agent hard_stop. hard_stop is needed because we need to restart the supervisord as well and flushes all the settings pertaining to the CM.
  3. Now remove both uuid and response.avro under /var/lib/cloudera-scm-agent, or just rename it just in case.
  4. Make sure /etc/cloudera-scm-agent/config.ini is pointing to the right CM server.
  5. Start CM server - service cloudera-scm-server start
  6. On all nodes, start the CM agent - service cloudera-scm-server clean_start or service cloudera-scm-server hard_start.

Check CM, you'll see all nodes are working now.