Level Up Your Kafka Skills in Just 5 Days | Join Season of Streaming On-Demand
This blog post is the fourth in a four-part series that discusses a few new Confluent Control Center features that are introduced with Confluent Platform 6.2.0. It focuses on removing residue data via a new cleanup script that helps you remove old Control Center instances easily. The series highlights the following new features that make managing Apache Kafka® clusters via Control Center an even smoother experience:
If you are not too familiar with Control Center, you can always refer to the Control Center overview first. Having a running Control Center instance at hand helps you explore the features discussed in this blog series better.
Now that you are ready, let’s delve into the fourth feature here in part 4: removing residue data with the Control Center Cleanup script.
With each version upgrade or ID update (explained more later), Control Center creates a new set of internal topics that correspond to the new Control Center instance. Consequently, after a Control Center upgrade/update, you may notice topics from old instances are left behind, cluttering the “Topics” overview page as shown below. These old topics are not used by the new Control Center instance, but they continue to take up disk space. Control Center does not automatically delete the old topics in order to avoid accidental removal of wanted data. Unfortunately, manual deletion of the old topics can make the Control Center upgrade/update process cumbersome and error prone.
Version 6.2.0 introduces a new cleanup script bin/control-center-cleanup that allows you to interactively delete the old instances’ residue—topics and local directories—easier and faster when you upgrade/update Control Center. With this new script, you can delete the old instances’ residue while the current instance of Control Center is running.
The example above shows the topic residue from the Control Center upgrade of version 5.4.1 to 6.2.0, where the old set of internal topics prefixed with _confluent-controlcenter-5-4-1-1 are left behind and coexist with the new set prefixed with _confluent-controlcenter-6-2-0-1.
The same issue occurs if you change the Control Center unique identifier using confluent.controlcenter.id in your properties file. Control Center unique identifiers are useful if you want multiple instances of Control Center to coexist on the same server. However, if you decide to keep only one instance after an identifier change, you will encounter the same data residue issues. For example, if you have Control Center version 6.2.0 and changed the ID from 1 to 2, then the old set of internal topics prefixed with _confluent-controlcenter-6-2-0-1 are left behind and coexist with the new set prefixed with _confluent-controlcenter-6-2-0-2.
The cleanup script requires a Control Center properties file to establish the initial connection to the Kafka cluster and to decide what the current running Control Center instance is in order to avoid deleting its data. The cleanup script uses:
For example, the following Control Center properties file etc/confluent-control-center/control-center.properties contains the following:
############################# Server Basics ############################# bootstrap.servers=localhost:9092 zookeeper.connect=localhost:2181 ######################### Control Center Settings ######################### confluent.controlcenter.data.dir=/tmp/control-center confluent.controlcenter.id=1 # using default confluent.controlcenter.name, “_confluent-controlcenter”
Therefore, running the cleanup script from package confluent-6.2.0, the script determines that the running instance is _confluent-controlcenter-6-2-0-1 (<running instance name>-<version>-<id>). It also determines that the local data of all instances reside in /tmp/control-center.
Assume that only the Control Center instance defined in the properties file—_confluent-controlcenter-6-2-0-1—is up and running.
Navigate to $CONFLUENT_HOME, run the script as ./bin/control-center-cleanup <props_file>, and you will get the following prompt:
./bin/control-center-cleanup etc/confluent-control-center/control-center.properties ============================================================================ The cleanup script found the following instance: _confluent-controlcenter-6-2-0-1 We believe this COULD be the instance defined in your config file so it will not be prompted for cleanup.
Here are the instances discovered for cleanup: _confluent-controlcenter-5-4-1-1 _confluent-controlcenter-5-4-1-2 Cleanup ALL of the instances above? [y/N]:
The script avoids cleaning the running instance—_confluent-controlcenter-6-2-0-1—and discovers that there are two old Control Center instances from version 5.4.1 available for cleanup, _confluent-controlcenter-5-4-1-1 and _confluent-controlcenter-5-4-1-2.
You can type y to clean all of the old instances without intermissions or prompts.
You can type N to receive a prompt individual instance cleanup instead.
Assuming N is used in the previous step, you will receive the following prompt:
Do you want to cleanup _confluent-controlcenter-5-4-1-1 ? [y/N/dryRun]:
For each Control Center instance, you can type y to clean the instance’s topics and the instance’s local directories.
_confluent-controlcenter-5-4-1-1-AlertHistoryStore-changelog _confluent-controlcenter-5-4-1-1-MetricsAggregateStore-changelog _confluent-controlcenter-5-4-1-1-cluster-rekey _confluent-controlcenter-5-4-1-1-expected-group-consumption-rekey _confluent-controlcenter-5-4-1-1-actual-group-consumption-rekey
/tmp/control-center/1 | cp-command/ | _confluent-controlcenter-5-4-1-1/ |
kafka-streams/ | _confluent-controlcenter-5-4-1-1/ |
You can type N to skip cleanup for the instance at hand.
You can type dryRun to see what topics and local directories will be deleted without any actual impact. After dryRun, you will be prompted to clean up the same instance again with option [y/N/dryRun] until you either type y or N, deciding to clean or skip the instance.
If you would like to avoid being prompted to clean each instance, type y in step 2, Cleanup ALL of the instances above? [y/N].
A majority of the logs are omitted below, except for high-level logs. Lines that start with # are comments added later and are not part of the original log.
./bin/control-center-cleanup etc/confluent-control-center/control-center.properties ================================================================================ The cleanup script found the following instance: _confluent-controlcenter-6-2-0-1 We believe this COULD be the instance defined in your config file so it will not be prompted for cleanup.
Here are the instances discovered for cleanup: _confluent-controlcenter-5-4-1-1 _confluent-controlcenter-5-4-1-2 Cleanup ALL of the instances above? [y/N]: N
Do you want to cleanup _confluent-controlcenter-5-4-1-1 ? [y/N/dryRun]: dryRun ----Dry run displays the actions which will be performed when running Streams Reset Tool---- Reset-offsets for input topics [_confluent-monitoring, _confluent-command, _confluent-metrics] Seek-to-end for intermediate topics [_confluent-controlcenter-5-4-1-1-cluster-rekey, _confluent-controlcenter-5-4-1-1-monitoring-message-rekey-store, _confluent-controlcenter-5-4-1-1-actual-group-consumption-rekey, _confluent-controlcenter-5-4-1-1-expected-group-consumption-rekey, _confluent-controlcenter-5-4-1-1-group-stream-extension-rekey, _confluent-controlcenter-5-4-1-1-monitoring-trigger-event-rekey, _confluent-controlcenter-5-4-1-1-MetricsAggregateStore-repartition, _confluent-controlcenter-5-4-1-1-metrics-trigger-measurement-rekey] Following input topics offsets will be reset to (for consumer group _confluent-controlcenter-5-4-1-1) (...) Following intermediate topics offsets will be reset to end (for consumer group _confluent-controlcenter-5-4-1-1) (...) Deleting all internal/auto-created topics for application _confluent-controlcenter-5-4-1-1 (...) Deleting intermediate topics (for consumer group _confluent-controlcenter-5-4-1-1) (...) Deleting local RocksDB data in /tmp/confluent/control-center/1 Deleting /tmp/confluent/control-center/1/cp-command/_confluent-controlcenter-5-4-1-1-command Deleting /tmp/confluent/control-center/1/kafka-streams/_confluent-controlcenter-5-4-1-1 Done. Finished dryRun for _confluent-controlcenter-5-4-1-1 . Do you want to clean it up? [y/N/dryRun]: y # Logs omitted. Same steps as above: # 1. For input topics, reset offsets to specified position (default EARLIEST) ← from Kafka Streams Reset Tool # 2. For intermediate topics, seek offsets to the end, LATEST ← from Kafka Streams Reset Tool # 3. Delete internal/auto-created topics ← from Kafka Streams Reset Tool # 4. Delete intermediate topics # 5. Delete local RocksDB data in directories Do you want to cleanup _confluent-controlcenter-5-4-1-2 ? [y/N/dryRun]: y # Logs omitted. Same 5 steps as above. ================================================================================
If you run the cleanup script again, you will see that _confluent-controlcenter-5-4-1-1 and _confluent-controlcenter-5-4-1-2 were cleaned up successfully and you won’t be prompted again.
./bin/control-center-cleanup etc/confluent-control-center/control-center.properties ================================================================================ The cleanup script found the following instance: _confluent-controlcenter-6-2-0-1 We believe this COULD be the instance defined in your config file so it will not be prompted for cleanup.
The cleanup script found no instances for cleanup. ================================================================================
Historically, Control Center has a reset script, bin/control-center-reset, which supports the cleanup of one instance at a time without any guidance prompts: The script only deletes the instance defined in the provided properties file and does not automatically discover other instances. Therefore, in order to maintain a clean Control Center environment, it is recommended that you run the reset script upon each version upgrade or unique identifier update.
Before we dive into the benefits of the cleanup script, the following provides a bit more detail about the reset script.
Just like the cleanup script, the reset script also requires a Control Center properties file. It is used to establish the initial connection to the Kafka cluster and to determine the Control Center instance to delete (the reset script only deletes the instance defined in your properties file). New with version 6.2.0, dryRun flag is now supported for the reset script:
bin/control-center-reset <props_file> [--dryRun]
With the dryRun flag, the script previews the topics and directories pertaining to the Control Center instance defined in your properties file, without actually deleting them.
It is important to note that prior to version 6.2.0, the reset script would clean local directories more “drastically.” It finds the unique identifier in the properties file, confluent.controlcenter.id, and deletes the entire ID directory, not just the directories of the target instance.
For example, if the unique identifier is 1, and you have two Control Center instances with ID 1, _confluent-controlcenter-5-4-1-1 (target instance to delete) and _confluent-controlcenter-6-2-0-1, then the entire ID directory /tmp/control-center/1 would be deleted, not just the directories of _confluent-controlcenter-5-4-1-1 (orange directories deleted):
This reset script issue is fixed in version 6.2.0, where only the target instance’s directories are deleted:
To summarize, despite the subtle differences between the two scripts, the reset script and the cleanup script are complete opposites. The former can only delete the Control Center instance defined in your Control Center configuration file, while the latter can automatically discover and delete any instances except the one defined in your configuration file. To maintain a clean environment, the reset script needs to be run each time before you start a new instance, while the cleanup script can run anytime (before or after a new instance and even only periodically). The cleanup script also provides a handful of guidance prompts, giving you full control over which instance(s) to delete.
There are a few benefits of the cleanup script that would make your Control Center upgrade/update process less error prone and cumbersome:
You are an operator and strive to maintain a clean environment by only keeping the necessary Control Center instances—you can now use the cleanup script to periodically delete all the unused instances in one go. No need to manually hunt down each Kafka topic or local data from old Control Center instances anymore.
Imagine you are an operator and just performed a Control Center unique identifier update. With the reset script, you would need to modify the properties file to target the instance that you want to delete and repeat the process until all the old instances are deleted. Now with the cleanup script, you only need the latest properties file, which will single out the running instance and delete the old ones.
Let’s say you are an operator and just performed a Control Center version upgrade. With the reset script, in order to delete an old instance, you would need to run the script in the Confluent Platform package whose version matches the target instance. For example, to delete an instance of version 5.4.1, you need to run the script in Confluent Platform package 5.4.1; running the reset script in Confluent Platform package 5.4.2 would delete the 5.4.2 instance, not the target 5.4.1 instance. Now with the cleanup script, you can run the script in any Confluent Platform package that provides it, and it will automatically discover old instances to delete. No need to match the package version with the target instance!
For operators who want to make sure they do not accidentally delete the wrong Control Center instances, the cleanup script provides guidance prompts to avoid accidental deletion.
In summary, removing residue data with the Control Center cleanup script allows you to maintain a clean environment by removing data from unused Control Center instances in one run, making the Control Center upgrade/update process more efficient and less error prone.
To learn about other new features of Control Center 6.2.0, check out the remaining blog posts in this series:
This blog announces the general availability of Confluent Platform 7.8 and its latest key features: Confluent Platform for Apache Flink® (GA), mTLS Identity for RBAC Authorization, and more.
We covered so much at Current 2024, from the 138 breakout sessions, lightning talks, and meetups on the expo floor to what happened on the main stage. If you heard any snippets or saw quotes from the Day 2 keynote, then you already know what I told the room: We are all data streaming engineers now.