This is a proof of concept for deploying a Debezium based change data capture project using helm and strimzi on top of kubernetes.
As part of any CDC deployment using Debezium, the following are required:
- Kafka
- Kafka Connect with Debezium jars installed on the container image
- Schema Registry for using Avro encoded messages
- A source database (PostgreSQL, MySQL, Cassandra or any other Debezium supported database)
The basic flow for building a CDC deployment looks like this:
- Decide which tables you want to export from your source database. Ideally these are outbox tables that are written to in the same transaction as your internal tables. Note: this different than the Debezium outbox router.
- Deploy a Kafka Connect instance for your BC that is connected to kafka.
- Create a Debezium source job by posting a job configuration to kafka connect's REST endpoint.
- Monitoring kafka connect tasks.
This project is intended to provide an easy way to set up the necessary compnents. Notice that normally there's some manual intervention in the deployment, specifically hitting a REST api to deploy the kafka connect jobs. The Strimzi operator makes it easy to deploy connectors by simply applying a KafkaConnector CRD.
This project gives you the necessary components to test a CDC deployment with Strimzi on your local environment:
- Source and Destination test databases with example data for sending and receiving CDC data
- The infrastructure needed for Kafka Connect (kafka, schema registry, etc)
- The infrastructure needed for monitoring via prometheus (prometheus operator, prometheus, etc)
- A helm chart to allow you to easily create (and/or deploy) the required Strimzi manifests
- KakfaConnect with
- secrets provided by kubernetes
- JMX metrics exported for prometheus consumption
- KafkaConnector(s)
- Change Event Connector - used to monitor all changes to an approved list of tables. Tombstones will be sent when a record is removed from the source database.
- Domain Event Connector - used to monitor domain event tables. Tombstones are not submitted.
- Have a kubernetes cluster you can deploy to. I used docker-desktop.
- Have the folliwng prerequisites installed locally:
- kubectl
- make
- helm
- Copy example.env to .env and modify settings in .env if needed (running on linux and need to point to minikube).
- Build the kafka-connect image:
make build_image
- Run
make strimzi_setup
- Run
make metrics_setup
to install kube-prom-stack. This will enable us to deploy pod monitors and get metrics sent to grafana with prometheus. - Modify the
./cdc-poc/values.yaml
to suit your needs. Do you want to run with metrics? Set that in the values file. - After the pods have stabilized, run
make helm_install
which will deploy the KafkaConnect and KafkaConnector CRDs. - Run
make sinks
to install the sink manifests so that you can verify data is actually being captured. You can (and should) tail the kafka logs to see the data come in. These sinks may make it easier to demo data transit by querying the destination database.
- Port forward
/web:9090
on the prometheus pod tolocalhost:9090
. Verify there is atarget for kafka connect. It will probably be the very last target in the list found athttp://localhost:9090/targets
. - Port forward
/grafana:3000
on the grafana pod tolocalhost:3000
. Navigate in your browser tolocalhost:3000
. Credentials for grafana are user:admin
, password:prom-operator
. - Load the
grafana-dashboard.json
dashboard in grafana.
You will find the grafana-dashboard.json
in the root directory. It may need to be modified to fit your installation, but it will give you a good start. This file is a slightly modified version of this one: https://github.com/debezium/debezium-examples/blob/master/monitoring/debezium-grafana/debezium-dashboard.json
- Postgres might not be quite ready when installing the database. If it dies on
pg_isready
, just runmake strimzi_setup
a second time. It should be ready after a few seconds. - The operators can take some time to come to life in kubernetes. Sometimes it will timeout. In that case, run
make strimzi_setup
again after a few minutes. If that doesn't work, it's possible the pods are not coming up due to other issues (e.g., disk pressure, etc.). Check your k8s UI of choice (e.g., lens, k9s, etc.) and see if the pods are not coming up for a particular reason. This is a pretty heavy project to run. You might need to allocate additional resources to docker.
kubectl get csv -n olm
- Check status of OLMkubectl get csv -n operators
- Check status of Strimzi Operatorkubectl get deployment -n operators
- If this doesn't come back with anything, the operator might still be loading. If thats the case, let's check that the subscription exists...kubectl get subscription -n operators
- If this doesn't come back with anything, something more serious happened during the install. Check the logs from when you ranmake strimzi_all
earlier in the process. If there is a subscription, just wait another minute and try step 1 again.
kubectl port-forward :
The commands require a connection to the kafka brokers and in this project, Kafka is running in kubernetes. That will make connecting to the brokers from a host using the kafka command line utilites a little tricky. Instead, shell into the broker pod to run the commands below. K9s and Lens both make it easy to shell into particular pods.
kafka-topics --list --bootstrap-server localhost:9092
kafka-console-consumer --bootstrap-server localhost:9092 --topic <topic> --from-beginning
If you need to know if a connector was not deployed properly, you'll need to check kubernetes. Look under the status section for information.
kubectl describe kafkaconnector -n namespace
Test the chart:
helm install . --dry-run --generate-name