Setting up Kafka and basic troubleshooting

Mohamed_Sadek · ‎01-22-2023

What is Kafka ?

Apache Kafka is a distributed publish-subscribe messaging system that can handle a high volume of data and enables you to pass messages from one end-point to another. Kafka is suitable for both offline and online message consumption. Kafka messages are persisted on the disk and replicated within the cluster to prevent data loss. Kafka is built on top of the ZooKeeper synchronization service. It integrates very well with Apache Storm and Sparks for real-time streaming.

Kafka has better throughput, built-in partitioning, replication, and inherent fault-tolerance, which makes it a good fit for large-scale message processing applications.

Kafka is very fast and guarantees zero downtime and zero data loss.

Setting up Kafka Datasource in Incorta

Currently the version supported in Kafka is 2.10

In the "Add New Data Source" window:

Select "Kafka" from the Data Source drop-down list.
Provide the data source name in the "Data Source Name" field.
Provide the topic in the "Topic" field.
Type the brokers list in the "Brokers List" field.
In the Message Type Field, it is optional to provide the name which is
the root table name, and would also be used as the schema name.
Switch the "Trim messageType after dash" feature on/off, by switching
the radio button to blue/grey, respectively.
Select a Kafka version from the "Select Kafka Version" drop-down list.
Click Choose File to select the Avro file containing the Avro schema
generated using the Avro Extractor Tool.
Click Add Data Source.

Specify a Kafka Consumer Service Name ( To enable Loader service to consume messages )

Specify a Kafka Consumer Service Name

For an Incorta Cluster with two or Incorta Nodes that each run a Loader Service, you must specify a Kafka Consumer Service Name in the Cluster Management Console.

A Cluster Management Console (CMC) Administrator for your Incorta Cluster must configure the Kafka Consumer Service Name. Changes to this property require that the CMC Administrator restart each Loader Service.

If your Incorta Cluster contains more than two Incorta Nodes each with a Loader Service, then you must specify the Incorta Node and Loader Service to use. If you do not assign a loader service to consume the Kafka messages, Incorta assigns a Loader Service randomly. This can result in unexpected behavior and row duplication.

Here are the steps to specify the required properties for the Server Configurations:

As the CMC Administrator, sign in to the CMC.
In the Navigation bar, select Clusters.
In the cluster list, select a Cluster name.
In the canvas tabs, select Cluster Configurations.
In the panel tabs, select Server Configurations.
In the left pane, select Clustering.
In the right pane, tor Kafka Consumer Service Name, enter the <NODE_NAME>.<SERVICE_NAME>.
Select Save.

This will make the loader service consume messages instantly fromt he broker defined in the datasource .

Log adjustment and checking .

Once a message is produced on kafka you should find this in the loader tenant logs .

/<incortai installtion path /IncortaNode/services/<loader id >/logs/incorta/<tenant>

To check if the message is parsed properly or rejected you need to check this log :

/<incortai installtion path >/IncortaNode/services/<loader id >/logs/kafka/<tenant>

you need more details to this logging you need to adapt this file ( please create if it does not exist ) .

You can also add additional needed properties in this file .

<Tenant directory >/<tenant>/KAFKA/kafka-consumer.properties

ex:

kafka.logger.logInfo=true
kafka.logger.logWarning=true

The CSV file that will hold consumed messages will be stored at the below location

<Tenant directory >/<tenant>/KAFKA/<datasource name >/<tablename >

It should be populated if the message sent by the broker is consumed properly in the loader.

Steps specific to our support instance :

1- Access to the support server by ssh .

Once logged in go to docker image 1 :

2- access container: access_docker support_c1

3- The location of Kafka installation is

/home/incorta/kafka_10

4- You will need to start Kafka zookeeper and Kafka as below

start zookeeper : 
./zookeeper-server-start.sh ../config/zookeeper.properties &

start kafka 
./kafka-server-start.sh ../config/server.properties &

Ensure the port is set to zookeeper and Kafka is within range

33107 -- > kafka port

33108 -- > zookeeper port

5- To push JSON files you can use this

bin/kafka-console-producer.sh --broker-list localhost:33107 --topic <topicname> < samplerecords.json