on 01-22-202306:52 AM - edited on 01-23-202302:47 AM by FadiB
What is Kafka ?
Apache Kafka is a distributed publish-subscribe messaging system that can handle a high volume of data and enables you to pass messages from one end-point to another. Kafka is suitable for both offline and online message consumption. Kafka messages are persisted on the disk and replicated within the cluster to prevent data loss. Kafka is built on top of the ZooKeeper synchronization service. It integrates very well with Apache Storm and Sparks for real-time streaming.
Kafka has better throughput, built-in partitioning, replication, and inherent fault-tolerance, which makes it a good fit for large-scale message processing applications.
Kafka is very fast and guarantees zero downtime and zero data loss.
Setting up Kafka Datasource in Incorta
Currently the version supported in Kafka is 2.10
In the "Add New Data Source" window:
Select "Kafka" from the Data Source drop-down list.
Provide the data source name in the "Data Source Name" field.
Provide the topic in the "Topic" field.
Type the brokers list in the "Brokers List" field.
In the Message Type Field, it is optional to provide the name which is the root table name, and would also be used as the schema name.
Switch the "Trim messageType after dash" feature on/off, by switching the radio button to blue/grey, respectively.
Select a Kafka version from the "Select Kafka Version" drop-down list.
Click Choose File to select the Avro file containing the Avro schema generated using the Avro Extractor Tool.
Click Add Data Source.
Specify a Kafka Consumer Service Name ( To enable Loader service to consume messages )
Specify a Kafka Consumer Service Name
For an Incorta Cluster with two or Incorta Nodes that each run a Loader Service, you must specify a Kafka Consumer Service Name in the Cluster Management Console.
A Cluster Management Console (CMC) Administrator for your Incorta Cluster must configure the Kafka Consumer Service Name. Changes to this property require that the CMC Administrator restart each Loader Service.
If your Incorta Cluster contains more than two Incorta Nodes each with a Loader Service, then you must specify the Incorta Node and Loader Service to use. If you do not assign a loader service to consume the Kafka messages, Incorta assigns a Loader Service randomly. This can result in unexpected behavior and row duplication.
Here are the steps to specify the required properties for the Server Configurations:
As the CMC Administrator, sign in to the CMC.
In the Navigation bar, select Clusters.
In the cluster list, select a Cluster name.
In the canvas tabs, select Cluster Configurations.
In the panel tabs, select Server Configurations.
In the left pane, select Clustering.
In the right pane, tor Kafka Consumer Service Name, enter the <NODE_NAME>.<SERVICE_NAME>.
This will make the loader service consume messages instantly fromt he broker defined in the datasource .
Log adjustment and checking .
Once a message is produced on kafka you should find this in the loader tenant logs .
/<incortai installtion path /IncortaNode/services/<loader id >/logs/incorta/<tenant>
To check if the message is parsed properly or rejected you need to check this log :
/<incortai installtion path >/IncortaNode/services/<loader id >/logs/kafka/<tenant>
you need more details to this logging you need to adapt this file ( please create if it does not exist ) .
You can also add additional needed properties in this file .