cancel
Showing results for 
Search instead for 
Did you mean: 
Mohamed_Sadek
Employee
Employee

What is Kafka ?

Apache Kafka is a distributed publish-subscribe messaging system that can handle a high volume of data and enables you to pass messages from one end-point to another. Kafka is suitable for both offline and online message consumption. Kafka messages are persisted on the disk and replicated within the cluster to prevent data loss. Kafka is built on top of the ZooKeeper synchronization service. It integrates very well with Apache Storm and Sparks for real-time streaming.

Kafka has better throughput, built-in partitioning, replication, and inherent fault-tolerance, which makes it a good fit for large-scale message processing applications.

Kafka is very fast and guarantees zero downtime and zero data loss.

Setting up Kafka Datasource in Incorta  

Currently the version supported in Kafka is 2.10 

In the "Add New Data Source" window:

  1. Select "Kafka" from the Data Source drop-down list.
  2. Provide the data source name in the "Data Source Name" field.
  3. Provide the topic in the "Topic" field.
  4. Type the brokers list in the "Brokers List" field.
  5. In the Message Type Field, it is optional to provide the name which is 
    the root table name, and would also be used as the schema name.
  6. Switch the "Trim messageType after dash" feature on/off, by switching 
    the radio button to blue/grey, respectively.
  7. Select a Kafka version from the "Select Kafka Version" drop-down list.
  8. Click Choose File to select the Avro file containing the Avro schema
    generated using the Avro Extractor Tool.
  9. Click Add Data Source.

Specify a Kafka Consumer Service Name ( To enable Loader service to consume messages ) 

 

Specify a Kafka Consumer Service Name

For an Incorta Cluster with two or Incorta Nodes that each run a Loader Service, you must specify a Kafka Consumer Service Name in the Cluster Management Console.

A Cluster Management Console (CMC) Administrator for your Incorta Cluster must configure the Kafka Consumer Service Name. Changes to this property require that the CMC Administrator restart each Loader Service.

If your Incorta Cluster contains more than two Incorta Nodes each with a Loader Service, then you must specify the Incorta Node and Loader Service to use. If you do not assign a loader service to consume the Kafka messages, Incorta assigns a Loader Service randomly. This can result in unexpected behavior and row duplication.

Here are the steps to specify the required properties for the Server Configurations:

  • As the CMC Administrator, sign in to the CMC.
  • In the Navigation bar, select Clusters.
  • In the cluster list, select a Cluster name.
  • In the canvas tabs, select Cluster Configurations.
  • In the panel tabs, select Server Configurations.
  • In the left pane, select Clustering.
  • In the right pane, tor Kafka Consumer Service Name, enter the <NODE_NAME>.<SERVICE_NAME>.
  • Select Save.

image.png

This will make the loader service consume messages instantly fromt he broker defined in the datasource . 

Log adjustment and checking . 

Once a message is produced on kafka you should find this in  the loader tenant logs . 

/<incortai installtion path /IncortaNode/services/<loader id >/logs/incorta/<tenant>

image.png

To check if the message is parsed properly or rejected you need to check this log : 

/<incortai installtion path >/IncortaNode/services/<loader id >/logs/kafka/<tenant>

image.png

you need more details to this logging you need to adapt this file ( please create if it does not exist ) .

You can also add additional needed properties in this file .

<Tenant directory >/<tenant>/KAFKA/kafka-consumer.properties

ex: 

kafka.logger.logInfo=true
kafka.logger.logWarning=true

The CSV file that will hold consumed messages will be stored at the below location 

<Tenant directory >/<tenant>/KAFKA/<datasource name >/<tablename >

It should be populated if the message sent by the broker is consumed properly in the loader. 

Steps specific to our support instance :

1- Access to the support server by ssh .

Once logged in go to docker image 1 : 

2- access container: access_docker support_c1

3- The location of Kafka installation is 

/home/incorta/kafka_10

4- You will need to start Kafka zookeeper and Kafka as below 

Best Practices Index
Best Practices

Just here to browse knowledge? This might help!

Contributors
Version history
Last update:
‎01-23-2023 02:47 AM
Updated by: