Introduction

amit_kothari · ‎04-27-2022

Introduction

This article provides more information about the Avro Extractor tool that generates an Avro file from a provided JSON file.

To learn how to connect to a Kafka datasource, consume JSON messages and load them into Incorta, see: Create a Kafka Data Source. For more on Kafka, see: Kafka Overview.

Solution

Pre-requisites

To create a Kafka data source, you will need all of the following :

An up-and-running Incorta Server
An up-and-running zookeeper
An up-and-running Kafka instance
Avro Extractor tool (shipped by Incorta Analytics)
A sample JSON file to be consumed from Kafka

Steps for Creating a Kafka data source

Creating a fresh Avro file from sample JSON messages:

Create an Avro schema using the Avro Extractor tool:

The "Avro Extractor" is an external tool that can be used to generate an Avro file from sample JSON messages to create a schema readable by the Incorta application. Since Incorta Analytics is designed to consume data having a predefined structure, which Kafka produced data does not have, you may use the Avro Extractor tool create that structure:

1. Copy the "javax.json-1.0.4.jar" file from the "<Installation_Path>/Server/lib" directory.

2. Go to the "/bin" directory and paste the "javax.json-1.0.4.jar" file (copied in the previous step). This is the same location of the "avroExtractor.jar" file (i.e. the Avro Extractor tool).

3. Run the Avro Extractor tool to generate a JSON file containing the Avro schema describing the source message, e.g. employee.avro, by running the following command:

where,

The output will be a JSON file containing the Avro schema describing the source message, e.g. employee.avro. Make note of this file location in order to import later into the Incorta Analytics application.

Annotations

In the generated Avro file, you may add annotations to ask Incorta to modify column or table definitions, e.g. ignore a table, set certain columns as keys, identify maps, etc.

Kafka Annotations

Below is the documentation of the annotations of the Kafka consumption.

The annotations are added to the Avro model to add details that are not directly deducible from the sample JSON files, such as whether a certain field is a key or not, or whether a record is a map of not.

All annotations start with a “path” attribute which represents either an Avro field or a record. This represents the field or record that will be annotated. The path is separated by dot.

#	Name	Syntax	Description	Example
1	Primary Key	isKey	Sets the given field/column to be part of the primary key of its parent table. This flag is set on the field level.	{ "path": "person.dsid", "isKey" : "true" }
2	Set as table	isTable	The default behavior of Incorta is to flatten the fields of nested child JSON objects inside its parent. If it is needed to have a nested child being a separate table, the isTable flag should be set to true.	{"path": "person.qualifications", "isTable" : "true" }
3	Parent table	isParent	A nested child object in JSON is by default a child table in Incorta. If the table is a parent table, the isParent flag is set to true. Note that isParent must be used in conjunction with isTable set to true.	{ "path": "person.organization", "isTable" : "true", "isParent": "true" }
4	Persistent	isPersistent	If the data for a JSON nested object will actually be sent via another Kafka message, then the nested object does not need to be saved as part of the current message. In this case the isPersistent should be set to false. Note that isPersistent must be used in conjunction with isTable true and isParent true.	{ "path": "person.organization", "isTable" : "true", "isParent": "true", "isPersistent": "false" }
5	Map	isMap	If a nested JSON object is actually a map between the keys in the JSON and the set of child records, then this will be a child table. This is done through setting the isMap flag to true. Note that isMap cannot be used in conjunction with isTable.	{ "path": "person.addresses", "isMap" : "true" }
6	Map Key	<KEY>	In the “path”, when referencing fields or records that child of a map, then the path will include a variable element which will be referenced via the name <KEY>	{ "path": "person.addresses.<KEY>.typeName", "isKey": "true" },
7	Array Map	tableName	If a map is actually a set of fields inside a record, then the annotations need to specify which field names and the corresponding table name because it would not have a name inside the JSON sample. The list of names of the fields is specified inside the “path” and comma-separated. Note that the tableName must be used in conjunction with isMap set to true.	{ "path": "person.addresses.<KEY>.local1,local2", "isMap": "true", "tableName": "person.addresses.LocalAddresses" },
8	Ignored	isIgnored	Any table record marked as isIgnored will not be shown in discovery	{ "path": "person.corrections", "isIgnored" : "true" }
9	Table Name	TabeName	An alias table name can be added to any table, map or array	{"path": "country.lastModification", "isTable":"true", "tableName": "coun_lsat_mod"}
10	One To One	isOneToOne	While annotating a table as “IsTable” you can annotate it as one-toone table, accordingly incorta will not mark more columns as PRIMARY_KEY other than those which was inherited from the parent table.	{ "path": "person.demographic", "isTable" : "true", "isOneToOne" : "true"}
11	Source encrypted	isEncrypted	Annotating a field as “isEncrypted” means that it is encrypted from the source and needs to be decrypted by incorta using custom Crypto class	{ "path": "person.basic.firstName", "isEncrypted" : "true"}
12	Encryption name	encryptionName	This annotation should be following isEncrypted = true, it is meaningless to have it alone. The value of this annotation is the crypto name which should be configured at kafka-crypto.properties file	{ "path": "person.basic.firstName", "isEncrypted" : "true", “encryptionName” : “person”}

JSON - Avro Extractor Tool and Configuring Kafka Data Sources

Introduction

Solution

Pre-requisites

Steps for Creating a Kafka data source

Creating a fresh Avro file from sample JSON messages:

Create an Avro schema using the Avro Extractor tool:

Annotations

In the generated Avro file, you may add annotations to ask Incorta to modify column or table definitions, e.g. ignore a table, set certain columns as keys, identify maps, etc.

Kafka Annotations

Related articles