0

JSON - Avro Extractor Tool and Configuring Kafka Data Sources

Goal

This article provides more information about the Avro Extractor tool that generates an Avro file from a provided JSON file.

To learn how to connect to a Kafka datasource, consume JSON messages and load them into Incorta, see: Create a Kafka Data Source. For more on Kafka, see: Kafka Overview.

Solution

Pre-requisites

 To create a Kafka data source, you will need all of the following :

  1. up-and-running Incorta Server.
  2. an up-and-running zookeeper.
  3. an up-and-running Kafka instance.
  4. Avro Extractor tool (shipped by Incorta Analytics)
  5. a sample JSON file to be consumed from Kafka.

Steps for Creating a Kafka data source

Creating a fresh Avro file from sample JSON messages:

Create an Avro schema using the Avro Extractor tool:

The "Avro Extractor" is an external tool that can be used to generate an Avro file from sample JSON messages to create a schema readable by the Incorta application since Incorta Analytics is designed to consume data having a predefined structure, which the Kafka-produced data do not have, you may use the Avro Extractor tool using the following steps:

1. Copy the "javax.json-1.0.4.jar" file from the "<Installation_Path>/Server/lib" directory.

2. Go to the "/bin" directory and paste the "javax.json-1.0.4.jar" file (copied in the previous step). This is the same location of the "avroExtractor.jar" file (i.e. the Avro Extractor tool).

3. Run the Avro Extractor tool to generate a JSON file containing the Avro schema describing the source message, e.g. employee.avro, by running the following command:

 

 

where,

 

 

The output will be a JSON file containing the Avro schema describing the source message, e.g. employee.avro. Make note of this file location in order to import later into the Incorta Analytics application.

 

Annotations

In the generated Avro file, you may add annotations to ask Incorta to modify column or table definitions, e.g. ignore a table, set certain columns as keys, identify maps, etc.

Kafka Annotations

 

Below is the documentation of the annotations of the Kafka consumption.


The annotations are added to the Avro model to add details that are not directly deducible from the sample JSON files, such whether a certain field is a key or not, or whether a record is a map of not.

 

All annotations start with a “path” attribute which represents either an Avro field or a record.  This represents the field or record that will be annotated. The path is separated by dot.

 

#

Name

Syntax

Description

Example

1

Primary Key

isKey

Sets the given field/column to be part of the primary key of its parent table.

This flag is set on the field level.

{ "path": "person.dsid", "isKey" : "true" }

2

Set as table

isTable

The default behavior of Incorta is to flatten the fields of nested child JSON objects inside its parent.

If it is needed to have a nested child being a separate table, the isTable flag should be set to true.

{"path": "person.qualifications", "isTable" : "true" }

3

Parent table

isParent

A nested child object in JSON is by default a child table in Incorta.

If the table is a parent table, the isParent flag is set to true.

Note that isParent must be used in conjunction with isTable set to true.

{ "path": "person.organization", "isTable" : "true", "isParent": "true" }

4

Persistent

isPersistent

If the data for a JSON nested object will actually be sent via another Kafka message, then the nested object does not need to be saved as part of the current message.  In this case the isPersistent should be set to false.

 

Note that isPersistent must be used in conjunction with isTable true and isParent true.

{ "path": "person.organization", "isTable" : "true", "isParent": "true", "isPersistent": "false" }

5

Map

isMap

If a nested JSON object is actually a map between the keys in the JSON and the set of child records, then this will be a child table.

This is done through setting the isMap flag to true.

Note that isMap cannot be used in conjunction with isTable.

{ "path": "person.addresses", "isMap" : "true" }

6

Map Key

<KEY>

In the “path”, when referencing fields or records that child of a map, then the path will include a variable element which will be referenced via the name <KEY>

{ "path": "person.addresses.<KEY>.typeName", "isKey": "true" },

7

Array Map

tableName

If a map is actually a set of fields inside a record, then the annotations need to specify which field names and the corresponding table name because it would not have a name inside the JSON sample.

The list of names of the fields is specified inside the “path” and comma-separated.

Note that the tableName must be used in conjunction with isMap set to true. 

{ "path": "person.addresses.<KEY>.local1,local2", "isMap": "true", "tableName": "person.addresses.LocalAddresses" },

8

Ignored

isIgnored

Any table record marked as isIgnored will not be shown in discovery

{ "path": "person.corrections", "isIgnored" : "true" }

9

Table Name

TabeName

An alias table name can be added to any table, map or array

{"path": "country.lastModification", "isTable":"true", "tableName":  "coun_lsat_mod"}

10

One To One

isOneToOne

While annotating a table as “IsTable” you can annotate it as one-toone table, accordingly incorta will not mark more columns as PRIMARY_KEY other than those which was inherited from the parent table.

{ "path": "person.demographic", "isTable" : "true", "isOneToOne" : "true"}


 

11

Source encrypted

isEncrypted

Annotating a field as “isEncrypted” means that it is encrypted from the source and needs to be decrypted by incorta using custom Crypto class

{ "path": "person.basic.firstName", "isEncrypted" : "true"}

12

Encryption name

encryptionName

This annotation should be following isEncrypted = true, it is meaningless to have it alone. The value of this annotation is the crypto name which should be configured at kafka-crypto.properties file

{ "path": "person.basic.firstName", "isEncrypted" : "true", “encryptionName” : “person”}

 

Related articles

To learn how to connect to a Kafka datasource, consume JSON messages and load them into Incorta, see: Create a Kafka Data Source. For more on Kafka, see: Kafka Overview.

Reply Oldest first
  • Oldest first
  • Newest first
  • Active threads
  • Popular
Like Follow
  • 12 days agoLast active
  • 39Views
  • 1 Following