on 04-27-2022 09:00 AM
This article provides more information about the Avro Extractor tool that generates an Avro file from a provided JSON file.
To learn how to connect to a Kafka datasource, consume JSON messages and load them into Incorta, see: Create a Kafka Data Source. For more on Kafka, see: Kafka Overview.
To create a Kafka data source, you will need all of the following :
The "Avro Extractor" is an external tool that can be used to generate an Avro file from sample JSON messages to create a schema readable by the Incorta application. Since Incorta Analytics is designed to consume data having a predefined structure, which Kafka produced data does not have, you may use the Avro Extractor tool create that structure:
1. Copy the "javax.json-1.0.4.jar" file from the "<Installation_Path>/Server/lib" directory.
2. Go to the "/bin" directory and paste the "javax.json-1.0.4.jar" file (copied in the previous step). This is the same location of the "avroExtractor.jar" file (i.e. the Avro Extractor tool).
3. Run the Avro Extractor tool to generate a JSON file containing the Avro schema describing the source message, e.g. employee.avro, by running the following command:
where,
The output will be a JSON file containing the Avro schema describing the source message, e.g. employee.avro. Make note of this file location in order to import later into the Incorta Analytics application.
Below is the documentation of the annotations of the Kafka consumption.
The annotations are added to the Avro model to add details that are not directly deducible from the sample JSON files, such as whether a certain field is a key or not, or whether a record is a map of not.
All annotations start with a “path” attribute which represents either an Avro field or a record. This represents the field or record that will be annotated. The path is separated by dot.
# |
Name |
Syntax |
Description |
Example |
1 |
Primary Key |
isKey |
Sets the given field/column to be part of the primary key of its parent table. This flag is set on the field level. |
{ "path": "person.dsid", "isKey" : "true" } |
2 |
Set as table |
isTable |
The default behavior of Incorta is to flatten the fields of nested child JSON objects inside its parent. If it is needed to have a nested child being a separate table, the isTable flag should be set to true. |
{"path": "person.qualifications", "isTable" : "true" } |
3 |
Parent table |
isParent |
A nested child object in JSON is by default a child table in Incorta. If the table is a parent table, the isParent flag is set to true. Note that isParent must be used in conjunction with isTable set to true. |
{ "path": "person.organization", "isTable" : "true", "isParent": "true" } |
4 |
Persistent |
isPersistent |
If the data for a JSON nested object will actually be sent via another Kafka message, then the nested object does not need to be saved as part of the current message. In this case the isPersistent should be set to false. Note that isPersistent must be used in conjunction with isTable true and isParent true. |
{ "path": "person.organization", "isTable" : "true", "isParent": "true", "isPersistent": "false" } |
5 |
Map |
isMap |
If a nested JSON object is actually a map between the keys in the JSON and the set of child records, then this will be a child table. This is done through setting the isMap flag to true. Note that isMap cannot be used in conjunction with isTable. |
{ "path": "person.addresses", "isMap" : "true" } |
6 |
Map Key |
<KEY> |
In the “path”, when referencing fields or records that child of a map, then the path will include a variable element which will be referenced via the name <KEY> |
{ "path": "person.addresses.<KEY>.typeName", "isKey": "true" }, |
7 |
Array Map |
tableName |
If a map is actually a set of fields inside a record, then the annotations need to specify which field names and the corresponding table name because it would not have a name inside the JSON sample. The list of names of the fields is specified inside the “path” and comma-separated. Note that the tableName must be used in conjunction with isMap set to true. |
{ "path": "person.addresses.<KEY>.local1,local2", "isMap": "true", "tableName": "person.addresses.LocalAddresses" }, |
8 |
Ignored |
isIgnored |
Any table record marked as isIgnored will not be shown in discovery |
{ "path": "person.corrections", "isIgnored" : "true" } |
9 |
Table Name |
TabeName |
An alias table name can be added to any table, map or array |
{"path": "country.lastModification", "isTable":"true", "tableName": "coun_lsat_mod"} |
10 |
One To One |
isOneToOne |
While annotating a table as “IsTable” you can annotate it as one-toone table, accordingly incorta will not mark more columns as PRIMARY_KEY other than those which was inherited from the parent table. |
{ "path": "person.demographic", "isTable" : "true", "isOneToOne" : "true"} |
11 |
Source encrypted |
isEncrypted |
Annotating a field as “isEncrypted” means that it is encrypted from the source and needs to be decrypted by incorta using custom Crypto class |
{ "path": "person.basic.firstName", "isEncrypted" : "true"} |
12 |
Encryption name |
encryptionName |
This annotation should be following isEncrypted = true, it is meaningless to have it alone. The value of this annotation is the crypto name which should be configured at kafka-crypto.properties file |
{ "path": "person.basic.firstName", "isEncrypted" : "true", “encryptionName” : “person”} |
To learn how to connect to a Kafka datasource, consume JSON messages and load them into Incorta, see: Create a Kafka Data Source . For more on Kafka, see: Kafka Overview.