cancel
Showing results for 
Search instead for 
Did you mean: 
dylanwan
Employee
Employee

Introduction

This article describes the best practices for using Incorta notebooks. Incorta notebooks can be used to create Materialized Views (MVs), a type of Incorta derived table. Additionally, Incorta notebooks are often leveraged to perform interactive and exploratory data analysis thanks to its built-in visualization capabilities.

What you need to know before reading this article

Let's Go

Use markdown to document your notebook

Incorta notebooks support markdown. You can use markdown language to write paragraphs in plain text and generate rich text format. These markdown paragraphs help describe the purposes of your notebook, discuss the output and findings from the charts or tables and write a conclusion. 

Use paragraph titles

Each paragraph can optionally have a title. Adding titles to sections increases the readability of your code.

Use multiple paragraphs to examine your data

Each paragraph has an output. Logically separate code into paragraphs. By doing so, you can show the progression of results from your code. It makes the notebook interactive and helps others who review it to quickly understand how code is running. 

To initiate markdown for a paragraph, use the following command:

 

 

%md

 

 

Before:

JoeM_0-1689274772470.png

After:

JoeM_1-1689274799521.png

Use keyboard shortcuts

Incorta Notebooks support many helpful keyboard shortcuts that help you accomplish tasks more quickly.

Here is a list of frequently used keyboard shortcuts:

Shortcut Action
Shift + Enter Run the current paragraph.
Ctrl + Shift + Up Run all the above paragraphs (exclusive)
Ctrl + Shift + Up Run all below paragraphs (inclusive)
Ctrl + Option + C Cancel
Ctrl + P Move cursor Up
Ctrl + N Move cursor Down
Ctrl + Option + D Remove paragraph
Ctrl + Option + A
Insert new paragraph above.
Ctrl + Option + B
Insert new paragraph below.
Ctrl + Shift + C
Insert a copy of the paragraph below.
Ctrl + Option + K
Move paragraph Up
Ctrl + Option + J
Move paragraph Down
Ctrl + Option + R
Enable/Disable run paragraph
Ctrl + Option + O
Toggle output
Ctrl + Option + E
Toggle editor
Ctrl + Option + M
Toggle line number
Ctrl + Option + T
Toggle title
Ctrl + Option + L
Clear output
Ctrl + Option + W
Link this paragraph
Ctrl + Shift + -
Reduce paragraph width
Ctrl + Shift + +
Increase paragraph width

Save the initial version of the notebook by creating a dummy DataFrame

When starting a notebook, save a dummy DataFrame as the output of the materialized view to save the code. Doing this will make it easier to keep progress on the notebook when validating it. Incorta validates the output and performs the schema inference to determine the output schema table structure, including the data type of columns. If this isn't convenient, there is another option below.

Use the following code to create a dummy output:

 

 

import pandas as pd
data = pd.DataFrame({"A" : ["1","2","3"]})
output_df= spark.createDataFrame(data)
save(output_df)

 

 

Save the script to avoid losing your work

Incorta Notebook does not maintain a persistent session and does not automatically save your content. If you spend an extended time in a notebook, your Incorta session could time out. While your script is in process, i.e. is still not complete or valid, a best practice is to save your work periodically using the 'save script only' option.

JoeM_0-1689276206789.png

Use Run All to test your code

Incorta Notebook allows you to run your code from the beginning of the notebook. During development time, the order of execution will determine the value of the data, and you can run paragraphs in arbitrary order.  When executed from the Incorta Loader service, however, the notebook will be executed linearly in the order the paragraphs appear from top to bottom.   To emulate how the loader will execute the script, select Run All Paragraphs and verify that it runs as expected before validating the script.

JoeM_0-1689277369349.png

Remove print statements

During the development of a notebook, you are likely to use statements like incorta.show() or incorta.head(). Any print statement using Incorta will be excluded during the run of the script. Conversely, using a statement like [df].show() or [df].count() will be executed with the script. 

Some ways to ensure the performance of your script are:

  1. Use Incorta-based print statements when possible
  2. When using other statements, remove them once the notebook goes to production
  3. When using other statements, place them in their own paragraph and then de-select the paragraph using the Include in MV script icon.

JoeM_1-1689277682422.png

Note: You can quickly verify that this works as expected by entering the notebook view and entering the query view.

Related Material

 

Best Practices Index
Best Practices

Just here to browse knowledge? This might help!

Contributors
Version history
Last update:
‎07-13-2023 04:23 PM
Updated by: