on 07-13-2023 01:01 PM - edited on 07-13-2023 04:23 PM by Tristan
This article describes the best practices for using Incorta notebooks. Incorta notebooks can be used to create Materialized Views (MVs), a type of Incorta derived table. Additionally, Incorta notebooks are often leveraged to perform interactive and exploratory data analysis thanks to its built-in visualization capabilities.
Incorta notebooks support markdown. You can use markdown language to write paragraphs in plain text and generate rich text format. These markdown paragraphs help describe the purposes of your notebook, discuss the output and findings from the charts or tables and write a conclusion.
Each paragraph can optionally have a title. Adding titles to sections increases the readability of your code.
Each paragraph has an output. Logically separate code into paragraphs. By doing so, you can show the progression of results from your code. It makes the notebook interactive and helps others who review it to quickly understand how code is running.
To initiate markdown for a paragraph, use the following command:
%md
Before:
After:
Incorta Notebooks support many helpful keyboard shortcuts that help you accomplish tasks more quickly.
Here is a list of frequently used keyboard shortcuts:
Shortcut | Action |
Shift + Enter | Run the current paragraph. |
Ctrl + Shift + Up | Run all the above paragraphs (exclusive) |
Ctrl + Shift + Up | Run all below paragraphs (inclusive) |
Ctrl + Option + C | Cancel |
Ctrl + P | Move cursor Up |
Ctrl + N | Move cursor Down |
Ctrl + Option + D | Remove paragraph |
Ctrl + Option + A |
Insert new paragraph above.
|
Ctrl + Option + B |
Insert new paragraph below.
|
Ctrl + Shift + C |
Insert a copy of the paragraph below.
|
Ctrl + Option + K |
Move paragraph Up
|
Ctrl + Option + J |
Move paragraph Down
|
Ctrl + Option + R |
Enable/Disable run paragraph
|
Ctrl + Option + O |
Toggle output
|
Ctrl + Option + E |
Toggle editor
|
Ctrl + Option + M |
Toggle line number
|
Ctrl + Option + T |
Toggle title
|
Ctrl + Option + L |
Clear output
|
Ctrl + Option + W |
Link this paragraph
|
Ctrl + Shift + - |
Reduce paragraph width
|
Ctrl + Shift + + |
Increase paragraph width
|
When starting a notebook, save a dummy DataFrame as the output of the materialized view to save the code. Doing this will make it easier to keep progress on the notebook when validating it. Incorta validates the output and performs the schema inference to determine the output schema table structure, including the data type of columns. If this isn't convenient, there is another option below.
Use the following code to create a dummy output:
import pandas as pd
data = pd.DataFrame({"A" : ["1","2","3"]})
output_df= spark.createDataFrame(data)
save(output_df)
Incorta Notebook does not maintain a persistent session and does not automatically save your content. If you spend an extended time in a notebook, your Incorta session could time out. While your script is in process, i.e. is still not complete or valid, a best practice is to save your work periodically using the 'save script only' option.
Incorta Notebook allows you to run your code from the beginning of the notebook. During development time, the order of execution will determine the value of the data, and you can run paragraphs in arbitrary order. When executed from the Incorta Loader service, however, the notebook will be executed linearly in the order the paragraphs appear from top to bottom. To emulate how the loader will execute the script, select Run All Paragraphs and verify that it runs as expected before validating the script.
During the development of a notebook, you are likely to use statements like incorta.show() or incorta.head(). Any print statement using Incorta will be excluded during the run of the script. Conversely, using a statement like [df].show() or [df].count() will be executed with the script.
Some ways to ensure the performance of your script are:
Note: You can quickly verify that this works as expected by entering the notebook view and entering the query view.