cancel
Showing results for 
Search instead for 
Did you mean: 
JoeM
Community Manager
Community Manager

Introduction

Zeppelin Notebooks (Materialized View interactive notebooks) have gained popularity as a versatile and interactive environment for data analysis and visualization. One of the standout features that sets Zeppelin apart is its support for dynamic forms. Dynamic forms empower data scientists and analysts to create more user-friendly and customizable notebooks by providing interactive widgets that allow users to modify variables, parameters, and inputs on the fly. In this knowledgebase article, we will delve into the world of dynamic forms in Zeppelin Notebooks, exploring their functionality, and follow on with an example. Whether you are a seasoned Zeppelin user or just getting started, understanding dynamic forms can significantly enhance your notebook-building capabilities, making your data-driven narratives more engaging and accessible.

What you need to know before reading this article

  • You should have some basic knowledge of SQL, Python, or R and interact with notebooks in general.
  • Incorta Cloud 2022.12 or Later

Why Dynamic Forms?

The interactive notebook can afford data scientists and analysts quite a few capabilities to investigate, discover, and query data. However, when updating your scripts, you might find that many revisions are needed to iterate through different scenarios. 

Wouldn't it be much easier to create a simple form that will, in turn, update your script?

Once you make your script dynamic, you can enhance exploration with:

  • Customized filtering
  • Customized data visualization
  • Parameter tuning for ML models
  • Feature selection for ML models
  • Scenario analysis
  • Product recommendations
  • Collaborative data analysis
  • Parameterized API testing

Let's Go

For this use case, we will use the OnlineStore Data that defaults with the Incorta cloud. We want to do some in-notebook investigation and graph generation to understand our sales trends. However, we want to create a script allowing a user to select Sales realization by Order Date, Due Date, or Ship Date. 

Step 1: Create the Script

If you're good, you can start programming a placeholder variable into your script, but I prefer to choose one sample variable to create a working script. This way, I can quickly identify something wrong with my script logic vs. how forms have been implemented. 

 

 

%pyspark
#Read in table
import pandas as pd
from pyspark.sql.functions import year, month
df = read("OnlineStore.salesorderheader")

#Create a temp view to query against using SparkSQL
df.createOrReplaceTempView('Preaggregate')

# Aggregate sales by day for August 2011
# Group by Month
df_item_with_data = spark.sql("""
select OrderDate, sum(TotalDue) as TotalDue
from    Preaggregate 
where OrderDate >= '2011-08-01' and OrderDate <= '2011-08-31'
Group by 
    OrderDate
Order by 
    OrderDate
""")
#Select fields based on the dynamic form
x_data = df_item_with_data.select("OrderDate").rdd.flatMap(lambda x: x).collect()
y_data = df_item_with_data.select("TotalDue").rdd.flatMap(lambda x: x).collect()

#Plot data

import matplotlib.pyplot as plt
import pandas as pd
plt.figure(figsize=(8, 10))
plt.plot(x_data, y_data, marker='o', linestyle='-', color='b', label='Line Chart')
plt.xticks(rotation=45)
plt.xlabel("OrderDate")
plt.ylabel("Y-axis Label")
plt.title("Line Chart")
plt.legend()
plt.grid(True)  # Add grid lines
plt.show()

 

 

Output:

JoeM_0-1695239459794.png

I find it easier to run basic aggregations in SQL, but more experienced Python pros will undoubtedly see more optimal methods than creating the temp table, as I did above.

Regardless, I have a working output. You, however, might want to view what this looks like to see the total revenue shipped by ShipDate. If interested, you'll have to update every mention of 'OrderDate' in this short script (7 places). We can make this so you can toggle a predefined set of fields.   

Creating a Drop-down List

Create a dynamic form for a drop-down by declaring the form label and each value pair. Each pair is actual value vs. display value.

 

 

selected_col = incorta.noteSelect("Select Date Column",[("OrderDate","Order Date"),("DueDate","Due Date"),("ShipDate","Ship Date")],"OrderDate")

 

 

When running this snippet, you'll see a drop-down at the top of the notebook.

JoeM_1-1695239718241.png

Selecting the drop-down and then running the snippet will map any of these three values to the variable selected_col.

Now, we can use selected_col in our script and it will dynamically update based on our drop-down selection.

 

 

selected_col = incorta.noteSelect("Select Date Column",[("OrderDate","Order Date"),("DueDate","Due Date"),("ShipDate","Ship Date")],"OrderDate")

import pandas as pd
from pyspark.sql.functions import year, month
df = read("OnlineStore.salesorderheader")
df.createOrReplaceTempView('Preaggregate')
# Group by Month
df_item_with_data = spark.sql("""
select """+ selected_col +""", sum(TotalDue) as TotalDue
from    Preaggregate 
where """+ selected_col +""" >= '2011-08-01' and """+ selected_col +""" <= '2011-08-31'
Group by 
    """+ selected_col +"""
Order by 
    """+ selected_col +"""
""")

#Select fields based on the dynamic form
x_data = df_item_with_data.select(selected_col).rdd.flatMap(lambda x: x).collect()
y_data = df_item_with_data.select("TotalDue").rdd.flatMap(lambda x: x).collect()

#Plot data

import matplotlib.pyplot as plt
import pandas as pd
plt.figure(figsize=(8, 10))
plt.plot(x_data, y_data, marker='o', linestyle='-', color='b', label='Line Chart')
plt.xticks(rotation=45)
plt.xlabel(selected_col)
plt.ylabel("Y-axis Label")
plt.title("Line Chart")
plt.legend()
plt.grid(True)  # Add grid lines

 

 

Output:

JoeM_2-1695239970276.png

⚠️ Note that just selecting the drop-down is not enough. You'll need to run the dynamic form snippet to assign the value to selected_col. Then, run the impacted paragraphs using selected_col.

Other form Types

In this example, we've shared a method that allows the notebook user to 'select' an item from a drop-down. There are three other options available in dynamic forms: 

Method Description
noteCheckbox(self, name, options, defaultChecked=[]) Create a checkbox selection in Notebook. Valid for multiple parameter changes.
noteSelect(self, name, options, defaultValue='') Create a radio selection in Notebook. It helps allow only one parameter change at a time.
noteTextbox(self, name, defaultValue='') Create a text box form in Notebook. Useful for defining specified values.

Learn more about the other forms here in the Zepplin documentation.

Related Material