cancel
Showing results for 
Search instead for 
Did you mean: 
suxinji
Employee Alumni
Employee Alumni

Overview

What is Sweetviz

Sweetviz is an open-source Python library that generates beautiful, high-density visualizations to kickstart EDA (Exploratory Data Analysis) with just a couple of lines of codes. Output is a fully self-contained HTML application. 

You can use Sweetviz to compare two data sets, such as training and testing data sets to see if the distribution of the data is similar.

Solution

1. Install Sweetviz

Use pip to install Sweetviz. 

pip install sweetviz

To learn how to install the package in Incorta Cloud, please follow this LINK

2. Generate the report

In Incorta Notebook, you can use the Zeppelin's display system to show the html content as the notebook's output. With the '%angular' directive, Zeppelin treats your output as HTML.

Screen Shot 2022-05-09 at 5.18.49 PM.pngScreen Shot 2022-05-09 at 5.19.14 PM.png

3. Copy the code

You only need to replace read data to your schema table and your target_feat column. The target_feat should be the column that will be used as the label in ML training and testing data. 

import sweetviz as sv
import sweetviz.sv_html as sv_html

# read data
# replace to your schema table 
df = read("Data_Profiler.Titanic_train")
# convert spark data frame to pandas data frame
pdf = df.toPandas()
# show all columns name 
df.columns
# replace target_feat to your column name
# comment out this line before save 
my_report  = sv.analyze([pdf,'Train'], target_feat='Survived')
# define function
def show_html(sweetviz_report):

    sweetviz_report.page_layout = 'widescreen'
    sweetviz_report.scale =  1.0
    
    sv_html.load_layout_globals_from_config()
    sv_html.set_summary_positions(sweetviz_report)
    sv_html.generate_html_detail(sweetviz_report)
    
    if sweetviz_report.associations_html_source:
        sweetviz_report.associations_html_source = sv_html.generate_html_associations(sweetviz_report, "source")
    if sweetviz_report.associations_html_compare:
        sweetviz_report.associations_html_compare = sv_html.generate_html_associations(sweetviz_report, "compare")
    
    _page_html = sv_html.generate_html_dataframe_page(sweetviz_report)
    return _page_html

# comment out this line before save 
print("%angular <h2>sweetviz Report</h2>" + show_html(my_report))
# save data frame
save(df)

Tips:

You can link the paragraph to HTML. Then you can visit the profile report as HTML. 

suxinji_0-1652142675220.png
suxinji_1-1652142707866.png

To save memory, comment out these two lines. 

# comment out this line before save 
my_report  = sv.analyze([pdf,'Train'], target_feat='Survived')
# comment out this line before save 
print("%angular <h2>sweetviz Report</h2>" + show_html(my_report))

Please note - The output is saved when you are saving the notebook.  If the output content is too large, you may get an error message when you save the notebook.  You can remove the output from the notebook before you save.

Best Practices Index
Best Practices

Just here to browse knowledge? This might help!

Contributors
Version history
Last update:
‎05-16-2022 11:28 AM
Updated by: