on 03-08-2022 03:13 PM
In this article, you will learn how to find and fill missing values in Incorta Notebook.
Copy below find null value reusable codes.
from pyspark.sql.functions import isnan, when, count, col, lit
def column_with_null(input_df):
isnull_df = input_df.select([count(when(col(c).isNull(), c)).alias(c) for c in input_df.columns])
return isnull_df
Find Null Values
df_null = column_with_null(<spark DataFrame>)
incorta.show(df_null)
Replace to your Dataframe Name. Use Incorta.show() to show the output
Fill the Null Values
df = df.fillna(<value>, subset = <column list>)
Here is an example:
df_sub = df_sub.fillna('N/A',
subset=['Sold_to_City',
'Sold_to_State',
'Sold_to_County',
'Sold_to_Country',
'Sold_to_Postal_Code',
'Sold_to_Customer_Class_Code',
'Sold_to_Sales_Channel_Code',
'Sold_to_Primary_Phone_Country_Code'])
df_sub = df_sub.fillna(-1,
subset=['Salesrep_Id',
'Inventory_Item_Id',
'Org_Id',
'Territory_Id',
'Sold_to_Total_Num_of_Orders',
'Sold_to_Total_Ordered_Amount',
'Sold_to_Last_Ordered_Date_Epoch',
'Sold_to_Employees_Total',
'Sold_to_Curr_Fy_Potential_Revenue'])
Call the function again to check if the null values are replaced.
df_null = column_with_null(df_sub)incorta.show(df_sub)