Build Machine Learning Models using Incorta Materialized View

Incorta Materialized View provides a way to run pyspark, scala, and R, and can be used for building the machine learning models.

We will discuss about the pyspark, scala, and R separately.   

Here are the best practices of using Incorta ML for model training and testing

  • Incorta ML requires you to save a dataframe as the result.   This can be any dataframe.  We can use the result of applying the model to the training or testing data.
  • Model building and the actual inference  using the model can be in separate MVs and they can be placed in different schema
  • Use the incorta_ml package which can simplify your model building process.  Currently incorta_ml is available in pyspark.
  • Please set the property spark.dataframe.sampling.enabled  to false for the Incorta MV that is used for building the ML model.  Incorta MV, by default, use data sampling during saving the MV.  

  • All the model should be saved to the same location on the disk in the on premise environment.  A good place is to create a folder under <Incorta Tenant Folder>/data/model.  Putting the model in the shared tenant folder will allow you to access in a multiple node environment.

  • Different ML libraries may provide different way of saving the models.   Use those corresponding native ML library API to save the models.  

  • The ML job load may be very different from the other regular data refresh jobs.  Test a small data set first and assess the impact before you run or deploy the model building MVs


Reply Oldest first
  • Oldest first
  • Newest first
  • Active threads
  • Popular

Product Announcement

Incorta 5 is now Generally Available