Incorta On-premises with DataRobot Batch Predictio...

suxinji · ‎11-09-2022

Overview

This article is one of the series of articles about Incorta and DataRobot integration via DataRobot APIs.

In this article, we assume that Incorta is running in an on-premises environment.

The DataRobot Batch Prediction API assumes that the data is stored in a file system as a CSV file and the API will grab the file and post the data to DataRobot Cloud via the client side batch API.

The DataRobot Batch Prediction API will then generate a CSV file in the local file system.

Incorta will read the CSV files generated by the DataRobot API, merge them with the original data, and save the result as a Materialized View (MV).

Solution

In Incorta Notebook:

Add property:

Code sample:

import datarobot as dr
API_KEY = 'NjFhNzNhNzllZjc0ZTljYWEyMDBkZjgxOnBCamZ6dzcvUHVWaEprejhja3VZOXdxaEU0Nm5wSVQ5bDRCeG5CanFjRnM9'
BATCH_PREDICTIONS_URL = 'https://app2.datarobot.com/api/v2'
DEPLOYMENT_ID = '61ca66aebea608e72b70e008'

dr.Client(
        endpoint=BATCH_PREDICTIONS_URL,
        token=API_KEY,
        user_agent_suffix='IntegrationSnippet-ApiClient'
    )
df = read('CreditRisk.CreditRiskPredPrepared')
tenant_path = spark.conf.get("ml.incorta.tenant_path")
pdf = df.toPandas()
pdf.to_csv(tenant_path + '/data/CreditRiskPredPrepared.csv')
in_path = tenant_path + '/data/CreditRiskPredPrepared.csv'
out_path = tenant_path + '/data/CreditRiskPredPrediction.csv'
dr.BatchPredictionJob.score(deployment = DEPLOYMENT_ID,
                             intake_settings = {
                                 'type': 'localFile',
                                 'file': in_path
                             }, 
                             output_settings = {
                                 'type': 'localFile',
                                 'path': out_path
                             })
import pandas as pd 
pdf_data = pd.read_csv(out_path)
pdf2 = pd.DataFrame(pdf_data, columns = ['RiskStatusPrediction_PREDICTION'])
# concat prediction data frame and original data frame
result = pd.concat([pdf, pdf2], axis=1)
# create spark data frame
df_output=spark.createDataFrame(result)
incorta.head(df_output, 10)
save(df_output)

Incorta On-premises with DataRobot Batch Prediction API

Overview

Solution