suxinji
-modified.png?version=preview)
Employee Alumni
Options
- Article History
- Subscribe to RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Printer Friendly Page
- Report Inappropriate Content
on 11-09-2022 04:06 PM
Overview
This article is one of the series of articles about Incorta and DataRobot integration via DataRobot APIs.
In this article, we assume that Incorta is running in an on-premises environment.
The DataRobot Batch Prediction API assumes that the data is stored in a file system as a CSV file and the API will grab the file and post the data to DataRobot Cloud via the client side batch API.
The DataRobot Batch Prediction API will then generate a CSV file in the local file system.
Incorta will read the CSV files generated by the DataRobot API, merge them with the original data, and save the result as a Materialized View (MV).
Solution
In Incorta Notebook:
Add property:
Code sample:
import datarobot as dr
API_KEY = 'NjFhNzNhNzllZjc0ZTljYWEyMDBkZjgxOnBCamZ6dzcvUHVWaEprejhja3VZOXdxaEU0Nm5wSVQ5bDRCeG5CanFjRnM9'
BATCH_PREDICTIONS_URL = 'https://app2.datarobot.com/api/v2'
DEPLOYMENT_ID = '61ca66aebea608e72b70e008'
dr.Client(
endpoint=BATCH_PREDICTIONS_URL,
token=API_KEY,
user_agent_suffix='IntegrationSnippet-ApiClient'
)
df = read('CreditRisk.CreditRiskPredPrepared')
tenant_path = spark.conf.get("ml.incorta.tenant_path")
pdf = df.toPandas()
pdf.to_csv(tenant_path + '/data/CreditRiskPredPrepared.csv')
in_path = tenant_path + '/data/CreditRiskPredPrepared.csv'
out_path = tenant_path + '/data/CreditRiskPredPrediction.csv'
dr.BatchPredictionJob.score(deployment = DEPLOYMENT_ID,
intake_settings = {
'type': 'localFile',
'file': in_path
},
output_settings = {
'type': 'localFile',
'path': out_path
})
import pandas as pd
pdf_data = pd.read_csv(out_path)
pdf2 = pd.DataFrame(pdf_data, columns = ['RiskStatusPrediction_PREDICTION'])
# concat prediction data frame and original data frame
result = pd.concat([pdf, pdf2], axis=1)
# create spark data frame
df_output=spark.createDataFrame(result)
incorta.head(df_output, 10)
save(df_output)
Labels: