Access AWS services from an Incorta MV

dylanwan · ‎02-28-2024

Scenario

You need to access AWS services, such as S3, Kafka, or Kinesis, from an Incorta Materialized View (MV) for extracting data.

Issues

You prefer not to list the credentials in the MV script.

Solution

You can set up AWS to use temporary security credentials.

Here is the documentation for how to set this up on the AWS side: Temporary security credentials in IAM.

From the Incorta side, here is sample MV code to use AWS temporary security credentials.

import requests, json
url = (
    f"http://aws-tmp-creds-service."
    f"<your incorta host>."
    "svc.cluster.local.:8000/credentials"
)

response = requests.get(url)
tmp_creds = json.loads(response.text)

spark._jsc.hadoopConfiguration().set("fs.s3a.access.key", tmp_creds["aws_access_key_id"])
spark._jsc.hadoopConfiguration().set("fs.s3a.secret.key", tmp_creds["aws_secret_access_key"])
spark._jsc.hadoopConfiguration().set("fs.s3a.session.token", tmp_creds["aws_session_token"])

s3_path = f's3a://<your s3 bucket>/<your path>/<your file>'
df_data = spark.read.format("csv")\
    .option("header","true")\
    .schema(schema)\
    .load(s3_path)\
    .select("*")

# Transformation logic on df_data
df_1 = df_data.filter("...")
df_2 = df_1.withColumn("Col X", ...)

save(df_2)