0

MV on a remote (Azure datalake) parquet location

I am trying to read parquet files directly from a remote Azure location. I dont want to load data within Incorta as there are frequent writes and dont want data inconsistency issues.

First, I created a datasource to connect to the Azure datalake. So far no issues. I then created a schema as shown below using the Azure connection and it immediately recognized the schema from the parquet files too. So I can safely assume that there are no access/file corruption issues. 

 

 

But when I create an MV so that I can use it in my dashboard, it fails. The MV creation code is below:

 

df_MPL=read("ADLSGen2_Testing_Without_loading_into_incorta.memBalRemote")
df_MPL.createOrReplaceTempView("membalv")
dfmembal = spark.sql("SELECT * from membalv")
save(dfmembal)

INC_005005001:Failed to load data from [spark://s123vmeinc2.vpc.company.net:7077] with properties [[error, list index out of range ('IndexError', ':', IndexError('list index out of range',)) ]]

Can you please help let me know if I am missing anything. Is this (creating an MV for reading remote files) even a right approach to read files from remote?

 

Thanks a whole lot..

2replies Oldest first
  • Oldest first
  • Newest first
  • Active threads
  • Popular
  • This question is currently being investigated by Incorta Support. 

    Thanks,
    Dustin

    Like
  • Dustin Basil we replied to Srinivasa Vasu over the support channels. The reason was enabling the "Include Filename as a Column" option which is unsupported in case of remote tables. Disabling this option will work fine. We shall address this glitch in a recent release.

    Like 1
Like Follow
  • 7 days agoLast active
  • 2Replies
  • 21Views
  • 3 Following