<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Unable to save the mv for creating a table from using a json keys in Administrative Discussions</title>
    <link>https://community.incorta.com/t5/administrative-discussions/unable-to-save-the-mv-for-creating-a-table-from-using-a-json/m-p/6492#M317</link>
    <description>&lt;P&gt;Although Incorta performs initial schema inference when an MV is saved for the first time, its ongoing incremental refresh relies on the saved schema definition.&amp;nbsp; Getting the table columns dynamically will not work without some specially handling to match the target schema.&lt;/P&gt;
&lt;P&gt;Can you share the code or provide us more details?&lt;/P&gt;</description>
    <pubDate>Fri, 09 May 2025 11:41:05 GMT</pubDate>
    <dc:creator>dylanwan</dc:creator>
    <dc:date>2025-05-09T11:41:05Z</dc:date>
    <item>
      <title>Unable to save the mv for creating a table from using a json keys</title>
      <link>https://community.incorta.com/t5/administrative-discussions/unable-to-save-the-mv-for-creating-a-table-from-using-a-json/m-p/6483#M316</link>
      <description>&lt;P&gt;I am trying to create a table reading the json object and dynamically getting the table columns from the json objects. I am able to display the data in notebook and getting the expected result but unable to save the schema when I am dynamically getting the table columns. If I hard code the values I am able to save it. Even added a property&amp;nbsp;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;&lt;I&gt;spark.dataframe.sampling.enabled to false.&lt;BR /&gt;&lt;/I&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;PRE&gt; Failed to generate schema&lt;/PRE&gt;&lt;PRE&gt;ArrayIndexOutOfBoundsException: Index 1 out of bounds for length 1&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 08 May 2025 16:36:32 GMT</pubDate>
      <guid>https://community.incorta.com/t5/administrative-discussions/unable-to-save-the-mv-for-creating-a-table-from-using-a-json/m-p/6483#M316</guid>
      <dc:creator>LB123</dc:creator>
      <dc:date>2025-05-08T16:36:32Z</dc:date>
    </item>
    <item>
      <title>Re: Unable to save the mv for creating a table from using a json keys</title>
      <link>https://community.incorta.com/t5/administrative-discussions/unable-to-save-the-mv-for-creating-a-table-from-using-a-json/m-p/6492#M317</link>
      <description>&lt;P&gt;Although Incorta performs initial schema inference when an MV is saved for the first time, its ongoing incremental refresh relies on the saved schema definition.&amp;nbsp; Getting the table columns dynamically will not work without some specially handling to match the target schema.&lt;/P&gt;
&lt;P&gt;Can you share the code or provide us more details?&lt;/P&gt;</description>
      <pubDate>Fri, 09 May 2025 11:41:05 GMT</pubDate>
      <guid>https://community.incorta.com/t5/administrative-discussions/unable-to-save-the-mv-for-creating-a-table-from-using-a-json/m-p/6492#M317</guid>
      <dc:creator>dylanwan</dc:creator>
      <dc:date>2025-05-09T11:41:05Z</dc:date>
    </item>
    <item>
      <title>Re: Unable to save the mv for creating a table from using a json keys</title>
      <link>https://community.incorta.com/t5/administrative-discussions/unable-to-save-the-mv-for-creating-a-table-from-using-a-json/m-p/6494#M319</link>
      <description>&lt;P&gt;Hey Dylanwan,&lt;/P&gt;&lt;P&gt;Let me explain what I am doing. I have a json object in one of the schemas. where I am trying to use the schema and extract the json keys dynamically and use those keys as column names and create a table by using these keys. I am able to extract the data and even I am getting the columns and data while I display in the notebook, but when I am saving the mv I am getting the error. here is some of the code logic apart from this the json object is encrypted. I am also doing a decryption for this. I have even added this property(&lt;STRONG&gt;&lt;I&gt;spark.dataframe.sampling.enabled to false.).&amp;nbsp;&lt;/I&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;PRE&gt; Failed to generate schema&lt;/PRE&gt;&lt;PRE&gt;ArrayIndexOutOfBoundsException: Index 1 out of bounds for length 1&lt;/PRE&gt;&lt;P&gt;&lt;BR /&gt;Below is the code snippet.&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;%pyspark&lt;BR /&gt;from pyspark.sql import SparkSession&lt;BR /&gt;from pyspark.sql.functions import col, udf, explode, trim, from_json, coalesce, length, map_keys, lit&lt;BR /&gt;from pyspark.sql.types import MapType, StringType&lt;BR /&gt;import json&lt;/P&gt;&lt;P&gt;# Initialize Spark session&lt;BR /&gt;spark = SparkSession.builder.getOrCreate()&lt;/P&gt;&lt;P&gt;# Function to flatten JSON&lt;BR /&gt;def smart_flatten(json_str):&lt;BR /&gt;def flatten(y, prefix=''):&lt;BR /&gt;out = {}&lt;BR /&gt;if isinstance(y, dict):&lt;BR /&gt;if not y:&lt;BR /&gt;out[prefix.lower()] = "{}"&lt;BR /&gt;else:&lt;BR /&gt;for k, v in y.items():&lt;BR /&gt;new_prefix = f"{prefix}_{k}".lower() if prefix else k.lower()&lt;BR /&gt;out.update(flatten(v, new_prefix))&lt;BR /&gt;elif isinstance(y, list):&lt;BR /&gt;if not y:&lt;BR /&gt;out[prefix.lower()] = "[]"&lt;BR /&gt;elif all(isinstance(i, dict) for i in y):&lt;BR /&gt;for idx, item in enumerate(y):&lt;BR /&gt;for k, v in item.items():&lt;BR /&gt;out.update(flatten(v, f"{prefix}_{idx}_{k}".lower()))&lt;BR /&gt;else:&lt;BR /&gt;for idx, item in enumerate(y):&lt;BR /&gt;out[f"{prefix}_{idx}".lower()] = item&lt;BR /&gt;else:&lt;BR /&gt;out[prefix.lower()] = str(y)&lt;BR /&gt;return out&lt;/P&gt;&lt;P&gt;try:&lt;BR /&gt;parsed = json.loads(json_str)&lt;BR /&gt;flat = flatten(parsed)&lt;BR /&gt;return json.dumps(flat)&lt;BR /&gt;except Exception as e:&lt;BR /&gt;return json.dumps({"error": str(e)})&lt;/P&gt;&lt;P&gt;# Apply UDF to flatten JSON&lt;BR /&gt;extract_nested_keys_udf = udf(smart_flatten, StringType())&lt;/P&gt;&lt;P&gt;# Assuming `flattened_table` is defined elsewhere in your code&lt;BR /&gt;flattened_table = flattened_table.filter(col("flattened_json").isNotNull() &amp;amp; (length(col("flattened_json")) &amp;gt; 2))&lt;/P&gt;&lt;P&gt;# Convert flattened JSON to MapType&lt;BR /&gt;flattened_with_map = flattened_table.withColumn("json_map", from_json(col("flattened_json"), MapType(StringType(), StringType()))).filter(col("json_map").isNotNull())&lt;BR /&gt;flattened_with_keys = flattened_with_map.withColumn("json_keys", map_keys(col("json_map")))&lt;/P&gt;&lt;P&gt;# Extract JSON keys&lt;BR /&gt;json_key_list = (flattened_with_keys&lt;BR /&gt;.select(explode(col("json_keys")).alias("json_key"))&lt;BR /&gt;.distinct()&lt;BR /&gt;.rdd&lt;BR /&gt;.map(lambda row: row["json_key"])&lt;BR /&gt;.collect())&lt;BR /&gt;json_key_list = [k for k in json_key_list if k and k.strip()]&lt;/P&gt;&lt;P&gt;# Rename columns to safe names&lt;BR /&gt;for col_name in flattened_with_map.columns:&lt;BR /&gt;safe_name = col_name.strip().replace(" ", "_").lower()&lt;BR /&gt;flattened_with_map = flattened_with_map.withColumnRenamed(col_name, safe_name)&lt;/P&gt;&lt;P&gt;# Select columns based on JSON keys&lt;BR /&gt;final_df = flattened_with_map.select(&lt;BR /&gt;*[coalesce(col("json_map").getItem(k), lit(None)).alias(k) for k in json_key_list]&lt;BR /&gt;)&lt;/P&gt;</description>
      <pubDate>Fri, 09 May 2025 15:41:21 GMT</pubDate>
      <guid>https://community.incorta.com/t5/administrative-discussions/unable-to-save-the-mv-for-creating-a-table-from-using-a-json/m-p/6494#M319</guid>
      <dc:creator>LB123</dc:creator>
      <dc:date>2025-05-09T15:41:21Z</dc:date>
    </item>
    <item>
      <title>Re: Unable to save the mv for creating a table from using a json keys</title>
      <link>https://community.incorta.com/t5/administrative-discussions/unable-to-save-the-mv-for-creating-a-table-from-using-a-json/m-p/6538#M322</link>
      <description>&lt;P&gt;You generate the column name dynamically based on the&amp;nbsp;json_key_list, which is populated based on&amp;nbsp;collect().&amp;nbsp; The spark action collect() could be expensive and I am not sure if it failed when there is no sampling.&amp;nbsp; We need to test with data.&amp;nbsp;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 03 Jun 2025 22:37:33 GMT</pubDate>
      <guid>https://community.incorta.com/t5/administrative-discussions/unable-to-save-the-mv-for-creating-a-table-from-using-a-json/m-p/6538#M322</guid>
      <dc:creator>dylanwan</dc:creator>
      <dc:date>2025-06-03T22:37:33Z</dc:date>
    </item>
  </channel>
</rss>

