1

Date variable substitution in pyspark

I am writing a MV where I am incrementally loading the data from the table.

 

In Incremental block, I am trying to get the maximum of last_update_date from MV and using it as filter to get the delta from table

 

df_mv_Channel = read("EDW.mv_Channel")    ----mv_Channel is the MV name

df_mv_Channel.createOrReplaceTempView("mv_Channel")

df_last_update_date=spark.sql("""select max(last_update_date) as ldate from mv_Channel""")

date_param = df_last_update_date.select('ldate').first()[0]

 

Now I'm trying to use date_param variable in another spark query. where condition given below

last_update_date > unix_timestamp('${date_param}')

 

I am not sure if this is how the variable substitutions has to be done. 

 

Appreciate your help.

2replies Oldest first
  • Oldest first
  • Newest first
  • Active threads
  • Popular
  • Here is an example:

     

    df_target = read("EBS_FIN_COMMON.GL_LEDGERS")
    df_target.createOrReplaceTempView("SELF")
    
    
    df_target_last_update = spark.sql("SELECT MAX(LAST_UPDATE_DATE) ldate FROM SELF")
    date_param = int(df_target_last_update.select('ldate').first()[0].timestamp())
    
    df_source = read("EBS_GL.GL_JE_HEADERS")
    df_source.createOrReplaceTempView("SRC")
    
    df = spark.sql("SELECT * FROM SRC WHERE LAST_UPDATE_DATE > from_unixtime(%d / 1000)" % date_param)
    save(df)
    Like 1
  • But this method will covert the data from Spark back to python and then pass back to Spark.  

    I will use a pure SQL instead:

    SELECT * FROM EBS_GL.GL_JE_HEADERS
    WHERE LAST_UPDATE_DATE > (SELECT MAX(LAST_UPDATE_DATE)
                              FROM EBS_FIN_COMMON.GL_LEDGERS)
    Like
Like1 Follow
  • 1 Likes
  • 4 days agoLast active
  • 2Replies
  • 15Views
  • 3 Following

Product Announcement

Incorta 5 is now Generally Available