Test Text for Valid Date Content

I am attempting to write a materialized view that selects rows from a table based on whether a text field contains a valid date. I use isdate() to do this in SQL but have not been able to find a similar function in Spark SQL or PySpark.

Does anyone know what function or other filter I can use?

6replies Oldest first
  • Oldest first
  • Newest first
  • Active threads
  • Popular
  • Hi Laura Brown there does not currently exist a library function that performs date validation. While we are currently working on including a data prep capability within the Incorta platform there are limited options in the short term. You can perform your data validation prior to ingesting the data into Incorta or, if that is not feasible, you might consider creating a User Defined Function (UDF) that you can use in your PySpark code. I found an example of doing the UDF date validation here but please be certain to test it at your expected data scale before moving forward.

    Like 1
  • I am very new to spark. Is this something you do within each materialized view, or is it something you do globally? 

    • Laura Brown for each MV as they are a stand-alone application. A global UDF would be great: we are opening an Enhancement Request to add this feature. In the meantime, I checked with our Spark engineering team and there is a way to do this with some extra configuration. If you have any issues with this please open a Support ticket and they will assist.

      1. Create file and define your UDF there e.g. my_udfs.py (location can be anywhere on the server, I recommend a folder outside <incorta_home> to make sure it does not get overwritten)
      2. Open spark.env for editing (<incorta_home>/IncortaNode/spark/conf)
      3. Add this line: PYTHON_PATH=$PYTHON_PATH:/path/to/new/py/file/my_udfs.py
      4. Restart Spark 
      5. You will register this function in every MV you need it e.g. spark.udf.register('udf_isdate', isdate)
      Like 1
  • Thanks! This is exactly what I needed.

  • Here's the ( an? ) enhancement request no.    INC-33138 for the global UDF.

    Like 1
    • R. A. Dawson Sr that ER is intended for the Incorta engine layer by allowing UDFs in the Incorta formula language (formula columns in insights and schemas). Laura's need is for similar but this would be confined to the Spark layer for use in Materialized Views. 

Like1 Follow
  • Status Answered
  • 1 mth agoLast active
  • 6Replies
  • 26Views
  • 3 Following

Product Announcement

Incorta 5 is now Generally Available