1

Test Text for Valid Date Content

I am attempting to write a materialized view that selects rows from a table based on whether a text field contains a valid date. I use isdate() to do this in SQL but have not been able to find a similar function in Spark SQL or PySpark.

Does anyone know what function or other filter I can use?

8replies Oldest first
  • Oldest first
  • Newest first
  • Active threads
  • Popular
  • Hi Laura Brown there does not currently exist a library function that performs date validation. While we are currently working on including a data prep capability within the Incorta platform there are limited options in the short term. You can perform your data validation prior to ingesting the data into Incorta or, if that is not feasible, you might consider creating a User Defined Function (UDF) that you can use in your PySpark code. I found an example of doing the UDF date validation here but please be certain to test it at your expected data scale before moving forward.

    Like 1
  • I am very new to spark. Is this something you do within each materialized view, or is it something you do globally? 

    Like
    • Laura Brown for each MV as they are a stand-alone application. A global UDF would be great: we are opening an Enhancement Request to add this feature. In the meantime, I checked with our Spark engineering team and there is a way to do this with some extra configuration. If you have any issues with this please open a Support ticket and they will assist.

      1. Create file and define your UDF there e.g. my_udfs.py (location can be anywhere on the server, I recommend a folder outside <incorta_home> to make sure it does not get overwritten)
      2. Open spark.env for editing (<incorta_home>/IncortaNode/spark/conf)
      3. Add this line: PYTHON_PATH=$PYTHON_PATH:/path/to/new/py/file/my_udfs.py
      4. Restart Spark 
      5. You will register this function in every MV you need it e.g. spark.udf.register('udf_isdate', isdate)
      Like 1
  • Thanks! This is exactly what I needed.

    Like
  • Here's the ( an? ) enhancement request no.    INC-33138 for the global UDF.

    Like 1
    • R. A. Dawson Sr that ER is intended for the Incorta engine layer by allowing UDFs in the Incorta formula language (formula columns in insights and schemas). Laura's need is for similar but this would be confined to the Spark layer for use in Materialized Views. 

      Like
  • Hi,

    Dustin Basil , Is there any update on this ER?

    Thanks,

    Srinivas Chava

    Like
    • Srinivas C Is this in reference to Laura's question about validating dates in Spark SQL or R.A. Dawson's comment about Incorta potentially adding User Defined Functions for use with the Incorta formula column syntax? For the former, I'm not aware of an official feature request and for the latter we have a feature request but not yet a target release date. Note the Product Ideas & Feature Requests section of our community is a good place to suggest a feature so other customers may see and vote on it. 

      Like 1
Like1 Follow
  • Status Answered
  • 1 Likes
  • 1 mth agoLast active
  • 8Replies
  • 46Views
  • 4 Following

Product Announcement

A new community experience is coming! If you would like to have beta access to provide feedback, please contact us at community@incorta.com.