1

Text source has MM/DD/YY - Can I define a date input format so Incorta reads it correctly?

If I set the Incorta data type to "date" ( which, er, it is ), I get rejects ( see attached ).   < edited to remove reference to epoch time - I was conflating different issue >

 

What I'd like to do is have Incorta read it as a date ( it is used as such downstream ) and be able to define the input format - ideally per table.

 

Maybe this belongs in enhancement requests?

5replies Oldest first
  • Oldest first
  • Newest first
  • Active threads
  • Popular
  • Update:  I added a formula column:  parseDate(Date_Started, "MM/dd/yyyy")   which returns a date in analyzer just fine.  However, when I try to use that formula column in an MV ( PostgresSQL ) I get a function not recognized error.   I can't use SPARK SQL because I can't select a formula column w/ SPARK so it's not looking good ...    

    Like
  • With friendly error messages like this who needs enemy ( error messages ):   

     

    Schema: QSR_Metrics
    Start Time: 10:53:21 PM 11/15/2021
    Error: 
    Table: SMG_Core_DeDup
    Message: INC_03070101: Transformation error [2021-11-16 04:53:38,868 ERROR datasources.FileFormatWriter: Aborting job ddbe70a2-f915-4bbc-87a9-c2d9a27f918f.
    org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree:
        at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56)
        at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.doExecute(ShuffleExchangeExec.scala:151)
        at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:180)
        at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215)
        at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:176)
        at org.apache.spark.sql.execution.InputAdapter.inputRDD(WholeStageCodegenExec.scala:525)
        at org.apache.spark.sql.execution.InputRDDCodegen.inputRDDs(WholeStageCodegenExec.scala:453)
        at org.apache.spark.sql.execution.InputRDDCodegen.inputRDDs$(WholeStageCodegenExec.scala:452)
        at org.apache.spark.sql.execution.InputAdapter.inputRDDs(WholeStageCodegenExec.scala:496)
        at org.apache.spark.sql.execution.SortExec.inputRDDs(SortExec.scala:132)
        at org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:720)
        at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:180)
        at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215)
        at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:176)
        at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:177)
        at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:178)
        at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:108)
        at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:106)
        at org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:131)
        at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:180)
        at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215)
        at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:176)
        at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:127)
        at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:126)
        at org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:962)
        at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:100)
        at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:160)
        at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:87)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764)
        at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
        at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:962)
        at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:414)
        at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:398)
        at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:287)
        at org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:847)
        at com.incorta.sqlApp.transformer.TransfomerJob.SaveOrSchema(TransfomerJob.java:70)
        at com.incorta.sqlApp.transformer.TransfomerJob.execute(TransfomerJob.java:80)
        at com.incorta.sqlApp.trans
    Error Trace:
    com.incorta.api.exceptions.DataException: INC_03070101: Transformation error [2021-11-16 04:53:38,868 ERROR datasources.FileFormatWriter: Aborting job ddbe70a2-f915-4bbc-87a9-c2d9a27f918f.
    org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree:
        at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56)
        at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.doExecute(ShuffleExchangeExec.scala:151)
        at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:180)
        at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215)
        at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:176)
        at org.apache.spark.sql.execution.InputAdapter.inputRDD(WholeStageCodegenExec.scala:525)
        at org.apache.spark.sql.execution.InputRDDCodegen.inputRDDs(WholeStageCodegenExec.scala:453)
        at org.apache.spark.sql.execution.InputRDDCodegen.inputRDDs$(WholeStageCodegenExec.scala:452)
        at org.apache.spark.sql.execution.InputAdapter.inputRDDs(WholeStageCodegenExec.scala:496)
        at org.apache.spark.sql.execution.SortExec.inputRDDs(SortExec.scala:132)
        at org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:720)
        at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:180)
        at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215)
        at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:176)
        at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:177)
        at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:178)
        at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:108)
        at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:106)
        at org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:131)
        at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:180)
        at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215)
        at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:176)
        at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:127)
        at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:126)
        at org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:962)
        at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:100)
        at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:160)
        at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:87)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764)
        at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
        at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:962)
        at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:414)
        at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:398)
        at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:287)
        at org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:847)
        at com.incorta.sqlApp.transformer.TransfomerJob.SaveOrSchema(TransfomerJob.java:70)
        at com.incorta.sqlApp.transfor

    Like
  • Here's how I ended up resolving it** - as you can see it makes a very strong case for the enhancement request to define a UDF so this doesn't have to be replicated every time I get a text file source with MM/DD/YYYY formatted dates!

     

    I swapped out the real schema and table names.

    ===========================

    date(
        concat(
            (CASE(
                AND(
                    find(
                        "/", 
                        SCHEMA.TABLE.Date_Started
                    ) = 2, 
                    findlast(
                        "/", 
                        SCHEMA.TABLE.Date_Started
                    ) = 5
                ), 
                substring(
                    SCHEMA.TABLE.Date_Started, 
                    6, 
                    4
                ), 
                AND(
                    find(
                        "/", 
                        SCHEMA.TABLE.Date_Started
                    ) = 2, 
                    findlast(
                        "/", 
                        SCHEMA.TABLE.Date_Started
                    ) = 4
                ), 
                substring(
                    SCHEMA.TABLE.Date_Started, 
                    5, 
                    4
                ), 
                AND(
                    find(
                        "/", 
                        SCHEMA.TABLE.Date_Started
                    ) = 1, 
                    findlast(
                        "/", 
                        SCHEMA.TABLE.Date_Started
                    ) = 3
                ), 
                substring(
                    SCHEMA.TABLE.Date_Started, 
                    4, 
                    8
                ), 
                AND(
                    find(
                        "/", 
                        SCHEMA.TABLE.Date_Started
                    ) = 1, 
                    findlast(
                        "/", 
                        SCHEMA.TABLE.Date_Started
                    ) = 4
                ), 
                substring(
                    SCHEMA.TABLE.Date_Started, 
                    5, 
                    9
                ), 
                "1111"
            )), 
            '-', 
            CASE(
                find(
                    "/", 
                    SCHEMA.TABLE.Date_Started
                ) = 2, 
                substring(
                    SCHEMA.TABLE.Date_Started, 
                    0, 
                    2
                ), 
                find(
                    "/", 
                    SCHEMA.TABLE.Date_Started
                ) = 1, 
                substring(
                    SCHEMA.TABLE.Date_Started, 
                    0, 
                    1
                ), 
                "00"
            ), 
            '-', 
            CASE(
                AND(
                    find(
                        "/", 
                        SCHEMA.TABLE.Date_Started
                    ) = 2, 
                    findlast(
                        "/", 
                        SCHEMA.TABLE.Date_Started
                    ) = 5
                ), 
                substring(
                    SCHEMA.TABLE.Date_Started, 
                    3, 
                    2
                ), 
                AND(
                    find(
                        "/", 
                        SCHEMA.TABLE.Date_Started
                    ) = 1, 
                    findlast(
                        "/", 
                        SCHEMA.TABLE.Date_Started
                    ) = 3
                ), 
                substring(
                    SCHEMA.TABLE.Date_Started, 
                    2, 
                    1
                ), 
                AND(
                    find(
                        "/", 
                        SCHEMA.TABLE.Date_Started
                    ) = 2, 
                    findlast(
                        "/", 
                        SCHEMA.TABLE.Date_Started
                    ) = 4
                ), 
                substring(
                    SCHEMA.TABLE.Date_Started, 
                    3, 
                    1
                ), 
                AND(
                    find(
                        "/", 
                        SCHEMA.TABLE.Date_Started
                    ) = 1, 
                    findlast(
                        "/", 
                        SCHEMA.TABLE.Date_Started
                    ) = 4
                ), 
                substring(
                    SCHEMA.TABLE.Date_Started, 
                    2, 
                    2
                ), 
                "00"
            )
        )
    )

    ===========================

    Like
  • Dawson, did you try setting the date format in the Data Source UI as shown in the attached image

    Like 1
    • Anurag Malik  Ah - I did not!   I'll add that to my "test this" list since I've got the workaround in place and will add it to the support ticket as well.

       

      Thank you!

      Like 1
Like1 Follow
  • Status Answered
  • 1 Likes
  • 2 wk agoLast active
  • 5Replies
  • 21Views
  • 2 Following

Product Announcement

Incorta 2014.2 is now available!