<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic PySpark log stdout in Data &amp; Schema Discussions</title>
    <link>https://community.incorta.com/t5/data-schema-discussions/pyspark-log-stdout/m-p/5149#M403</link>
    <description>&lt;P&gt;I am building materialized view with Python Spark and since I am very new to Python I am struggling a little bit. In order to do some debugging I would like to be able to log some hints from the code, most likely in stdout (but I will be happy with any log that is accessible from Spark Master or Incorta itself). I tried few methods I found on Stack Overflow, but none of them worked. Any suggestions?&lt;/P&gt;</description>
    <pubDate>Fri, 13 Oct 2023 08:47:15 GMT</pubDate>
    <dc:creator>Piotrek82X</dc:creator>
    <dc:date>2023-10-13T08:47:15Z</dc:date>
    <item>
      <title>PySpark log stdout</title>
      <link>https://community.incorta.com/t5/data-schema-discussions/pyspark-log-stdout/m-p/5149#M403</link>
      <description>&lt;P&gt;I am building materialized view with Python Spark and since I am very new to Python I am struggling a little bit. In order to do some debugging I would like to be able to log some hints from the code, most likely in stdout (but I will be happy with any log that is accessible from Spark Master or Incorta itself). I tried few methods I found on Stack Overflow, but none of them worked. Any suggestions?&lt;/P&gt;</description>
      <pubDate>Fri, 13 Oct 2023 08:47:15 GMT</pubDate>
      <guid>https://community.incorta.com/t5/data-schema-discussions/pyspark-log-stdout/m-p/5149#M403</guid>
      <dc:creator>Piotrek82X</dc:creator>
      <dc:date>2023-10-13T08:47:15Z</dc:date>
    </item>
    <item>
      <title>Re: PySpark log stdout</title>
      <link>https://community.incorta.com/t5/data-schema-discussions/pyspark-log-stdout/m-p/5196#M414</link>
      <description>&lt;P&gt;If you are using Incorta on-prem, you can access the Spark webUI for the Spark standalone cluster.&amp;nbsp; It depends on your Spark configuration. it is typically runnning under the spark machine with the port 9091. It looks like this page:&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Screen Shot 2023-10-24 at 5.14.54 AM.png" style="width: 400px;"&gt;&lt;img src="https://community.incorta.com/t5/image/serverpage/image-id/2475i37779E88128CF1AA/image-size/medium?v=v2&amp;amp;px=400" role="button" title="Screen Shot 2023-10-24 at 5.14.54 AM.png" alt="Screen Shot 2023-10-24 at 5.14.54 AM.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;You can then click on the application ID to access the Application page like below.&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Screen Shot 2023-10-24 at 5.21.27 AM.png" style="width: 400px;"&gt;&lt;img src="https://community.incorta.com/t5/image/serverpage/image-id/2476iABD1DFE25CA41212/image-size/medium?v=v2&amp;amp;px=400" role="button" title="Screen Shot 2023-10-24 at 5.21.27 AM.png" alt="Screen Shot 2023-10-24 at 5.21.27 AM.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;The log file is accessible by clicking on the stderr link and the log looks like this page:&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Screen Shot 2023-10-24 at 5.26.12 AM.png" style="width: 400px;"&gt;&lt;img src="https://community.incorta.com/t5/image/serverpage/image-id/2477i70650C46BA7E302A/image-size/medium?v=v2&amp;amp;px=400" role="button" title="Screen Shot 2023-10-24 at 5.26.12 AM.png" alt="Screen Shot 2023-10-24 at 5.26.12 AM.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt; If you are using Incorta Cloud.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;The Spark history server page is accessible under &lt;A href="https://xxx.cloudx.incorta.com/applications" target="_blank" rel="noopener"&gt;https://xxx.cloudx.incorta.com/applications&lt;/A&gt;.&lt;/P&gt;
&lt;P&gt;It includes the two pages, one for the completed jobs and the other for the incomplete jobs. It looks like this page.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Screen Shot 2023-10-24 at 5.35.15 AM.png" style="width: 400px;"&gt;&lt;img src="https://community.incorta.com/t5/image/serverpage/image-id/2478i6D0C6C6458A3E53A/image-size/medium?v=v2&amp;amp;px=400" role="button" title="Screen Shot 2023-10-24 at 5.35.15 AM.png" alt="Screen Shot 2023-10-24 at 5.35.15 AM.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt; &lt;/P&gt;
&lt;P&gt;You can download the event logs that is available as a json file.&lt;/P&gt;
&lt;P&gt;I won't be able to describe how to read the event logs or how to debug here.&amp;nbsp; The above is about how to access the event logs.&lt;/P&gt;
&lt;P&gt;Hope it helps.&lt;/P&gt;
&lt;P&gt; &lt;/P&gt;</description>
      <pubDate>Tue, 24 Oct 2023 12:42:09 GMT</pubDate>
      <guid>https://community.incorta.com/t5/data-schema-discussions/pyspark-log-stdout/m-p/5196#M414</guid>
      <dc:creator>dylanwan</dc:creator>
      <dc:date>2023-10-24T12:42:09Z</dc:date>
    </item>
    <item>
      <title>Re: PySpark log stdout</title>
      <link>https://community.incorta.com/t5/data-schema-discussions/pyspark-log-stdout/m-p/5206#M415</link>
      <description>&lt;P&gt;HI, thanks, unfortunately it is not answer to my question. I know how to access Spark Master and how to read it. What I don't know is how to write to it from Python Spark Materialized View.&lt;/P&gt;</description>
      <pubDate>Tue, 24 Oct 2023 15:56:34 GMT</pubDate>
      <guid>https://community.incorta.com/t5/data-schema-discussions/pyspark-log-stdout/m-p/5206#M415</guid>
      <dc:creator>Piotrek82X</dc:creator>
      <dc:date>2023-10-24T15:56:34Z</dc:date>
    </item>
    <item>
      <title>Re: PySpark log stdout</title>
      <link>https://community.incorta.com/t5/data-schema-discussions/pyspark-log-stdout/m-p/5210#M416</link>
      <description>&lt;P&gt;I see.&amp;nbsp; I typically use Incorta Notebook to develop PySpark Materialized Views.&amp;nbsp; &amp;nbsp;We can get immediate feedback by running each paragraph.&amp;nbsp; We can then inspect the data by using some of the functions available in the Incorta Notebook.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Here are examples:&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;df = read("EBS_PARTY_COMMON.HZ_PARTIES")
incorta.show(df)&lt;/LI-CODE&gt;
&lt;P&gt;We can use incorta.show(df) to preview the data.&lt;/P&gt;
&lt;LI-CODE lang="python"&gt;incorta.printSchema(df)&lt;/LI-CODE&gt;
&lt;P&gt;I may show the data type of the schema before proceeding further.&lt;/P&gt;
&lt;P&gt;Typing any variable name in a paragraph and running the paragraph means inspecting the value.&lt;/P&gt;
&lt;LI-CODE lang="python"&gt;df&lt;/LI-CODE&gt;
&lt;P&gt;Some of the python built-functions may help.&amp;nbsp;&amp;nbsp;&lt;/P&gt;
&lt;LI-CODE lang="python"&gt;type(df)&lt;/LI-CODE&gt;
&lt;P&gt;Use type() to confirm the data type of a variable.&lt;/P&gt;
&lt;P&gt;Spark actions can be useful as well&lt;/P&gt;
&lt;LI-CODE lang="python"&gt;df.count()  ## to show the number of rows in the dataframe
df.show() ## use incorta.show(df) to get the formated data
df.first(10) ## get the first 10 rows&lt;/LI-CODE&gt;
&lt;P&gt;We can examine the data using a SQL statement anytime:&lt;/P&gt;
&lt;LI-CODE lang="python"&gt;df.createOrReplaceTempView("TBL")  ## TBL can be any name
spark.sql("""
  SELECT col1, col2
  FROM TBL --the name given above
  WHERE  col3 = '123'
""").show()&lt;/LI-CODE&gt;
&lt;P&gt;We can view the output below the paragraph.&lt;/P&gt;
&lt;P&gt;Hope this helps&lt;/P&gt;</description>
      <pubDate>Fri, 27 Oct 2023 00:37:28 GMT</pubDate>
      <guid>https://community.incorta.com/t5/data-schema-discussions/pyspark-log-stdout/m-p/5210#M416</guid>
      <dc:creator>dylanwan</dc:creator>
      <dc:date>2023-10-27T00:37:28Z</dc:date>
    </item>
    <item>
      <title>Re: PySpark log stdout</title>
      <link>https://community.incorta.com/t5/data-schema-discussions/pyspark-log-stdout/m-p/5230#M417</link>
      <description>&lt;P&gt;&lt;a href="https://community.incorta.com/t5/user/viewprofilepage/user-id/924"&gt;@Piotrek82X&lt;/a&gt;&amp;nbsp; - did this answer your question? The notebook is a great way to interactively view data and outputs during development.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 03 Nov 2023 13:52:36 GMT</pubDate>
      <guid>https://community.incorta.com/t5/data-schema-discussions/pyspark-log-stdout/m-p/5230#M417</guid>
      <dc:creator>JoeM</dc:creator>
      <dc:date>2023-11-03T13:52:36Z</dc:date>
    </item>
  </channel>
</rss>

