cancel
Showing results for 
Search instead for 
Did you mean: 

PySpark log stdout

Piotrek82X
Rocketeer

I am building materialized view with Python Spark and since I am very new to Python I am struggling a little bit. In order to do some debugging I would like to be able to log some hints from the code, most likely in stdout (but I will be happy with any log that is accessible from Spark Master or Incorta itself). I tried few methods I found on Stack Overflow, but none of them worked. Any suggestions?

4 REPLIES 4

dylanwan
Employee
Employee

If you are using Incorta on-prem, you can access the Spark webUI for the Spark standalone cluster.  It depends on your Spark configuration. it is typically runnning under the spark machine with the port 9091. It looks like this page:

Screen Shot 2023-10-24 at 5.14.54 AM.png

You can then click on the application ID to access the Application page like below.

Screen Shot 2023-10-24 at 5.21.27 AM.png

The log file is accessible by clicking on the stderr link and the log looks like this page:

Screen Shot 2023-10-24 at 5.26.12 AM.png

 If you are using Incorta Cloud. 

 The Spark history server page is accessible under https://xxx.cloudx.incorta.com/applications.

It includes the two pages, one for the completed jobs and the other for the incomplete jobs. It looks like this page. 

Screen Shot 2023-10-24 at 5.35.15 AM.png

You can download the event logs that is available as a json file.

I won't be able to describe how to read the event logs or how to debug here.  The above is about how to access the event logs.

Hope it helps.

HI, thanks, unfortunately it is not answer to my question. I know how to access Spark Master and how to read it. What I don't know is how to write to it from Python Spark Materialized View.

I see.  I typically use Incorta Notebook to develop PySpark Materialized Views.   We can get immediate feedback by running each paragraph.  We can then inspect the data by using some of the functions available in the Incorta Notebook. 

Here are examples:

df = read("EBS_PARTY_COMMON.HZ_PARTIES")
incorta.show(df)

We can use incorta.show(df) to preview the data.

incorta.printSchema(df)

I may show the data type of the schema before proceeding further.

Typing any variable name in a paragraph and running the paragraph means inspecting the value.

df

Some of the python built-functions may help.  

type(df)

Use type() to confirm the data type of a variable.

Spark actions can be useful as well

df.count()  ## to show the number of rows in the dataframe
df.show() ## use incorta.show(df) to get the formated data
df.first(10) ## get the first 10 rows

We can examine the data using a SQL statement anytime:

df.createOrReplaceTempView("TBL")  ## TBL can be any name
spark.sql("""
  SELECT col1, col2
  FROM TBL --the name given above
  WHERE  col3 = '123'
""").show()

We can view the output below the paragraph.

Hope this helps

JoeM
Community Manager
Community Manager

@Piotrek82X  - did this answer your question? The notebook is a great way to interactively view data and outputs during development.