Pivot Raw Data

rahabib — Sun, 13 Aug 2023 11:08:38 GMT

Hello,

Is there any way to Pivot some of the raw data columns (not in PySpark) prior to using the data for analytics?

Also, Is there any way to use Python instead of PySpark in the MV creation?

Re: Pivot Raw Data

JoeM — Mon, 14 Aug 2023 15:42:20 GMT

@rahabib - I was looking to understand a little more on what you are looking for.

Are you looking to use other languages like Spark Scala or Spark R? Or are you looking to use python for easier-to-express functions?
Pyspark Example:

# Sample data data = [ ("Alice", "Math", 90), ("Alice", "Physics", 85), ("Bob", "Math", 75), ("Bob", "Physics", 80), ("Alice", "Chemistry", 88), ("Bob", "Chemistry", 92) ] # Create a DataFrame columns = ["student", "subject", "score"] df = spark.createDataFrame(data, columns) # Pivot the table pivot_df = df.groupBy("student").pivot("subject").agg({"score": "first"}) pivot_df.show()

PySpark using Pandas example:

import pandas as pd data = [ ("Alice", "Math", 90), ("Alice", "Physics", 85), ("Bob", "Math", 75), ("Bob", "Physics", 80), ("Alice", "Chemistry", 88), ("Bob", "Chemistry", 92) ] # Create a DataFrame columns = ["student", "subject", "score"] df = pd.DataFrame(data, columns=columns) # Pivot the table pivot_df = df.pivot(index="student", columns="subject", values="score") print(pivot_df)

topic Re: Pivot Raw Data in Data & Schema Discussions

Pivot Raw Data

Re: Pivot Raw Data