Pivot Raw Data

rahabib · ‎08-13-2023

Hello,

Is there any way to Pivot some of the raw data columns (not in PySpark) prior to using the data for analytics?

Also, Is there any way to use Python instead of PySpark in the MV creation?

JoeM · ‎08-14-2023

@rahabib - I was looking to understand a little more on what you are looking for.

Are you looking to use other languages like Spark Scala or Spark R? Or are you looking to use python for easier-to-express functions?
Pyspark Example:

# Sample data
data = [
    ("Alice", "Math", 90),
    ("Alice", "Physics", 85),
    ("Bob", "Math", 75),
    ("Bob", "Physics", 80),
    ("Alice", "Chemistry", 88),
    ("Bob", "Chemistry", 92)
]

# Create a DataFrame
columns = ["student", "subject", "score"]
df = spark.createDataFrame(data, columns)

# Pivot the table
pivot_df = df.groupBy("student").pivot("subject").agg({"score": "first"})

pivot_df.show()

PySpark using Pandas example:

import pandas as pd
data = [
    ("Alice", "Math", 90),
    ("Alice", "Physics", 85),
    ("Bob", "Math", 75),
    ("Bob", "Physics", 80),
    ("Alice", "Chemistry", 88),
    ("Bob", "Chemistry", 92)
]

# Create a DataFrame
columns = ["student", "subject", "score"]
df = pd.DataFrame(data, columns=columns)

# Pivot the table
pivot_df = df.pivot(index="student", columns="subject", values="score")

print(pivot_df)