cancel
Showing results for 
Search instead for 
Did you mean: 

Pivot Raw Data

rahabib
Partner
Partner

Hello,

Is there any way to Pivot some of the raw data columns (not in PySpark) prior to using the data for analytics?

Also, Is there any way to use Python instead of PySpark in the MV creation?

1 REPLY 1

JoeM
Community Manager
Community Manager

@rahabib  - I was looking to understand a little more on what you are looking for. 

Are you looking to use other languages like Spark Scala or Spark R? Or are you looking to use python for easier-to-express functions?
Pyspark Example:

# Sample data
data = [
    ("Alice", "Math", 90),
    ("Alice", "Physics", 85),
    ("Bob", "Math", 75),
    ("Bob", "Physics", 80),
    ("Alice", "Chemistry", 88),
    ("Bob", "Chemistry", 92)
]

# Create a DataFrame
columns = ["student", "subject", "score"]
df = spark.createDataFrame(data, columns)

# Pivot the table
pivot_df = df.groupBy("student").pivot("subject").agg({"score": "first"})

pivot_df.show()

PySpark using Pandas example:

import pandas as pd
data = [
    ("Alice", "Math", 90),
    ("Alice", "Physics", 85),
    ("Bob", "Math", 75),
    ("Bob", "Physics", 80),
    ("Alice", "Chemistry", 88),
    ("Bob", "Chemistry", 92)
]

# Create a DataFrame
columns = ["student", "subject", "score"]
df = pd.DataFrame(data, columns=columns)

# Pivot the table
pivot_df = df.pivot(index="student", columns="subject", values="score")

print(pivot_df)