08-13-2023 04:08 AM
Hello,
Is there any way to Pivot some of the raw data columns (not in PySpark) prior to using the data for analytics?
Also, Is there any way to use Python instead of PySpark in the MV creation?
08-14-2023 08:42 AM
@rahabib - I was looking to understand a little more on what you are looking for.
Are you looking to use other languages like Spark Scala or Spark R? Or are you looking to use python for easier-to-express functions?
Pyspark Example:
# Sample data
data = [
("Alice", "Math", 90),
("Alice", "Physics", 85),
("Bob", "Math", 75),
("Bob", "Physics", 80),
("Alice", "Chemistry", 88),
("Bob", "Chemistry", 92)
]
# Create a DataFrame
columns = ["student", "subject", "score"]
df = spark.createDataFrame(data, columns)
# Pivot the table
pivot_df = df.groupBy("student").pivot("subject").agg({"score": "first"})
pivot_df.show()
PySpark using Pandas example:
import pandas as pd
data = [
("Alice", "Math", 90),
("Alice", "Physics", 85),
("Bob", "Math", 75),
("Bob", "Physics", 80),
("Alice", "Chemistry", 88),
("Bob", "Chemistry", 92)
]
# Create a DataFrame
columns = ["student", "subject", "score"]
df = pd.DataFrame(data, columns=columns)
# Pivot the table
pivot_df = df.pivot(index="student", columns="subject", values="score")
print(pivot_df)