HOWTO: Unpivot a table with python
Related to the pivoting of data I also needed to find a good way to unpivot data, specifically from the census data.
The data comes with age groups in separate columns:
Year, State, City, Under_5, age5_9, age10_14, etc
I needed it to be in the format of:
year, state, city, age_group, population
Here's the python code :
df = read("CENSUS_INT.CITY_POP_AGE_PREPIVOT") newdf = df.selectExpr("Year","State","City","stack(18,'Under-5',Under_5,'age5-9',age5_9,'age10-14',age10_14,'age15-19',age15_19,'age20-24',age20_24,'age25-29',age25_29,'age30-34',age30_34,'age35-39',age35_39,'age40-44',age40_44,'age45-49',age45_49,'age50-54',age50_54,'age55-59',age55_59,'age60-64',age60_64,'age65-69',age65_69,'age70-74',age70_74,'age75-79',age75_79,'age80-84',age80_84,'age85-Plus',age85_Plus) as (age_group,population)").where("population is not null") save(newdf)
This pulls out 18 different age groups and specifies the field names to be (age_group,population) for any row where the population isn't null. The 18 would need to be changed for a different number of fields.
The value that gets added to the table is the part in quotes so for the snippet
This means that Under_5 is the fieldname in the original table and 'Under-5' is the value that will get added for the age_group field.
Also, the source table CITY_POP_AGE_PREPIVOT includes only the fields that are referenced in the script.