1

HOWTO: Unpivot a table with python

Related to the pivoting of data I also needed to find a good way to unpivot data, specifically from the census data.

The data comes with age groups in separate columns:

    Year, State, City, Under_5, age5_9, age10_14, etc

I needed it to be in the format of:

    year, state, city, age_group, population

Here's the python code :

df = read("CENSUS_INT.CITY_POP_AGE_PREPIVOT")
newdf = df.selectExpr("Year","State","City","stack(18,'Under-5',Under_5,'age5-9',age5_9,'age10-14',age10_14,'age15-19',age15_19,'age20-24',age20_24,'age25-29',age25_29,'age30-34',age30_34,'age35-39',age35_39,'age40-44',age40_44,'age45-49',age45_49,'age50-54',age50_54,'age55-59',age55_59,'age60-64',age60_64,'age65-69',age65_69,'age70-74',age70_74,'age75-79',age75_79,'age80-84',age80_84,'age85-Plus',age85_Plus) as (age_group,population)").where("population is not null")
save(newdf)

This pulls out 18 different age groups and specifies the field names to be (age_group,population) for any row where the population isn't null. The 18 would need to be changed for a different number of fields.

The value that gets added to the table is the part in quotes so for the snippet

'Under-5',Under_5

This means that Under_5 is the fieldname in the original table and 'Under-5' is the value that will get added for the age_group field.

Also, the source table CITY_POP_AGE_PREPIVOT includes only the fields that are referenced in the script.

1reply Oldest first
  • Oldest first
  • Newest first
  • Active threads
  • Popular
Like1 Follow
  • 1 Likes
  • 2 wk agoLast active
  • 1Replies
  • 13Views
  • 2 Following