Showing results for 
Search instead for 
Did you mean: 

Challenge #1 Employees transition between departments using the Sankey chart


In an organization for software development, some employees make a decision to move to another department to gain more experience and help in the implementation and improvement processes,

As an end user, I need to track and have a complete picture of the flow between the organization’s departments

Can you help me get this data insight??

I would like to see a Sankey chart showing the movement of the employees between departments in an organization based on the employees' count I believe it should be something like this:


You can use the below-attached data sets files:




Also, you can check this article as an example for the Sankey chart

Please try it, and attach here the outputs(schema and the dashboard) I Hope you enjoy it 😊.



Here is my output.
Employee Transitions Between departments - CommunityChallenge1.png

  1. Upload the CSV files, create a schema and create a table from each one.marwansalem_0-1669141475930.png
  2. Create Materialized View (MV) that generates all the employee transitions.
  3. Create another table that shows the total number of employees that moved from one department to another (EmployeeTransitionSummary)
  4. Create a new insight, and use Dept Name as the source, New Dept Name as the target, and Employee Count as the measure. Optionally sort the source field by Employee Count. (We can also use Insight over Insight instead of creating a separate table for EmployeeTransitionSummary)marwansalem_0-1669142207352.png

Now for the Materialized View in step#2, here is the PySpark script for EmployeeTransitions.

from pyspark.sql.window import Window
from pyspark.sql.functions import lead
dept = read('CommunityChallenge1.department')
dept_emp = read('CommunityChallenge1.dept_emp').withColumnRenamed('dept_no', 'employee_dept_no')

# join with department table to include department name
dept_emp = dept_emp.join(dept, dept.dept_no == dept_emp.employee_dept_no).drop('employee_dept_no')

# partition by the employee number and order by start date "from_date"
windowSpec  = Window.partitionBy("emp_no").orderBy("from_date")

# use lead function to get the "next" department name
transitions = dept_emp.withColumn('new_dept_name', lead("dept_name", 1).over(windowSpec))
transitions = transitions.filter(transitions.new_dept_name.isNotNull()).sort('emp_no')

You can learn more about windows functions in PySpark here.


I am solving this challenge using the power of Incorta's no-code/low-code capabilities.

1.) Upload the CSV files, create a schema and create tables for the department.csv and dept_emp.csv files.

2.) Create aliases of department and dept_emp.

3.) Join dept_emp to the department on the dept_no. Join dept_emp to its alias with the below joins. Join the dept_emp alias to the department alias on dept_no.

Screen Shot 2022-11-23 at 4.39.49 PM.png

4.) Add a Formula Columns to the dept_emp table with the Column Formula of 1. We are creating this because at this time, the Sankey only supports sums and not counts.

Screen Shot 2022-11-23 at 4.43.14 PM.png

5.) Create an Aggregate Table insight. Put the Dept Name from the department table in the Grouping Dimension. Put the Dept Name from the department alias table in the Grouping Dimension below the other Dept Name, I recommend renaming it not to confuse the two. The formula column you created should be in the measure. To filter only on transfers, grab any column from the dept_emp alias table, place it in the Individual Filter, and only show Not Null records.

Screen Shot 2022-11-23 at 4.44.54 PM.png

6.) Change the Insight into a Sankey and add the formula column you created as the Sort By of the Dept Name in the Source.

Screen Shot 2022-11-23 at 4.51.23 PM.png

 Here is the diagram of my Sankey schema for this challenge.

Screen Shot 2022-11-23 at 4.38.33 PM.png

The join between the dept_emp and alias of dept_emp only works because the to date from the previous department matches the from date of the new department. If this was different we would have to create a different solution using Incortas First Version filter or SQL/Pyspark.

Screen Shot 2022-11-23 at 5.00.42 PM.png