Welcome to week 7 of Workout Wednesday for Power BI 2022. We are again using the Data Visualization Society‘s annual State of the Industry survey results.
This week we are looking at how we can visualize parallel sets. Although this may not be a visual that we use often, it can be very useful for multivariate analysis. We will build a parallel set visual to show the relationships between job role, time worked, and time spent producing visualizations. Although I used the Sankey Chart AppSource visual from Microsoft, the result is, in fact, not a Sankey chart. You are welcome to use any visual at your disposal (default, custom, Charticulator, or Deneb) to create the parallel sets visualization, but the visual must be interactive as parallel sets are best consumed by highlighting a specific node or link within the visual.
As with the Sankey Chart, most other visuals used to create parallel sets will require you to reorganize your data into links and nodes. This gives us something fun to do in Power Query!
- Connect to the job titles and tasks time data in the survey results.
- Remove any rows that have a blank answer for RoleMultichoice_composite, TimeWorked, or TimeProducingViz.
- Replace the following role values: “Leadership (Manager, Director, VP, etc.)” should be changed to “Leadership”. “Teacher” should be changed to “Academic/Teacher”.
- Use Power Query to generate a final table that contains 3 columns: Source, Target, and Count. Count should represent the distinct count of altID values. The Source column should contain the values for RoleMultichoice_composite and TimeWorked (where the data flows from). The Target column should contain the values for TimeWorked and TimeProducingViz (where the data flows to). This final table should contain 3,726 rows with 1,863 respondents.
- Create a parallel sets visualization that shows RoleMultichoice_composite on the left, TimeWorked in the middle, and TimeProducingViz on the right.
- Add a title and explanatory text to provide context and tips for use.
This week’s data comes from the Data Visualization Society’s annual State of the Industry survey. Data can be found in full (in Google sheets) here. This challenge uses data from the data_2021_ jobtitles_taskstime sheet. You can access a CSV version of the data from the README sheet.
Note: To pull data directly from Google sheets you will need a Google account. If you do not have an account, you can download the file containing the data and use it locally.
After you finish your workout, share on Twitter using the hashtags #WOW2022 and #PowerBI, and tag @JSBaucke, @MMarie, @shan_gsd, @KerryKolosko, and @NerdyWithData. Also make sure to fill out the Submission Tracker so that we can count you as a participant this week in order to track our participation throughout the year.