Recreating Gapminder in Tableau: A Humble tribute to Hans Rosling
“My interest is not data, it’s the world. And part of world development you can see in numbers.” — Hans Rosling
Hans Rosling was a visionary. He had a way with numbers. A physician, teacher and statistician, he challenged millions of peoples’ biased notions about basic issues like poverty and population growth. He did not achieve this by giving mundane lectures or boring presentations but by using clever visualisations, which ushered in an era of smart data visualisation techniques. Rosling, together with his son and daughter-in-law, co-founded the Gapminder Foundation to develop Trendalyzer, a software to convert international statistics into moving, interactive graphics.
In the above video, Hans Rosling takes us through 200 years of global development. In this spectacular section of ‘The Joy of Stats’, he tells the story of the world’s 200 countries over 200 years using 120,000 numbers — in just four minutes. Plotting life expectancy against income for every country since 1810, Hans showed how the world we live in is radically different from the world most of us imagine to be.
I will try to recreate the same visualization (as shown in the video/talk) to analyse how Life Expectancy in years (health) and GDP per capita (wealth) have changed over time in the world for various countries.This will a small tribute to the master storyteller who passed away on 7 February 2017.
This article assumes a basic familiarity with Tableau and it’s functions.
We require data pertaining to the following parameters :
- Life Expectancy in years
- Income per person (GDP/capita, PPP$ inflation-adjusted)
- Population, Total
- Regions of the World
So how did I go about creating Rosling’s impactful story in tableau? Let’s dive in and find out.
1. Connecting to the Data Sources in Tableau
The first step was to import the downloaded data files into the Tableau workspace.
- On tableau Home screen, click on the text file and connect to the
GDP/income per personcsv file .
2. First row actually represents the year as data in the file is arranged as cross-tab of Countries (Rows) against Years (Columns).
3. We can correct this either by using Data Interpreter or by manually making the first row as headers just like below. Now we will rename the first column as
Country and proceed ahead.
4. In order to make sense of this data, we need to convert the cross tab data into a tabular format. For this Tableau’s Pivot feature will come in handy. So, Select all the columns from 1800 to 2018 and right click on Pivot.
5. Rename the columns to
Income per Person as shown below.
6. Repeat the same steps for the
Life Expectancy data and the
Population data. The final view should be similar to the image below.
7. Lastly, connect to the
Country to the data source. You will notice that the first row is already the header, which is good. But there are many columns here which have no use in our working. Right Click on the columns to hide the ones not required. We will only need the
alpha-2 in the sheet) column.
Now we have all the Data sources in place. Let’s click on the
Sheet 1 tab on the on to the bottom left to go the workspace with all the data sources appearing in the Top Left corner. Rename the sheet as
Life Expectancy VS Income
2. Deriving Relationships between Data
We intend to show the relationship between Income/GDP and their Life expectancy per person for all the countries right from the year 1800 to the year 2018. Inorder to achieve this we will follow the following steps:
- Drag Income/GDP per Capita on column Shelf
- Drag Life Expectancy on the Row shelf.
3. Since we are using a different data source, Tableau expects us to define the relationship between the two data sources. Go to
Data > Edit Relationships and confirm the default relationship between Income and Life expectancy. Review the relationship for all secondary data sources and click ok.
3. Creating a Scatter Plot
Finally, it is time to create a plot between the two axes. We will create a plot first and then format it to mimic Rosling’s graph. Since the visualisation in Gapminder denotes countries with bubbles, so a scatter plot can be used for the same with the size of the bubble showing the population of each country. Also, we will denote each region of the world with a unique colour for visual clarity.
- Put Country on Details Shelf to create Scatter plot
- Put Total Population on Size Shelf
- Put region on Colour Shelf
Well, it seems that the graph has started taking shape. But it seems both the Income and Life Expectancy axes do not conform to normal range. Let us now work on that now.
4. Formatting the Axes
The ranges for both
GDP and the
Life Expectancy axes seem well out of range. Let us fix the
Life expectancy range between 25 and 85 and for
Income between $200 and $50,000.
- Changing the Life Expectancy Axis.
2. Changing the Income Axis
We will Convert the Income axis to logarithmic Scale and format it to reflect the currency.
5. Creating the Animation
We will now work with the Pages shelf in tableau where the actual magic will take place.
The Pages shelf allows us to break a view into a series of pages so that we can better analyze how a specific field affects the rest of the data in a view. When we place a dimension on the Pages shelf we are adding a new row for each member in the dimension. When we place a measure on the Pages shelf, Tableau automatically converts the measure into a discrete measure.Let us use it in out workflow.
- Since, in the Rosling’s graph, the Year is constantly changing to show changing times, we will put Year (from Primary data source) on Pages Shelf.
We observe The Pages shelf indeed created a set of pages, with a different view on each page. Each view is based on the member of the field placed on the Pages shelf which in this case is Years. We can adjust the Page Shelf settings such as speed, whether to show history, trail etc. as per our preferences and see the animation.
6. Resizing the Bubbles
- We have successfully created the visualization.But the size of the bubbles is too small to be seen.Lets resize them.
So here we have the animated visualisation depicting the relationship between Life Expectancy and Income per person over the last 200 Years. You can also do some minor changes if you want it to completely mimic the Gapminder visualisation like matching the
region colour coding to the original graph, adding
label for countries name etc. You can experiment with tableau as per your choice.
One can also convert this work into the Dashboard for publishing purpose and for more options.
Below is the outcome (animated gif) of the steps performed above.
Let us get to know what are the key findings from this graph which makes it so important to the world.
- We have an axis for health life i.e Life Expectancy from 25 years to 75 years and another axis for wealth i.e income per person from. This implies countries lying at the bottom left of the graph are poor and sick and the ones at the top right are rich and healthy.
- Contrary to the normal misconception, the world is not divided into 2 categories ie. developed and developing worlds. In fact, there are 4 income levels and the majority of the population lives in the middle. As we can see, the two large red circles denote the Asian Giants: India and China which lie in Income level 2 and Income level 3 and they are slowly inching towards the 4th Income level.
- That huge historical gap between the west and the rest is now closing and we have become an entirely new Converging world. There is a clear trend into the future and with aid, trade and green technology and It’s fully possible that everyone can make it to the healthy wealthy corner.
Data Visualisation is not merely a tool, its an art of storytelling. A story told with data can change the way we see the world, creating a conviction that may even call us to action. The goal should be to use data and present it in the form of a story that has a profound effect and put it in the hands of decision-makers who can affect outcomes.
Originally Published here