Cluster Analysis in Tableau

Learn how to cluster your data in Tableau easily

Image by Nicky ❕❣️ PLEASE STAY SAFE ❣️❕ from Pixabay

Consider a situation where you have some sales data belonging to your company. Let’s say you wanted to discover a pattern in terms of the consumers’ spending capacity. If you could uncover distinct groups or associations within the data, your company could target the different groups based on their preferences. The basic idea behind this intuition is called clustering, and tableau has an inherent feature that can automatically cluster similar data points based on certain attributes. In this article, we will explore this functionality of Tableau and see how we can apply the clustering method to some real-world data set.


What is Clustering?

Clustering, also known as cluster analysis, is an unsupervised machine learning algorithm that tends to group more similar items based on some similarity metric.

The figure below visualizes the working of the K -Means algorithm very intuitively. In K means clustering, the algorithm splits the dataset into k clusters where every cluster has a centroid, which is calculated as the mean value of all the points in that cluster. In the figure below, we start by randomly defining 4 centroid points. The K means algorithm then assigns each data point to its nearest cluster (cross). The centroid shifts to a new position as the mean value of all data points changes,. This entire process is then repeated until there is no further observable change in the centroids’ position.

 

Visualizing K-Means algorithm

Clustering in Tableau

Tableau also uses the K Means clustering algorithm under the hood. It uses the Calinski-Harabasz criterion to assess cluster quality. Here is the mathematical interpretation of the Calinski-Harabasz criterion :

 

source: https://onlinehelp.tableau.com/v10.0/pro/desktop/en-us/clustering_howitworks.html

Here SSB is the overall between-cluster variance, SSW the overall within-cluster variance, k is the number of clusters, and N the number of observations.

This ratio gives a value that determines the cohesiveness of the clusters. A higher value suggests that the clusters are more closely associated, having low within-cluster distance and high between-cluster distance.

Now that we have an idea as to what clustering is, it is time to look at how the same can be applied using tableau.


Using clustering to uncover patterns in the dataset

Clustering helps to uncover the patterns in the dataset. Suppose that you are an analyst with some Tourism company. As a company, it would be useful to understand the patterns in people’s traveling habits. You are interested to know which age group likes to travel more. Your work is to use the World Indicators sample data to identify the countries where there are enough of the right kind of customers.

Tableau Environment

In this tutorial, we will be working with Tableau Public, which is absolutely free. Download the Tableau Public edition from the official website. Follow the installation instructions, and if the following screen appears on clicking the tableau icon, you are good to go.

 

Tableau main screen | Image by Author

Connecting to the Dataset

The World Economic Indicators dataset consists of useful indicators driving the economies of the various countries of the world like life expectancy, ease of doing business, population, etc. The dataset has been obtained from the United Nations website. The dataset can be accessed from here.

  • Download the dataset on to your systems.
  • Import the Data into the Tableau workspace from the computer. Use the Data Interpreter, present under Sheets Tab, to rectify and realign the data.

 

Connecting to data source | Image by Author

Formatting the Data Source

In the worksheet, the columns from your data source are shown as fields on the left side of the Data pane. The Data pane contains of a variety of fields organized by the table. There are many features that can be clubbed together under a single category. This will also help to represent all the data fields better.

  • Select Business Tax Rate, Days to Start Business, Ease of Business, Hours to do Tax, and Lending Interest > Folders > Create Folder

 

Grouping fields in a folder | Image by Author
  • Name the folder as Business, and now all the above fields are included in this particular folder.

 

Folder view | Image by Author
  • Similarly, create three new folders — Development, Healthand Population , in the same way, as shown above. Add the following fields, respectively. This is how the Data Pane will look like after the formatting:

 

The final look | Image by Author
  • Double click Country in the Data pane. Tableau creates a map view with a filled circle representing each country. Change the mark type to Map, on the Marks card,

 

Displaying countries in the dataset | Image by Author

Identifying the variables for clustering

The next step in clustering is to identify the variables that will be used in the clustering algorithm. In tableau, the variables are akin to the fields. There is no single answer to the best variables that will give ideal clusters, but you can experiment with several variables to see the desired results. In our case, let’s work with the following fields:

  • Population Urban

Urban population is a good indicator of the population density in a country. Higher the density, more business opportunities become available.

  • Population 65+

Population greater than 65 signifies senior citizens. A lot of senior citizens tend to like traveling, so this could be a useful indicator.

  • Life Expectancy Female and Life Expectancy Male

Countries with a higher life expectancy signify that people there tend to live longer and be more interested in traveling.

  • Tourism Per Capita

This field doesn’t exist and can be created as a calculated field using Tourism Outbound and Population Total fields as follows:

Tourism Per Capita = SUM([Tourism Outbound])/SUM([Population Total])

 

Tourism per Capita | Image by Author

Tourism Outbound represents the money (in US dollars) that people spend annually on international travel. To get the average value, we will need to divide this field by the population of each country

Adding a selected field to the view

Before moving ahead, we need to change the default aggregation from SUM to AVERAGE. Tableau makes it possible to aggregate measures or dimensions, though aggregating measures are more common. Whenever we add a measure to the view, an aggregation is applied to that measure by default. The type of aggregation that needs to be used depends on the context of the view.

 

Adding a selected field to the view | Image by Author

Change the Aggregation for all the selected fields and then drag them on to the Detail on the Marks card as follows:

 

Changingaggregation from SUM to AVERAGE | Image by Author

Clustering

Clustering in Tableau is a simple drag and drop process. The following steps outline the clustering process:

  • Click on the Analytics Pane and drag Cluster onto the view, and the data is clustered by Tableau automatically. It is that simple.

 

Clustering in Tableau | Image by Author
  • Although Tableau can automatically decide the number of clusters to create, we can also control the number of clusters and what variables to compute it. Drag a field in the box to include it in the clustering algorithm or drag it out to exclude it.

 

Deciding the number of clusters | Image by Author
  • We shall go with 4 clusters and the default variables for better analysis. Note some countries did not fall in any cluster and have been marked as not clustered.

 

4 clusters and the default variables | Image by Author
  • The cluster is created as a new pill and can be seen on the color shelf. Drag this pill on to the Data Pane to be saved as a group.

 

Cluster field | Image y Author

So here, we have clustered the countries in relation to the chosen measures. But how do we make sense of these results, and how we make business decisions based on the clusters? The next section addresses these concerns.


Describing Clusters

Click on the Clusters field in the Marks card and click on the Describe Clusters option.

 

Describing various clusters | Image by Author

This displays a document that contains a detailed description of the clusters. There are two tabs in the document — Summary and Models:

1. Summary

This gives a summary of the results and the average values of each variable for every cluster.

 

Summary of results | Image by Author

From the above results, we can infer that Cluster 2 has :

  • Highest Average life expectancy for both males and females
  • Highest Total Tourism Per capita
  • Highest average urban population

This means it has a wealthy urban population with a larger life expectancy and seems to be a good market for the Senior Tourism Industry. Let us see which countries are included in this cluster.

 

Analysing countries in Cluster 2 | Image by Author

2. Models

The models’ tab displays the various statistical value for all the variables/fields’ average value and shows their statistical significance. You can read more about the cluster model statistics here.

 

Analysing the Model Tab |Image by Author

Thus as an analyst, you can present this list to the Sales Team to focus on these prospective clients. Clustering has provided us with some great insights. From here, you can experiment with different fields, set a threshold for population or Income, etc. There are many ways to cluster the data, but the basic principle stays the same.


Conclusion

In this article, we learned how to perform cluster analysis on a given dataset in Tableau with a simple drag and drop mechanism. Clustering is an essential tool and, when coupled with Tableau, gives the power of a statistical analysis technique in analysts’ hands.


References and for further study

Find Clusters in Data — A self guide by Tableau which goes deeper into the concept of cluster analysis.


Originally published here

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s