Getting Datasets for Data Analysis tasks — Advanced Google Search

“Data! Data! Data!” he cried impatiently. “I cannot make bricks without clay.”

Sherlock Holmes in “The Adventure of the Copper Beeches,” Sir ArthuR Conan Doyle

The importance of data cannot be emphasised enough in a data science process. The outcomes of a data analysis task is a representation of the kind of data that has been fed into it. However, sometimes getting the data in itself is also a big pain point. Recently, I did a short course titled Data Journalism and Visualization with Free Tools and there were some great resources shared through that course. I’ll be sharing some of the useful tips through a set of articles. In these articles, I’ll try to highlight some of the ways by which you can find data on the internet for free and then use it to create something meaningful out of it.

Advanced Google Search

Let’s begin with advanced Google search which is one of the most common ways to get access to publicly available datasets. By merely typing the name of the required dataset in the search bar, we can get access to a plethora of resources. However, here is a simple trick which could ease this process to a great extent and help you find files with specific types on the internet.

1. Using Filename and extension of the file to be downloaded

Let’s say we have a task at hand to find healthcare-related data in CSV format. A CSV file indicates a comma-separated values file, which allows data to be saved in a tabular format. To get such files, go to the Google search bar and type the following:

filetype < the extension of the file to be downloaded>: <category of data> data

Google will list the links which closely matches the search results. Most of the times this will be a direct link to the specific files on the sites which can then be downloaded on to the local system and analysed later.


2. Using Filename, extension and site name.

If you want to narrow down your search further, then this option will come in handy. , mentioning the file name will point to a lot of files. However, if you want to find data on a specific website, you can mention it too in the search bar, as follows:

filetype < the extension of the file to be downloaded> : site <website> <category of data>

filetype xlsx: who.int health

All the results will now pertain to only WHO, and this helps to narrow down the search results considerably.


Files compatible with the search command

So what different kinds of files are compatible with the search command. This information can be accessed easily through the settings on the homepage as follows:

  • Click Settings > Advanced Search
  • Scroll Down to the file type option and look for the available types. You’ll see there are a lot of options including pdf and ppt filetypes also.

Hopefully, the above tips will make you find your desired datasets faster and more efficiently. In the next article, I will share some other sources, including Google Datasets Search tool and other useful sites to download datasets.


Other articles in the series

4 Comments »

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s