Getting Datasets for Data Analysis tasks — Advanced Google Search
The importance of data cannot be emphasised enough in a data science process. The outcomes of a data analysis task is a representation of the kind of data that has been fed into it. However, sometimes getting the data in itself is also a big pain point. Recently, I did a short course titled Data Journalism and Visualization with Free Tools and there were some great resources shared through that course. I’ll be sharing some of the useful tips through a set of articles. In these articles, I’ll try to highlight some of the ways by which you can find data on the internet for free and then use it to create something meaningful out of it.
Advanced Google Search
Let’s begin with advanced Google search which is one of the most common ways to get access to publicly available datasets. By merely typing the name of the required dataset in the search bar, we can get access to a plethora of resources. However, here is a simple trick which could ease this process to a great extent and help you find files with specific types on the internet.
1. Using Filename and extension of the file to be downloaded
Let’s say we have a task at hand to find healthcare-related data in CSV format. A CSV file indicates a comma-separated values file, which allows data to be saved in a tabular format. To get such files, go to the Google search bar and type the following:
filetype < the extension of the file to be downloaded>: <category of data> data
Google will list the links which closely matches the search results. Most of the times this will be a direct link to the specific files on the sites which can then be downloaded on to the local system and analysed later.
2. Using Filename, extension and site name.
If you want to narrow down your search further, then this option will come in handy. , mentioning the file name will point to a lot of files. However, if you want to find data on a specific website, you can mention it too in the search bar, as follows:
filetype < the extension of the file to be downloaded> : site <website> <category of data> filetype xlsx: who.int health
All the results will now pertain to only WHO, and this helps to narrow down the search results considerably.
Files compatible with the search command
So what different kinds of files are compatible with the search command. This information can be accessed easily through the settings on the homepage as follows:
- Scroll Down to the
file typeoption and look for the available types. You’ll see there are a lot of options including pdf and ppt filetypes also.
Hopefully, the above tips will make you find your desired datasets faster and more efficiently. In the next article, I will share some other sources, including Google Datasets Search tool and other useful sites to download datasets.