“Getting information off the internet is like taking a drink from a firehose.” — Mitchell Kapor
The internet is flooded with data. However, getting that data in the right format is the real deal. It takes some amount of cleaning and filtering to bring the data into a format which is ideal for analysis. Last week I came across some pretty good tabular data sets on Wikipedia. As always, I copied the data and pasted on the excel sheet. For most parts, it worked well, but there were few instances when the typical copy-paste methodology failed miserably:
- When the table was typically long and spread across a complete webpage
- When the data wasn’t static, i.e., it updated after regular intervals. So every time, there was a change in the original dataset, I had to fetch the dataset again.
For situations like these, simply copy-pasting wasn’t enough. Scraping was another option, but I was looking for something quick and easy and which didn’t involve a ton of code. It was then that I came across a convenient function in Google Sheets called
IMPORTHTML, which is ideal for importing data from a table or list within an HTML page. In this article, I’ll describe the end to the end process of fetching tables( and lists) into google sheets.[Read More]