Creating custom image datasets for Deep Learning projects.
This weekend I created a simple fruit classifier for my preschool kid. It is a simple image classification app which predicts the fruit in the image. I presented it as a game for my son, to see who predicts the name first — Computer or Human :). Here is a quick preview of the app.
For this app, I needed to download images of many fruits to train an image classifier. In the process, I discovered few browser extensions, which make it pretty easy to bulk download the images, and I have compiled and presented them in this article.
However, Before you begin using the extensions, there are two crucial things to keep in mind:
Do not download any image that violates the copyright terms. Some times, you cannot reproduce copyright images without the owner’s permission. Images downloaded in this article are meant for educational purpose only.
Make sure the ‘Ask where to save each file before downloading’ option is not selected in your download settings, else, the downloader will ask your permission for every file that will be downloaded. Not desirable. The clip below demonstrates the process to access the option.
Let’s now look at some of the useful tools to download images easily:
1. Fatkun Batch Download Image
Fatkun Batch Download Image is a powerful and handy browser extension to download images from the web. Some of its capabilities are:
- Possible to filter images based on resolution or link
- Create Custom rules to download desired images, and
- Ability to batch rename and bulk download images
Let’s now download images of apple fruit since we want to create a fruit classification detector. Since it is easier to show than to write about the process, I have included a short video to show the download process step by step.
2. Imageye — Image downloader
Imageye is another browser extension that allows you to download all images on a web page. Imageye also gives you the following capabilities:
- filtering images based on pixel width and height. You can also filter images based on their url.
- Like Fatkun, you can bulk download all the images at once or select manually the ones you want to download.
3. Download All Images
This Chrome extension downloads all images from a web page and packs them into a zip file. It cannot filter images based on their sizes but is excellent for batch download of images from sites like Unsplash, which only hosts images. It analyzes the current browser page to identify images and then download them into a single zip file. Start image download by clicking the extension icon in the top right corner. It will give you an estimate of how long it will take to finish.
4. ImageAssistant Batch Image Downloader
ImageAssistant Batch Image Downloader is an image extractor for sniffing, analyzing and batch downloading images from the web page. It is pretty flexible and offers a lot of ways to customize the image download. For instance, you can either extract pictures on a webpage or prefetch image links or even batch extract URLs of the images. Additionally, there is a picture filter also which offers the option to filter the display of the picture type through the picture expansion type or the resolution size.
5. The fastai way
The last method doesn’t use any browser extension. I picked up this method from Zachary Mueller’s Practical-Deep-Learning-for-Coders-2.0 resource, which he has shared on Github. This code has been given by Francisco Ingham and Jeremy Howard which in turn is inspired by Adrian Rosebrock
The method requires you to install fastai — a deep learning library as it utilizes some of its inherent functions. To understand what is happening under the hood, you would require some knowledge of the library, especially the use of the data block API. Explaining that it is out of the scope of this article, but I would quickly go through the steps required to download the images:
- Go to Google Images and search for the images you are interested in. Scroll down until you find the images you want to download. Let’s say we are interested in finding images of apple and mango.
urls=Array.from(document.querySelectorAll('.rg_i')).map(el=> el.hasAttribute('data-src')?el.getAttribute('data-src'):el.getAttribute('data-iurl')); window.open('data:text/csv;charset=utf-8,' + escape(urls.join('\n')));
- Next, create a folder for each category of images that you want to download.
folders = ['Apple','Mango']
files = ['apple.csv','mango.csv')
- Finally, download the images
classes = ['Apple','Mango']
path = Path('fruits')path.mkdir(parents=True, exist_ok=True)
for i, n in enumerate(classes):
path_f = Path(files[i])
download_images(path/n, path_f, max_pics=50)
- Verify if the images are correct
imgs = L() for n in classes: print(n) path_n = path/n imgs += verify_images(path_n.ls())
Display the images
fruits = DataBlock(blocks=(ImageBlock, CategoryBlock), get_items=get_image_files, splitter=RandomSplitter(0.2), get_y=parent_label, item_tfms=RandomResizedCrop(460), batch_tfms=[*aug_transforms(size=224,max_warp=0),Normalize.from_stats(*imagenet_stats)]) dls = fruits.dataloaders(path, bs=32) dls.show_batch(max_n=9)
Here is a video showing the entire process:
In this article, we saw various ways to gather image data for creating a deep learning model. You can either go for browser extensions or can also code to get the same results. Whichever method you choose, please be mindful of the restrictions and the copyright issues.