Creating custom image datasets for Deep Learning projects.



This weekend I created a simple fruit classifier for my preschool kid. It is a simple image classification app which predicts the fruit in the image. I presented it as a game for my son, to see who predicts the name first — Computer or Human :). Here is a quick preview of the app.

The Fruit Classifier App

For this app, I needed to download images of many fruits to train an image classifier. In the process, I discovered few browser extensions, which make it pretty easy to bulk download the images, and I have compiled and presented them in this article.

However, Before you begin using the extensions, there are two crucial things to keep in mind:

Copyright issues

Do not download any image that violates the copyright terms. Some times, you cannot reproduce copyright images without the owner’s permission. Images downloaded in this article are meant for educational purpose only.

Download Settings

Make sure the ‘Ask where to save each file before downloading’ option is not selected in your download settings, else, the downloader will ask your permission for every file that will be downloaded. Not desirable. The clip below demonstrates the process to access the option.

Download Settings

Let’s now look at some of the useful tools to download images easily:

1. Fatkun Batch Download Image

Fatkun Batch Download Image is a powerful and handy browser extension to download images from the web. Some of its capabilities are:

  • Possible to filter images based on resolution or link
  • Create Custom rules to download desired images, and
  • Ability to batch rename and bulk download images
🔗 Link to download the extension:

Usage

Let’s now download images of apple fruit since we want to create a fruit classification detector. Since it is easier to show than to write about the process, I have included a short video to show the download process step by step.

Fatkun Batch Download Image

2. Imageye — Image downloader

Imageye is another browser extension that allows you to download all images on a web page. Imageye also gives you the following capabilities:

  • filtering images based on pixel width and height. You can also filter images based on their url.
  • Like Fatkun, you can bulk download all the images at once or select manually the ones you want to download.
🔗 Link to download the extension:

Usage

Imageye — Image downloader demo

3. Download All Images

This Chrome extension downloads all images from a web page and packs them into a zip file. It cannot filter images based on their sizes but is excellent for batch download of images from sites like Unsplash, which only hosts images. It analyzes the current browser page to identify images and then download them into a single zip file. Start image download by clicking the extension icon in the top right corner. It will give you an estimate of how long it will take to finish.

🔗 Link to download the extension:

Usage


4. ImageAssistant Batch Image Downloader

ImageAssistant Batch Image Downloader is an image extractor for sniffing, analyzing and batch downloading images from the web page. It is pretty flexible and offers a lot of ways to customize the image download. For instance, you can either extract pictures on a webpage or prefetch image links or even batch extract URLs of the images. Additionally, there is a picture filter also which offers the option to filter the display of the picture type through the picture expansion type or the resolution size.

🔗 Link to download the extension:

Usage


5. The fastai way

The last method doesn’t use any browser extension. I picked up this method from Zachary Mueller’s Practical-Deep-Learning-for-Coders-2.0 resource, which he has shared on Github. This code has been given by Francisco Ingham and Jeremy Howard which in turn is inspired by Adrian Rosebrock

The method requires you to install fastai — a deep learning library as it utilizes some of its inherent functions. To understand what is happening under the hood, you would require some knowledge of the library, especially the use of the data block API. Explaining that it is out of the scope of this article, but I would quickly go through the steps required to download the images:

  • Go to Google Images and search for the images you are interested in. Scroll down until you find the images you want to download. Let’s say we are interested in finding images of apple and mango.
  • Open the Javascript ‘Console’ in Chrome/Firefox and paste the following lines of code and execute. This will get all the URLs of the images and save them in a CSV file. Repeat the process for every category. Now you will have two CSV files, i.e. apple.csv and mango.csv.
urls=Array.from(document.querySelectorAll('.rg_i')).map(el=> el.hasAttribute('data-src')?el.getAttribute('data-src'):el.getAttribute('data-iurl')); 

window.open('data:text/csv;charset=utf-8,' + escape(urls.join('\n')));
  • Next, create a folder for each category of images that you want to download.
folders = ['Apple','Mango']
files = ['apple.csv','mango.csv')
  • Finally, download the images
classes = ['Apple','Mango']
path = Path('fruits')path.mkdir(parents=True, exist_ok=True)
for i, n in enumerate(classes):
print(n)
path_f = Path(files[i])
download_images(path/n, path_f, max_pics=50)
  • Verify if the images are correct
imgs = L()
for n in classes:
   print(n)
   path_n = path/n
   imgs += verify_images(path_n.ls())

Display the images

fruits = DataBlock(blocks=(ImageBlock, CategoryBlock),
       get_items=get_image_files,
       splitter=RandomSplitter(0.2),
       get_y=parent_label,
       item_tfms=RandomResizedCrop(460),
       batch_tfms=[*aug_transforms(size=224,max_warp=0),Normalize.from_stats(*imagenet_stats)])   
dls = fruits.dataloaders(path,  bs=32)
dls.show_batch(max_n=9)
Downloaded images

Here is a video showing the entire process:


Conclusion

In this article, we saw various ways to gather image data for creating a deep learning model. You can either go for browser extensions or can also code to get the same results. Whichever method you choose, please be mindful of the restrictions and the copyright issues.


Other articles in the series

Categories: Deep LearningTags: ,

3 comments

  1. Crispy app in an innovative way to learning ,for your kid…Nice!

    Liked by 2 people

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: