Reviewing the recently released HuggingFace 🤗 Course
Originally published here
Massive Open Online Courses (MOOCs) are an indispensable part of the life of a self-taught data scientist. If you are in a room full of wanna-be data scientists, the chances are that fifty percent of them have taken the famous Machine Learning course by Andrew Ng. However, here is the twist. Even though many of us get enrolled in various online courses, only a handful complete them. In fact, a study titled Why MOOCs Didn’t Work, in 3 Data Points, claims that the completion and retention rates of online courses are minimal. While some may argue that it is the students who have to be motivated enough to finish a course, the onus also falls on the content creators.
I have interacted with a lot of people and taken their feedback on delivering content. If a course checks the above five points, I believe it will make for a great learning experience.
So why this sudden deep dive into online courses? This is because recently, the team at Hugging Face 🤗 released their free course on NLP with Hugging Face libraries. This course will give access to many people to understand not only their libraries but also how to accomplish state-of-the-art tasks in NLP. Hugging Face is a pretty well-known name in the Natural Language processing ecosystem. Apart from having a cool logo, they are also credited with democratizing the NLP sector significantly.
In this article, we’ll walk through and get a tour of the course. Then, we’ll look at the course content, its offerings, and whether or not it ticks the right boxes for us. So let’s get started.
What is the course about ❓
This course is focused on teaching the ins and outs of NLP using the HuggingFace ecosystem. Even though the course is aimed at beginners, it will be helpful for intermediates as well as experts in some way. The main objective of the course is to highlight the inner workings and usage of the four important Hugging Face libraries:
- Transformers is a library that provides thousands of pre-trained models like BERT, GPT-2, RoBERTa, XLM, DistilBert, XLNet, etc., to perform tasks on texts such as classification, information extraction, question answering, summarization, translation, text generation, and more.
- Tokenizers convert text inputs to numerical data.
- Datasets is a lightweight and extensible library to easily share and access datasets and evaluation metrics for Natural Language Processing (NLP).
- Accelerate library enables distributed training of Pytorch models on multiple GPUs or TPUs, with just a few adjustments
In addition to this, the course will also teach you how to use the Hugging Face Hub. The entire course is in the form of short video snippets coupled with explanations in text and reusable code.
What are the pre-requisites❓
The course has a few pre-requisites so that you can make the most out of it. It requires you to have a sound understanding of Python and some level of basic deep learning knowledge. Additionally, having some experience in either Pytorch or Tensorflow will be helpful.
What does the course comprise of ❓
The course is divided into three major modules, and each module is further divided into chapters or subsections. The modules get advanced as you progress. The main modules are as follows:
Currently, only the first module has been released. The rest of the two modules will be made available in the coming months.
The first module introduces the concept of the Transformers library and how to use it. Additionally, it also teaches how to use a model from the Hugging Face Hub, fine-tune it on a dataset, and then share the results on the Hub.
The module is further divided into four chapters:
- Chapter 1
This chapter introduces NLP and why text processing poses a challenge to machine learning practitioners. Then it explains the concept of pipeline — the most fundamental object in the 🤗 Transformers library.
It connects a model with its necessary preprocessing and postprocessing steps, allowing us to directly input any text and get an intelligible answer — source[HuggingFace course]
You can apply the pipeline method to several NLP tasks such as text generation, text classification, question answering, and many others. You will then dive deeper into the architecture and working of a Transformer model along with Encoders, Decoders, and Sequence-to-sequence models. Finally, you will learn about encoders, decoders, and encoder-decoder models.
- Chapter 2
Now that you know what transformers are and how pipelines work, you will see how it all works under the hood. You’ll be using Transformer models and tokenizers to replicate the
pipeline API’s behavior. You’ll also learn about Tokenizers and how they can convert text to inputs a model can understand.
- Chapter 3
Things start getting a little advanced now. This chapter focuses on pre-training NLP models for custom datasets. You’ll learn how to make use of the high-level
Trainer API to fine-tune a model and then use the 🤗 Accelerate library to train your Pytorch models on multiple GPUs and TPUs.
- Chapter 4
The final chapter focuses on the community aspect of the Hugging Face Ecosystem. It will teach you how to navigate the Model Hub so that you can not only use the models trained by the community but also contribute your own.
The second module dives into Huggingface Datasets and Tokenizers. Once you have a decent understanding of the first and second modules, you will be in a position to apply the learnings to tackle the most common NLP tasks.
If you wanted to learn how to write custom objects for specific use cases or understand the specialized architectures, this module would not disappoint you. By the end of this module, you should understand the HuggingFace ecosystem and solve the complex NLP problems in a meaningful way.
The article will be updated once the chapters of Module 2 and 3 are also released
Which Frameworks are used in the course❓
The course is available in both Pytorch and Tensorflow. So if you are comfortable in either of the two libraries, you should be good to go. Furthermore, the course can be easily followed along with Google Colab notebook.
Bonus Quizzes 🏅
Supplementary quizzes are provided at the end of each chapter to test your understanding. However, a great way to understand if you have genuinely grasped the material will be to utilize the learnings in a project of your own or by collaborating with the community.
Final Remarks 🌟
So, getting back to our question of whether this MOOC ticks the right boxes? Definitely. The course content is meaningful, interesting and was something that was required for a long time. Each module has been carefully curated to go from easy to advance systemically. This prevents beginners from losing interest right at the start. The course also encourages participants to experiment with stuff like using their own examples or data, which is a great idea. MOOCs suffer from a lack of interactiveness, and including quizzes and ‘try yourself’ exercises can help overcome this.
👉 Interested in reading other articles authored by me. This repo contains all the articles written by me category-wise.