Speed up Inference of your scikit-learn models
This article was originally published here.
Deep learning frameworks consist of tensors as their basic computational unit. As a result, they can utilize the hardware accelerators (e.g., GPUs), thereby speeding up the model training and inference. However, the traditional machine learning libraries like scikit-learn are developed to run on CPUs and have no notion of tensors. As a result, they cannot take advantage of GPUs and hence miss out on the potential accelerations that deep learning libraries enjoy.
In this article, we’ll learn about a library called Hummingbird, created to bridge this gap. Hummingbird speedups up the inferencing in traditional machine learning models by converting them into tensor-based models. This enables us to use models like scikit-learn’s decision trees and random forest even on GPUs and take advantage of the hardware capabilities.
What is Hummingbird?
As mentioned above, Hummingbird is a library for accelerating inference in traditional machine learning models. Hummingbird achieves this by compiling these traditional machine learning pipelines into tensor computations. This means you can take advantage of hardware acceleration like GPUs and TPUs, even for traditional machine learning models, without re-engineering the models.
This is beneficial in several aspects. With the help of Hummingbird, users can benefit from:
- the optimizations implemented in neural network frameworks;
- native hardware acceleration;
- having a single platform to support both traditional and neural network models;
Apart from the advantages above, Hummingbird also offers many convenient features, some of which are listed below.
1️⃣ . Convenient uniform “inference” API
Hummingbird provides a convenient uniform “inference” API that closely mimics the sklearn API. This allows swapping sklearn models with Hummingbird-generated ones without having to change the inference code.
2️⃣. Support for major models and featurizers.
This current release of Hummingbird currently supports the following operators:
3️⃣. Conversion capabilities
The main focus of the Hummingbird library is to speed up the inference of the traditional machine learning models. There’s a lot of specialized systems that have been developed, such as ONNX Runtime, TensorRT, and TVM. However, a lot of these systems focus on Deep Learning. The issue with traditional models is that they are expressed using imperative code in an ad hoc way. Let’s understand it via some visual representation.
Traditional models is that they are expressed using imperative code in an ad hoc way
Let’s think of a data frame containing four columns, out of which two are categorical and the rest two numerical. These are fed into a machine learning model, say logistic regression, to identify whether they belong to
class 0 or
class 1. This is a classic case of a binary classification problem. If we look under the hood, we have a DAG or directed acyclic graph of operators called a pipeline. The pipeline consists of featurizers that preprocess the data and then feed it into a predictor, which will output the prediction.
This is just a simple representation of what a traditional model might look like. Across all of the traditional ML frameworks, there are hundreds and hundreds of these featurizers and predictors. As a result, it becomes difficult to represent them in a way that makes sense across all different frameworks.
Deep learning models are expressed as DAG of Tensor Operations
On the other hand, we primarily rely on tensors’ abstraction in deep learning, which is just a multi-dimensional matrix. Deep learning models are also expressed as a DAG but focused explicitly on tensor operators. In the diagram below, we have very generic matrix operations that can be easily represented across a wide variety of systems.
Deep Learning Prediction Serving systems can capitalize on these tensor operations and exploit this abstraction to work across many different target environments.
Hummingbird converts the traditional pipelines into tensor operations by reconfiguring algorithmic operators. The following example from their official blog explains one of Hummingbird’s strategies for translating a decision tree into tensors involving GEMM (GEneric Matrix Multiplication).
Hummingbird’s syntax is very intuitive and minimal. To run your traditional ML model on DNN frameworks, you only need to
import hummingbird.ml and add
convert(model, 'dnn_framework') to your code. Below is an example using a scikit-learn random forest model and PyTorch as the target framework.
Hummingbird is a promising library and working on a core problem in the machine learning space. Giving the ability to users to seamlessly transition from CPU to GPU and take advantage of the hardware accelerator to speed up inference will help to direct focus on the problems rather than the code. If you want to go further, make sure to look at the resources below. This article builds upon these references, and you’ll also find them helpful if you decide to go deeper into the underlying explanations of the library.