Beware of the Dummy variable trap in pandas

Important caveats to be kept in mind when encoding data with pandas.get_dummies() Handling categorical variables forms an essential component of a machine learning pipeline. While machine learning algorithms can naturally handle the numerical variables, the same is not valid for their categorical counterparts. Although there are algorithms like LightGBM and Catboost that can inherently handle the categorical variables, it is … Continue reading Beware of the Dummy variable trap in pandas

A hands-on guide to ‘sorting’ dataframes in Pandas

My tryst with the pandas’ library continues. Of late, I have been trying to look deeper into this library and consolidating some of the pandas’ features in byte-sized articles. I have written articles on reducing memory usage while working with pandas, converting XML files into a pandas dataframe easily, getting started with time series in pandas, and many more. In this article, … Continue reading A hands-on guide to ‘sorting’ dataframes in Pandas