Important caveats to be kept in mind when encoding data with pandas.get_dummies() Handling categorical variables forms an essential component of a machine learning pipeline. While machine learning algorithms can naturally handle the numerical variables, the same is not valid for their categorical counterparts. Although there are algorithms like LightGBM and Catboost that can inherently handle the categorical variables, it is … Continue reading Beware of the Dummy variable trap in pandas
A tutorial on creating Plotly and Bokeh plots directly with Pandas plotting syntax Data exploration is by far one of the most important aspects of any data analysis task. The initial probing and preliminary checks that we perform, using the vast catalog of visualization tools, give us actionable insights into the nature of data. However, the … Continue reading Get Interactive plots directly with pandas.
A deep dive into some of the parameters of the read_csv function in pandas Pandas is one of the most widely used libraries in the Data Science ecosystem. This versatile library gives us tools to read, explore and manipulate data in Python. The primary tool used for data import in pandas is read_csv().This function accepts the file path of a … Continue reading There is more to ‘pandas.read_csv()’ than meets the eye
My tryst with the pandas’ library continues. Of late, I have been trying to look deeper into this library and consolidating some of the pandas’ features in byte-sized articles. I have written articles on reducing memory usage while working with pandas, converting XML files into a pandas dataframe easily, getting started with time series in pandas, and many more. In this article, … Continue reading A hands-on guide to ‘sorting’ dataframes in Pandas
Optimizing pandas memory usage by the effective use of datatypes Managing large datasets with pandas is a pretty common issue. As a result, a lot of libraries and tools have been developed to ease that pain. Take, for instance, the pydatatable library mentioned below. Using Python’s datatable library seamlessly on Kaggle Despite this, there are … Continue reading Reducing memory usage in pandas with smaller datatypes