The curious case of Simpson’s Paradox

Statistical tests and analysis can be confounded by a simple misunderstanding of the data Statistics rarely offers a single “right”way of doing anything — Charles Wheelan in Naked Statistics In 1996, Appleton, French, and Vanderpump conducted an experiment to study the effect of smoking on a sample of people. The study was conducted over twenty years and included 1314 … Continue reading The curious case of Simpson’s Paradox

Shapley summary plots: the latest addition to the H2O.ai’s Explainability arsenal

Originally published at https://www.h2o.ai on April 21, 2021. It is impossible to deploy successful AI models without taking into account or analyzing the risk element involved. Model overfitting, perpetuating historical human bias, and data drift are some of the concerns that need to be taken care of before putting the models into production. At H2O.ai, Machine Learning … Continue reading Shapley summary plots: the latest addition to the H2O.ai’s Explainability arsenal

There is more to ‘pandas.read_csv()’ than meets the eye

A deep dive into some of the parameters of the read_csv function in pandas Pandas is one of the most widely used libraries in the Data Science ecosystem. This versatile library gives us tools to read, explore and manipulate data in Python. The primary tool used for data import in pandas is read_csv().This function accepts the file path of a … Continue reading There is more to ‘pandas.read_csv()’ than meets the eye

H2O AI Hybrid Cloud: Democratizing AI for every person and every organization

Harnessing the true potential of AI by enabling every employee, customer, and citizen with sophisticated AI technology and easy-to-use AI applications. Democratization is an essential step in the development of AI, and AutoML technologies lie at the heart of it. AutoML tools have played a pivotal role in transforming the way we consume and understand … Continue reading H2O AI Hybrid Cloud: Democratizing AI for every person and every organization

A hands-on guide to ‘sorting’ dataframes in Pandas

My tryst with the pandas’ library continues. Of late, I have been trying to look deeper into this library and consolidating some of the pandas’ features in byte-sized articles. I have written articles on reducing memory usage while working with pandas, converting XML files into a pandas dataframe easily, getting started with time series in pandas, and many more. In this article, … Continue reading A hands-on guide to ‘sorting’ dataframes in Pandas

Useful pip commands for Data Science

A look at the most used package management system in Python An in-depth article was published in the February of 2020 by Sebastian Raschka et al. that studies the role and importance of Python in the Machine Learning ecosystem. The paper titled Machine Learning in Python: Main Developments and Technology Trends in Data Science, Machine Learning, and Artificial Intelligenceput … Continue reading Useful pip commands for Data Science