The best way of learning anything is by doing.
What do you do after you have completed hundreds of MOOCs, consumed thousands of books and notes and listened to a million people rant about their experience in Data Science? You start applying the concepts. The only way to apply machine learning concepts is by getting your hands dirty. Either find some real-world problems in your area of interest or participate in Hackathons and Machine learning Competitions.
Competitive Data Science is not all about applying algorithms. An algorithm is essentially a tool and anybody can use it by writing just a few lines of code. The main takeaway from participating in these competitions is that they provide a great opportunity for learning. Of course, the real-life problems are not necessarily the same as the ones provided in the competitions, still, these platforms enable you to apply your knowledge to processes and see how you fare in comparison to others.
Advantages of participating in Data Science Competitions
You have a lot to gain and practically nothing to lose by participating in these competitions. It has both tangible and intangible benefits like:
- Great opportunity for learning.
- Getting exposed to state of the art approaches and datasets.
- Networking with like-minded people. Working in teams is even great since it helps to think over a problem from different perspectives.
- Showcasing your talent to the world and a chance of getting recruited
- It is also fun to participate and see how you fare on the leaderboard.
- The prize is an added bonus but shouldn’t be the sole criteria.
Kaggle is a well-known platform for Data Science competitions. It is an online community of more than 1,000,00 registered users consisting of both novice and experts. However, apart from Kaggle, there are other Data Mining Competition Platforms worth knowing and exploring. Here is a brief overview of some of them.
Driven Data
DrivenData hosts data science competitions to build a better world, bringing cutting-edge predictive models to organizations tackling the world’s toughest problems. Driven data hosts data science competitions for social good in areas like international development, health, education, research and conservation, and public services. You can either join a competition or host one of your own.
The site has a section dedicated to Sample Projects which provides information about some of their successful projects in the form of a case study. The datasets listed in Driven Data are related to Non-Profits ranging from wildlife preservation to public health. Thus, if you want to apply your skills to real-world problems, this is the platform for you.
CrowdANALYTIX
CrowdANALYTIX is a crowdsourced analytics platform that converts business challenges and problems into competitions. The CrowdANALYTIX Community collaborates & competes to build & optimize AI, ML, NLP and Deep Learning algorithms. The platform also hosts a community blog that has great resources including interviews and reference materials.
Innocentive
InnoCentive mainly focuses on problems dealing with life sciences but has other interesting competitions too. Here Solvers contribute towards tackling some of the world’s most pressing problems, from facilitating access to clean water at a household level to passive solar devices designed to attract & kill malaria-carrying mosquitos. Challenges are real problems requiring sustained concentration, critical thinking, research, creativity, and synthesis of knowledge. Developing a solution is incredibly rewarding and an unparalleled mental workout.
TunedIT
TunedIT started as a scientific doctoral project carried at the University of Warsaw. The goal was to help data mining scientists conduct repeatable experiments and easily evaluate data-driven algorithms. The Research part was supplemented later on with TunedIT Challenges platform for hosting data competitions — for educational, scientific and business purposes.
Codalab
Codalab is is an open-source web-based platform that enables researchers, developers, and data scientists to collaborate, with the goal of advancing research fields where machine learning and advanced computation is used. CodaLab helps to solve many common problems in the arena of data-oriented research through its online community where people can share worksheets and participate in competitions. You can either participate in an existing competition or host a new competition.
ZINDI
Zindi is the first data science competition platform in Africa. Zindi hosts an entire data science ecosystem of scientists, engineers, academics, companies, NGOs, governments, and institutions focused on solving Africa’s most pressing problems.
Analytics Vidhya
Analytics Vidhya provides a community-based knowledge portal for Analytics and Data Science professionals. In addition to providing great resources for Data Science learnings, it also hosts Hackathons which are Real-life industry problems being released in the form of contests. You can either participate in the challenges or sponsor a hackathon. Most companies that organize Hackathons on Analytics Vidhya also offer job opportunities to the top scorers.
CrowdAI
The data science challenge platform crowdAI hosts multiple open data science challenges each year. The challenges cover problems in image classification, text recognition, reinforcement learning, adversarial attacks, image segmentation, resource allocation optimization, and many other areas across multiple domains. They were awarded over $100,000 from Amazon and Nvidia for their 2017 challenge called “Learning to Run”.
Update 17th march 2020:
CrowdAI, which has since shut down (see https://www.crowdai.org/blogs/7). Since then, one of their founders has created this other site with a similar name, which does still hosts competitions: https://www.aicrowd.com/. Thanks to Harald Carlens for providing this information.
Numerai
Numerai is an AI-run, crowd-sourced hedge fund built by a network of data scientists. It holds a data science competition every week that powers a real hedge fund. Numerai provides encrypted data every week to its participants who then submit their predictions. Numerai then creates a meta-model from all its submissions and makes investments.
The data scientists submit their predictions in exchange for the potential to earn some Numeraire, a cryptographic token on the Ethereum blockchain.
Tianchi
Tianchi is a data competition platform by Alibaba Cloud and resembles Kaggle in many ways. It is a community where hundreds of thousands of data scientists cooperate with each other and connect with businesses and governments globally to solve the hardest business problems across industries.
DataScienceChallenge
These Data Science Challenges are sponsored by the Defence Science and Technology Laboratory (Dstl) as well as a number of other UK government departments including Government Office for Science, SIS and MI5 The challenges are designed to encourage the brightest minds in data science to help solve real-world problems. The two challenges offered by the platform are over as of now but they will soon come out with new problems that will challenge you to find unorthodox answers to real-world problems.
Apart from this, there are some competitions which are only held annually.
KDD Cup
KDD Cup is the annual Data Mining and Knowledge Discovery competition organized by ACM Special Interest Group on Knowledge Discovery and Data Mining, the leading professional organization of data miners. KDD-2019 will take place in Anchorage, Alaska, the US from August 4–8, 2019.
VizDoom AI competition(VDAIC)
ViZDoom is a Doom-based AI Research Platform for Reinforcement Learning from Raw VisualInformation. The participants of the Visual Doom AI competition are supposed to submit a controller (C++, Python, or Java) that plays Doom.
Machine Learning Contests
Machine Learning Contests is a data science competition aggregator site. It lists ongoing machine learning competitions/data science contests across Kaggle, DrivenData, AICrowd, and others. It’s all open source and community maintained.
Conclusion
Although this list will change over time, I believe you will find the competition which is most relevant and interesting to you. If you think there are other data science competition platforms that I haven’t mentioned, please put them in the comment section below.