In conversation with Fatih Öztürk: A Data Scientist and a Kaggle Competition Grandmaster.
In this series of interviews, I present the stories of established Data Scientists and Kaggle Grandmasters at H2O.ai, who share their journey, inspirations, and accomplishments. These interviews are intended to motivate and encourage others who want to understand what it takes to be a Kaggle Grandmaster.
In this interview, I shall be sharing my interaction with Fatih Öztürk. He is a Kaggle Competitions’ Grandmaster and a Data Scientist at H2O.ai. Fatih obtained a Bachelor’s in industrial engineering with honors at the Boğaziçi University, Istanbul. He worked as a Data Scientist at UrbanStat before joining H2O.ai. Fatih joined Kaggle almost four years ago and has won seven golds, including a solo one. He also holds the Master status in the discussion tier as well.
In this interview, we shall know more about his academic background, his passion for Kaggle, and his work as a Data Scientist. Here is an excerpt from my conversation with Fatih:
You have a background in Industrial Engineering. What prompted you to choose Data Science as a career?
Fatih: My primary focus in Industrial Engineering was on Operations Research(OR), Supply Chains, and Statistics. Apart from these main courses, we also had the option to choose specific electives based on our interests. In my last semester, I took “Data Mining’’ as one of my elective courses. One of the reasons for opting for this choice was its popularity. While studying data mining, it was for the first time that I came across concepts like the random forest, classification, predicting things, etc. I found it pretty interesting and analogous to playing some competitive game. I realized that my passion lay in the field of data analysis, and I instantly knew what field I had to pursue after my graduation.
How did your tryst with Kaggle begin, and what kept you motivated throughout your GrandMaster’s journey?
Fatih: My first job was as a Junior Data Scientist in a tech-startup. I was the only data scientist there, and we were working only for insurance-related companies there. A few months after joining the company, my boss found out about the Porto Seguro competition on kaggle, and he asked me if I could look at it since it was an insurance use-case. I was pleased about what I found out in that competition because I saw that people were sharing a lot. So during that competition, I realized two main things:
- My learning rate was much higher when I was around kernels and discussions.
- My competitive side was triggered, and I learned that I liked competing a lot.
Competing and learning on kaggle go hand in hand. It is my primary motivation for participating in any competition. Being a Master or a GrandMaster is just a natural result of this process.
Can you tell us a little about your favorite Kaggle competition?
Fatih: I liked the Home Credit Default Risk competition. The datasets were not fully anonymized, and hence there was a lot of room for feature engineering. Trying to understand the domain of the competition and then being able to generate useful features was fun. Moreover, our team had a good validation strategy that turned out to be very successful for the private leaderboard in the end. We went from 29th place on the public leaderboard to 10th on the private one.
How do you typically approach a Kaggle problem?
Fatih: For any competition, my first attempt is always to have a reliable validation scheme on my side. Having a well correlated CV-LB relation is everything. So how to achieve this? It mostly depends on the right exploratory data analysis(EDA). Figuring out how the test set differs from the train set (if so) and then mimicking this in your validation scheme is a good starting point. Besides doing EDA with plots and numbers, I also check adversarial validation scores in this regard.
After having a good validation strategy, I focus on finding useful things that are not shared on the public forum because having different tricks is crucial to land a good rank at the end.
For any competition, my first attempt is always to have a reliable validation scheme on my side. Having a well correlated CV-LB relation is everything
Could you give us a sneak peek into your toolkit like a favorite programming language, IDE, Algorithms, etc
Fatih: I use Python and, most of the time, work with JupyterLab. I also have a Google Colab pro account to get access to GPUs since I don’t have a local one. I find it is a good investment since we have limited GPU hours per week on Kaggle notebooks.
My favorite modeling algorithm is Lightgbm. I still think that it is a very efficient and production-friendly algorithm given how easy it is to tune and how fast it can get sufficiently good scores.
You regularly speak up in meetup events. How is the data science landscape in and around Turkey?
Fatih: I find people’s interest in data science quite noteworthy in Turkey, and it’s increasing every day. More and more students are choosing Computer Science as their major over other engineering majors. The main reason for this popularity is the overall adoption of data science in every industry.
The number of Turkish people that I encounter in kaggle competitions is also growing quite fast. This is heartwarming since this was not the case a few years ago. A similar situation is reflected in the meetup community as well. There has also been a rapid rise in both the number of the events and the students involved. Recently, a lot of Turkish companies have started hosting in-class competitions on Kaggle.
As a Data Scientist at H2O.ai, what are your roles, and in which specific areas do you work?
Fatih: I’m involved in POCs and other customer-related projects to help them benefit more from Driverless AI. Besides, I develop new apps via the Wave framework and testing Driverless AI with new datasets.
The Data Science domain is rapidly evolving. How do you manage to keep up with all the latest developments?
Fatih: I think social networks are the key to this. It’s almost impossible to remain up to date just by yourself. However, if you are in the right Slack channels and have a meaningful LinkedIn feed, it’s easier to follow the news. . Apart from this, joining kaggle competitions and regularly following the threads in competition forums is another useful resource.
How do you plan to spend your time on kaggle in 2021? Any special milestones you want to achieve?
Fatih: I want to join Computer Vision competitions in 2021. I’d be delighted to be placed in the top 50 as a solo competitor in one of these competitions. A gold medal as a team would also be fantastic, of course. 😃
A word of advice for the Data Science aspirants who have just started or wish to start their Data Science journey?
Fatih: I would suggest not worry too much about questions like — where to start, which courses to take, which tools to learn etc. Instead of dealing with all these questions initially, it is advisable to directly jump into a data science project or a competition and learn from others’ code. This is the way I improved myself by getting my hands dirty early on. Analyzing other peoples’ code and asking questions like — What does this code snippet do here? Why did the author code like this? How does it help in this project/competition? etc were some of the ways which allowed me to hone my skills. The next task is to answer these questions then. One could either search for the answers on the internet or make use of the discussion forums.
Fatih’s Kaggle’s achievements reflect his passion for problem-solving and his constant penchant for hard work. How he transitioned from industrial engineering into Data science and then went to achieve the title of a Kaggle GrandMaster in a span of two years is commendable.
Read other interviews in this series:
- Rohan Rao: A Data Scientist’s journey from Sudoku to Kaggle
- Shivam Bansal: The Data Scientist who rules the ‘Data Science for Good’ competitions on Kaggle.
- Meet Yauhen: The first and the only Kaggle Grandmaster from Belarus.
- Sudalai Rajkumar: How a passion for numbers turned this Mechanical Engineer into a Kaggle Grandmaster
- Gabor Fodor: The inspiring journey of the ‘Beluga’ of Kaggle World 🐋
- Meet the Data Scientist who cannot stop winning on Kaggle
Originally published here