An analysis of the 2019 Kaggle ML and DS Survey for Women’s Representation in Machine Learning and Data Science
Where women are full participants in a country’s politics or economy, societies are more likely to succeed: Barack Obama
I recently participated in the 2019 Kaggle ML & DS Survey Challenge. The survey that is in its third year now aims to offer a comprehensive view of the state of data science and machine learning. The challenge was open to all and the notebook that told a unique and creative story about the data science community was awarded a prize. The challenge description said -: “The challenge is to deeply explore (through data) the impact, priorities, or concerns of a specific group of data science and machine learning practitioners.”
This was an excellent opportunity for me to participate in the challenge and explore the dataset w.r.t women’s participation in the survey, worldwide. The objective of the notebook was to analyze the survey data to answer an important question: is the women participation in STEM really improving, or is it just a hype? I employed my analytical skills to investigate whether things seem to be improving, or is there still much left to be done.
I also won the best notebook award for all notebooks published prior to November 19th,2019.

This article is a summary of the results and insights that I obtained while analyzing the dataset. You can view the original Kaggle notebook here.
Methodology
I analyzed the female participation in the survey under six different categories, namely:

Analyzing women representation under these different categories gave me a fair idea of the nature of women’s representation is the ML and DS realm.
I began by analyzing the 2019 survey’s dataset to get a big-picture.
Countries with the majority of respondents in 2019

- The majority of the respondents(both male and female) were from India, followed closely by the U.S. In fact; these countries together make up more than 50% of the entire population.
Note that if a country or territory received less than 50 respondents, they have been grouped into a group named “Other” for anonymity.
Comparison of the number of respondents over the years

- Whereas the number of responses in 2018 was considerably higher than in 2017, the year 2019 saw a decline.
Let’s now see the insights obtained from the analysis of the six key areas — Gender, Country, Age, Education, Professional Experience, and Salary.
1. Gender

The Great Gender Divide

- There was a staggering difference between men and women respondents in the survey. Around 82% of the respondents are men, while only 16% are women.
Gender Distribution over the years
The gender distribution in 2019 as compared to the last two years.

- Well, the pattern is almost the same. The participation of women respondents has been consistently low over the years.
2. Country

Let us now look at the countries from where the female respondents come. The data will help us know the countries of maximum as well as minimum responses.
Countries of Female Respondents


- The majority of the female respondents in 2019 were from India and the U.S.
- The percentage of participation from Central Africa was appalling, although countries like Nigeria did show their presence on the world map.
- Females from Turkey, Nigeria, and Pakistan also responded to the survey, albeit their percentage was tiny.
The Indian and the U.S Female Respondents over the years
Since India and the U.S had the maximum percentage of respondents, I also analyzed the data to find out if a similar trend was observed in the last couple of years too.

- The number of female respondents in the U.S was considerably higher than in India for the years 2017 and 2018. However, the year 2019 saw a growth in Indian female respondents, and their percentage surpassed the U.S females.
Daunting obstacles remain in Africa


- The number of African females who responded to the survey in 2019 showed an increase as compared to the previous years. Around 150 females responded to the survey in 2019, whereas, for the previous years, the numbers were even less than 100.
- It was interesting to look into the reason for the increased participation of the African subcontinent in 2019. Firstly, few Algerian females took the survey for the first time in 2019. Secondly, a sharp spike in the Nigerian female respondents was observed in 2019 as compared to the previous two years. Both these factors contributed to the better performance of African females in the 2019 survey.
3. Age Distribution

Age is an important attribute for any demographic analysis, and some interesting results were obtained on analyzing the age variable.
The Young Brigade dominates in 2019
The age distribution of the female respondents in 2019.

- The majority of the female respondents were in the (25 -29) age group followed closely by (22–24) age group. Thus most of the women were in the (20–30) age bracket.
- The (20–30) age group can comprise of both students(undergraduate and postgraduate) and professionals.
- Interestingly, females greater than 60 and 70 years of age had also responded to the survey. Well, as it is said — Age is just a number.
Age distribution pattern over the years

- No notable change in the pattern was observed. Overall, the 20–30 group dominated the survey.
Age distribution country wise

- The majority of female respondents in India consists of women in their 20s, which is also higher than in any other country. Thus, the female respondents from India were predominantly young.
- Also, there were a considerable number of women respondents in India between the age of 18 and 21 years. This age group generally comprises students, and it was heartening to see them participate in the survey.
- For the U.S women, the percentage of student respondents is comparatively less. As for other nations, the distribution of various age groups was almost the same.
4. Education

It has been rightly said that educated females form the backbone of society. Here is the analysis of the qualification status of the female respondents.
Educational qualifications of the female respondents in 2019

- The education status of the females was impressive, with the majority (~46%)having a Master’s degree followed closely by a Bachelor’s degree(27%). 16% of PhDs answered the survey.
- The analysis also revealed that there was a certain proportion who have had no formal education past high school. In spite of this, they took the survey, which in itself is a commendable thing.
Educational qualifications of the female respondents, country wise

- The U.S had the maximum number of women with a Master’s, and Doctoral degrees followed closely by India. However, It should be kept in mind that a lot of women in India and other countries generally move to the U.S for their Masters and PhDs.
- India topped the list with the maximum number of Bachelor degrees. This was pretty obvious since the majority of women respondents were students in their 20s.
- There was a general predominance of Masters’ over other degrees, among all the countries except for Japan, which had a higher incidence of a Professional degree.
5. Professional Experience

Let’s look at the various professional roles that females occupy in the industry.
Female respondents’ roles over the years

- Data scientists seemed to be the principal role for the female respondents since 2017, followed by Data Analyst. Other roles like Developers, researchers, and project managers could also be seen in the population.
Top 20 roles for female respondents in 2019

- If we exclude students from the result, Data Scientists(~19.5%) formed the chunk of the population who took the survey. This was closely followed by women in the Data Analyst role(~11%).
- Interestingly, some women were not employed but had responded to the survey. These women could not be working by choice or maybe looking for jobs. We could connect to these women to understand if they are willing to work and could assist them in the same.
Female respondents’ Current Roles country wise
I combined some of the roles to create broader groups. For instance, Data Engineer and DBA/Database Engineer were clubbed together as was Data Analyst and Business Analyst.

- Again, leaving out the students, the U.S had the maximum number of Data Scientists and Data Analysists who took part in the survey, followed by India.
- India had the maximum number of Software Engineers participating in the survey. Interestingly, the percentage of unemployed females respondents(~<2%) was also the highest in India.
Female Data Scientists distribution over the years

- Even though the percentage of female respondents had decreased in 2019, the percentage of Data Scientists who took the survey was more significant than in 2018.
6. Salary

Even though some researches say otherwise, salary is a significant motivational factor in retaining and acquiring new talent. Let’s see how well our ladies are paid in the Data Science space. I analyzed the general trend of the salary of female respondents in 2019 and then compared the trend with 2018 wages.


Salary Range of Female respondents in 2019
- The majority of female respondents did not wish to disclose their annual salary. Of the remaining, (~10%) had an annual salary of fewer than 1000 dollars. This makes sense since a significant chunk of that population comprised of students who may currently not be having permanent jobs.
- There was also a tiny percentage of females who made more than 200k and 300k dollars a year.
Comparison of Salaries of Female respondents in 2018 and 2019.
To see if this pattern of salaries was exclusive to 2019 or was it a recurring phenomenon, I compared it with the 2018 Salary range. I did not include the 2017 salary data since it had more than ten different currencies.
- The general pattern amongst the salary distribution appeared to be the same in 2018 and 2019. The annual compensation in 2019 is marginally better than it was in 2018, which is good.
- Another critical point is that unlike in 2018, 2019 does have some females who earn more than 500k USD.
Comparison of Male and Female salaries in 2019.

- The general salary trend remains the same for both genders. Most of the people earn less than 1000 USD. However, the percentage of women earning less than 1000 dollars is more for women than men. Also, unlike men, no women respondents are making higher than 500k USD.
Comparison of Male & Female Data Scientists’ salaries in 2019

- The percentage of women Data Scientists earning less than 1000 USD is considerably higher as compared to their male counterparts. Female data scientists also seem to be paid less in the high salary range. The percentage of women Data Scientists earning more than 200k USD is very less.
A look at Salaries of Female Data Scientists worldwide

- Data Scientists in the U.S are paid relatively higher than in other countries.
Key Takeaways and Recommendations
Some of the significant takeaways from this entire exercise can be summed up in six key points:
- The participation of women in this survey is very low and hasn’t shown much improvement over the years. We need to see the reason behind such low participation and how we can encourage women to become more participative.
- The response from the U.S and Indian women is heartening, although it is still very low as compared to their male counterparts. Africa shows a glimmer of hope with countries like Nigeria and Algeria gearing up. Organizations should team up with the African government and NGOs to provide better opportunities for study and research to these women.
- The young female populations in Data Science are on the rise, with the majority coming from India. This is not surprising as India has one of the highest young brigades in the world. This demographic dividend should be tapped efficiently by reforming the existing Indian education model and introducing high-quality Data science courses in their current curriculum.
- The females in the Science space are highly qualified, with the majority having a Master’s degree.
- Most of the female Data Scientists are located in the and India. Some women who are not employed have also responded to the survey. There could be multiple reasons for that. They could be students or women who wish to restart their careers after a break. The latter could be provided with assistance by getting them to attend meetups or community events. For the students, mentorship is a good option.
- The salary distribution has remained constant over the years, but Indian female Data Scientists are paid comparatively less as compared to their counterparts in the U.S.
Conclusion
So let’s get back to the initial question — are the Geek Girls Rising? It’s a mixed feeling. Some areas have shown improvement while a lot of work is required in others. Overall, things appear to be promising.
Data science, in itself, is a combination of diverse scientific disciplines. It makes all the more sense to bring in people from different genders, backgrounds, and ethnicity who can bring in more creativity and allow knowledge, discoveries, and innovation to flourish.
It will require a collaborative effort from society to make diversity and inclusion a vital part of the ecosystem. As women, we should make sure that we create an active support group to assist other women in the Data Science space. After all, Empowered women empower women.