Leveraging AI in Education

JULY 27, 2018
Team Impact Analytics
Impact Analytics collaborated with one of the largest non-governmental organizations that focusses on providing high-quality, low-cost and replicable interventions, to bridge the gap in the educational system of India. The main objective of the NGO was/is to design programs that ensure learning levels in schools and communities increase, education reaches all children who are in school or unable to use school facilities, and youth gets well trained for job opportunities. The NGO aims to break down the traditional tactics and challenge the current system of learning in schools. To achieve this objective, the NGO adopted testing tools to evaluate the learning levels of children and determine the best course of action for each student.


The NGO wanted to understand the content/subject consumption when students were given free access to learning. Information gathered about the content consumption would help them develop better content ahead and increase student engagement. To democratize the education in the rural levels, the NGO distributed ‘smart-tablets’ with an inbuilt learning app for community-level interaction in a few villages. Initial insights into the grass root-level interaction lead them to expand their testing scenarios to the higher community level in rural areas. Data collected from each tablet was sent to the local server that would record the multiple variables needed to analyze the data. Various data sets such as the subjects, quizzes, time logged in, engagement levels, etc., were recorded to get an understanding of the overall effect of the smart-tablets distribution. Although the client had access to voluminous data, they were struggling to identify any patterns for easy comprehension and implementation of new strategies to improve engagement. Impact Analytics performed extensive exploratory data analysis and clustering to understand trends in the data. A cluster is a group of elements, sharing similar behavior within the group and dissimilar with other clusters (across clusters). The data obtained from the client did not have highly segregated clusters. The clustering in the 2nd image (middle) is an intermediate case and the 3rd image shows completely segregated clusters (a case that is generally not possible to achieve).
Illustrative Figure: An illustration to show the over-lapping clusters that were classified using clustering algorithms and the 3rd image (right) shows an ideal clustering case which isn’t feasible in most cases

Using multiple algorithms and various clustering methods, IA was able to create meaningful engagement segments.

Clusters were created to understand which villages were consuming content based on engagement (low, medium and high). Once the variables were analyzed, additional analysis revealed the content that performed well in the low/medium/high-engagement clusters. For example, the most engaging content in the app was the ‘Short Science Videos’; this helped the NGO gain many insights into the high engagement into the Science section. They were able to conclude that students favored the short-explanatory videos better.

Figure 1: 40% of the Group Ids segmented into the High engagement cluster

Data was clustered using K-means clustering on group level and content level. The group level clustering involved clustering segments/groups based on engagement with the tablet. Content level clustering was based on the content consumptions from the app. Before clustering, data had to be cleaned and normalized. All the data obtained was normalized to one scale to carry out proper analysis and provide accurate results. Following variables were used for clustering:

  1. Days in a month
  • The average number of days the app was accessed in a month by a group.
  • The metric was divided by 30 days to normalize the variable

2. Resources accessed

  • The total number of resources accessed by the group in their history as a percentage of the total number of resources available to the group
3. Time spent on the app
  • The total time spent on the app per day (in seconds)
  • The variable was normalized by dividing by 86400 (no. of seconds in a day)

Analysis and Results:

Groups have been clustered to understand how the students are engaging with the content on the app. The objective is to understand the underlying metrics driving the engagement. The client wanted to know if the location of the student Is driving engagement or type of content.

  1. High engagement groups access a wider variety of content and for a longer time than other groups.
    • 20% of the groups observed have shown the best engagement. Engagement is defined based on metrics such as days spent on the app per month, time spent on the app per day, and the number of resources accessed.
Figure 2: 20% of groups have shown best engagement
Figure 3: Number of resources accessed by the Group IDs in the engagement clusters
  • The groups with the best participation are seen to be accessing the app 20 days per month on an average. They are also accessing a variety of content on the app leading to higher time spent on the app.
Figure 4: Time Spent on the app (in minutes) per day by engagement cluster
  • Number of Days the app was accessed by each cluster – MoM
Figure 5: Month-over-Month Trend of number of days accessed by engagement cluster

2. Two districts had the highest percentage of high engagement groups.

  • The next step is to identify the student demographics to understand the distribution of the groups. Based on sample data, we see that Place C and Place D have a higher percentage of best engagement groups
  • We need to identify the causes behind the engagement. Is the CRL engagement with the students driving the students to use the app more?
Figure 6: District wise percentage of engagement clusters
Figure 7: Male and Female students in the engagement clusters

3. Content has been clustered to understand and identify aspects that popularized the content.

Figure 8: Categories accessed by students in the high engagement clusters
    1. Impact Analytics observed that science videos with a shorter duration are most popular among the students followed by English videos. Further, content developed in-house is more popular among students as compared to third-party content.
    2. Math videos which are generally longer seem to be the least popular videos among students.
    3. English and Math games are the most popular games. Language related games are the least popular.
    4. The ideal next step was to understand the characteristics of the videos which are driving engagement among the students. The client needed to understand if videos that are more interactive are popular among students? Or if they are drawn to videos with more illustrations etc.
    5. Students also have access to game resources. The games are also categorized based on the subject and the type of games.
    6. Initial analysis was performed on games where students pick one answer from two options.


Impact Analytics categorized clusters based on various engagement metrics. Engagement clusters were isolated and identify precise clusters based on extensive exploratory data analysis.

Major insights obtained were:

  1. Students classified under ‘high engagement’ were spending about 15-days on an average per month.
  2. Students in the ‘high engagement’ category were accessing a wide variety of content


The insights into ‘Games’ in specific are:

  1. As the students attempt the game more, the variance among scores reduce, i.e., they are getting more consistent as they play the game frequently.
  2. Students with low engagement seem to have a slightly better performance compared with students of higher engagement. One possible explanation is that the low engagement students seem to grasp the concepts much quicker than the others and are performing well in fewer tries and hence are not engaging with the game further. One way to resolve this problem would be to add difficulty to the games as the student progresses which would increase retention with the game.
  3. Not all games are showing a similar pattern of scoring. Games that require higher math aptitude seem to have a decreasing performance with an increase in attempts.


The next steps going forward would be to accurately collect information on student demographics to understand the causation behind engagement. The data would also help IA understand which districts need more attention so that the CRLs can be deployed.

Deep dive into the video characteristics would help the client and IA understand what type of content is preferred among students. Based on the results, IA is helping the NGO understand the metrics and the content consumption that enables them to drive the engagement amongst the students.  In addition to this analysis, the NGO aims to understand the reasons behind the low/medium/high engagement clusters and the steps that can be taken to improve the learning capabilities of the rural students and design engaging content ahead.