Would Taylor Swift date her suitors or not? Guess what? Data scientists may know the answer. But this time it was pupils who found the answer. Pupils? Yes, data science is for everyone, kids included.
During Tech Week, a UK-wide event in July promoted by the Tech Partnership, organisations were encouraged to create interactive and fun activities for schools, universities and colleges – all aimed at inspiring and motivating young people to get interested in STEM careers. The idea is to change the way students learn about technology and ensure they see all the different tech career opportunities available to them.
We hosted a tech careers event at SAS’ UK headquarters over July 5-6, where we opened our doors to Sir William Borlase’s Grammar School and Sir William Ramsay School. Year 9 and 10 pupils from each school spent a fun day on our campus in Marlow, learning about different careers in technology, the job of a data scientist and how to solve real-life business problems with analytics.
Loredana Cornea and Oliver Crowley, two data science graduates, created a fun activity using the abacus: “The Maths in the Dates." It was a game that allowed pupils to use their curiosity to explore problems and experience some of the technical skills required of a data scientist.
During the hourlong game, pupils were divided into different groups, and each group was divided into two subgroups. One subgroup was the training team, comprising six members, and the other the scoring team, comprising three members
The training team got a complete “data set” (in an abacus) with information on Taylor Swift’s preference in men – referred to as the training data set. In data science, a training data set represents known information and is used to identify patterns within the data. The pupils’ job was to identify logical patterns in Taylor Swift's dating habits, backing up it with facts. After that, they communicated their conclusion to the matching scoring team.
The scoring team got an incomplete “data set” (in an abacus) with information on people who would like to date Taylor Swift – referred to as the scoring data set. In data science, a scoring data set represents a population you want to test your theories on or use to predict the outcome of an event you’re interested in. These theories are built using known information in a complete "data set" which was given to the training team.
The two data sets were different: The suitors' characteristics were different for each team, and we didn’t know if Taylor Swift will really date them or not because she hasn’t accepted or refused them yet.
The first step in analysing the data was to look at the different variables and find a threshold that separates those who Taylor chose to date from those she did not date. This process is called binning the variable into two buckets.
Next, the teams identified the categories that correspond to people who got a date and those who didn’t. This is called collapsing the levels of a qualitative variable.
The main goal was to understand the concept of dependent and independent variables, quantitative and qualitative variables, training and scoring data sets, and statistical inference. These are just some of the concepts and activities in the day-to-day work of a data scientist.
Finally, here is a picture of the game’s authors: