Learning from learning: My Stanford machine learning course experience

Last summer, I joined a growing trend: I took an online course on machine learning algorithms run by Stanford University. At SAS, we learn a lot about the user interface and tools. However, I wanted to learn what is happening behind the scenes with these algorithms, and understand them at a much more fundamental level.

It turns out that I’m not alone. Nearly 3 million other people have already taken this course – a huge number. I understand that this is one of the most popular machine learning courses available in terms of both numbers of students and ratings. The ratings it receives from students averages 4.9 out of 5. In total, 38% of students reported getting a tangible career benefit out of this course. It takes about 56 hours to complete, and is available in English or subtitled in several languages, including Japanese, Hindi, Chinese and Spanish.

Hidden Insights: Better ways to handle overfitting and regularisation - SAS — Better ways to handle overfitting and regularisation – Stanford University.

Demonstrating an appetite for learning

The ratings show how good the course is, but the number of students taking it shows the sheer level of interest in machine learning. It doesn’t hurt that the course is provided by one of the world’s top experts in machine learning, Andrew Ng, but that’s not the only attraction.

The course covers both the theory underlying machine learning and how to apply it in practice. It outlines some of the most effective machine learning techniques, then allows students to implement them in practical sessions. The course also describes best practices in artificial intelligence and machine learning so students can see how it should be used. It is, therefore, a broad introduction to machine learning, covering both theory and practice, and using case studies to improve understanding.

Framework for the future

The popularity of the course suggests that people want to understand both the technicalities and the framework for machine learning. It is relatively easy to focus on the technicalities and demonstrate how to program in a language. In fact, it’s not unknown to get completely lost in the technicalities of a language and never emerge again. Each programming language has its own quirks, and they can become all-consuming when you’re working with them. Especially in the beginning most of your time will be spent on getting the language syntax correct instead of focusing on the actual analytics.

This course, however, is different. It provides a framework within which students will be able to work in the future. It uses Octave as a programming language, simply because it is simple. Octave is based on C++, and it is easy to use. Students, therefore, don’t need to learn to code in detail to do the course – a huge advantage. The course tutors don’t directly recommend using Octave in real life, but it is a great option for the purposes of getting up and running quickly.

Some useful outcomes

I am certainly among the 38% of students who have reported getting some tangible career benefits out of the course. Interestingly, not everything that was taught in the course would squarely transfer to the SAS world. For example, in the course a lot of focus is in taking small steps in modelling, e.g., limiting the data sample size to optimise the learning rate and minimise the number of iterations. The SAS Platform, on the other hand, offers scalable, high-performance, in-memory computing built for big data and a capability to train multiple algorithms in a flick of the wrist.

For example, we all know that being able to use big data sets is one of the major advantages of machine learning. In fact, it is the reason why we can use machine learning in the first place. However, it also has some disadvantages. In a standard data set, you might have as many as a thousand variables. The course stressed that it was worth considering taking out some of these variables. This reduces the "noise" and helps find the true latent relationship between the response and predictors. SAS offers superior functionality to reduce the number of variables without having to manually select which ones to keep.

Dealing with overfitting

The second thing is that in machine learning, it is possible to "overfit" a model. This is when you get your model to fit your data almost perfectly, which sounds great. However, it’s not so good when some of those data are actually irrelevant. In other words, by fitting the data too closely, the model actually loses some of its accuracy and predictive ability.

To get around this, one very useful technique is regularisation, which avoids overfitting and also helps to make the model easier to interpret. Regularisation achieves this by reducing the magnitude of model parameters by penalising them while training the model. Finding the optimal penalties for model parameters is an iterative manual process. SAS, however, takes a different angle on this by offering autotuning capability to ease finding optimal penalties for model parameters and much more.