Let's talk about using DLPy to model employee retention through a survival analysis model. Survival analysis is used to model time-to-event. Examples of time-to-event include the time until an employee leaves a company, the time until a disease goes into remission, or the time until a mechanical part fails. The variable used in survival analysis is the time-to-event outcome variable. What makes survival analysis interesting is that the statistical assumptions for normality of distributions do not apply because the events tend to occur either very quickly or much later in the time period under investigation.
The example used in this video is about employee attrition. In this dataset, there are 15,000 observations with the time employees spend at the company ranging from 2 to 10 years. At the time of the study, approximately 76% of employees were still at the company. This means that the attrition event did not occur in the designated time period. This is called censoring as the time-to-event outcome variable does not have the exact tenure time for employees.
Survival models are needed to account for censoring to provide good prediction and valid inference. In the attrition example, we compare the Cox proportional hazards model with a deep survival model. The Cox proportional hazards model is a type of regression model. The main difference between the Cox model and the deep survival model is that the Cox model has no hidden layer. That is, the Cox model fits a simple linear function for prediction, while the deep survival model with hidden layers can automatically learn a nonlinear and complex function, which can lead to a better performing model.
The concordance index (C-index), is the most useful criterion for evaluating a model’s overall predictive performance in survival analysis. It evaluates how well a model predicts the ordering of survival times based on individual risk scores. In this case, we want to know who will leave the company when (e.g. employee 101 will leave the company after 5 years, employee 39 will leave the company after 6 years). This is very different from a typical model evaluation metric such as mean squared error.
For example, a C equal to .5 is for a random model, whereas C equal to 1 is for a model with perfect predicted ranking of survival times. The higher the C-index, the better the model is. In our employee attrition model, the C-Index is .93 for the deep learning model, and .85 for the Cox model. The higher C-index value tells us that the deep survival model obtains more accurate predictions than the Cox model.
Try it for yourself
DLPy makes it easy to take advantage of deep learning. Simply choose your models, modify them and begin deep learning using the notebooks and examples on GitHub. And if you want to contribute to the DLPy library, create a pull request on GitHub, as SAS gladly accepts them.
Just a note, to use the package for model development, a SAS Visual Data Mining and Machine Learning license is required. If you do not yet have a license, consider a 14-free trial at sas.com/tryvdmml.
In case you missed them, here are the previous blogs with videos on DLPy: