Update: The winning sessions are Data mining - open source integration with SAS and Forecasting - multistage models for highly seasonal and/or sparse demand series. Get registered for the conference today. We'll see you in Las Vegas!
The Analytics 2015 conference in Las Vegas, Oct. 26 and 27 is designed for you. So why wouldn’t you help choose the content? New this year, we’re asking the analytics community to vote on one data mining and one forecasting topic that they want to hear at the conference. The voting takes place on AllAnalytics.com.
The sessions you can choose from include:
- Data mining - open source integration with SAS
- Data mining - video data mining
- Data mining - ensemble modeling
- Forecasting - count series forecasting (for time series that are discretely valued)
- Forecasting - multistage models for highly seasonal and/or sparse demand series
I asked our forecasting expert, Ken Sanford and the data mining-meister, Patrick Hall to break it all down for us. These guys are serious about their areas of expertise. Just watch…
And now that we’re all friends again, I’ve asked Ken and Patrick to answer some questions on each of these hot topics.
Why is open source integration with SAS an important topic for today’s data scientist?
PH: This is the golden age of analytics. There are so many good tools available to data scientists that mixing and matching them has become common place. SAS enables the data scientist to make calls to their favorite bleeding-edge open-source packages AND allows open-source languages to call into vetted and scalable SAS procedures. This level of flexibility empowers data scientists to tackle complicated problems in whatever way they see fit.
What industries right now are taking advantage of video data mining?
PH: Video mining is used in government and security applications, in medical applications, and there are several emerging use cases in the retail and energy sectors. With the advent of deep learning techniques that can bring automated image recognition to human-level accuracy, it is likely that this sophisticated, big data technology will continue to evolve.
What types of predictive analytics problems do ensemble models help with?
PH: Ensemble models are perfect for real-life problems that involve big, noisy, dirty data. Ensemble models often train on boot-strapped samples, so they can be super scalable. They are known to produce very accurate results, and because ensembles are often built from decision trees, missing values, character variables, and high-cardinality class variables are no problem. So bring on your worst data, and let's see what ensemble models can do!
What are the benefits of modeling series as count value versus continuous value?
KS: Most traditional time series analysis techniques assume that the time series values are continuously distributed. For example, autoregressive integrated moving average (ARIMA) models assume that the time series values are generated by continuous white noise passing through various types of filters. When a time series takes on small, discrete values (0, 1, 2, 3, and so on) such as with sales of durable goods or spare parts, this assumption of continuous values is unrealistic. By using discrete probability distributions, count series analysis can better predict future values and, most importantly, more realistic confidence intervals. In addition, count series often contain many zero values (a characteristic that is called zero-inflation). Any realistic distribution must account for the “extra” zeros.
What types of data tend to have multiple seasons? Why is this topic important for forecasters?
KS: Sales for a product, such as sunscreen or swimsuits, can greatly vary from one selling location to another during certain times of year. Sporadic sales across time, including weeks of zero sales, within the same selling location further complicates forecasting weekly sales of a particular product for a given store. These challenges require special attention in order to achieve a reasonable replenishment forecast for retailers. We will show several methods of forecasting with data that experience these seasonal characteristics.
You can cast your vote for one data mining and one forecasting topic on AllAnalytics.com from July 13-Aug. 7. Look for the Quick Poll section on the right-hand side of the homepage. The data mining and forecasting topics with the most number of votes will be a 50-minute breakout session at Analytics 2015.