Editor's Note: There are hundreds of breakout sessions happening at SAS Global Forum in both the Users and Executive programs. Since we couldn’t just pick one to highlight, we decided to put together a SAS Global Forum day 2 session roundup, highlighting some of our very favorites!
Don’t overlook data management when it comes to cybersecurity analytics
There’s a constant buzz in the market around analytics and its role in the cybersecurity space, but often the conversation overlooks the important role data management plays. Data management is a fundamental component SAS cyber experts want to be sure organizations understand – just because the investment is being made in cyber analytics doesn’t mean companies can ignore data quality and data management.
“There are countless solutions and dollars spent to protect organizations,” said SAS’ Director of Cybersecurity Christopher Smith. “All of those pieces – firewalls, endpoints and email gateways – play a vital role, but those systems don’t communicate with each other.” Even with all the investment organizations are making to protect themselves, there is still no greater insight being gained into what’s actually happening inside company walls.
What’s needed is business context, and that’s something isolated solutions cannot provide. While those systems are valuable in identifying what’s good, what’s bad and what can be defined, they offer limited business intelligence.
But the challenge isn’t just about obtaining data, it’s about the speed, type, structure and volume of data being generated per second.
“We are working in a society where everyone is looking for a silver bullet,” said Vice President of Business Consulting and Data Management Evan Levy. “People are buying products to solve problems, but it’s more complicated than that. The volume, need and diversity of content and sources isn’t something we could have ever predicted.”
Levy said that’s where data management becomes critical. Companies have to enlist the proper data management techniques to avoid lagging in security and exposing themselves to added risk with every attack. By looking at what’s actually happening, companies can see what the data is saying and then develop an effective response.
The fear today is not what happened, it’s the unknown of what else has happened that we haven’t yet identified. “Once data is created it will always be an asset to the business,” said Smith, which means it must be catalogued to offer value. Effective cyber protection requires sophisticated analytic prowess with rich data history in order to protect organizations from the clever and skilled hackers.
Learning from past mistakes
In his April 20 Executive Conference breakout session, Sterling Price, Director of Customer Analytics at WalMart Stores, Inc., cautioned against relying too heavily on completed analytical projects, assuming that new technologies and massive data sets produce an accurate and relevant result. He used several historical examples, from the Google Flu prediction mishap to the faulty prediction outcome of the 1936 US presidential race, to help prove the point.
Big data, it turns out, is simply the newest phenomenon tempting leaders to believe their outcomes are statistically sound. "We owe our organizations objective analysis based on science, not wishful thinking," said Price.
Here are five points gleaned from his personal experience at Walmart as well as the historical examples he shared:
- Don't fall prey to the belief that results will be accurate and useful because of how much data was used.
- We still need to sample things, but a badly chosen large sample - even a really big one - is much worse than a well-chosen small sample.
- Methodology still matters. Big data by itself does nothing. How we use it defines its value.
- Scalability should be considered up front.
- Don't mistake statistical significance for practical significance. They are not the same.
Arrest Prediction and Analysis in New York City
Analyzing "stop and frisk" data captured by the New York City Police Department can lead to insights that help cops make better decisions about whether to arrest a person or not, say two Oklahoma State University graduate students.
Karan Rudra and Maitreya Kadiyala looked at open source data from the NYPD to understand the propensity of arrest and optimize frisk activities. This type of analysis can potentially reduce the number of stops and impact the arrest rate.
The pair examined 56 variables, including in which precinct a stop occurred, whether a stop led to an arrest, whether the officer produced an ID and shield, and whether a person was stopped inside or outside of a building.
Using SAS® Enterprise Miner™, they built and compared four models, determining that a polynomial regression model was the best. Some findings from their research include:
- In the Bronx and Manhattan, females have the highest percentage of arrests after a stop and frisk.
- In Staten Island, though there are a high number of stops per area, the number of resulting arrests is comparatively low.
- Blacks and Hispanics have a higher percentage of arrests after a stop.
- The overall arrest rate of the data sample was 6 percent.