Perhaps no other sector has more massive volumes of text data than government. During the SAS Government Leadership Summit earlier this week, different agencies discussed big data topics, trends and techniques. And while diverse elements of focus were apparent, shared themes emerged that Juan Zarate, former Deputy Assistant to the President and Deputy National Security Advisor for Combating Terrorism, poignantly outlined - four baseline functions that summarize many of the conversations.
Speakers explored alternate techniques to improve content access with managed oversight. One ongoing challenge is examining content silos from a comprehensive perspective. One attendee stated that visibility across thousands of document sources would help reduce the over five billion dollars spent per year on third party contracts in one agency alone. Knowing the content of the contracts, the associated services, and rates across all providers would give the means to spot duplication of effort and reduce extraneous costs. Another shared a similar experience, using enterprise content categorization to classify documents across silos, layering the generated metadata on top of existing systems – and importantly, avoiding reinvention of any wheels. All agreed that having consistent commitment was a requirement (the OMB was mentioned) in order to be successful.
People mentioned information protection throughout the day. Nate Silver, founder, FiveThirtyEight.com & author of ‘The Signal and the Noise’, reminded us that although the volume is increasing, ninety percent of working knowledge is not new. One thing different today, he stated, is that the barrier for information sharing has been lowered. And with these barriers down, privacy and civil liberty issues arise and will continue to be a topic for leadership.
Baseline Function #2 – Validation: How does big data change and shift for different users?
With different levels of confidence needed to support distinctive decisions, focus on the end use of big data is paramount. Illustrating that decision-makers have specific needs from collected data, Juan Zarate posed: “What level of confidence does a border patrol officer need to take action?“ In legal matters there well-known rules of evidence, in statistical analysis there are quantified rules of confidence, but with big data – there is potential bias, particularly from social media sources. As a result, situational awareness of how insight will be used was highlighted as critical in defining the level of confidence you need in the source.
Nate Silver indicated that one of the problems is that people are quite sectarian in what information they consume, and if there is bias in the data, then big data simply provides more options to delude yourself. To address this bias, he recommends assessing who is collecting the data, and understanding its purpose, amongst other things. Denise Bedford, Goodyear Professor of Information Architecture and Knowledge Management, Kent State University, further clarified that we need ways to validate social media data to adequately assess the risks associated with any conclusions made from it.
Baseline Function #3 – Analysis: How can big data analysis help us think differently about what we do?
In analyzing big data, we need to be mindful of how we can think differently about problems and actions that we can take. Asking questions like: What actions could we change if we had immediate use of data? How would that lead to greater efficiencies or preventative measures? Social media, a popular topic at the event, holds value but needs to be grounded in the context of issues. And analysis of social media isn’t simply a function of terms. Language, as Denise Bedford pointed out, is much richer than ten to twenty keywords. Dave Wennergren, Assistant Deputy Chief Management Officer, Department of Defense, cautioned the audience to not get “too liquored up by the system”, and focus consideration on the outcome or action. In this keynote, Nate Silver provided guidance to those looking to analyze big data, suggesting that big data may change how you communicate uncertainty and probabilities, and that with big data there can be a tendency to over fit models and, above all, “try and err” in your analysis process. And resounding loud and clear throughout the day was the directive to critically examine what decisions would be better if they were made faster, and then frame analysis questions to improve those decisions.
Baseline Function #4 – Portrayal: How can we best absorb the insight provided by big data?
Recognizing that if you “expose the data, it will be used in ways not previously conceived”, as Dave Wennergren said, different types of presentation of the data are required at different stages of analysis. The need to have flexibility in the presentation layer used to examine big data, was explore in Randy Guard's (Vice President of Product Management, SAS) demonstration of Visual Analytics. FEMA’s (Federal Emergency Management Agency) Carter Hewgley, Director, FEMAStat, described how in pre-event situations a lot of high impact visualization is very important to analysis and evaluation; whereas post-event, reports and statistics to assess effectiveness in addressing the needs of the communities assisted.
Big data perhaps ultimately provides us with the opportunity for reinvention. Juan Zarate described how he and his team redefined the Treasury’s role to provide unique value to Homeland Security investigations. So too does big data provide the opportunity for every aspect of government to assess their unique contribution to better government. Policy makers and decision makers are looking for big data analytics to help determine the best possible, or at least the right options for decisions.
WIll you comment here with your baseline functions for big data?