In the big data era, I hear a lot about new and dynamic data sources that are giving companies a wide range of opportunities – and anxiety. When industry wonks talk about the “speeds and feeds” inherent in big data, they are often talking about an avalanche of transactional or external data that is new to the enterprise.
But what about the old stuff? The massive amounts of information that your organization has collected over time? That’s part of your data management equation, and it’s one that’s been at the forefront for years.
At SAS Global Forum, there were several data management presentations featuring organizations that are moving existing data systems into newer systems. For example, the US Census Bureau deals with historical information – dating to the origins of the United States!
The Census Bureau is currently modernizing its Standard Economic Processing System (StEPS), a system designed to process data from more than 100 economic surveys. Initially started in 1995, StEPS once encompassed 16 different data collection systems. While it was a big step forward to have a processing point for economic information, it was difficult to make broad changes across those different platforms.
To make it easier to adjust, the Census Bureau decided to move to StEPS II, a next-generation version of the architecture. The goal: one processing system that makes it more scalable and allows the team to support both batch and real-time analytics. They used the SAS programming language to build the initial StEPS and chose a service-oriented architecture to build a new framework.
“Because we had everything written in SAS, we were able to build things ourselves,” said Scott Ankers of the US Census Bureau. “We wanted better data harmonization across the board. Luckily, we had the foundation in place. Now, when people want to do a survey, they have the tools they need.”
A continent away, another data management project is underway to adapt a data structure to modern demands. Brazil will face an interesting population shift over the next few decades. The government states that 7 percent of the Brazilian population was over 65 in 2013; in 2060, that number will be 26%.
This generational shift will cause a ripple effect through the years. The current youth of Brazil will put a strain on several government systems, especially as this generation goes through different phases – employment changes, parenthood, retirement and the use of social services.
As a result, Brazil established the Dataprev program using SAS Master Data Management (MDM) technology to create a master view of the citizen. Dataprev focuses on social programs, such as social security, employment information, passports, etc. This information was once managed in regional units, but to make more effective decisions, Dataprev began to centralize information.
The first two databases were social security information as well as labor and employment data. The team established business rules, defined a data model then profiled and loaded the data into SAS MDM. Using clustering and data de-duplication capabilities, they can select a golden record, giving a more accurate view of the citizen across different systems.
For the Census Bureau and the Dataprev, they have decades of information – but many demands on the horizon. Their data management challenge is similar to what many organizations face. But by finding ways to modernize their current environment, they are setting themselves up for big things in the future.