SAS Master Data Management: a smarter approach to a unified view

Sometimes you have to get small to win big. SAS Data Management breaks solution capabilities into smaller chunks – and deploys services as needed – to help customers reduce their total cost of ownership.

SAS Master Data Management (MDM) is also a pioneer in "phased MDM." It's built on top of a data quality and data governance platform, resulting in reduced services integration costs and faster time to value. Customers can quickly make changes to existing processes, add new ones or easily integrate the technology with other solutions.

The latest release of SAS MDM adds pervasive data governance, improved usability and streamlined enterprise integration and performance to the mix.

  1. Pervasive data governance
    MDM incorporates workflow-based remediation, which helps data stewards, data quality and MDM administrators route and resolve data issues from within a common UI in collaboration with a larger team. For example, if a customer service representative sees that a customer is 144 years old in the database, it's pretty clear there's a data quality issue. This remediation issue can be surfaced and routed to the correct person to resolve it and check the birthdate.SAS MDM also provides a unified role-based web UI called the Data Management Console to increase business user engagement and foster better alignment between business and IT users. In addition to workflow, deep entity linking allows for an external application to bring up the Data Management Console with a specific customer pre-loaded onto the screen. A data steward could use this ability to contact an MDM admin with a specific entity for inspection by emailing a link to that specific URL. It's the difference between saying "Google it" when explaining a concept to someone versus sending them a direct link to the concept in question. Read More »
Post a Comment

CIOs, business-IT collaboration and the journey from information to influence

Cats and dogs. Hatfields and McCoys. Business and IT. Sometimes you just need a couple of names. Great rivalries need no further explanation.

At SAS Executive Conference 2014, Jill Dyché, vice president of best practices at SAS, led a panel exploring the way that business leaders and their IT counterparts interact in today’s business. The main theme: CIOs are adapting – or need to adapt – to the current state of business. Or the business will pass them by.

Dyché’s panel included H. James Dallas, a former CIO at Medtronic and Georgia Pacific; Mary Turner, president, Canadian Tire Bank and COO, Canadian Tire Financial Services; and Peter Moore, a technology consultant with Wild Oak Consulting. The group covered topics ranging from how the reporting structure of CIOs can affect their roles to the ways that IT can drive innovation in companies. Read More »

Post a Comment

Are you making data quality a design task?

Ask any battle-hardened data quality practitioner and they will tell you that one of the leading causes of data quality defects stems from an inability to design quality into information systems. I am going to take a specific example of bad system design to explain how data defects quickly become a reality.

Earlier in my career I was asked to migrate data into a system that was built with abstract data model design.

Most businesses have customers, employees, suppliers, partners and many other types of party data that interact with their business. All of this information needs to be modelled in a system somewhere. To get around the problem of having multiple entities (or tables) to store this data, a lot of companies opt for an abstract modelling approach. This design means that the identifying information of multiple entities is stored in one table and each fact, or attribute, is referenced via foreign keys with other fact-based tables.

This concept of modelling provides incredible flexibility when designing new systems. It allows you to add new entities relatively easily because you don’t typically need to create new tables.

When implemented correctly, there are data quality benefits to this kind of approach. For example, if all of your address information is located in one table, it is easy to standardise your data validation rules in one area. Abstract models also make it easier to master your core subject data centrally, which has obvious data quality benefits.

So what is the problem? Read More »

Post a Comment

Putting new life into “old” data

In the big data era, I hear a lot about new and dynamic data sources that are giving companies a wide range of opportunities – and anxiety. When industry wonks talk about the “speeds and feeds” inherent in big data, they are often talking about an avalanche of transactional or external data that is new to the enterprise.

But what about the old stuff? The massive amounts of information that your organization has collected over time? That’s part of your data management equation, and it’s one that’s been at the forefront for years.

At SAS Global Forum, there were several data management presentations featuring organizations that are moving existing data systems into newer systems. For example, the US Census Bureau deals with historical information – dating to the origins of the United States!

The Census Bureau is currently modernizing its Standard Economic Processing System (StEPS), a system designed to process data from more than 100 economic surveys. Initially started in 1995, StEPS once encompassed 16 different data collection systems. While it was a big step forward to have a processing point for economic information, it was difficult to make broad changes across those different platforms.

To make it easier to adjust, the Census Bureau decided to move to StEPS II, a next-generation version of the architecture. The goal: one processing system that makes it more scalable and allows the team to support both batch and real-time analytics. They used the SAS programming language to build the initial StEPS and chose a service-oriented architecture to build a new framework. Read More »

Post a Comment

The difference between traditional statisticians and data scientists

It's hard to imagine a hotter job now than the data scientistSupply trails demand and, as a result, there's no shortage of myths around them.

But is there any real difference between traditional statisticians and what we now call data scientists?

I asked my friend Melinda Thielbar, a research statistician developer at JMP (a SAS company). Her first answer to the question was "Yes, about $10,000 in salary." No argument from me there, but I probed deeper and asked, "Would it be fair to say that the former have typically worked with existing datasets while the latter have been more involved with the retrieval, analysis and generation of (usable) data?"

I found her response fascinating:

We started as statisticians. Then, we became "data miners." Now, it's "data scientists." Just like toothpaste now comes in about 14 different flavors, even though their main function is to clean teeth. Read More »

Post a Comment

Big data hubris

While big data is rife with potential, as Larry Greenemeier explained in his recent Scientific American blog post Why Big Data Isn’t Necessarily Better Data, context is often lacking when data is pulled from disparate sources, leading to questionable conclusions. His blog post examined the difficulties that Google Flu Trends (GFT) has experienced while attempting to accurately provide real-time monitoring of influenza cases worldwide based on Google searches that matched terms for flu-related activity.

Greenemeier cited university research positing that one contributing factor in GFT mistakes is big data hubris, which they explained is the “often implicit assumption that big data are a substitute for, rather than a supplement to, traditional data collection and analysis.” The mistake of many big data projects, the researchers noted, is that they are not based on technology designed to produce valid and reliable data amenable for scientific analysis. The data comes from sources such as smartphones, search results and social networks rather than carefully vetted participants and scientific instruments. Read More »

Post a Comment

Virtualizing the master data replicas

Last time we discussed two different models for syndicating master data. One model was replicating copies of the master data and pushing them out to the consuming applications, while the other was creating a virtual layer on top of the master data in its repository and funneling access through a data virtualization framework.

The benefit of the replication model is that it can scale to meet the performance needs of all the downstream consumers, at the risk of introducing asynchrony and inconsistency. The benefit of the virtualization approach is synchronization and consistency, but at the risk of creating a data access bottleneck. Either may be satisfactory for certain types of applications, but neither is optimal for all applications.

There is, however, a hybrid model that blends these two models: selectively replicating the master repository, maintaining a consistent view via change data capture, and enabling federated access via a virtualization layer on top of the replicas. In this approach, the repository can be replicated to one or more high-performance platforms (such as on a Hadoop cluster), with each instance intended to support a limited number of simultaneous client applications. Read More »

Post a Comment

Re-thinking issue management

For most organisations, issue management is seen as an administrative chore. Scattered across the organisation, data workers diligently resolve issues often via their own local issue management process.

With silos of data comes silos of maintenance, and this is a real shame because the data these systems possess is a vital tool in your data quality armoury – but only if you can pool this resource.

Another problem with conventional data defect management is that it is often a wasteful activity. I once met a diligent data analyst who worked tirelessly for several months fixing data issues in a data warehouse, only for the same issues to be repeated week after week. Such is the nature of upstream data defects. They will continue to flow downstream unless someone sees the bigger picture.

Data quality leaders need to have visibility of this big picture. They need to understand the various issue management systems in their scope of control and develop a transitional strategy to get them onto the data quality radar. Whilst the data quality team may not be on the front-line for issue management, they still need to have visibility of the metadata surrounding these issues because it can help them dramatically improve the impact that data quality management has on the organisation. Read More »

Post a Comment

Data metavisualization

How does one select the "right" or "best" way to visually represent data? Of course, the short answer is it depends. (In fact, that theoretical manner might not even exist.)

Beyond that, it's an interesting question and, as I argue in The Visual Organization, a harder one to answer these days for two reasons. First, there's just so much more data flying around today. Second and in a related vein, there are more ways to represent this information.

Against that backdrop, wouldn't it be useful to consult a dataviz guide? A visualization of the different types of visualizations available to us (read: a metavisualization)?

It turns out that such a thing does, in fact, exist.The Data Visualization Catalogue is currently an ongoing project developed by Severino Ribecca. From the site:

Originally, this project was a way for me to develop my own knowledge of data visualization and create a reference tool for me to use in the future for my own work. However, I thought it would also be useful tool to not only other designers, but also anyone in a field that requires the use of data visualization regularly (economists, scientists, statisticians etc). Read More »

Post a Comment

What magic teaches us about data science

Teller, the normally silent half of the magician duo Penn & Teller, revealed some of magic’s secrets in a Smithsonian Magazine article about how magicians manipulate the human mind. Given the big data-fueled potential of data science to manipulate our decision-making, we should listen to what Teller has to tell us.

“Magicians,” Teller explained, “have done controlled testing in human perception for thousands of years. Magic is not really about the mechanics of your senses. Magic is about understanding — and then manipulating — how viewers digest the sensory information.”

In his article, Teller explains seven principles that magicians employ to alter our perceptions. The first principle is pattern recognition. I have previously compared its role in data-driven decision-making to how we listen to music. We search for any pattern in data relevant to our decision that allows us to discover a potential source of insight. Once our brain finds a decision pattern, we start making predictions and imagining what data will come next. But sometimes the music of the data is the sound of pattern recognition directing our search for decision consonance among data dissonance toward comforting, but false, conclusions. Read More »

Post a Comment