Creating analytics? Don't forget a supplier SLA policy.

team meeting to discuss data quality SLAsDuring a data quality assessment, one of my clients discovered that a large chunk of data that ultimately fed into their business analytics engine was sourced externally. After examining the contracts surrounding this data, I found that 100% of it failed to possess service-level agreements (SLAs) for the quality of information expected.

I see this problem time after time in organisations of all sizes. Data just isn’t factored into service-level contracts. As a result, any defects get passed down the line as service issues, unplanned administrative costs and decreased profitability.

What if you operated a restaurant but failed to hold your food suppliers to adequate service levels for timeliness or quality of produce? Your business wouldn’t last long.

Read More »

Post a Comment

Identity, identification and the proliferation of identifiers

I was surprised to learn recently that despite the reams of laws and policies directing the protection of personally identifiable information (PII) across industries and government agencies, more than 50 million Medicare beneficiaries were issued cards with a Medicare Beneficiary Number that's based on their Social Security Number (SSN). That's right – the same SSN that's the key artifact required for identity theft is used as part of millions of existing Medicare Health Insurance Claim Numbers (HICNs). This poses a significant threat of PII exposure.

Read More »

Post a Comment

Managing data where it lives

Historically, before data was managed it was moved to a central location. For a long time that central location was the staging area for an enterprise data warehouse (EDW). While EDWs and their staging areas are still in use – especially for structured, transactional and internwoman with tablet, managing data where it livesally generated data – big data has given rise to another central location, namely the data lake. As a storage repository for vast amounts of raw data in its native format, a data lake often contains data that is semistructured or unstructured, nontransactional or event-driven, and externally generated.

Centralizing data before it’s managed has its advantages. The main benefit is that there’s only one place the enterprise has to build and maintain the processes for cleansing, unduplicating, transforming and structuring data. Of course, this approach is based on the assumption that most of the data business users consume will be sourced from this data management hub. But these days business users have access to an abundance of alternative data sources, both within and outside of the enterprise. Read More »

Post a Comment

Is it wise to crowdsource data governance?

Data governance doesn't lack for definitions and never has. Here is one that's as good as any:

The process of creating and agreeing to standards and requirements for the collection, identification, storage and use of data.

Read More »

Post a Comment

How SAS supports the four pillars of a data quality initiative

Data quality initiatives challenge organizations because the discipline encompasses so many issues, approaches and tools. Across the board, there are four main activity areas – or pillars – that underlie any successful data quality initiative. Let’s look at what each pillar means, then consider the benefits SAS Data Management brings to each.

Read More »

Post a Comment

How can data privacy and protection help drive better analytics?

store owner practicing data privacy and protectionBalance. This is the challenge facing any organisation wishing to exploit their customer data in the digital age.

On one side we have the potential for a massive explosion of customer data. We can collect real-time social media data, machine data, behavioural data and of course our traditional master and transactional customer data. By combining the growth of data sources with the accessibility of big data processing, we can now exploit more and more opportunities to wring the investment out of customer information.

But this has to be tempered by the other side of the equation – the growing demand for greater data protection and privacy controls. Take the EU General Data Protection Regulation (GDPR) as an example. Read More »

Post a Comment

Harmonizing semantics for consistency in interpreting analytical results

coworkers harmonizing semantics for analytic consistencyOne aspect of high-quality information is consistency. We often think about consistency in terms of consistent values. A large portion of the effort expended on “data quality dimensions” essentially focuses on data value consistency. For example, when we describe accuracy, what we often mean is consistency with a defined source of record.

You might say that consistency of data values provides a single-dimensional perspective on data usability. On the one hand, consistent values enable consistency in query results – we always want to make sure that our JOINs are executing correctly. Yet as more diverse data sets are added to the mix, we continue to see scenarios where the issue of consistency extends way beyond the realm of the values themselves. Read More »

Post a Comment

The growing importance of big data quality

small dog looks at big dog representing growing importance of big data qualityOur world is now so awash in data that many organizations have an embarrassment of riches when it comes to available data to support operational, tactical and strategic activities of the enterprise. Such a data-rich environment is highly susceptible to poor-quality data. This is especially true when swimming in data lakes – the increasingly popular (and, arguably, increasingly necessary) storage repositories that hold a vast amount of raw data in its native format, including structured, semistructured and unstructured data. In data lakes, of course, data structures and business requirements do not have to be defined until the data is needed.

As the TDWI Best Practices Report Improving Data Preparation for Business Analytics explains, a key reason why organizations are creating data lakes is simply to make more data available for analytics, even if consistency and data quality are uncertain. The report noted Hadoop is playing an important role in this data availability. Read More »

Post a Comment