Is it wise to crowdsource data governance?

Data governance doesn't lack for definitions and never has. Here is one that's as good as any:

The process of creating and agreeing to standards and requirements for the collection, identification, storage and use of data.

Read More »

Post a Comment

How SAS supports the four pillars of a data quality initiative

Data quality initiatives challenge organizations because the discipline encompasses so many issues, approaches and tools. Across the board, there are four main activity areas – or pillars – that underlie any successful data quality initiative. Let’s look at what each pillar means, then consider the benefits SAS Data Management brings to each.

Read More »

Post a Comment

How can data privacy and protection help drive better analytics?

store owner practicing data privacy and protectionBalance. This is the challenge facing any organisation wishing to exploit their customer data in the digital age.

On one side we have the potential for a massive explosion of customer data. We can collect real-time social media data, machine data, behavioural data and of course our traditional master and transactional customer data. By combining the growth of data sources with the accessibility of big data processing, we can now exploit more and more opportunities to wring the investment out of customer information.

But this has to be tempered by the other side of the equation – the growing demand for greater data protection and privacy controls. Take the EU General Data Protection Regulation (GDPR) as an example. Read More »

Post a Comment

Harmonizing semantics for consistency in interpreting analytical results

coworkers harmonizing semantics for analytic consistencyOne aspect of high-quality information is consistency. We often think about consistency in terms of consistent values. A large portion of the effort expended on “data quality dimensions” essentially focuses on data value consistency. For example, when we describe accuracy, what we often mean is consistency with a defined source of record.

You might say that consistency of data values provides a single-dimensional perspective on data usability. On the one hand, consistent values enable consistency in query results – we always want to make sure that our JOINs are executing correctly. Yet as more diverse data sets are added to the mix, we continue to see scenarios where the issue of consistency extends way beyond the realm of the values themselves. Read More »

Post a Comment

The growing importance of big data quality

small dog looks at big dog representing growing importance of big data qualityOur world is now so awash in data that many organizations have an embarrassment of riches when it comes to available data to support operational, tactical and strategic activities of the enterprise. Such a data-rich environment is highly susceptible to poor-quality data. This is especially true when swimming in data lakes – the increasingly popular (and, arguably, increasingly necessary) storage repositories that hold a vast amount of raw data in its native format, including structured, semistructured and unstructured data. In data lakes, of course, data structures and business requirements do not have to be defined until the data is needed.

As the TDWI Best Practices Report Improving Data Preparation for Business Analytics explains, a key reason why organizations are creating data lakes is simply to make more data available for analytics, even if consistency and data quality are uncertain. The report noted Hadoop is playing an important role in this data availability. Read More »

Post a Comment

3 Thanksgiving lessons about data warehouses, Hadoop and self-service data prep

It's that time of year again where almost 50 million Americans travel home for Thanksgiving. We'll share a smorgasbord of turkey, stuffing and vegetables and discuss fun political topics, all to celebrate the ironic friendship between colonists and Native Americans. Being part Italian, my family augments the 20-pound turkey with pasta – more specifically, cavatelli (pronounced "cav-a-deal"). And, being part Swedish, we add Swedish meatballs to the fray.

Before the feast, these dishes have to be prepared, often using a time-tested recipe. The perfect gravy or Swedish meatball – which might have taken my nana or grandma several years of experimentation to master – is deployed and consumed in mere minutes. Later, the meal is stored away in Tupperware bins to be deployed and consumed another day.


Meals and data need to be prepared and blended while still hot.

Think of these time-tested entrees as data dishes that people across your organization are anxious to consume. Imagine business analysts, data scientists and chief data officers all sitting at the table together consuming different combinations and slices of your data. How can you serve precisely the right combinations of data to the right people while it's still hot?

One way is to use SAS Data Management, which has lots of recent updates that can help you cleanse, prepare and deploy your data faster and better than ever. 

Getting back to Thanksgiving... Let's look at three lessons Thanksgiving can teach us about data warehouses, Hadoop and self-service data prep. Read More »

Post a Comment

Data preparation: Empowering business users via self-service

There are many reasons organizations like Netflix and Amazon can glean fascinating insights into consumer behavior: their data is vast and eerily accurate. That is, employees don't have to spend a great deal of time scrubbing data for location, past purchases, preferences and the like. Put differently, these companies need not speculate on who their customers are, merely what they want.

Big difference.

Read More »

Post a Comment

The “tarnished record” – Alternatives to gold for fraud analytics

We often talk about full customer data visibility and the need for a “golden record” that provides a 360-degree view of the customer to enhance our customer-facing processes. The rationale is that by accumulating all the data about a customer (or, for that matter, any entity of interest) from multiple sources, you can apply sets of rules that will pick the best (perhaps “cleanest”) values from among the available data attributes and construct a golden customer record. The quality of the golden record is expected to supersede the quality level of any of the source records.

nurse holds a tarnished recordUnder the right circumstances and with appropriate controls, it might be argued that the processes associated with identity resolution, record linkage and the application of data value survivorship rules can support the provision of improved information for both operational and analytical purposes. For example, having current, accurate customer telephone information supports an operational aspect of customer support (such as automatically recognizing that a customer is calling from a telephone number associated with an account). It also supplements analytical profiles and patterns linked to location analytics used for real-time retention efforts.

Read More »

Post a Comment