Data quality and cloud computing: What are the risks?

tablet in the cloudIf I have one gripe about technology today, it's the seemingly universal belief that it manifests social and business issues for the first time. Sure, Twitter may be a relatively new advent and only recently have people been fired for tweeting really inappropriate jokes. Still, if you think that losing your job for publicly saying something objectionable is new, think again.

And the same holds true with respect to data quality. Yes, contemporary cloud computing arrived fairly recently, although its roots in grid computing go back decades. Make no mistake, though: the notion that duplicate, erroneous, invalid and incomplete information harms a business is hardly new. It predates today's rampant technology, big data and even the modern-day computer. Bum handwritten general-ledger entries caused problems centuries ago.

This begs the question: What are the data quality risks specific to cloud computing? In a nutshell, I see three. Read More »

Post a Comment

Health insurance, healthcare and pharmaceutical perspectives on data quality – Part 1

scientist in lab measuring liquidDoes it upset you when you log onto your healthcare insurance portal and find that they spelled your name wrong, have your dependents listed incorrectly or your address is not correct? Well, it's definitely not a warm fuzzy feeling for me! After working for many years in the healthcare, pharmaceutical and insurance fields, my perspective is a bit skewed.

So, ask yourself – Why is the data the way it is presented? The healthcare, pharmaceutical and insurance industries go through constant changes. They buy one another, they buy new operational systems, and they forget that integration or migration of the data will be required. One of the tools needed to make integration or migration easier is a data profiling and/or data quality tool. These tools streamline the joins of pertinent data between source systems as we analyze it for integrity and quality. Read More »

Post a Comment

"Real MDM" and the quest for long-term data quality improvement

database concept with bar chartsI'm frequently asked: "What causes poor data quality?" There are, of course, many culprits:

  • Lack of a data culture.
  • Poor management attitude.
  • Insufficient training.
  • Incorrect reward structure.

But there is one reason that is common to all organizations – poor data architecture.

Read More »

Post a Comment

Self-service data prep versus data quality

man using laptop under treeMany data quality issues are a result of the distance separating data from the real-world object or entity it attempts to describe. This is the case with master data, which describes parties, products, locations and assets. Customer (one of the roles within party) master data quality issues are rife with examples, especially within the data quality dimension currency (i.e., whether data is current with the real world it models). The current postal addresses, email addresses, phone numbers and preferred contact method for customers often change faster than updates can be applied – or the need for updates is even detected. Often the only way a company discovers its customer contact data is out of date is by failing in attempts to contact customers using the data currently on file.

Read More »

Post a Comment

SAS Event Stream Processing with Hortonworks – The future is now

satisfied man at computerEven the most casual observers of the IT space over the last few years are bound to have heard about Hadoop and the advantages it brings. Consider its ability to store data in virtually any format and process it in parallel. Hadoop distributors, such as Hortonworks, can also provide enterprise-level data governance and security. Hadoop certainly has been a game changer.

These days, organizations are not only seeking to gain insights on data that wasn’t available to them in the past (think unstructured data). Now they’re looking to do it in a manner and speed not possible before (think streaming data). As the Hadoop space matures with continual innovations, organizations are sure to uncover even more opportunities for gleaning insights from their data. Read More »

Post a Comment

Five data quality lessons from Amazon

woman at laptop in warehouseAbout a year ago on this site, I penned a post titled "Analytics lessons from Amazon." In it, I described the analytics lessons that employees and even entire companies can learn from the retail giant.

But there's so much more that Jeff Bezos et. al can teach us. Today, I'll focus on the data quality lessons we can glean from the largest Internet retailer.

Read More »

Post a Comment

Big data, data standards and cross-platform integration

abstract big dataAt a recent TDWI conference, I was strolling the exhibition floor when I noticed an interesting phenomenon. A surprising percentage of the exhibiting vendors fell into one of two product categories. One group was selling cloud-based or hosted data warehousing and/or analytics services. The other group was selling data integration products.

Of course, when you think about it, this makes a lot of sense. The economics of cloud computing has shown benefits when using software-as-a-service products like Salesforce.com. Clearly, this paradigm significantly reduces the costs of developing and managing big data projects using tools like Hadoop without having to pop for purchasing the necessary hardware. But as data moves off-premise, it does not obviate the need for internal data accessibility for in-house reporting. That means being able to integrate data wherever the data lives.

Read More »

Post a Comment

Should easy access to data change the data strategy?

data access via tabletIn the past, we've always protected our data to create an integrated environment for reporting and analytics. And we tried to protect people from themselves when using and accessing data, which sometimes could have been considered a bottleneck in the process. We instituted guidelines and procedures around:

  • Certification of the data for enterprise reports.
  • Specific areas in the data warehouse where data could be accessed.
  • Security on the layers to protect the data.
  • Quality of the data (when possible).
  • Ensured integrity of the data.

With the new order of data – and with enterprises using technologies like Hadoop – where does the data strategy change? Or does it?

The quest for good quality, integrated data has not gone away. It's just that if we do not thoughtfully load our new technologies with data, we could end up back in the same boat we were in at the beginning of our BI endeavors. With unrelated composted data that may not relate to other data very elegantly.

In the insurance industry, for example, most companies have multiple claim engines. These engines do NOT speak the same language, nor does the data map well between the engines. Integration is of utmost importance to create a complete view of the claim data. If we load this data without giving thought to how it will be used, data could be reported incorrectly. Claims are money, and money has to be correct in our corporation.  Hence, thoughtful access to the data via our new technologies. Consider loading the new technology from a trusted and integrated source (i.e., data warehouse or operational data store). Then allow the new users access to the new store of data. It's still important to implement procedures and guidelines surrounding the reporting of this data to internal and external users.

Post a Comment

Data governance is nothing without data quality

woman showing chartWhen you spend long enough writing and working in any industry, you inevitably see trends emerge and reach varying levels of maturity. Data governance is one such trend, as you can see from the following Google Trends chart:

Read More »

Post a Comment

Top 5 data quality mistakes organizations make

two people looking at data quality mistakesThere's no shortage of talk today about newfangled tools, technologies and concepts. The Internet of Things, big data, cloud computing, Hadoop, and countless other new terms, apps and trends have inundated many business folks over the last few years.

Against this often confusing backdrop, it's easy to forget the importance of basic blocking and tackling. Yes, I'm talking about good old-fashioned data quality, something that still vexes many departments, groups, organizations and industries. Without further ado, here are the five biggest data-quality mistakes that organizations routinely make.

Read More »

Post a Comment