If I have one gripe about technology today, it's the seemingly universal belief that it manifests social and business issues for the first time. Sure, Twitter may be a relatively new advent and only recently have people been fired for tweeting really inappropriate jokes. Still, if you think that losing your job for publicly saying something objectionable is new, think again.
And the same holds true with respect to data quality. Yes, contemporary cloud computing arrived fairly recently, although its roots in grid computing go back decades. Make no mistake, though: the notion that duplicate, erroneous, invalid and incomplete information harms a business is hardly new. It predates today's rampant technology, big data and even the modern-day computer. Bum handwritten general-ledger entries caused problems centuries ago.
This begs the question: What are the data quality risks specific to cloud computing? In a nutshell, I see three.
Integration
In the traditional on-premise world, data is often integrated and stored through a very controlled series of periodic ETL batch jobs. For the most part, that data is internal to the enterprise, although exceptions certainly exist such as interfaces to banks, insurance carriers and the like. In an era of cloud computing, though, data flows much more frequently and quickly, often via APIs. That data is often external to the enterprise. Examples include feeds from Twitter, LinkedIn, Google Maps, etc.
With much less control over the data, organizations could certainly see "their" data quality take a hit. What's more, whatever company created the API can do with it what it wishes. Don't be surprised if data elements all of a sudden stop appearing in a feed – or now come with a price. Case in point: Twitter.
Opportunity for error
To be sure, definitions of cloud computing certainly vary, a point that I make in Too Big to Ignore. Suffice it to say that it enables people to get to their apps wherever and whenever they like. That is, they no longer have to schlep to the office to pay an invoice, check e-mail, cut an employee check or enter a lead in a CRM application.
Cloud computing introduces new types and sources of potential data quality errors. For one, it's easier to fat finger on keyboards on mobile devices than on proper QWERTY keyboards. And employees, vendors, partners and users typically enter data via different devices, apps and even different operating systems. Sure, thanks to massive advances in web services, organizations can overcome these challenges. But it's folly to think that all data today stems from the same place. This is not 1998.
Security
Contrary to what many people believe, there is no "one" cloud. Vendors offer different types of them based on an organization's budget, tenancy, security needs, and other business and technical requirements. Without going too far down the rabbit hole here, device portability increases enterprise risk. Few people have "lost" desktop computers because they stayed, you know, on desks. Laptop theft/loss took place, but I suspect that we misplace our tablets and smartphones more than our five-pound computing equivalents.
Feedback
What say you?
Download a paper to see how data management solves real-world challenges