
.@philsimon lists the gravest data-quality errors.
.@philsimon lists the gravest data-quality errors.
I've been doing some investigation into Apache Spark, and I'm particularly intrigued by the concept of the resilient distributed dataset, or RDD. According to the Apache Spark website, an RDD is “a fault-tolerant collection of elements that can be operated on in parallel.” Two aspects of the RDD are particularly
Data quality has always been relative and variable, meaning data quality is relative to a particular business use and can vary by user. Data of sufficient quality for one business use may be insufficient for other business uses, and data considered good by one user may be considered bad by others.
I recently presented a webinar (via the IAIDQ) on the topic of 7 Habits of Effective Data Quality Leaders. To prepare, I looked back at the many interviews of leading data quality practitioners I had undertaken over the years. A common trait among all these interviews stood out – they
As I explained in Part 1 of this series, creating a strategy for the data in an organization is not a straightforward task. Two of the most important issues you'll want to address in your data strategy are data quality and big data. Data quality There can be no data that is
"I skate to where the puck is going to be, not where it has been." - Wayne Gretzky I love this quote from Wayne Gretzky. It sums up how most organizations approach data strategy. Data strategy typically starts with a strategic plan laid down by the board. The CEO will
When my band first started and was in need of a sound system, we bought a pair of cheap yet indestructible Peavey speakers, some Radio Shack microphones and a power mixer. The result? We sounded awful and often split our ear drums from high-pitched feedback and raw, untrained vocals. It took us years
In this two-part series, which posts as the calendar turns to a new year, I revisit the top data management topics of 2015 (Part 1) and then try to predict a few of the data management trends of 2016 (Part 2). Data management in 2016 The Internet of Things (IoT) made significant
In this two-part series, which posts as the calendar prepares to turn 2015 into 2016, I revisit the top data management topics of 2015 (Part 1) and then try to predict a few of the data management trends of 2016 (Part 2). Data management in 2015 Big data continued to make
Most people have logged on to a social media site, maybe to look up an old friend, acquaintance or family member. Some people play games, or post funny pictures or other information they want to share with everyone. Do you ever ask yourself what happens with this information? What if your business wanted to purchase this information and
In 2014, big data was on everyone’s mind. So in 2015, I expected to see data quality initiatives make a major shift toward big data. But I was surprised by a completely new requirement for data quality, which proves that the world is not all about big data – not
Sometimes when trying to fuzzy match names you want to fuzzy match just a portion of the name: for example, Family Name and/or Given Name. A common mistake that people make is to feed in the Family Name and Given Name columns separately into the Match Codes node instead of
Confusion is one of the big challenges companies experience when defining the data governance function – particularly among the technical community. I recently came across a profile on LinkedIn for a senior data governance practitioner at an insurance firm. His profile typified this challenge. He cited his duties as: Responsible for the collection
Time. It flies. It does so whether or not you’re having fun or otherwise putting it to good use. To know where it flies, you’d need to watch. But most of us can’t make the time to watch. How we use time is important since it’s the one resource we
To prepare for the data challenges of 2015 and beyond, health care fraud, waste and abuse investigative units (government funded and commercial insurance plans, alike) need a data management infrastructure that provides access to data across programs, products and channels. This goes well beyond sorting and filtering small sets of
Jim Harris explains why it's especially important to assess the quality of metadata when it comes to big data.
Jim Harris discusses perspectives on the question of how much quality big data really needs.
Jim Harris addresses some of the most common questions and challenges big data poses for data quality.
.@philsimon on bridging the IT-business divide once and for all.
As a youngster in the 70s and 80s, Star Trek inspired my imagination and fostered a great love for science, technology and reading. (See the embedded Star Trek infographic for some interesting factoids – did you know that there were 28 crew member deaths by those wearing red shirts?) Captain Kirk and the
Data integration, on any project, can be very complex – and it requires a tremendous amount of detail. The person I would pick for my data integration team would have the following skills and characteristics: Has an enterprise perspective of data integration, data quality and extraction, transformation and load (ETL): Understands
Integrating big data into existing data management processes and programs has become something of a siren call for organizations on the odyssey to become 21st century data-driven enterprises. To help save some lost time, this post offers a few tips for successful big data integration.
There is a time and a place for everything, but the time and place for data quality (DQ) in data integration (DI) efforts always seems like a thing everyone’s not quite sure about. I have previously blogged about the dangers of waiting until the middle of DI to consider, or become forced
“Garbage in, garbage out” is more than a catchphrase – it’s the unfortunate reality in many analytics initiatives. For most analytical applications, the biggest problem lies not in the predictive modeling, but in gathering and preparing data for analysis. When the analytics seems to be underperforming, the problem almost invariably
Bigger doesn’t always mean better. And that’s often the case with big data. Your data quality (DQ) problem – no denial, please – often only magnifies when you get bigger data sets. Having more unstructured data adds another level of complexity. The need for data quality on Hadoop is shown by user
.@philsimon on whether companies should apply some radical tactics to DG.
If your organization is large enough, it probably has multiple data-related initiatives going on at any given time. Perhaps a new data warehouse is planned, an ERP upgrade is imminent or a data quality project is underway. Whatever the initiative, it may raise questions around data governance – closely followed by discussions about the
In recent years, we practitioners in the data management world have been pretty quick to conflate “data governance” with “data quality” and “metadata.” Many tools marketed under "data governance" have emerged – yet when you inspect their capabilities, you see that in many ways these tools largely encompass data validation and data standardization. Unfortunately, we
After doing some recent research with IDC®, I got to thinking again about the reasons that organizations of all sizes in all industries are so slow at adopting analytics as part of their ‘business as usual’ operations. While I have no hard statistics on who is and who isn’t adopting