We had just completed a four-week data quality assessment of an inside plant installation. It wasn't looking good. There were huge gaps in the data, particularly when we cross-referenced systems together. In theory, each system was meant to hold identical information of the plant equipment. But when we consolidated the
Tag: data quality
In erster Linie wird der Begriff Datenqualität mit Kunden- und Adressinformationen in Zusammenhang gebracht. Neben der Dublettensuche und Bereinigung von Adressdatenbeständen ist die Qualität der Produktstammdaten aber ebenfalls äußert wichtig, um automatisierte Prozessabläufe zu verbessern oder beispielsweise die Trefferquote bei Suchanfragen im Onlineshop zu erhöhen.
In my last post we started to look at two different Internet of Things (IoT) paradigms. The first only involved streaming automatically generated data from machines (such as sensor data). The second combined human-generated and machine-generated data, such as social media updates that are automatically augmented with geo-tag data by
The concept of the internet of things (IoT) is used broadly to cover any organization of communication devices and methods, messages streaming from the device pool, data collected at a centralized point, and analysis used to exploit the combined data for business value. But this description hides the richness of
Anwender in Risiko- oder Controlling-Abteilungen haben – in aller Regel – keine tiefer gehenden Kenntnisse in Abfragen von Datenbanken. Excel ist die Welt, in der sie zu Hause sind und sich wohlfühlen. Komplexe Datenbankfragen, wenn etwa Zusammenhänge zwischen Datenbanktabellen identifiziert werden sollen, führt die IT-Abteilung durch und stellt die Ergebnisse
Was zeichnet ein erfolgreiches Unternehmen aus? Entscheidender Indikator für den Erfolg ist der Umsatz und der daraus resultierende Gewinn des laufenden Geschäftsjahres. Was auf der einen Seite hart erarbeitet wird, geht allerdings auf der anderen Seite oft leichtfertig verloren. So büßen viele Unternehmen laut Analystenstudien etwa acht Prozent ihres Gewinns
Throughout my long career of building and implementing data quality processes, I've consistently been told that data quality could not be implemented within data sources, because doing so would disrupt production systems. Therefore, source data was often copied to a central location – a staging area – where it was cleansed, transformed, unduplicated, restructured
In my first blog article I explained that many insurance companies have implemented a standard data model as base for their business analytics data warehouse (DWH) solutions. But why should a standard data model be more appropriate than an individual one designed especially for a certain insurance company?
A soccer fairy tale Imagine it's Soccer Saturday. You've got 10 kids and 10 loads of laundry – along with buried soccer jerseys – that you need to clean before the games begin. Oh, and you have two hours to do this. Fear not! You are a member of an advanced HOA
While it’s obvious that chickens hatch from eggs that were laid by other chickens, what’s less obvious is which came first – the chicken or the egg? This classic conundrum has long puzzled non-scientists and scientists alike. There are almost as many people on Team Chicken as there are on Team
.@philsimon on the specific risks to data quality posed by cloud computing.
Does it upset you when you log onto your healthcare insurance portal and find that they spelled your name wrong, have your dependents listed incorrectly or your address is not correct? Well, it's definitely not a warm fuzzy feeling for me! After working for many years in the healthcare, pharmaceutical and
I'm frequently asked: "What causes poor data quality?" There are, of course, many culprits: Lack of a data culture. Poor management attitude. Insufficient training. Incorrect reward structure. But there is one reason that is common to all organizations – poor data architecture.
Many data quality issues are a result of the distance separating data from the real-world object or entity it attempts to describe. This is the case with master data, which describes parties, products, locations and assets. Customer (one of the roles within party) master data quality issues are rife with examples, especially
@philsimon on what we can learn about data quality from Jeff Bezos's behemoth.
At a recent TDWI conference, I was strolling the exhibition floor when I noticed an interesting phenomenon. A surprising percentage of the exhibiting vendors fell into one of two product categories. One group was selling cloud-based or hosted data warehousing and/or analytics services. The other group was selling data integration products. Of
When you spend long enough writing and working in any industry, you inevitably see trends emerge and reach varying levels of maturity. Data governance is one such trend, as you can see from the following Google Trends chart:
.@philsimon lists the gravest data-quality errors.
I've been doing some investigation into Apache Spark, and I'm particularly intrigued by the concept of the resilient distributed dataset, or RDD. According to the Apache Spark website, an RDD is “a fault-tolerant collection of elements that can be operated on in parallel.” Two aspects of the RDD are particularly
Data quality has always been relative and variable, meaning data quality is relative to a particular business use and can vary by user. Data of sufficient quality for one business use may be insufficient for other business uses, and data considered good by one user may be considered bad by others.
I recently presented a webinar (via the IAIDQ) on the topic of 7 Habits of Effective Data Quality Leaders. To prepare, I looked back at the many interviews of leading data quality practitioners I had undertaken over the years. A common trait among all these interviews stood out – they
As I explained in Part 1 of this series, creating a strategy for the data in an organization is not a straightforward task. Two of the most important issues you'll want to address in your data strategy are data quality and big data. Data quality There can be no data that is
"I skate to where the puck is going to be, not where it has been." - Wayne Gretzky I love this quote from Wayne Gretzky. It sums up how most organizations approach data strategy. Data strategy typically starts with a strategic plan laid down by the board. The CEO will
When my band first started and was in need of a sound system, we bought a pair of cheap yet indestructible Peavey speakers, some Radio Shack microphones and a power mixer. The result? We sounded awful and often split our ear drums from high-pitched feedback and raw, untrained vocals. It took us years
In this two-part series, which posts as the calendar turns to a new year, I revisit the top data management topics of 2015 (Part 1) and then try to predict a few of the data management trends of 2016 (Part 2). Data management in 2016 The Internet of Things (IoT) made significant
In this two-part series, which posts as the calendar prepares to turn 2015 into 2016, I revisit the top data management topics of 2015 (Part 1) and then try to predict a few of the data management trends of 2016 (Part 2). Data management in 2015 Big data continued to make
Most people have logged on to a social media site, maybe to look up an old friend, acquaintance or family member. Some people play games, or post funny pictures or other information they want to share with everyone. Do you ever ask yourself what happens with this information? What if your business wanted to purchase this information and
In 2014, big data was on everyone’s mind. So in 2015, I expected to see data quality initiatives make a major shift toward big data. But I was surprised by a completely new requirement for data quality, which proves that the world is not all about big data – not
Sometimes when trying to fuzzy match names you want to fuzzy match just a portion of the name: for example, Family Name and/or Given Name. A common mistake that people make is to feed in the Family Name and Given Name columns separately into the Match Codes node instead of