The physical data model should represent exactly the way the tables and columns are designed in the in the database management system. I recommend keeping storage, partitioning, indexing and other physical characteristics in the data model if at all possible. This will make upkeep and comparison with the development, test
Tag: data quality
We've explored data provenance and the importance of data lineage before on the Data Roundtable (see here). If you are working in a regulated sector such as banking, insurance or healthcare, it is especially important right now and one of the essential elements of data quality that they look for
I have a question --- do we need a logical data model for a conversion? Here are my thoughts. I believe the answer is yes if the conversion has any of the following characteristics: The target application is created in-house. This application will more than likely be enhanced in the
La grandeza de sus datos probablemente no es la característica más importante. De hecho, puede que ni siquiera figure dentro de los aspectos relevantes por los cuales usted debería preocuparse. La calidad, la integración de los silos, la manipulación y la extracción de valor de los datos no estructurados siguen
In my previous post I explained that even if your organization does not have anyone with data steward as their official job title, data stewardship plays a crucial role in data governance and data quality. Let’s assume that this has inspired you to formally make data steward an official job title. How
To perform a successful data conversion, you have to know a number of things. In this series, we have uncovered the following about our conversion: Scope of the conversion Infrastructure for the conversion Source of the conversion Target for the conversion Management for the conversion Testing and Quality Assurance for
Here on the Data Roundtable we've discussed many topics such as root-cause analysis, continual improvement and defect prevention. Every organization must focus on these disciplines to create long-term value from data quality improvement instead of some fleeting benefit. Nowhere is this more important than the need for an appropriate education strategy, both in
The bigness of your data is likely not its most important characteristic. In fact, it probably doesn’t even rank among the Top 3 most important data issues you have to deal with. Data quality, the integration of data silos, and handling and extracting value from unstructured data are still the most
There are multiple types of data models, and some companies choose to NOT data model purchased software applications. I view this a bit differently. I think that any purchased application is part of our enterprise, thus it is part of our enterprise data model (or that concept is part of the
When you examine where most data quality defects arise from, you soon realise that your source applications are a prime culprit. You can argue that the sales team always enter incomplete address details, or the surgeons can't remember the correct patient type codes but in my experience the majority of
Data. Our industry really loves that word, making it seem like the whole world revolves around it. We certainly enjoy revolving a lot of words around it. We put words like master, big, and meta before it, and words like management, quality, and governance after it. This spins out disciplines
Don't be shy! Interviewing people BEFORE or AFTER a facilitated session just takes a bit of confidence, and good preparation. Building your confidence gets easier and easier the more you participate in interviews. The objective is to prepare and not waste anyone’s valuable time. I like to prepare notes based on
Many managers still perceive data quality projects to be a technical endeavour. Data being the domain of IT and therefore an initiative that can be mapped out on a traditional project plan with well-defined exit criteria and a clear statement of requirements. I used to believe this myth too. Coming
.@philsimon on the proliferation of "as a service" terms.
A few weeks back I noted that one of the objectives on an inventory process for reference data was data harmonization, which meant determining when two reference sets refer to the same conceptual domain and harmonizing the contents into a conformed standard domain. Conceptually it sounds relatively straightforward, but as
My previous post pondered the term disestimation, coined by Charles Seife in his book Proofiness: How You’re Being Fooled by the Numbers to warn us about understating or ignoring the uncertainties surrounding a number, mistaking it for a fact instead of the error-prone estimate that it really is. Sometimes this fact appears to
In my previous post Sisyphus didn’t need a fitness tracker, I recommended that you only collect, measure and analyze big data if it helps you make a better decision or change your actions. Unfortunately, it’s difficult to know ahead of time which data will meet that criteria. We often, therefore, collect, measure and analyze
In his pithy style, Seth Godin’s recent blog post Analytics without action said more in 32 words than most posts say in 320 words or most white papers say in 3200 words. (For those counting along, my opening sentence alone used 32 words). Godin’s blog post, in its entirety, stated: “Don’t measure
A lot of data quality projects kick off in the quest for root-cause discovery. Sometimes they’ll get lucky and find a coding error or some data entry ‘finger flubs’ that are the culprit. Of course, data quality tools can help a great deal in speeding up this process by automating
My previous post explained how confirmation bias can prevent you from behaving like the natural data scientist you like to imagine you are by driving your decision making toward data that confirms your existing beliefs. This post tells the story of another cognitive bias that works against data science. Consider the following scenario: Company-wide
What kind of security do we need for this conversion? In fact, where are the security people? Including security personnel upfront in any conversion project can sure save some time and heartache later. It is important to include security for the following: Source system access – You must be able
Nowadays we hear a lot about how important it is that we are data-driven in our decision-making. We also hear a lot of criticism aimed at those that are driven more by intuition than data. Like most things in life, however, there’s a big difference between theory and practice. It’s
Many people, myself included, occasionally complain about how noisy big data has made our world. While it is true that big data does broadcast more signal, not just more noise, we are not always able to tell the difference. Sometimes what sounds like meaningless background static is actually a big insight. Other times
Data-driven journalism has driven some of my recent posts. I blogged about turning anecdote into data and how being data-driven means being question-driven. The latter noted the similarity between interviewing people and interviewing data. In this post I want to examine interviewing people about data, especially the data used by people to drive
At the Journalism Interactive 2014 conference, Derek Willis spoke about interviewing data, his advice for becoming a data-driven journalist. “The bulk of the skills involved in interviewing people and interviewing data are actually pretty similar,” Willis explained. “We want to get to know it a little bit. We want to figure
In my previous post, I discussed sampling error (i.e., when a randomly chosen sample doesn’t reflect the underlying population, aka margin of error) and sampling bias (i.e., when the sample isn’t randomly chosen at all), both of which big data advocates often claim can, and should, be overcome by using all the data. In this
In his recent Financial Times article, Tim Harford explained the big data that interests many companies is what we might call found data – the digital exhaust from our web searches, our status updates on social networks, our credit card purchases and our mobile devices pinging the nearest cellular or WiFi network.
As an unabashed lover of data, I am thrilled to be living and working in our increasingly data-constructed world. One new type of data analysis eliciting strong emotional reactions these days is the sentiment analysis of the directly digitized feedback from customers provided via their online reviews, emails, voicemails, text messages and social networking
We sometimes describe the potential of big data analytics as letting the data tell its story, casting the data scientist as storyteller. While the journalist has long been a newscaster, in recent years the term data-driven journalism has been adopted to describe the process of using big data analytics to
Sometimes you have to get small to win big. SAS Data Management breaks solution capabilities into smaller chunks – and deploys services as needed – to help customers reduce their total cost of ownership. SAS Master Data Management (MDM) is also a pioneer in "phased MDM." It's built on top of a data