Today, I was in a conversation about using Hadoop (a big data platform) for master data management (MDM). I still find it amazing when we have the discussion of what systems feed another system. Many of our friends have spent years creating MDM for customer, product, etc. with success. I'm a
Author
How many companies are using Hadoop as part of their master data management initiative? Come on, raise your hands! Well, maybe a better question is this: How many companies are using Hadoop for enterprise data? From what I have seen, Hadoop is coming along quite nicely. However, it may not be the
As I explained in Part 1 of this series, spelling my name wrong does bother me! However, life changes quickly at health insurance, healthcare and pharmaceutical companies. That said, taking unintegrated or cleansed data and propagating it to Hadoop may only help one issue. That would be the issue of getting the data
Does it upset you when you log onto your healthcare insurance portal and find that they spelled your name wrong, have your dependents listed incorrectly or your address is not correct? Well, it's definitely not a warm fuzzy feeling for me! After working for many years in the healthcare, pharmaceutical and
In the past, we've always protected our data to create an integrated environment for reporting and analytics. And we tried to protect people from themselves when using and accessing data, which sometimes could have been considered a bottleneck in the process. We instituted guidelines and procedures around: Certification of the data
As I explained in Part 1 of this series, creating a strategy for the data in an organization is not a straightforward task. Two of the most important issues you'll want to address in your data strategy are data quality and big data. Data quality There can be no data that is
Creating a strategy for the data in an organization is not a straightforward task. Not only does our business change – our software solutions also change before we can ever get done with a data strategy. So, I choose to understand that a strategy has a vision, and my vision may change
While setting up meetings with business consumers developing a data warehouse environment, I was involved in some very interesting conversations. Following are some of the assumptions that were made during these conversations, as well as a few observations. To get a well-rounded view of this topic, read my earlier post that focuses on the IT perspective.
The other day I was in a meeting with a client and there was an argument about who owns the data. Those arguing were IT people. In this scenario, the assumption was that data from source systems would flow into and integrate with a data warehouse. I found the discussion very interesting. Here are some of the
How many times have you gone onto a website, put a few things in a shopping cart, and then exited the Internet? I do it all the time. Sometimes when I log on to that site during my next visit, those same items are still in my cart – ready for purchase. I find
Most people have logged on to a social media site, maybe to look up an old friend, acquaintance or family member. Some people play games, or post funny pictures or other information they want to share with everyone. Do you ever ask yourself what happens with this information? What if your business wanted to purchase this information and
Twenty-five years ago (when I was 12 years old), we realized that data, across the corporation, was not integrated. Nor did our data let us predict the future by looking at the past. So we started creating these stores of historical data soon to be called “data warehouse.” Here are
The other day I was chatting with an ETL developer and he said he always pushes queries into the database instead of dragging data across the network. I thought “Hmm, I remember talking about those topics when I was a DBA.” I'd like to share those thoughts with you now.
I feel like I'm singing a song called Data in the Sky – With Options! The cloud is forever in our minds these days as a lower cost option because it requires fewer resources to address our data needs. Cloud solutions are an increasing part of many organizations' budgets every year. Whether enterprise data is
I don’t know about you, but I'm asked every day where some type of data lives in our enterprise. I keep thinking that we have not done a good job of helping people learn to help themselves! A few things I have learned about corporate data assets are: The data
There are many ways to do data integration. Those include: Extract, transform and load (ETL) – which moves and transforms data (with some redundancy) from a source to a target. While ETL can be implemented (somewhat) in real time, it is usually executed at intervals (15 minutes, 30 minutes, 1
Data integration, on any project, can be very complex – and it requires a tremendous amount of detail. The person I would pick for my data integration team would have the following skills and characteristics: Has an enterprise perspective of data integration, data quality and extraction, transformation and load (ETL): Understands
Guess what? Data governance can be considered a bottleneck and a bothersome activity at some organizations. So let’s discuss how NOT TO BE the BOTTLENECK. Defining what the data governance initiative will entail is very important here.
Determining the life cycle of event stream data requires us to first understand our business and how fast it changes. If event data is analyzed, it makes sense that the results of that analysis would feed another process. For example, a customer relationship management (CRM) system or campaign management system like
I believe most people become overwhelmed when considering the data that can be created during event processing. Number one, it is A LOT of data – and number two, the data needs real-time analysis. For the past few years, most of us have been analyzing data after we collected it,
(Otherwise known as Truncate – Load – Analyze – Repeat!) After you’ve prepared data for analysis and then analyzed it, how do you complete this process again? And again? And again? Most analytical applications are created to truncate the prior data, load new data for analysis, analyze it and repeat
In the last post, we talked about creating the requirements for the data analytics, and profiling the data prior to load. Now, let’s consider how to filter, format and deliver that data to the analytics application. Filter – the act of selecting the data of interest to be used in the
What data do you prepare to analysis? Where does that data come from in the enterprise? Hopefully, by answering these questions, we can understand what is required to supply data for an analytics process. Data preparation is the act of cleansing (or not) the data required to meet the business
The other day, I was looking at an enterprise architecture diagram, and it actually showed a connection between the marketing database, the Hadoop server and the data warehouse. My response can be summed up in two ways. First, I was amazed! Second, I was very interested on how this customer uses
If you are looking for a way to fund your data quality objectives, consider looking in the closets of the organization. For example, look for issues that cost the company money that could have been avoided by better availability of data, better quality of the data or reliability of the
Once in a while, people run into an issue with the data that doesn't really need to be fixed right to ensure success of a specific project. So, the data issues are put into production and forgotten. Everyone always says, “We will go back and correct this later.” But that
There are companies that have no data quality initiative, and truly do believe that if they see no data problem. In effect, they say that if it does not interfere with day-to-day business, then there is no data quality problem. From what I have seen in my consulting experience, it usually
The last three parts of our conversion blog (see all of the posts here) go hand-in-hand and require the most time on the project plan. Development - During development of the conversion routines, you may want to consider using error handling standards based on corporate standards. This is where data
I have a rule – any conversion or upgrade will require the creation of a decommission plan. A decommission plan should include the following: A list and definition of each database, table and column (source and target). A list and definition of each of the current programs in use (you
The physical data model should represent exactly the way the tables and columns are designed in the in the database management system. I recommend keeping storage, partitioning, indexing and other physical characteristics in the data model if at all possible. This will make upkeep and comparison with the development, test