Blend, cleanse and prepare data for analytics, reporting or data modernization efforts

.@philsmion says that even the "best governed" organization today isn't safe from inquiring minds.
Blend, cleanse and prepare data for analytics, reporting or data modernization efforts
.@philsmion says that even the "best governed" organization today isn't safe from inquiring minds.
Data integration teams often find themselves in the middle of discussions where the quality of their data outputs are called into question. Without proper governance procedures in place, though, it's hard to address these accusations in a reasonable way. Here's why.
GDPR, or the European General Data Protection Regulation, will be upon us in just 15 months’ time. Companies not just in Europe but around the world are preparing for it, because it affects any personal data held about any European customer, no matter where a company is based. But how
Data governance must encompass management of the full life cycle of a data policy – its definition, approval, implementation and the means of ensuring its observance - David Loshin, Data Policies and Data Governance I was checking out my Google stats on Data Quality Pro recently and observed that "How
Data governance has been the topic of many of the recent posts here on the Data Roundtable. And rightfully so, since data governance plays such an integral role in the success of many enterprise information initiatives – such as data quality, master data management and analytics. These posts can help you prepare for discussing
Machine learning is taking a significant role in many big data initiatives today. Large retailers and consumer packaged goods (CPG) companies are using machine learning combined with predictive analytics to help them enhance consumer engagement and create more accurate demand forecasts as they expand into new sales channels like the
Real world data collected in a functioning health care setting instead of a controlled clinical environment can provide opportunities for new and deeper insights across life science and health care organizations. However, managing, analyzing and extracting actionable information from the varied available sources can present unique challenges. The sheer size of these
Lately, the definitions of data governance and data management look very much alike. In this two-part series, we'll define data governance and data management. And we'll see that there's a big difference in the two.
.@philsimon asks, Rather than trying to tackle a new form of governance, wouldn't your organization do better to shore up its existing data-governance practices?
Start with the end in mind -- wise words that apply to everything, and in the world of big data it means we have to change the way we look at managing the data we have. There was a time when we managed data quality, and the main goal was
We've witnessed a significant rise in data governance adoption in recent years. Careers, technology, education, frameworks, practitioners – there's growth in all aspects of the discipline. Regulatory compliance across many sectors is a typical driver for data governance. But I also believe one of the main reasons is the realisation by
Just in time for the Strata + Hadoop World Conference, SAS became the first software vendor to achieve ODPi Interoperability with our Base SAS® and SAS/ACCESS® Interface to Hadoop products. Now, that's a lot to digest – so let me back up a second and give some background as to what this
What if you could predict with near-perfect accuracy what you’re going to sell and when your customer is going to buy? Right supply, right time is the goal German manufacturers have set themselves, without reducing the configuration options customers expect. Having almost completed stage 1 of their plan – changing
In my last post, we explored the operational facet of data governance and data stewardship. We focused on the challenges of providing a scalable way to assess incoming data sources, identify data quality rules and define enforceable data quality policies. As the number of acquired data sources increases, it becomes
As I've previously written, data analytics historically analyzed data after it stopped moving and was stored, often in a data warehouse. But in the era of big data, data needs to be continuously analyzed while it’s still in motion – that is, while it’s streaming. This allows for capturing the real-time value of data
.@philsimon on the need to adopt agile methodologies for data prep and analytics.
In Part 1 of this two-part series, I defined data preparation and data wrangling, then raised some questions about requirements gathering in a governed environment (i.e., ODS and/or data warehouse). Now – all of us very-managed people are looking at the horizon, and we see the data lake. How do
Data governance can encompass a wide spectrum of practices, many of which are focused on the development, documentation, approval and deployment of policies associated with data management and utilization. I distinguish the facet of “operational” data governance from the fully encompassed practice to specifically focus on the operational tasks for
Lately I've been binge-watching a lot of police procedural television shows. The standard format for almost every episode is the same. It starts with the commission or discovery of a crime, followed by forensic investigation of the crime scene, analysis of the collected evidence, and interviews or interrogations with potential suspects. It ends
.@philsimon chimes in on new data-gathering methods and what they mean for analytics.
I'm a very fortunate woman. I have the privilege of working with some of the brightest people in the industry. But when it comes to data, everyone takes sides. Do you “govern” the use of all data, or do you let the analysts do what they want with the data to
Since the idea of an “IoT analytical lifecycle,” may be understood in many different ways, let’s start with a definition. Performing analytics at the data center and the cloud is well established practice, and is still quite relevant. With growing numbers of connected devices and availability of computing capabilities at
.@philsimon on the downside of the Band-Aid approach.
Critical business applications depend on the enterprise creating and maintaining high-quality data. So, whenever new data is received – especially from a new source – it’s great when that source can provide data without defects or other data quality issues. The recent rise in self-service data preparation options has definitely improved the quality of
It’s nearly impossible to avoid the debate. From politicians and pundit commentary, to dinner table discussions across the United States, the hot topic for the last several years has been the rising cost of health care. Consider that health care expenditures in the US were $3 trillion in 2014 and are
Have you ever had problems matching data that has typographical errors in it? Because of the nature of arbitrary typos and incorrect spelled words a specific matching technique is required to tackle those cases. SAS Data Quality, with its traditional, in nature deterministic matching approach is by nature not best
Hadoop has driven an enormous amount of data analytics activity lately. And this poses a problem for many practitioners coming from the traditional relational database management system (RDBMS) world. Hadoop is well known for having lots of variety in the structure of data it stores and processes. But it's fair to
Some organizations I visit don’t seem to have changed their analytics technology environment much since the early days of IT. I often encounter companies with 70s-era base statistical packages running on mainframes or large servers, data warehouses (originated in the 80s), and lots of reporting applications. These tools usually continue
Two years ago, I found myself the proud, first-time owner of a garage. My wife and I quickly started to add new items to the garage – a battery-powered lawn mower, two beach cruisers and four Tommy Bahama beach chairs. They were stored with ease. What a fantastic world I'd been missing out on. But it wasn't long before we outstripped our
.@philsimon continues his series on data prep and anlytics.