The rise of self-service analytics, and the idea of the ‘citizen data scientist’, has also brought a number of issues to the fore in organizations. In particular, two common areas of discussion are the twin pillars of data quality and data preparation. There is no doubt that good quality, well-prepared
Tag: data preparation
„Die IT liefert nicht, der Fachbereich weiß nicht, was er heute oder morgen an Daten haben will“… Beide haben recht, ein Dilemma, das darin endet, dass Selbsthilfe betrieben wird. Der Informationshunger besteht weiterhin, und was nicht geliefert wird, besorgt man sich auf anderem Wege. Da wären: die SAP-Maske, Excel, Datenbank(en),
One aspect of high-quality information is consistency. We often think about consistency in terms of consistent values. A large portion of the effort expended on “data quality dimensions” essentially focuses on data value consistency. For example, when we describe accuracy, what we often mean is consistency with a defined source
It's that time of year again where almost 50 million Americans travel home for Thanksgiving. We'll share a smorgasbord of turkey, stuffing and vegetables and discuss fun political topics, all to celebrate the ironic friendship between colonists and Native Americans. Being part Italian, my family augments the 20-pound turkey with pasta –
.@philsimon says don't treat data self-service as a binary.
Most enterprises employ multiple analytical models in their business intelligence applications and decision-making processes. These analytical models include descriptive analytics that help the organization understand what has happened and what is happening now, predictive analytics that determine the probability of what will happen next, and prescriptive analytics that focus on
.@philsimon on the need to adopt agile methodologies for data prep and analytics.
In Part 1 of this two-part series, I defined data preparation and data wrangling, then raised some questions about requirements gathering in a governed environment (i.e., ODS and/or data warehouse). Now – all of us very-managed people are looking at the horizon, and we see the data lake. How do
Lately I've been binge-watching a lot of police procedural television shows. The standard format for almost every episode is the same. It starts with the commission or discovery of a crime, followed by forensic investigation of the crime scene, analysis of the collected evidence, and interviews or interrogations with potential suspects. It ends
.@philsimon chimes in on new data-gathering methods and what they mean for analytics.
I'm a very fortunate woman. I have the privilege of working with some of the brightest people in the industry. But when it comes to data, everyone takes sides. Do you “govern” the use of all data, or do you let the analysts do what they want with the data to
Critical business applications depend on the enterprise creating and maintaining high-quality data. So, whenever new data is received – especially from a new source – it’s great when that source can provide data without defects or other data quality issues. The recent rise in self-service data preparation options has definitely improved the quality of
Hadoop has driven an enormous amount of data analytics activity lately. And this poses a problem for many practitioners coming from the traditional relational database management system (RDBMS) world. Hadoop is well known for having lots of variety in the structure of data it stores and processes. But it's fair to
In my last post, I talked about how data still needs to be cleaned up – and data strategy still needs to be re-evaluated – as we start to work with nontraditional databases and other new technologies. There are lots of ways to use these new platforms (like Hadoop). For example, many
I'm hard-pressed to think of a trendier yet more amorphous term today than analytics. It seems that every organization wants to take advantage of analytics, but few really are doing that – at least to the extent possible. This topic interests me quite a bit, and I hope to explore
What does it really mean when we talk about the concept of a data asset? For the purposes of this discussion, let's say that a data asset is a manifestation of information that can be monetized. In my last post we explored how bringing many data artifacts together in a
If your enterprise is working with Hadoop, MongoDB or other nontraditional databases, then you need to evaluate your data strategy. A data strategy must adapt to current data trends based on business requirements. So am I still the clean-up woman? The answer is YES! I still work on the quality of the data.
Data access and data privacy are often fundamentally at odds with each other. Organizations want unfettered access to the data describing customers. Meanwhile, customers want their data – especially their personally identifiable information – to remain as private as possible. Organizations need to protect data privacy by only granting data access to authorized
A long time ago, I worked for a company that had positioned itself as basically a third-party “data trust” to perform collaborative analytics. The business proposition was to engage different types of organizations whose customer bases overlapped, ingest their data sets, and perform a number of analyses using the accumulated
Analytics, statistics, operations research, data science and machine learning - with which term do you prefer associate? Are you from the House of Capulet or Montague, or do you even care? Shakespeare's Juliet derides excess identification with names in the famous play, Romeo and Juliet. "What's in a name? That which we call
A soccer fairy tale Imagine it's Soccer Saturday. You've got 10 kids and 10 loads of laundry – along with buried soccer jerseys – that you need to clean before the games begin. Oh, and you have two hours to do this. Fear not! You are a member of an advanced HOA
When my band first started and was in need of a sound system, we bought a pair of cheap yet indestructible Peavey speakers, some Radio Shack microphones and a power mixer. The result? We sounded awful and often split our ear drums from high-pitched feedback and raw, untrained vocals. It took us years
In two previous posts (Part 1 and Part 2), I explored some of the challenges of managing data beyond enterprise boundaries. These posts focused on issues around managing and governing extra-enterprise data. Let’s focus a bit on one specific challenge now – satisfying the need for business users to rapidly ingest new data sources. Sophisticated business
As a youngster in the 70s and 80s, Star Trek inspired my imagination and fostered a great love for science, technology and reading. (See the embedded Star Trek infographic for some interesting factoids – did you know that there were 28 crew member deaths by those wearing red shirts?) Captain Kirk and the
Now that another summer of 12-hour family road-trips to Maine and Ohio, pricey engineering and basketball camps for the kids, and beating the heat at the beach are over, I've taken a fresh look at what people are focused on with their data – and what SAS is providing in the data management space.
In April, the free trial of SAS Data Loader for Hadoop became available globally. Now, you can take a test drive of our new technology designed to increase the speed and ease of managing data within Hadoop. The downloads might take a while (after all, this is big data), but I think you’ll
We are in the age of big data. But just because modern software makes it easy to handle large data volumes, it's worth asking: do you always need all that data? In other words, if your analytics software can accommodate and even thrive in this big data environment, does that mean
As the “Year of Statistics” comes to a close, I write this blog in support of the many statisticians who carefully fulfil their analysis tasks day by day, and to defend what may appear to be demanding behavior when it comes to data requirements. How do statisticians get this reputation? Are we