Blend, cleanse and prepare data for analytics, reporting or data modernization efforts
Jim Harris explains why it's especially important to assess the quality of metadata when it comes to big data.
Blend, cleanse and prepare data for analytics, reporting or data modernization efforts
Jim Harris explains why it's especially important to assess the quality of metadata when it comes to big data.
La integración del Big Data en los procesos y programas de gestión de datos existentes se ha convertido en algo así como una llamada de alerta para las organizaciones en su afán de convertirse en empresas del siglo XXI. Jim Harris, blogger y obsesivo-compulsivo del Data Quality, nos da estos
With Big Data, there are far more technical questions than answers.
No one knows for sure who coined the term Big Data. Despite etymological studies, we are still no closer to attributing provenance to any one person, or indeed any one period. Some say the term was coined in the '80s, others believe the '90s – and many are convinced the term originated
In today’s brave new technological world, most of us live cocooned in thermo-regulated cars and buildings. Our food and drink is on tap, and we experience little inconvenience beyond death and taxes. It’s easy to forget that life as we know it would come to an abrupt halt without the
Jim Harris discusses perspectives on the question of how much quality big data really needs.
Sizing is a topic that solutions managers typically leave until the end after decisions about the application have been settled. But there are often many variables that can impact the final size requirement. We have seen across our customer base that sizing and the number of environments has been determined
There are many ways to do data integration. Those include: Extract, transform and load (ETL) – which moves and transforms data (with some redundancy) from a source to a target. While ETL can be implemented (somewhat) in real time, it is usually executed at intervals (15 minutes, 30 minutes, 1
Jim Harris addresses some of the most common questions and challenges big data poses for data quality.
.@philsimon on bridging the IT-business divide once and for all.
If I were to believe the feedback I get, statisticians are among the most difficult people to work with. What’s more, they’re the only group that should be allowed to work in data analytics. It sounds harsh, but this may explain why big data projects continually fail. Businesses need statisticians who are both
As a youngster in the 70s and 80s, Star Trek inspired my imagination and fostered a great love for science, technology and reading. (See the embedded Star Trek infographic for some interesting factoids – did you know that there were 28 crew member deaths by those wearing red shirts?) Captain Kirk and the
Why are so many companies across a diverse set of industries investing in and around the Internet of Things? Everywhere I go, every blog I read … I sound like my favorite band from the 80s: the Internet of Things is watching me. In reality, it’s the reverse: I'm seeing
Healthcare IT News recently published an article on 18 health technologies poised for big growth, a list culled from a HIMSS database. The database is used to track an extensive list of technology products that have seen growth of 4-10 percent since 2010, but have not yet reached a 70
It’s been an amazing journey with Hadoop. As we discussed in an earlier blog, Hadoop is informing the basis of a comprehensive data enterprise platform that can power an ecosystem of analytic applications to uncover rich insights on large sets of data. With YARN (Yet Another Resource Negotiator) as its
.@philsimon on the new role of IT.
A week from today, we'll be in New York City for Strata + Hadoop World, where we’ll kick things off at the Opening Reception. Be sure to stop by booth 543 to meet the team IRL (in real life)! They are excited about the event and eager to talk with attendees.
It’s rather appropriate that the rock band Europe recorded the hit “The Final Countdown”, because today, September 22nd, represents 100 days until the much anticipated (and delayed) European insurance legislation Solvency II will come into effect on January 1st 2016. Designed to introduce a harmonized, EU-wide insurance regulation, Solvency II
.@philsimon on the new challenges of an old problem.
It’s me again!! We're at the halfway point of meeting our Strata + Hadoop World dream team. So far, you’ve met machine learning guru Patrick Hall; data management expert Clark Bradley; and advanced analytics specialist Rachel Hawley. Next up … Dan Zaratsian! I met Dan a few years back while preparing for Analytics 2013
In the UK, technology trends move a little slower than for our US counterparts. It was about 5 years ago when I first met a data leader at a conference on this side of the pond who was actively engaging in large scale big data projects. This wasn’t a presenter
Data integration, on any project, can be very complex – and it requires a tremendous amount of detail. The person I would pick for my data integration team would have the following skills and characteristics: Has an enterprise perspective of data integration, data quality and extraction, transformation and load (ETL): Understands
Meet Clark Bradley: SAS technical architect by day and comedian by night. When he’s not demoing SAS Data Loader for Hadoop, he’s blogging about it on The Data Roundtable. Clark and a core SAS team of thought leaders, developers and executives will be in New York City on September 29 at Strata
In my prior two posts, I explored some of the issues associated with data integration for big data and particularly, the conceptual data lake in which source data sets are accumulated and stored, awaiting access from interested data consumers. One of the distinctive features of this approach is the transition
with Natalie Osborn, Senior Industry Consultant, Hospitality and Gaming Practice, SAS. It’s back to school time, and back to school reminds me of getting back to the basics. So, we thought we’d start the fall with a “back to the basics” refresher series on analytics. To accomplish this, Natalie and
This is my final entry in the Education Meets Big Data blog series. Let’s review what we've covered so far… In my first post, I explained that statewide longitudinal data systems (SLDSs) track student data from preschool through college and workforce across the state. SLDSs can be used to see one
Integrating big data into existing data management processes and programs has become something of a siren call for organizations on the odyssey to become 21st century data-driven enterprises. To help save some lost time, this post offers a few tips for successful big data integration.
There is a time and a place for everything, but the time and place for data quality (DQ) in data integration (DI) efforts always seems like a thing everyone’s not quite sure about. I have previously blogged about the dangers of waiting until the middle of DI to consider, or become forced
While not on the same level of Rush, I do fancy myself a fan of The Who. I'm particularly fond of the band's 1973 epic, Quadrophenia. From the track "5:15": Inside outside, leave me alone Inside outside, nowhere is home Inside outside, where have I been? The inside-outside distinction is rather apropos
In my last post, I noted that the flexibility provided by the concept of the schema-on-read paradigm that is typical of a data lake had to be tempered with the use of a metadata repository so that anyone wanting to use that data could figure out what was really in