Jim Harris takes a deep dive into data lakes and how they relate to the cloud.
Tag: Data Lake
Guest blogger Khari Villela shares tips to help you skip common pitfalls of building a data lake.
David Loshin suggests that reference data could be the foundation of future governance structures.
Guest blogger Khari Villela says data lakes are not a cure-all – they're just one part of a comprehensive, strategic architecture.
Jim Harris warns against allowing your data lake to become a poorly managed and ungoverned data dumping ground.
Kim Kaluba gives examples of the benefits of data governance for data lakes.
David Loshin describes how data as a service supports fast deployment, easy accessibility and data reuse.
David Loshin explains how to set up a data catalog that will help you get more value from a data lake.
Phil Simon shares his thoughts on this simple yet often-overlooked question.
In part 2, Jim Harris explains more about why you should address data quality and governance issues on the way to data lakes and Hadoop.
Jim Harris advocates addressing data quality and governance issues on the way to data lakes and Hadoop.
Historically, before data was managed it was moved to a central location. For a long time that central location was the staging area for an enterprise data warehouse (EDW). While EDWs and their staging areas are still in use – especially for structured, transactional and internally generated data – big
Start with the end in mind -- wise words that apply to everything, and in the world of big data it means we have to change the way we look at managing the data we have. There was a time when we managed data quality, and the main goal was
I'm a very fortunate woman. I have the privilege of working with some of the brightest people in the industry. But when it comes to data, everyone takes sides. Do you “govern” the use of all data, or do you let the analysts do what they want with the data to
A long time ago, I worked for a company that had positioned itself as basically a third-party “data trust” to perform collaborative analytics. The business proposition was to engage different types of organizations whose customer bases overlapped, ingest their data sets, and perform a number of analyses using the accumulated
In my last post, I started to look at the use of Hadoop in general and the data lake concept in particular as part of a plan for modernizing the data environment. There are surely benefits to the data lake, especially when it's deployed using a low-cost, scalable hardware platform.
More and more organizations are considering the use of maturing scalable computing environments like Hadoop as part of their enterprise data management, processing and analytics infrastructure. But there's a significant difference between the evaluation phase of technology adoption and its subsequent production phase. This seems apparent in terms of how organizations are
Ein Data Lake ist ein Konzept zur Speicherung von Daten in einem Hadoop-Cluster. Es entstehen heutzutage an vielen Stellen Daten, welche aus Kostengründen nicht ins klassische Data Warehouse fließen. Doch könnten mit diesen Daten zusätzliche Assets generiert werden, vorausgesetzt man speichert sie an einem Ort und hat dann eine analytische
“Field of dreams warehouse”– a historic phrase I used in the early days of data warehouse development. It describes the frenzy of activity that took place to create enterprise data infrastructure, before the business rationale for the data use was even understood. Those were the early days. In some ways
In my prior two posts, I explored some of the issues associated with data integration for big data and particularly, the conceptual data lake in which source data sets are accumulated and stored, awaiting access from interested data consumers. One of the distinctive features of this approach is the transition
In my last post, I noted that the flexibility provided by the concept of the schema-on-read paradigm that is typical of a data lake had to be tempered with the use of a metadata repository so that anyone wanting to use that data could figure out what was really in
A few of our clients are exploring the use of a data lake as both a landing pad and a repository for collection of enterprise data sets. However, after probing a little bit about what they expected to do with this data lake, I found that the simple use of
In the last few days, I have heard the term “data lake” bandied about in various client conversations. As with all buzz-term simplifications, the concept of a “data lake” seems appealing, particularly when it is implied to mean “a framework enabling general data accessibility for enterprise information assets.” And of
Adoption of Hadoop, a low-cost open source platform used for processing and storing massive amounts of data, has exploded by almost 60 percent in the last two years alone according to Gartner. One primary use case for Hadoop is as a data lake – a vast store of raw, minimally processed data. But, in many ways, because
All hail the data lake, destroyer of enterprise data warehouses and the solution to all our enterprise data access problems! Ok – well, maybe not. In part four of this series I want to talk about the confusion in the market I am seeing around the data lake phrase, including
Working out where Hadoop might fit alongside, or where it might replace components, of existing IT architectures is a question on the minds of every organization that is being drawn towards the promises of Hadoop. That is the main focus of this blog along with discussions of some of the reasons they