Even though it sounds like something you hear on a Montessori school playground, this theme “Share your cluster” echoes across many modern Apache Hadoop deployments. Data architects are plotting to assemble all their big data in one system – something that is now achievable thanks to the economics of modern
Tag: big data
My previous post explained how confirmation bias can prevent you from behaving like the natural data scientist you like to imagine you are by driving your decision making toward data that confirms your existing beliefs. This post tells the story of another cognitive bias that works against data science. Consider the following scenario: Company-wide
Nowadays we hear a lot about how important it is that we are data-driven in our decision-making. We also hear a lot of criticism aimed at those that are driven more by intuition than data. Like most things in life, however, there’s a big difference between theory and practice. It’s
Many people, myself included, occasionally complain about how noisy big data has made our world. While it is true that big data does broadcast more signal, not just more noise, we are not always able to tell the difference. Sometimes what sounds like meaningless background static is actually a big insight. Other times
In my last three posts on data ethics, I explored a few of the ethical dilemmas in our data-driven world. From examining the ethical practices of free internet service providers to the problem of high-frequency trading, I’ve come to realize the depth and complexity of these issues. Anyone who's aware of these
Perhaps you're a big data expert who is fluent in Pig, Hive, MapR and all the technologies associated with the open source big data framework. Or maybe your role hasn't yet been touched by the increasingly popular Hadoop system. It's certainly worth a few minutes of your time (or perhaps
Imagine if your ability to feed your family depended upon how fast you could run. Imagine the aisles of your grocery store as lanes on a running track. If you can outrun your fellow shoppers, grab food off the shelves and race through the checkout at the finish line, then
Although I’m not particularly excited about football (I admit, I don’t completely understand what offside means), I did follow the last World Cup with more than average attention. Not only for the handsome players, but especially for all the fascinating statistics that appeared. It struck me that heat maps popped
Los datos importan más que nunca. Las organizaciones en progreso como Netflix y La Universidad de Texas, entre otras, están usando herramientas de visualización de datos para encontrarle forma a las grandes cantidades de información que se generan diariamente. En el libro The visual organization, Phil Simon, reconocido experto en
In my previous post, I examined ethics in a data-driven world with an example of how Facebook experiments on its users. Acknowledging the conundrum facing users of free services like Facebook, Phil Simon commented that “users and customers aren’t the same thing. Maybe users are there to be, you know... used.” What about when a
SAS users love to look at data. And the data grid in SAS Enterprise Guide is a convenient way to view the contents of a data set. While small data sets can be rendered lickity-split for quick viewing, sometimes people get justifiably anxious when opening very large data. Perhaps they've
I have previously blogged about how the dark side of our mood skews the sentiment analysis of customer feedback negatively since we usually only provide feedback when we have a negative experience with a product or service. Reading only negative reviews from its customers could make a company sad, but could reading only
Data science, as Deepinder Dhingra recently blogged, “is essentially an intersection of math and technology skills.” Individuals with these skills have been labeled data scientists and organizations are competing to hire them. “But what organizations need,” Dhingra explained, “are individuals who, in addition to math and technology, can bring in
Usted probablemente ha oído hablar del Internet de las Cosas. Por si no lo ha escuchado, es ese concepto que habla de que los dispositivos conectados a Internet pueden transmitir datos de un lado a otro y se comunican entre sí para tomar decisiones o enviar notificaciones sin intervención humana.
Data-driven journalism has driven some of my recent posts. I blogged about turning anecdote into data and how being data-driven means being question-driven. The latter noted the similarity between interviewing people and interviewing data. In this post I want to examine interviewing people about data, especially the data used by people to drive
At the Journalism Interactive 2014 conference, Derek Willis spoke about interviewing data, his advice for becoming a data-driven journalist. “The bulk of the skills involved in interviewing people and interviewing data are actually pretty similar,” Willis explained. “We want to get to know it a little bit. We want to figure
Oil companies are being forced to explore in geologically complex and remote areas to exploit more unconventional hydrocarbon deposits. New engineering technology has pushed the envelope of previous upstream experience. No guidebook existed on how computing methodologies can contribute to E&P performance at reduced risk. Until now. A new book
Puede llamarlo transformación, cambio de paradigma, evolución o revolución. No importa qué nombre reciban: en la actualidad, los servicios en la nube están cambiando la realidad de los negocios. La posibilidad y necesidad de acceder a la información desde cualquier sitio, en cualquier momento y con prácticamente cualquier dispositivo, ha
La gestión de datos es la práctica que permite a las empresas organizar y administrar los datos como un activo valioso e impulsar tanto los principales procesos operativos, como la toma de decisiones estratégicas al interior de las organizaciones. El gobierno de datos, entre tanto, hace referencia al esquema de
Data federation is a relatively new term used to describe a form of data virtualization. Data virtualization, however, is not new. It has been around since at least the 1960's when virtual memory was introduced to simulate additional memory beyond what was physically available on a machine. While data federation is a
My previous post made the point that it’s not a matter of whether it is good for you to use samples, but how good the sample you are using is. The comments on that post raised two different, and valid, perspectives about sampling. These viewpoints reflected two different use cases for data,
El Big Data y la Nube se están volviendo inseparables: “se necesitan recursos en la nube para el almacenamiento y la ejecución de proyectos de big data, y el big data brinda a las compañías una buena ocasión de pasar a la nube”. Podríamos decir que el big data y
In my previous post, I discussed sampling error (i.e., when a randomly chosen sample doesn’t reflect the underlying population, aka margin of error) and sampling bias (i.e., when the sample isn’t randomly chosen at all), both of which big data advocates often claim can, and should, be overcome by using all the data. In this
If you've followed technology news in the last few weeks, you've probably seen a proliferation of awestruck faces wearing blacked out goggles, similar to the image I'm using here. The goggles are Oculus Rift immersive VR headsets, and the news is the $2 billion dollar acquisition of Oculus by Facebook.
Debido al crecimiento masivo de las Tecnologías de la Información en la última década, el Chief Information Officer (CIO) ha pasado de tener una función de apoyo de back-office a una unidad de negocio crítica para las decisiones rentables del negocio. En la actualidad se ubica entre las cinco áreas con
In his recent Financial Times article, Tim Harford explained the big data that interests many companies is what we might call found data – the digital exhaust from our web searches, our status updates on social networks, our credit card purchases and our mobile devices pinging the nearest cellular or WiFi network.
I was asked to speak recently on a topic that includes two hyped terms: Big data and sustainability. At the risk of igniting an anti-buzzword campaign, I added a third over-used term to that list: analytics. Even though individuals and companies use those three words – big data, sustainability, and
What if virtual reality technology allowed you to immerse yourself in big data? That could be your future – and sooner than you think. It happened in August 2013 at the University of Washington: the first direct brain-to-brain communication, with one researcher controlling another researcher’s brain over Skype. Maybe in
¿Alguna vez ha imaginado una nave roja futurista que aterrizó en medio del desierto? Así es como se ve el mayor tejado de aluminio pre revestido del mundo: El parque temático cubierto Ferrari World en Abu Dabi, Emiratos Árabes Unidos. Ahora fíjese en la extraordinaria resistencia del color en este techo
As an unabashed lover of data, I am thrilled to be living and working in our increasingly data-constructed world. One new type of data analysis eliciting strong emotional reactions these days is the sentiment analysis of the directly digitized feedback from customers provided via their online reviews, emails, voicemails, text messages and social networking