In my last two posts, we concluded two things. First, because of the need for broadcasting data across the internal network to enable the complete execution of a JOIN query in Hadoop, there is a potential for performance degradation for JOINs on top of files distributed using HDFS. Second, there are
Tag: Hadoop
Using Hadoop: Emerging options for improved query performance
Explore las razones por las cuales los gigantes usan Hadoop
Seguramente ya ha escuchado hablar sobre Hadoop y todas sus potentes capacidades, de no ser así, este sistema no es más que un marco para software de código abierto que permite almacenar y procesar grandes volúmenes de datos de forma distribuida en un gran número de productos de hardware. En
Using Hadoop: Query optimization
In my last post, I pointed out that an uninformed approach to running queries on top of data stored in Hadoop HDFS may lead to unexpected performance degradation for reporting and analysis. The key issue had to do with JOINs in which all the records in one data set needed