ALOJA is an initiative to produce mechanisms for an automated characterization of cost-effectiveness of Hadoop deployments and reports its initial results.
While during the last years, Hadoop has become the de-facto platform for Big Data deployments, still little is understood of how the different layers of the software and hardware deployment options affects its performance.
Early ALOJA findings show that Hadoop's runtime performance, and therefore its price, are critically affected by relatively simple software and hardware configuration choices e.g., number of mappers, compression, or volume configuration.
ALOJA presents a vendor-neutral repository (hadoop.bsc.es) featuring thousands of Hadoop runs, a test bed, and tools to evaluate the cost-effectiveness of different hardware, parameter tuning, and Cloud services for Hadoop.
As few organizations have the time or performance profiling expertise,