Pedro/Gustavo,

How do you plan to benchmark our Hadoop implementation? It seems TeraSort benchmark suite is an interesting option. Maybe not using 1 TB data set right away, but eventually, why not? Especially now that we can easily run 500 nodes cluster on GCE. I would love to see if we can, when you guys start benchmarking our Hadoop impl, give TeraSort a run on a regular Map/Reduce implementation as well.

What do you think?

Vladimir

[1] http://www.michael-noll.com/blog/2011/04/09/benchmarking-and-stress-testing-an-hadoop-cluster-with-terasort-testdfsio-nnbench-mrbench/#terasort-benchmark-suite