[infinispan-dev] Nutch atop Hadoop+ISPN

Pierre Sutra pierre.sutra at unine.ch
Thu Aug 28 07:48:10 EDT 2014


Hello,

As announced previously, we developed a Gora connector for Infinispan
(https://github.com/otrack/gora). The code is quite functional now as we
are able to run Apache Nutch 2.x on top of Infinispan and Yarn+HDFS
(Hadoop 2.x). Nutch is a pipeline of M/R jobs accessing web pages from a
data store (in that case Infinispan). Queries to fetch (and store) pages
are executed via the Gora connector which itself relies on an Apache
Avro remote query module in Infinispan and Hot Rod.

The next step to foster integration would be removing the need for
stable storage (distributing jars to the workers), as well as moving to
Infinispan native M/R support. I have seen that this is related to
https://issues.jboss.org/browse/ISPN-2941. Could someone please give me
more details about the next steps in this direction, in particular
regarding stable storage ? Many thanks.

Cheers,
Pierre



More information about the infinispan-dev mailing list