Infinispan and Gora
by Pierre Sutra
Hello,
As part of the LEADS project, we are planning to run Apache Nutch on top
of Infinispan. To that end, we implemented an infinispan module for
Apache Gora. The implementation is accessible via GitHub ( projects
https://github.com/otrack/Leads-infinispan.git and
https://github.com/vschiavoni/gora). At core, it relies on a preliminary
support for Avro in infinispan (impacting the remote-query-client,
remote-query-server and hotrod-client modules). This support uses the
self-descriptive …
[View More]capabilities of Avro data to avoid declaring types in
advance via Google Protocol Buffers ( protobuf). In the current state,
our modifications are not fully compatible with the existing
protobuf-based remote operations, but if they look of interest we can
improve this.
Cheers,
Pierre
[View Less]
10 years, 8 months
LevelDBStore's expiryEntryQueue
by Pedro Ruivo
Hi guys,
Is there a way to replace the expiryEntryQueue with a non-blocking
structure?
Long history:
In high throughput systems, this queue gets full very often and it is
blocking all the writes (that throws timeout exceptions everywhere).
Also, I don't full understand why this queue exists. It is drain in
purgeExpired and it deletes the keys expired, but the not-expires keys
are never tested (or I am missing something)
Can someone explains?
Cheers, (and have a nice weekend)
Pedro
10 years, 9 months
Hadoop and ISPN first and next steps
by Gustavo Fernandes
Hi all,
Last week Pedro, myself and Mircea met at London to start prototyping the integration between Hadoop and ISPN.
We discussed several scenarios where Hadoop and ISPN would be able to work together, and decided to start with ISPN server as the source and/or sink for a Hadoop Map Reduce job
After creating an InputFormat and OutputFormat for ISPN [1], we generated some data [2] and run a sample job [3] using Hadoop v1.x, both in docker [4] and on a 4 node physical cluster (installed with …
[View More]the help of puppet [5])
We also run the same job in the same cluster with the same data, but using HDFS as data source and sink, so that we could verify correctness.
In this setup, each Hadoop slave runs the TaskTracker, Data node and ISPN server, and the idea was to generate a split [6] based on segments and redirect the map task to be executed on the nodes associated with those segments. This routing and filtering the data is still work in progress, carried on by Pedro.
Next steps?
- For sure optimise the current Input/OutputFormat so that it can efficiently read/write data. This will allow ISPN to become part of the Hadoop ecosystem and easier to integrate it with tools like Apache Hive [7] or Pig [8].
- Investigate closer integration for Map Reduce, potentially usable in library mode. As you might know, YARN (the overhaul of Hadoop architecture) is not only about Map Reduce, and it offers more extensions points than Hadoop Map Reduce v1
- I read with great interest the Spark paper [9]. Spark provides a DSL with functional language constructs like map, flatMap and filter to process distributed data in memory. In this scenario, Map Reduce is just a special case achieved by chaining functions [10]. As Spark is much more than Map Reduce, and can run many machine learning algorithms efficiently, I was wondering if we should shift attention to Spark rather than focusing too much on Map Reduce. Thoughts?
[1] https://github.com/pruivo/infinispan-hadoop-integration/tree/master/src/m...
[2] http://www.skorks.com/2010/03/how-to-quickly-generate-a-large-file-on-the...
[3] https://github.com/pruivo/hadoop-wordcount-example/tree/master/src/main/j...
[4] https://github.com/gustavonalle/docker/tree/master/hadoop
[5] https://gist.github.com/gustavonalle/95dfdd771f31e1e2bf9d
[6] https://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/Input...
[7] https://hive.apache.org/
[8] http://pig.apache.org/
[9] http://www.cs.berkeley.edu/~matei/papers/2012/nsdi_spark.pdf
[10] https://spark.apache.org/docs/0.9.0/quick-start.html#more-on-rdd-operations
Cheers,
Gustavo
[View Less]
10 years, 9 months