I would like to test that everything work setting up infinispan to work on DIST.
Is there anywhere an example that I can look at?
On Tue, Mar 19, 2013 at 11:19 AM, Sanne Grinovero <sanne(a)hibernate.org> wrote:
On Friday we had been pair-programming and likely finished the
implementation:
it looks good but we couldn't run the test.
The blocker is that Map/Reduce on Infinispan only works on DIST, and since
we can't iterate on entries we need M/R so we might need to
reconfigure Infinispan in our tests.
That's annoying as DIST will make our testsuite significantly slower,
an alternative is to have Infinispan fix this limitation first.
Sanne
On 13 March 2013 11:14, Davide D'Alto <daltodavide(a)gmail.com> wrote:
> No problem.
>
> The association is lazy but I will investigate about Hibernate.initialize
>
> On Tue, Mar 12, 2013 at 8:01 PM, Emmanuel Bernard
> <emmanuel(a)hibernate.org> wrote:
>> I have not forgotten, I'm just in a middle of a Bean Validation crisis
>> that delayed my look into this issue.
>> Could it be BTW that the mass indexer does not ask for these objects to
>> be loaded using Hibernate.initialize ? It coudl also be a bug in OGM but
>> not necessarily. In particular is the association lazy or eager?
>>
>> Emmanuel
>>
>> On Mon 2013-03-11 11:00, Davide D'Alto wrote:
>>> I have created a branch for OGM-228 (OGM MassIndexer) that includes
>>> OGM-151 (Metamodel) and OGM-273 (load entities from tuple):
>>>
https://github.com/DavideD/hibernate-ogm/tree/OGM-228
>>>
>>> A test I've added fails though (AssociationMassIndexerTest):
>>>
https://github.com/DavideD/hibernate-ogm/blob/74549a4d264af30fa88960c30e2...
>>>
>>> The test uses two entitties IndexedNews and IndexedLabel, with a
>>> relationship one to many from news to label.
>>> The mass indexing works fine but when I retrieve the list of indexed
>>> labels with the query "FROM IndexedLabel", the result contains a
list
>>> of proxy and the equals fails because the class of the objects in the
>>> list is not IndexedLabel.
>>>
>>> If I first get the list of news and than for each of them I called the
>>> method news.getLabels(), everything works fine.
>>>
>>> Any thoughts
>>>
>>> Thanks
>>>
>>> On Thu, Mar 7, 2013 at 10:15 AM, Emmanuel Bernard
>>> <emmanuel(a)hibernate.org> wrote:
>>> > I have no more coin for this one so I have dumped what I have so far
>>> >
https://github.com/hibernate/hibernate-ogm/pull/175
>>> >
>>> > Emmanuel
>>> >
>>> > On Wed 2013-03-06 19:18, Emmanuel Bernard wrote:
>>> >> I've successfully implemented OGM-151 for EntityKey which is the
one we
>>> >> need to move OGM-273 forward for now.
>>> >> I am trying to implement it for AssociationKey but caching here is
>>> >> significantly harder as data is cross reference across
associations.
>>> >>
>>> >> Sanne, when you worked on the profiling of OGM, do you remember
>>> >> AssociationKey putting a pressure in build time or memory wise?
Because
>>> >> caching them per persister means some rather complex race conditions
and
>>> >> more memory used permanently (as opposed to on demand).
>>> >>
>>> >> So I'm wondering if that's worth it. As an intermediary
step, I could
>>> >> introduce AssociationKeyMetadata but build it on-demand - that one
is
>>> >> easier to achieve.
>>> >>
>>> >> Emmanuel
>>> >>
>>> >> On Wed 2013-03-06 15:32, Davide D'Alto wrote:
>>> >> > it's ok for me
>>> >> >
>>> >> > Davide
>>> >> >
>>> >> > On Wed, Mar 6, 2013 at 3:28 PM, Emmanuel Bernard
<emmanuel(a)hibernate.org> wrote:
>>> >> > > I'm planning on working on OGM-151. Fine with
everyone?
>>> >> > > That will likely be my last before I move back to BVAL and
close the
>>> >> > > final issues there.
>>> >> > >
>>> >> > > Emmanuel
>>> >> > >
>>> >> > > On Tue 2013-03-05 19:04, Sanne Grinovero wrote:
>>> >> > >> Nice!
>>> >> > >> n+1 is something Hibernate Search has to deal with
too, that's why I
>>> >> > >> was interested in the fetch profiles and graph loading
in JPA 2.1
>>> >> > >>
>>> >> > >> On 5 March 2013 17:44, Emmanuel Bernard
<emmanuel(a)hibernate.org> wrote:
>>> >> > >> > I have implemented a solution that gives an
entity based on a tuple.
>>> >> > >> >
https://hibernate.onjira.com/browse/OGM-273#comment-50082
>>> >> > >> >
>>> >> > >> > Note that it does not currently works for
MongoDB, but that's waiting
>>> >> > >> > for the dedicated GridDialect method as well as
OGM-151.
>>> >> > >> > Also note that I have no idea how that will work
for associations. I
>>> >> > >> > suspect some nasty n+1 is happening as best.
Worse case, an exception :)
>>> >> > >> >
>>> >> > >> > Emmanuel
>>> >> > >> >
>>> >> > >> > On Tue 2013-03-05 10:30, Emmanuel Bernard wrote:
>>> >> > >> >> We might hope for a stable enough contract on
Hibernate Search and
>>> >> > >> >> hope that we won't break serializability
between micro or minor
>>> >> > >> >> versions. That will need to be taken into
account in the test suite and
>>> >> > >> >> design.
>>> >> > >> >> On the OGM side though, we are not at that
level of maturity and we will
>>> >> > >> >> force homogenous Hibernate OGM version across
all the cluster. The grid
>>> >> > >> >> will have to go down for upgrades or enforce
that no mpa reduce job
>>> >> > >> >> using OGM is used while the version roll out
is in process.
>>> >> > >> >>
>>> >> > >> >> Emmanuel
>>> >> > >> >>
>>> >> > >> >> On Mon 2013-03-04 18:30, Sanne Grinovero
wrote:
>>> >> > >> >> > Found an example, this is all the code
it needs to have a MassIndexer working
>>> >> > >> >> > on top of Infinispan's Map/Reduce:
>>> >> > >> >> >
>>> >> > >> >> >
https://github.com/infinispan/infinispan/blob/master/query/src/main/java/...
>>> >> > >> >> >
>>> >> > >> >> > Note it's initialize method which
injects needed components; the
>>> >> > >> >> > implementation is serialized across
nodes.
>>> >> > >> >> >
>>> >> > >> >> > Sanne
>>> >> > >> >> >
>>> >> > >> >> > On 4 March 2013 18:26, Sanne Grinovero
<sanne(a)hibernate.org> wrote:
>>> >> > >> >> > > We finished this discussion on IRC,
in case someone else was interested:
>>> >> > >> >> > >
>>> >> > >> >> > > <sanne> hum I forgot the
first step.. transformation from entry into entity
>>> >> > >> >> > > <sanne> updated
>>> >> > >> >> > > <sanne> emmanuel, the
"hidrate" step is what DavideD is bashing is
>>> >> > >> >> > > head against, but let's assume
he finds a workaround and we focus on
>>> >> > >> >> > > the pattern as first step?
>>> >> > >> >> > > <emmanuel>
https://gist.github.com/emmanuelbernard/5084039
>>> >> > >> >> > > <emmanuel> sanne: ^
that's how I would do it if I had an Iterator from the tuple
>>> >> > >> >> > > <emmanuel> assuming
pushToExecutor pushes to whatever concurrent work
>>> >> > >> >> > > mechanism you planned to use on
consumes
>>> >> > >> >> > > <emmanuel> Plus I am not
folloing exactly how you plan consumes(Entry)
>>> >> > >> >> > > to be executed concurrently
>>> >> > >> >> > > <emmanuel> is that the
GridDialect responsibility?
>>> >> > >> >> > > <emmanuel> That looks like a
lot of work on the dialect's side
>>> >> > >> >> > > <sanne> emmanuel, imagine the
backend is Infinispan and has some large
>>> >> > >> >> > > amount of data per node, plus that
each node has its own backend
>>> >> > >> >> > > IndexManager (like and ideal
sharding)
>>> >> > >> >> > > <emmanuel> ie pool mgt and
cap + queuing
>>> >> > >> >> > > <sanne> then with your
approach the iterator needs to fetch data from
>>> >> > >> >> > > all remote nodes, and then enqueue
in a local blocking queue which is
>>> >> > >> >> > > returning the data to the original
owners
>>> >> > >> >> > > <sanne> but if you skip that
step, you can just forward the statless
>>> >> > >> >> > > consumer to each node and have it
run on data locality
>>> >> > >> >> > > <emmanuel> I was thinking
that if you had the luncene index locally on
>>> >> > >> >> > > each node you would ahve a
different impl of the MassIndexer anyways
>>> >> > >> >> > > <emmanuel> that would simply
send a command to each local node
>>> >> > >> >> > > <sanne> To answer your
question: that would be an optional GridDialect
>>> >> > >> >> > > responsibility. I would endorse a
trivial first draft doing a
>>> >> > >> >> > > single-threaded loop.
>>> >> > >> >> > > <emmanuel> and have
GridDialect.getDataFor() returnlocal data
>>> >> > >> >> > > <sanne> The
"consumes" implementation can be either implemented with a
>>> >> > >> >> > > simple iterator - as in your design
- so I don't think it pushes much
>>> >> > >> >> > > complexity to the GridDialect
implementor?
>>> >> > >> >> > > <sanne> The benefit of the
consumer is that *optionally* it can be
>>> >> > >> >> > > mapped on the Map phase, and
that's trivial if your backend supports
>>> >> > >> >> > > Map/Reduce
>>> >> > >> >> > > <emmanuel> sanne: I don't
follow that soory
>>> >> > >> >> > > <emmanuel> how does that make
it mappable to the Map phase?
>>> >> > >> >> > > <sanne> "public void
consume(Entry e) " is a degenerate (simplified)
>>> >> > >> >> > > form of map.
>>> >> > >> >> > > <sanne> mm infinispan IDE
crashes at the right moment.
>>> >> > >> >> > > <emmanuel> I thought Map was
about *filtering*
>>> >> > >> >> > > <emmanuel> not processing
>>> >> > >> >> > > <sanne> you can decide to
accept 100% of values (without filtering),
>>> >> > >> >> > > but actually you might want to
filter on the specified tables only.
>>> >> > >> >> > > <sanne> also, the return type
doesn't have to match the input type:
>>> >> > >> >> > > hence you define a transformation
function, which is inherently
>>> >> > >> >> > > applied in parallel on all matching
entries.
>>> >> > >> >> > > <emmanuel> sanne: but then
you require the OGM code to be everywhere
>>> >> > >> >> > > (ie on each node of the
targetNoSQL
>>> >> > >> >> > > <emmanuel> to eb able to do
tuple -> entity
>>> >> > >> >> > > <emmanuel> that's not
realistic
>>> >> > >> >> > > <emmanuel> assuming your
transform phase is about tuple -> entity and
>>> >> > >> >> > > some HSearch ops
>>> >> > >> >> > > <sanne> yes right
>>> >> > >> >> > > <sanne> but isn;t it worth
it? it's optional and much more efficient,
>>> >> > >> >> > > as you avoid transferring any
data.
>>> >> > >> >> > > <sanne> btw we often assume
all nodes in the grid are equally
>>> >> > >> >> > > configured, so having same apps
& libraries deployed.
>>> >> > >> >> > > <emmanuel> sanne: let me try
and summarize what I understand
>>> >> > >> >> > > <emmanuel> it's more
efficient if you store the Lucene index locally
>>> >> > >> >> > > with the data, and if the grid is
written in Java or at least can run
>>> >> > >> >> > > code in Java including libraries
and if you distribute the OGM
>>> >> > >> >> > > configuration across the whole
grid
>>> >> > >> >> > > <emmanuel> Otherwise, it does
not make any difference
>>> >> > >> >> > > <emmanuel> Also the
GridDialect implementation need to know if you are
>>> >> > >> >> > > doing this trick to only return
local data
>>> >> > >> >> > > <sanne> no there are other
drawbacks which get defeated, but minor so
>>> >> > >> >> > > I didn't mention them
>>> >> > >> >> > > <emmanuel> am I right?
>>> >> > >> >> > > <sanne> mainly, you skip the
need for the contentions point as there
>>> >> > >> >> > > is no push to a shared blocking
queue
>>> >> > >> >> > > <sanne> no the GridDialect
doesn't need to know.
>>> >> > >> >> > > <emmanuel> sanne: sure if you
can process the code on each node you
>>> >> > >> >> > > avoid the shared blocking queue, at
lest until you reach the
>>> >> > >> >> > > IndexManager
>>> >> > >> >> > > <sanne> you'll just
forward a simple (standard) M/R task, and it will
>>> >> > >> >> > > need to execute it as always.
>>> >> > >> >> > > <sanne> the IndexManager is
parallel ;)
>>> >> > >> >> > > <emmanuel> sanne: parallel on
a single node
>>> >> > >> >> > > <sanne> yes, but no
contentions points other than the internal
>>> >> > >> >> > > structure of the IW
>>> >> > >> >> > > <emmanuel> I mean updating
the index for a given table is better done
>>> >> > >> >> > > on a singlle node
>>> >> > >> >> > > <sanne> IndexWriter
>>> >> > >> >> > > <emmanuel> sorry I meant
IndexWriter
>>> >> > >> >> > > <emmanuel> ah but ou mention
perfect sharding
>>> >> > >> >> > > <emmanuel> you need
cosmological alignment for this shit to happen
>>> >> > >> >> > > <sanne> not if we plan for it
:)
>>> >> > >> >> > > <sanne> you might remember
the changes to Segments in the ISPN code,
>>> >> > >> >> > > to accomodate index storage
consistent with the data locality
>>> >> > >> >> > > <sanne> that's expected
in 6.0
>>> >> > >> >> > > <emmanuel> So
gridDialect.getData(Consumer consumer, String.. tables) is wrong
>>> >> > >> >> > > <emmanuel> it's more
gridDialect.getData(ConsumerImpl.class, String... tables)
>>> >> > >> >> > > <emmanuel> as you ened to
send the Comsumer impl
>>> >> > >> >> > > <emmanuel> not simply use it
>>> >> > >> >> > > <sanne> hu, it needs a
reference to the current SearchFactory at very least
>>> >> > >> >> > > <emmanuel> sanne: but
you're telling me you send the M/R task
>>> >> > >> >> > > <emmanuel> so you need to
send the M/R code as well
>>> >> > >> >> > > <sanne> yes but here we enter
Infinspan specific implementation
>>> >> > >> >> > > <sanne> I would register the
needed components in Infinispan and use
>>> >> > >> >> > > the ServiceRegistry to look them up
remotely
>>> >> > >> >> > > <sanne> not to mention
Infinispan could accomodate a custom command for it
>>> >> > >> >> > > <emmanuel> What I am saying
is that you don't pass the Consumer
>>> >> > >> >> > > *instance* tot he grid dialect but
rather the impl, no?
>>> >> > >> >> > > <sanne> the impl class
definition?
>>> >> > >> >> > > <emmanuel> sanne: you tell
me. How do I send M/R code today?
>>> >> > >> >> > > <emmanuel> certainly not an
impl instance
>>> >> > >> >> > > <sanne> yes you do
>>> >> > >> >> > > <sanne> JBMar will take care
of it, including state.
>>> >> > >> >> > > <sanne> but in this case that
would be wrong of course as I don't want
>>> >> > >> >> > > to serialize the whole
SearchFactory so I'd use injection and lookup,
>>> >> > >> >> > > but that's a detail of
Infinispan.
>>> >> > >> >> > > <sanne> But this
shouldn't be MassIndexer specific right? it's good to
>>> >> > >> >> > > expose a general "execute on
all" method, and I think accepting
>>> >> > >> >> > > instances would make life easier
for most - even though we might need
>>> >> > >> >> > > to document some limitations.
>>> >> > >> >> > > <emmanuel> alright, I guess
'll have to live with a visitor pattern
>>> >> > >> >> > > for a feature that has 5% chance of
happening :)
>>> >> > >> >> > > <sanne> I'm going to
punch Davide
>>> >> > >> >> > > <sanne> as he's yelling
"it's not a visitor" but doesn't have the guts
>>> >> > >> >> > > to write it down :)
>>> >> > >> >> > > <emmanuel> sanne: DavideD
's would have nothing to do about it, that's
>>> >> > >> >> > > requires a lot of config and
Infinispan machinery I'm not sure is here
>>> >> > >> >> > > today
>>> >> > >> >> > > <DavideD> :)
>>> >> > >> >> > > <emmanuel> ah
>>> >> > >> >> > > <emmanuel> I don't care
how it's called, it's one of those patterns
>>> >> > >> >> > > that make the code harder to
follow
>>> >> > >> >> > > <DavideD> I was actually
trying to remember the name of the pattern
>>> >> > >> >> > > <sanne> ok now we agree :)
>>> >> > >> >> > > <emmanuel> Obfuscator pattern
family
>>> >> > >> >> > > <sanne> very popular among
consultants, I don't understand why you complain :P
>>> >> > >> >> > > <sanne> Anyway, let's
wrap up and broaden the horizon:
>>> >> > >> >> > > <emmanuel> ok so we are left
with findin to to load a entity from a tuple
>>> >> > >> >> > > <sanne> you don't think
it's useful as a general purpose method?
>>> >> > >> >> > > <emmanuel> sanne: wil be for
queries
>>> >> > >> >> > > <emmanuel> It's just that
it's non obvious
>>> >> > >> >> > > <sanne> Exactly. Also I think
lambda methods are getting widely better known.
>>> >> > >> >> > > <emmanuel> syntactically yes
>>> >> > >> >> > > <emmanuel> VM wise, perf
improvements will come later
>>> >> > >> >> > > <sanne> what I mean is that
by defining the SPI this way, I don't
>>> >> > >> >> > > expect it to be more complex for
the GridDialect implementors, while
>>> >> > >> >> > > we can reuse it for a wider scope
of needs.
>>> >> > >> >> > >
>>> >> > >> >> > > --Sanne
>>> >> > >> >> > >
>>> >> > >> >> > > On 4 March 2013 17:02, Emmanuel
Bernard <emmanuel(a)hibernate.org> wrote:
>>> >> > >> >> > >>
>>> >> > >> >> > >>
>>> >> > >> >> > >> On 4 mars 2013, at 17:39, Sanne
Grinovero <sanne(a)hibernate.org> wrote:
>>> >> > >> >> > >>
>>> >> > >> >> > >>> On 4 March 2013 16:20,
Emmanuel Bernard <emmanuel(a)hibernate.org> wrote:
>>> >> > >> >> > >>>> I already gave what I
knew on how to load an entity from a tuple (which
>>> >> > >> >> > >>>> isn't much) but we
can try and dig together. Something I thought about
>>> >> > >> >> > >>>> is that ORM probably
has a mechanism to load an entity from a resultset
>>> >> > >> >> > >>>> via the query parser.
And that probably looks also like the second half
>>> >> > >> >> > >>>> of OgmLoader.load. We
could look at this part and see if we can make an
>>> >> > >> >> > >>>> OGM version of it. We
never had the need before as we never had query
>>> >> > >> >> > >>>> support (the way SQL
does it).
>>> >> > >> >> > >>>
>>> >> > >> >> > >>> I would also need to study
the ORM code, but to add a high level observation,
>>> >> > >> >> > >>> the methods currently
defined by the GridDialect are focusing on
>>> >> > >> >> > >>> loading from well known key
instances,
>>> >> > >> >> > >>> there is nothing to makes
us able to scan/inspect for all values.
>>> >> > >> >> > >>>
>>> >> > >> >> > >>> In other words: even if we
wanted to load keys first, we don't have definitions
>>> >> > >> >> > >>> of functions from
raw->primary key instances either.
>>> >> > >> >> > >>
>>> >> > >> >> > >> I understand that. I'm not
denying the need for the method.
>>> >> > >> >> > >>
>>> >> > >> >> > >>>
>>> >> > >> >> > >>>
>>> >> > >> >> > >>>> On the visitor vs
Iterator approach, I still don't see how implementing
>>> >> > >> >> > >>>> an Iterator on a map /
reduce backend would be harder than the visitor
>>> >> > >> >> > >>>> but maybe I'm
missing something.
>>> >> > >> >> > >>>>
>>> >> > >> >> > >>>> class
IteratorAsStream {
>>> >> > >> >> > >>>> final Query
someMapReduceQuery = ...;
>>> >> > >> >> > >>>>
>>> >> > >> >> > >>>> public Object
next() {
>>> >> > >> >> > >>>> if
(!someMapReduceQuery.started()) {
>>> >> > >> >> > >>>> //
execute and collect results in parallel
>>> >> > >> >> > >>>>
someMapReduceQuery.execute();
>>> >> > >> >> > >>>> }
>>> >> > >> >> > >>>> Object
result = someMapReduce.getNextOrBlock();
>>> >> > >> >> > >>>> return
result;
>>> >> > >> >> > >>>> }
>>> >> > >> >> > >>>> }
>>> >> > >> >> > >>>
>>> >> > >> >> > >>> That could work to *load*
all entities in parallel, but I'd like to
>>> >> > >> >> > >>> process the entities in
parallel as well.
>>> >> > >> >> > >>> And I'd rather not
force the GridDialect implementors to write some
>>> >> > >> >> > >>> Hibernate Search specific
code,
>>> >> > >> >> > >>> so to break out we need
some form of "Execute X on each": a closure or a lambda.
>>> >> > >> >> > >>>
>>> >> > >> >> > >>
>>> >> > >> >> > >> I can't see how the visitor
model helps in your processing of entities in parallel. To me both approaches are strictly
equivalent. Care to show some pseudo-code?
>>> >> > >> >>
_______________________________________________
>>> >> > >> >> hibernate-dev mailing list
>>> >> > >> >> hibernate-dev(a)lists.jboss.org
>>> >> > >> >>
https://lists.jboss.org/mailman/listinfo/hibernate-dev
>>> >> > > _______________________________________________
>>> >> > > hibernate-dev mailing list
>>> >> > > hibernate-dev(a)lists.jboss.org
>>> >> > >
https://lists.jboss.org/mailman/listinfo/hibernate-dev
>>> >> _______________________________________________
>>> >> hibernate-dev mailing list
>>> >> hibernate-dev(a)lists.jboss.org
>>> >>
https://lists.jboss.org/mailman/listinfo/hibernate-dev
> _______________________________________________
> hibernate-dev mailing list
> hibernate-dev(a)lists.jboss.org
>
https://lists.jboss.org/mailman/listinfo/hibernate-dev
_______________________________________________
hibernate-dev mailing list
hibernate-dev(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/hibernate-dev