[infinispan-dev] Non-blocking state transfer (ISPN-1424)

Sanne Grinovero sanne at infinispan.org
Fri Mar 9 11:03:37 EST 2012


I agree with Bela: this looks scary, can't imagine how tricky it would
be to implement it correctly. Could you split the problem?

Also, what happened with the super-simple ideas I'd shared in London?
Is is the same? I'm assuming it's very different..
I might have overseen some important aspect but I'd like to know why
that approach was discarded. Was it looking too simple ? :P

I'm sorry I'm skimming through it, will have more time next week but
as Galder said as well I'll need to draw this to understand it better.

Some first-impact notes:
# ownership information
  Deciding when to start a state transfer, is a different problem.
move it to another page or drop it?
  Deciding when to have a node join - same as above?

# Cache entries
##  "We need a snapshot of the iterator" < Can we avoid it? We just
start refusing to serve write commands by checking any incoming
command. It's an additional interceptor which checks the incoming
command is "appropriate" to be handled by us or needs to return some
kind of rejection code.
## Need for tombstones < I think we can avoid that actually. We'll
need it for MVCC replace/delete operations, but for state transfer
it's not needed if we decide that a Write operation has to send the
value to the new owners only and send an "authoritative invalidation"
to all previous owners.

# Lock Information
This is trivial if you stop thinking as them as being special. A lock
is a marker, and a marker is a value stored in the grid. These values
are transferred as any other value, with one single differentiator:
since there always is only one, they are manipulated via CAS
operations and are guaranteed to be consistent without the need of
being locked when changed.

#L1
let's keep it simple initially and just flush them out as decided.
## the cleanup you mention: is that not a current bug, orthogonal to
this design page? (trying to identify more thing to move out)

#Handling merges
Could we simplify this by saying that the views are not actually a
linked list but a tree?
In this document we're not attempting to solve consistent merging of
split brain, right? So we only need to know how to move the state to
the rightful new owner. For conflicts, let's assume there is an
"ConflictResolver object" which we'll describe/implement somewhere
else.

#State transfer disabled
We should think about the cases in which this option makes sense to be
enabled. In those cases, would people still be interested in L1
consistency and transactions? If not, this is not a problem to solve.

after getting to the end, it's not a bad document at all but I still
think it looks too scary :D

Cheers,
Sanne

On 9 March 2012 14:19, Bela Ban <bban at redhat.com> wrote:
> Wow !
>
> Does this need to be so complex ? I've spent a hour trying to understand
> it, and am still overwhelmed... :-)
>
> My understanding (based on my changed in 4.2) is that state transfer
> moves/deletes keys based on the diff between 2 subsequent views:
> - Each node checks all of the affected keys
> - If a key should be stored in additional nodes, the key is pushed there
> - If a key shouldn't be stored locally anymore, it is removed
>
> IMO, there's no need to handle a merge differently from a regular view,
> and we might end up with inconsistent state, but that's unavoidable
> until we have eventual consistency. Fine...
>
> Also, why do we need to transfer ownership information ? Can't ownership
> be calculated purely on local information ?
>
> I'm afraid that the complexity will increase the state space (hard to
> test all possible state transitions), lead to unnecessary messages being
> sent and most importantly, might lead to blocks.
>
> The section on locking outright scares me :-) Perhaps reducing the level
> of details here - as Galder suggested - might help to understand the
> basic design.
>
> Sorry for being a bit negative, but I think state transfer is one of the
> most critical and important pieces of code in DIST mode, and this needs
> to work against large (say a couple of hundreds) clusters and nodes
> joining, leaving or crashing all the times...
>
> I'm going to re-read the design again, maybe what I said above is just
> BS ... :-)
>
>
> On 3/8/12 11:55 AM, Dan Berindei wrote:
>> Hi guys
>>
>> It's been a long time coming, but I finally published the non-blocking
>> state transfer draft on the wiki:
>> https://community.jboss.org/wiki/Non-blockingStateTransfer
>>
>> Unlike my previous state transfer design document, I think I've
>> fleshed out most of the implications. Still, there are some things I
>> don't have a clear solution for yet. As you would expect it's mostly
>> around merging and delayed state transfer.
>>
>> I'm looking forward to hearing your comments/advice!
>>
>> Cheers
>> Dan
>>
>> PS: Let's discuss this over the mailing list only.
>>
> --
> Bela Ban, JGroups lead (http://www.jgroups.org)
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev


More information about the infinispan-dev mailing list