[infinispan-dev] Using infinispan as quorum-based nosql

Dan Berindei dan.berindei at gmail.com
Thu Jun 6 05:09:22 EDT 2013


Say you have two transactions, tx1 and tx2. They both send a
LockControlCommand(k1) to the primary owner of k1 (let's call it B).
If the lock commands use SYNCHRONOUS_IGNORE_LEAVERS and B dies while
processing the commands, both tx1 and tx2 will think they have succeeded in
locking k1.

So you're right, everything should be locked before prepare in pessimistic
mode, but LockControlCommands are also susceptible to SuspectExceptions. On
the other hand, you can use SYNCHRONOUS mode for LockControlCommands and
you can just retry the transaction in case of a SuspectException.

Unfortunately, you can't retry the transaction if the PrepareCommand fails
(in pessimistic mode; or the CommitCommand in optimistic mode), because it
is executed in the commit phase. The transaction manager swallows all the
exceptions in the commit phase, making it impossible to see if it failed
because of a node leaving. I guess this means I should increase the
priority of https://issues.jboss.org/browse/ISPN-2402 ...



On Thu, Jun 6, 2013 at 11:49 AM, <vitalii.tymchyshyn at ubs.com> wrote:

> **
> Hello.
>
> We are using pessimistic transaction mode. In this case everything's
> already locked by the time of prepare, is not it?
> As of merge, for quorum mode it's simple - take data from quorum. I think
> I will try to simply suppress sending data from non-quorum members on
> merge. Because currently everyone sends it's data and it creates complete
> mess with unsynchronized data after merge (depending on the timing).
>
> Best regards, Vitalii Tymchyshyn
>
>  ------------------------------
> *From:* infinispan-dev-bounces at lists.jboss.org [mailto:
> infinispan-dev-bounces at lists.jboss.org] *On Behalf Of *Dan Berindei
> *Sent:* Wednesday, June 05, 2013 12:04 PM
>
> *To:* infinispan -Dev List
> *Subject:* Re: [infinispan-dev] Using infinispan as quorum-based nosql
>
>
>
> On Mon, Jun 3, 2013 at 4:23 PM, <vitalii.tymchyshyn at ubs.com> wrote:
>
>> Hello.
>>
>> Thanks for your information. I will subscribe and vote for the issues
>> noted.
>> In the meantime I've implemented hacky JgroupsTransport that "downgrades"
>> all (but CacheViewControlCommand and StateTransferControlCommand)
>> SYNCHRONOUS invokeRemotely calls to SYNCHRONOUS_IGNORE_LEAVERS and checks
>> if required number of answers was received with a filter (I've tried to use
>> original invokeRemotely return value but it often returns some strange
>> value, like empty map). It seems to do the trick for me. But I am still not
>> sure if this has any side effects.
>>
>>
> Indeed, I started working on a solution, but I over-engineered it and then
> I got side-tracked with other stuff. Sorry about that.
>
> The problem with using SYNCHRONOUS_IGNORE_LEAVERS everywhere, as I found
> out, is that you don't want to ignore the primary owner of a key leaving
> during a prepare/lock command (or the coordinator, in REPL mode prior to
> 5.3.0.CR1/ISPN-2772). If that happens, you have to retry on the new primary
> owner, otherwise you can't know if the prepare command has locked the key
> or not.
>
> A similar problem appears in non-transactional caches with
> supportsConcurrentUpdates=true: there the primary owner can ignore any of
> the backup owners leaving, but the originator can't ignore the primary
> owner leaving.
>
>
>
>> For now I can see merge problem in my test: different values are picked
>> during merge. I am going to dig a little deeper and follow up. But it's
>> already a little strange for me, since the test algorithm is:
>> 1)Assign "old" value to full cluster (it's REPL_SYNC mode)
>> 2)Block coordinator
>> 3)Writer "new" value to one of two remaining nodes. It's syncrhonized to
>> second remaining node
>> 4)Unblock coordinator
>> 5)Wait (I could not find a good way to wait for state transfer but wait
>> in this case).
>> 6)Check the value on coordinator
>>
>> And in my test I am randomly getting "old" or "new" in assert. I am now
>> going to check why. May be I will need to "reinitialize" smaller cluster
>> part to ensure data is taken from the quorum part of the cluster.
>>
>>
> We don't handle merges properly. See
> https://issues.jboss.org/browse/ISPN-263 and the discussion at
> http://markmail.org/message/meyczotzobuva7js
>
> What happens right now is that after a merge, all the caches are assumed
> to have up-to-date data, so there is no state transfer. We had several
> ideas floating around on how we could force the smaller partition to
> receive data from the quorum partition, but I think with the public API
> your best option is to stop all the caches in the smaller partition after
> the split and start them back up after the merge.
>
> Cheers
> Dan
>
>
>
>> Best regards, Vitalii Tymchyshyn
>>
>> -----Original Message-----
>> From: infinispan-dev-bounces at lists.jboss.org [mailto:
>> infinispan-dev-bounces at lists.jboss.org] On Behalf Of Galder Zamarreno
>> Sent: Monday, June 03, 2013 9:04 AM
>> To: infinispan -Dev List
>> Subject: Re: [infinispan-dev] Using infinispan as quorum-based nosql
>>
>>
>> On May 30, 2013, at 5:10 PM, vitalii.tymchyshyn at ubs.com wrote:
>>
>> > Hello.
>> >
>> > We are going to use Infinispan in our project as NoSQL solution. It
>> > performs quite well for us, but currently we've faced next problem.
>> > Note: We are using Infinispan 5.1.6 in SYNC_REPL mode in small cluster.
>> > The problem is that when any node fails, any running transactions wait
>> > for Jgroups to decide if it've really failed or not and rollback
>> > because of SuspectException after that. While we can live with a
>> > delay, we'd really like to skip rolling back. As for me, I actually
>> > don't see a reason for rollback because transactions started after
>> > leave will succeed. So, as for me, previously running transactions
>> > could do the same.
>>
>> We're aware of the problem (https://issues.jboss.org/browse/ISPN-2402).
>>
>> @Dan, has there been any updates on this?
>>
>> > The question for is if node that left will synchronize it's state
>> > after merge (even if merge was done without infinispan restart). As
>> > for me, it should or it won't work correctly at all.
>>
>> This is not in yet: https://issues.jboss.org/browse/ISPN-263
>>
>> > So, I've found RpcManager's ResponseMode.SYNCHRONOUS_IGNORE_LEAVERS
>> > and think on switching to it for RpcManager calls that don't specify
>> > ResponseMode explicitly. As for me, it should do the trick. Also, I am
>> > going to enforce Quorum number of reponses, but that's another story.
>> > So, how do you think, would it work?
>>
>> ^ Not sure if that'll work. @Dan?
>>
>> > P.S. Another Q for me, how does it work now, when SuspectException is
>> > thrown from CommitCommand broadcasting. Af far as I can see, commit is
>> > still done on some remote nodes (that are still in the cluster), but
>> > rolled back on local node because of this exception. Am I correct?
>>
>> ^ How Infinispan reacts in these situations depends a lot on the type of
>> communications (synchronous or asynchronous) and the transaction
>> configuration. Mircea can provide more details on this.
>>
>> Cheers,
>>
>> > This
>> > can cause inconsistencies, but we must leave with something in
>> > peer-to-peer world :) The only other option is to switch from
>> > write-all, read-local to write-quorum, read-quorum scenario that is
>> > too complex move for Infinispan as for me.
>> >
>> > Best regards, Vitalii Tymchyshyn
>> >
>> > Please visit our website at
>> > http://financialservicesinc.ubs.com/wealth/E-maildisclaimer.html
>> > for important disclosures and information about our e-mail policies.
>> > For your protection, please do not transmit orders or instructions by
>> > e-mail or include account numbers, Social Security numbers, credit
>> > card numbers, passwords, or other personal information.
>> >
>> > _______________________________________________
>> > infinispan-dev mailing list
>> > infinispan-dev at lists.jboss.org
>> > https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
>>
>> --
>> Galder Zamarreño
>> galder at redhat.com
>> twitter.com/galderz
>>
>> Project Lead, Escalante
>> http://escalante.io
>>
>> Engineer, Infinispan
>> http://infinispan.org
>>
>>
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> Please visit our website at
>> http://financialservicesinc.ubs.com/wealth/E-maildisclaimer.html
>> for important disclosures and information about our e-mail
>> policies. For your protection, please do not transmit orders
>> or instructions by e-mail or include account numbers, Social
>> Security numbers, credit card numbers, passwords, or other
>> personal information.
>>
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
>
>
> Please visit our website at
> http://financialservicesinc.ubs.com/wealth/E-maildisclaimer.html
> for important disclosures and information about our e-mail
> policies. For your protection, please do not transmit orders
> or instructions by e-mail or include account numbers, Social
> Security numbers, credit card numbers, passwords, or other
> personal information.
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20130606/4df30089/attachment.html 


More information about the infinispan-dev mailing list