[infinispan-dev] DIST-SYNC, put(), a problem and a solution

Bela Ban bban at redhat.com
Mon Aug 11 03:42:11 EDT 2014



On 06/08/14 22:32, Erik Salter wrote:
> Hi Sanne,
>
> So I guess I'm one of the five.  This particular issue has been happening
> over and over again in my production environment.  It is a performance and
> availability killer, since any sort of network blip will result in these
> lock scenarios.  And network blips in my data center environment are very
> real and surprisingly common.  In the past 5-6 weeks, we've had 3
> redundant routers fail. The best thing that could happen is lock
> contention.
>
> Almost any sort of MERGE event will result in "stale" locks -- locks that
> are never released by the system.


This is seperate from the put-while-holding-the-lock issue. Have you 
created a JIRA to keep track of this ?

> Now consider that I can't throw more
> threads at the problem.  I have 1100 OOB + ISPN threads available per
> cluster.


Understood.


> So what I've hacked in production is something that basically examines the
> LockManager (and TxTable) and manages locks that are > 5x lock acquisition
> time old.  If I don't, then I start getting user threads and OOB threads
> backed up, and it's surprising how quickly these pools can exhaust.
> Nowhere near a scalable and elastic solution, but desperate times and all
> that.
>
>
> I can't agree more that any solution must take NBST into consideration,
> especially WRT pessimistic locks.
>
> Regards,
>
> Erik
>
>
> On 8/6/14, 2:50 PM, "Sanne Grinovero" <sanne at infinispan.org> wrote:
>
>> On 6 August 2014 18:49, Dan Berindei <dan.berindei at gmail.com> wrote:
>>>
>>>
>>>
>>> On Wed, Aug 6, 2014 at 6:19 PM, Bela Ban <bban at redhat.com> wrote:
>>>>
>>>> Hey Dan,
>>>>
>>>> On 06/08/14 16:13, Dan Berindei wrote:
>>>>> I could create the issue in JIRA, but I wouldn't make it high
>>>> priority
>>>>> because I think it have lots of corner cases with NBST and cause
>>>>> headaches for the maintainers of state transfer ;)
>>>>
>>>> I do believe the put-while-holding-the-lock issue *is* a critical
>>>> issue;
>>>> anyone banging a cluster of Infinispan nodes with more than 1 thread
>>>> will run into lock timeouts, with or without transactions. The only
>>>> workaround for now is to use total order, but at the cost of reduced
>>>> performance. However, once a system starts hitting the lock timeout
>>>> issues, performance drops to a crawl, way slower than TO, and work
>>>> starts to pile up, which compounds the problem.
>>>
>>>
>>> I wouldn't call it critical because you can always increase the number
>>> of
>>> threads. It won't be pretty, but it will work around the thread
>>> exhaustion
>>> issue.
>>
>> If Infinispan doesn't do it automatically, I wouldn't count that as a
>> solution.
>> Consider the project goal is to make it easy to scale up/down
>> dynamically.. if it requires experts to be alert all the time to for
>> such manual interventions it's a failure.
>> Besides, I can count the names of people able to figure such trouble
>> out on a single hand.. so I agree this is critical.
>>
>>
>>>> I believe doing a sync RPC while holding the lock on a key is asking
>>>> for
>>>> trouble and is (IMO) an anti-pattern.
>>>
>>>
>>> We also hold a lock on a key between the LockControlCommand and the
>>> TxCompletionNotificationCommand in pessimistic-locking caches, and
>>> there's
>>> at least one sync PrepareCommand RPC between them...
>>>
>>> So I don't see it as an anti-pattern, the only problem is that we
>>> should be
>>> able to do that without blocking internal threads in addition to the
>>> user
>>> thread (which is how tx caches do it).
>>>
>>>>
>>>> Sorry if this has a negative impact on NBST, but should we not fix this
>>>> because we don't want to risk a change to NBST ?
>>>
>>>
>>> I'm not saying it will have a negative impact on NBST, I'm just saying I
>>> don't want to start implementing an incomplete proposal for the basic
>>> flow
>>> and leave the state transfer/topology change issues for "later". When
>>> happens when a node leaves, when a backup owner is added, or when the
>>> primary owner changes should be part of the initial discussion, not an
>>> afterthought.
>>
>> Absolutely!
>> No change should be done leaving questions open, and I don't presume I
>> suggested a solution I was just trying to start a conversation on
>> using "such a pattern".
>> But also I believe we already had such conversations in past meetings,
>> so my words were terse and short because I just wanted to remind about
>> those.
>>
>>
>>> E.g. with your proposal, any updates in the replication queue on the
>>> primary
>>> owner will be lost when that primary owner dies, even though we told the
>>> user that we successfully updated the key. To quote from my first email
>>> on
>>> this thread: "OTOH, if the primary owner dies, we have to ask a backup,
>>> and
>>> we can lose the modifications not yet replicated by the primary."
>>>
>>> With Sanne's proposal, we wouldn't report to the user that we stored the
>>> value until all the backups confirmed the update, so we wouldn't have
>>> that
>>> problem. But I don't see how we could keep the sequence of versions
>>> monotonous when the primary owner of the key changes without some extra
>>> sync
>>> RPCs (also done while holding the key lock). IIRC TOA also needs some
>>> sync
>>> RPCs to generate its sequence numbers.
>>
>> I don't know how NBST v.21 is working today, but I trust it doesn't
>> lose writes and that we should break down the problems in smaller
>> problems, in this case I hope to build on the solid foundations of
>> NBST.
>>
>> When the key is re-possessed by a new node (and this starts to
>> generate "reference" write commands), you could restart the sequences:
>> you don't need an universal monotonic number, all what backup owners
>> need is an ordering rule and understand that the commands coming from
>> the new owner are more recent than the old owner. AFAIK you already
>> have the notion of view generation id?
>> Essentially we'd need to store together with the entry not only its
>> sequence but also the viewid. It's a very simplified (compact) vector
>> clock, because in practice from this viewId we can extrapolate
>> addresses and owners.. but is simpler than the full pattern, as you
>> only need the last one, as the longer tail of events is handled by
>> NBST I think?
>>
>> One catch is I think you need tombstones, but those are already needed
>> for so many things that we can't avoid them :)
>>
>> Cheers,
>> Sanne
>>
>>
>>>
>>>>
>>>>> Besides, I'm still not sure I understood your proposals properly,
>>>> e.g.
>>>>> whether they are meant only for non-tx caches or you want to change
>>>>> something for tx caches as well...
>>>>
>>>> I think this can be used for both cases; however, I think either
>>>> Sanne's
>>>> solution of using seqnos *per key* and updating in the order of seqnos
>>>> or using Pedro's total order impl are probably better solutions.
>>>>
>>>> I'm not pretending these solutions are final (e.g. Sanne's solution
>>>> needs more thought when multiple keys are involved), but we should at
>>>> least acknowledge the issue exists, create a JIRA to prioritize it and
>>>> then start discussing solutions.
>>>>
>>>
>>> We've been discussing solutions without a JIRA just fine :)
>>>
>>> My feeling so far is that the thread exhaustion problem would be better
>>> served by porting TO to non-tx caches and/or changing non-tx locking to
>>> not
>>> require a thread. I have created an issue for TO [1], but IMO the
>>> locking
>>> rework [2] should be higher priority, as it can help both tx and non-tx
>>> caches.
>>>
>>> [1] https://issues.jboss.org/browse/ISPN-4610
>>> [2] https://issues.jboss.org/browse/ISPN-2849
>>>
>>>>
>>>>>
>>>>>
>>>>> On Wed, Aug 6, 2014 at 1:02 PM, Bela Ban <bban at redhat.com
>>>>> <mailto:bban at redhat.com>> wrote:
>>>>>
>>>>>      Seems like this discussion has died with the general agreement
>>>> that
>>>>> this
>>>>>      is broken and with a few proposals on how to fix it, but without
>>>> any
>>>>>      follow-up action items.
>>>>>
>>>>>      I think we (= someone from the ISPN team) need to create a JIRA,
>>>>>      preferably blocking.
>>>>>
>>>>>      WDYT ?
>>>>>
>>>>>      If not, here's what our options are:
>>>>>
>>>>>      #1 I'll create a JIRA
>>>>>
>>>>>      #2 We'll hold the team meeting in Krasnojarsk, Russia
>>>>>
>>>>>      #3 There will be only vodka, no beers in #2
>>>>>
>>>>>      #4 Bela will join the ISPN team
>>>>>
>>>>>      Thoughts ?
>>>>
>>>>
>>>> --
>>>> Bela Ban, JGroups lead (http://www.jgroups.org)
>>>> _______________________________________________
>>>> infinispan-dev mailing list
>>>> infinispan-dev at lists.jboss.org
>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>
>>>
>>>
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> infinispan-dev at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>

-- 
Bela Ban, JGroups lead (http://www.jgroups.org)


More information about the infinispan-dev mailing list