[infinispan-issues] [JBoss JIRA] Commented: (ISPN-1016) Hash-aware view update causing lock ups in Hot Rod

Thu Mar 31 07:46:37 EDT 2011

    [ https://issues.jboss.org/browse/ISPN-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12592922#comment-12592922 ] 

Galder Zamarreño commented on ISPN-1016:
----------------------------------------

The logs confirm what I say, the previous view correctly installed is:

{code}[JBoss] 09:38:21,342 TRACE [HotRodEncoder$] Write hash distribution change response header HashDistAwareResponse(TopologyView(7,List(TopologyAddress(perf20,11222,Map( -> 3058),perf20-12777), TopologyAddress(perf19,11222,Map( -> 3411),perf19-25830), TopologyAddress(perf17,11222,Map( -> 8136),perf17-33962))),2,1,10240){code}

And the new one being added that cause the lock up is:
{code}[JBoss] 09:39:36,222 TRACE [HotRodEncoder$] Write hash distribution change response header HashDistAwareResponse(TopologyView(8,List(TopologyAddress(perf20,11222,Map( -> 3058),perf20-12777), TopologyAddress(perf19,11222,Map( -> 3411),perf19-25830), TopologyAddress(perf17,11222,Map( -> 8136),perf17-33962), TopologyAddress(perf18,11222,Map(),perf18-56988))),2,1,10240){code}

perf18 is being started and hash id map is still empty from the startup. The fix here is to calculate that map on startup rather than at encoding time. The benefit here is double: First, only one node updates the view as opposed to potentially all of them at the same time, and second, it removes code from the response encoding part making the requests faster to respond.

We can do it on startup cos all caches are defined in advanced and we start them all :)

> Hash-aware view update causing lock ups in Hot Rod
> --------------------------------------------------
>
>                 Key: ISPN-1016
>                 URL: https://issues.jboss.org/browse/ISPN-1016
>             Project: Infinispan
>          Issue Type: Bug
>          Components: Cache Server
>    Affects Versions: 4.2.1.FINAL
>            Reporter: Galder Zamarreño
>            Assignee: Galder Zamarreño
>             Fix For: 4.2.2.BETA1, 5.0.0.BETA1
>
>
> When encoding a Hot Rod response, if the encoder discovers that the client has an old view, it decides that a new topology needs to be sent to the client. Now, when building this view in distributed caches, the encoder checks whether in the new view, any of the nodes hash ids has changed, and if so, it sends a cluster wide replace with the view containing the new hash ids.
> This seems to cause some deadlocks as shown in JBPAPP-6113 where one node is timing to send the replace to other node, and another node is timing out doing the same. This needs further thinking but have some ideas in mind...
> On top of that, it appears that a failure here is causing problems the requests after that, so some thinking needs to be done to see if that replace() call can be moved out of there...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira