[
https://issues.jboss.org/browse/ISPN-1016?page=com.atlassian.jira.plugin....
]
Galder Zamarreño commented on ISPN-1016:
----------------------------------------
The logs confirm what I say, the previous view correctly installed is:
{code}[JBoss] 09:38:21,342 TRACE [HotRodEncoder$] Write hash distribution change response
header HashDistAwareResponse(TopologyView(7,List(TopologyAddress(perf20,11222,Map( ->
3058),perf20-12777), TopologyAddress(perf19,11222,Map( -> 3411),perf19-25830),
TopologyAddress(perf17,11222,Map( -> 8136),perf17-33962))),2,1,10240){code}
And the new one being added that cause the lock up is:
{code}[JBoss] 09:39:36,222 TRACE [HotRodEncoder$] Write hash distribution change response
header HashDistAwareResponse(TopologyView(8,List(TopologyAddress(perf20,11222,Map( ->
3058),perf20-12777), TopologyAddress(perf19,11222,Map( -> 3411),perf19-25830),
TopologyAddress(perf17,11222,Map( -> 8136),perf17-33962),
TopologyAddress(perf18,11222,Map(),perf18-56988))),2,1,10240){code}
perf18 is being started and hash id map is still empty from the startup. The fix here is
to calculate that map on startup rather than at encoding time. The benefit here is double:
First, only one node updates the view as opposed to potentially all of them at the same
time, and second, it removes code from the response encoding part making the requests
faster to respond.
We can do it on startup cos all caches are defined in advanced and we start them all :)
Hash-aware view update causing lock ups in Hot Rod
--------------------------------------------------
Key: ISPN-1016
URL:
https://issues.jboss.org/browse/ISPN-1016
Project: Infinispan
Issue Type: Bug
Components: Cache Server
Affects Versions: 4.2.1.FINAL
Reporter: Galder Zamarreño
Assignee: Galder Zamarreño
Fix For: 4.2.2.BETA1, 5.0.0.BETA1
When encoding a Hot Rod response, if the encoder discovers that the client has an old
view, it decides that a new topology needs to be sent to the client. Now, when building
this view in distributed caches, the encoder checks whether in the new view, any of the
nodes hash ids has changed, and if so, it sends a cluster wide replace with the view
containing the new hash ids.
This seems to cause some deadlocks as shown in JBPAPP-6113 where one node is timing to
send the replace to other node, and another node is timing out doing the same. This needs
further thinking but have some ideas in mind...
On top of that, it appears that a failure here is causing problems the requests after
that, so some thinking needs to be done to see if that replace() call can be moved out of
there...
--
This message is automatically generated by JIRA.
For more information on JIRA, see:
http://www.atlassian.com/software/jira