[infinispan-issues] [JBoss JIRA] (ISPN-2580) Do not request segments from all nodes at once

Radim Vansa (JIRA) jira-events at lists.jboss.org
Tue Dec 4 08:38:21 EST 2012


    [ https://issues.jboss.org/browse/ISPN-2580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12739253#comment-12739253 ] 

Radim Vansa commented on ISPN-2580:
-----------------------------------

Yes, the state transfer is not finished in this case. However, I have not marked this as critical as it is possible that the ST failure may be evaded by another JGroups configuration, different chunk size etc.
I think I have experienced it with clusters from size 8, but I am not 100% sure, cluster size 8 might be affected as well (we don't keep track of test runs in Hyperion so I can't verify that). I am definitely experiencing this in resilience test when I kill one of 32 nodes and then try to start it and join back.

Yes, 881070 is probably one of the victims as I can see long delays between some of the StateResponseCommands. But it's hard to confirm without trace logging on org.jgroups.
                
> Do not request segments from all nodes at once
> ----------------------------------------------
>
>                 Key: ISPN-2580
>                 URL: https://issues.jboss.org/browse/ISPN-2580
>             Project: Infinispan
>          Issue Type: Enhancement
>          Components: State transfer
>    Affects Versions: 5.2.0.Beta5
>            Reporter: Radim Vansa
>            Assignee: Mircea Markus
>             Fix For: 5.2.0.Beta6
>
>
> When a new node joins large cluster filled with data, it gets the new CH and REBALANCE_START command, and requests data from all nodes at once (or almost all with even distribution of segments). It may be not able to handle this amount of transfers in parallel even at the JGroups level - this results in data sent to the node and discarded at the receiver, sent again and again. With a heavy congestion the node just buffers fragments of a message from one sender and never passes this up.
> The number of StateRequestCommands(START_STATE_TRANSFER) should be limited so that the node is not congested.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


More information about the infinispan-issues mailing list