[jboss-jira] [JBoss JIRA] Commented: (JGRP-1060) NAKACK has inconsistent internal state after concurrent node startup

Dennis Reed (JIRA) jira-events at lists.jboss.org
Thu Sep 24 17:49:49 EDT 2009


    [ https://jira.jboss.org/jira/browse/JGRP-1060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12487332#action_12487332 ] 

Dennis Reed commented on JGRP-1060:
-----------------------------------

What I think is happening is:
- Node2 gets view (Node1, Node2).  NAKACK sets"members" and "received_msgs" to (Node1, Node2)
- Node1 sends the state to Node2, including its current digest (Node1, Node2)
- Node2 gets view (Node1, Node2, Node3).  NAKACK sets "members" and "received_msgs" to (Node1, Node2, Node3)
- Node2 processes the state.  STATE_TRANSFER sends Event.SET_DIGEST (Node1, Node2) to NAKACK
- NAKACK sets received_msgs to (Node1, Node2).

At this point, received_msgs is missing node3, so Node2 drops all messages from Node3.


> NAKACK has inconsistent internal state after concurrent node startup
> --------------------------------------------------------------------
>
>                 Key: JGRP-1060
>                 URL: https://jira.jboss.org/jira/browse/JGRP-1060
>             Project: JGroups
>          Issue Type: Bug
>    Affects Versions: 2.4.5
>            Reporter: Dennis Reed
>            Assignee: Bela Ban
>
> Three nodes are started concurrently.  The log from the second node to join shows the following (IPs/ports have been replaced)
>     05:26:00,594 45102 INFO  [org.jboss.cache.TreeCache] (main:) viewAccepted(): [node1:1234|1] [node1:1234, node2:1234]
>     05:26:00,732 45240 INFO  [org.jboss.cache.TreeCache] (main:) TreeCache local address is node2:1234
>     05:26:00,852 45360 INFO  [org.jboss.cache.TreeCache] (IncomingPacketHandler (channel=Tomcat-DefaultPartition):) viewAccepted(): [node1:1234|2] [node1:1234, node2:1234, node3:1234]
>     05:26:00,861 45369 INFO  [org.jboss.cache.TreeCache] (IncomingPacketHandler (channel=Tomcat-DefaultPartition):) received the state (size=1024 bytes)
> Then many instances of the following logs for more than a day:
>     WARN  [org.jgroups.protocols.pbcast.NAKACK] (IncomingPacketHandler (channel=Tomcat-DefaultPartition):) node2:1234] discarded message from non-member node3:1234, my view is [node1:1234|2] [node1:1234, node2:1234, node3:1234]
>     ERROR [org.jgroups.protocols.pbcast.NAKACK] (Timer-3:) sender node3:1234 not found in received_msgs
> For these messages to be logged, NAKACK is in an inconsistent internal state.  The addresses in "members" does not match "received_msgs".

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://jira.jboss.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        



More information about the jboss-jira mailing list