[jboss-jira] [JBoss JIRA] (JGRP-1549) TCP: handle concurrent connections more gracefully

Dan Berindei (JIRA) jira-events at lists.jboss.org
Wed Dec 5 08:35:21 EST 2012


    [ https://issues.jboss.org/browse/JGRP-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12739617#comment-12739617 ] 

Dan Berindei commented on JGRP-1549:
------------------------------------

I've written an Infinispan test that reproduces the problem pretty reliably (but with a pretty invasive modification in ClusterTopologyManagerImpl, delaying the rebalance confirmation). The sources are here: https://github.com/danberindei/infinispan/tree/t_jgrp-1549_m

I tried to reproduce it with plain JGroups, but I wasn't successful. I think the initial discovery phase changed the way connections were created(the Infinispan test suite uses our custom TEST_PING protocol, so discovery doesn't create any connections).

I've also looked at the code in TCPConnectionMap and I think I see two problems:

1. After creating a connection, a node should check the just-created connection against any existing connection in the map (only if it's open, obviously) and only replace it if it satisfies the same check that's in ConnectionAcceptor.

2. On a send exception, the sender should only close the connection that it used to send the message. The acceptor might have replaced the connection in the map sending was in progress.

I think either one of these could explain why the initial message was dropped in the test. I'm not sure why UNICAST2:STABLE doesn't kick in and force the re-transmission of those messages for 15 seconds though...
                
> TCP: handle concurrent connections more gracefully
> --------------------------------------------------
>
>                 Key: JGRP-1549
>                 URL: https://issues.jboss.org/browse/JGRP-1549
>             Project: JGroups
>          Issue Type: Enhancement
>            Reporter: Bela Ban
>            Assignee: Bela Ban
>             Fix For: 3.3
>
>         Attachments: cft.log.gz
>
>
> When A connects to B and B connects to A *concurrently*, and no existing connections are present, then one member (with the higher address) will prevail, and the other one will close its connection and drop the message.
> This is not usually an issue, as higher-up layers will retransmit the message, thus re-establishing the connection.
> However, if we have a protocol based on negative acks, such as UNICAST2, the retransmission might take a while if that message was the last one.
> SOLUTION:
> The end that closes the connection should simply resend the message *once*, thus re-creating the connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


More information about the jboss-jira mailing list