[jboss-jira] [JBoss JIRA] (JGRP-1401) RELAY: messages lost when relay coordinator crashes

Bela Ban (JIRA) jira-events at lists.jboss.org
Mon Jan 16 07:47:18 EST 2012


     [ https://issues.jboss.org/browse/JGRP-1401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bela Ban updated JGRP-1401:
---------------------------

    Fix Version/s: 3.2
                       (was: 3.1)

    
> RELAY: messages lost when relay coordinator crashes
> ---------------------------------------------------
>
>                 Key: JGRP-1401
>                 URL: https://issues.jboss.org/browse/JGRP-1401
>             Project: JGroups
>          Issue Type: Feature Request
>            Reporter: Bela Ban
>            Assignee: Bela Ban
>             Fix For: 3.2
>
>
> When we have sites {A,B,C} and {X,Y,Z} (with relay coords A and X), during the time X leaves (or crashes) and Y taking over, all messages sent by the first site are not relayed to the second site.
> Because the sites are autonomous, there won't be any retransmission of the dropped messages.
> This can have an adverse affect, e.g. in Infinispan:
> - Say key K is stored on A, B and Z
> - Now we're updating K, on A and B, but before the change is relayed to the other site, X crashes
> - If there is no rebalancing, e.g. because K is still to be stored on A, B and Z, since the update on Z was dropped, Z has a stale value !
> SOLUTION 1:
> - Have a backup coordinator B cache the last N messages in memory (with overflow to disk)
> - A numbers relayed messages
> - As soon as A has relayed message #50, it sends this info to B. Or, alternatively, this could be done periodically, or based on the number of relayed messages (e.g. every 10 messages)
> - B can then purge those messages
> - When A crashes, B runs a reconciliation protocol with X to determine whether to relay some backed up messages
> - C now starts acting as backup relay to B
> This solution is probably the simplest to implement, and doesn't require any code changes in Infinispan. However, there is still a chance of message loss if both the relay *and* the backup relay crash at the same time.
> SOLUTION 2:
> - After a crash (not a graceful leave !) of a relay coordinator, there has to be a full rebalancing of all keys
> - This is wasteful though
> - May not be needed, perhaps Infinispan could check whether a full rebalancing is required ?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.jboss.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        


More information about the jboss-jira mailing list