[jboss-jira] [JBoss JIRA] (JGRP-1785) TOA, inconsistent message delivery

Mon Jan 27 11:55:29 EST 2014

    [ https://issues.jboss.org/browse/JGRP-1785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12939066#comment-12939066 ] 

Ryan Emerson commented on JGRP-1785:
------------------------------------

I have encountered this error problem using several different configuration files, however at the minute I am just using the "toa.xml" file that comes with JGroups.  

{code: title=toa.xml | borderStyle=solid}
<config xmlns="urn:org:jgroups"
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xsi:schemaLocation="urn:org:jgroups http://www.jgroups.org/schema/JGroups-3.3.xsd">
    <UDP
         mcast_port="${jgroups.udp.mcast_port:45588}"
         tos="8"
         loopback="true"
         max_bundle_size="64K"
         max_bundle_timeout="30"
         ip_ttl="${jgroups.udp.ip_ttl:8}"
         enable_diagnostics="true"
         thread_naming_pattern="cl"

         timer_type="new3"
         timer.min_threads="4"
         timer.max_threads="10"
         timer.keep_alive_time="3000"
         timer.queue_max_size="500"

         thread_pool.enabled="true"
         thread_pool.min_threads="2"
         thread_pool.max_threads="8"
         thread_pool.keep_alive_time="5000"
         thread_pool.queue_enabled="true"
         thread_pool.queue_max_size="10000"
         thread_pool.rejection_policy="discard"

         oob_thread_pool.enabled="true"
         oob_thread_pool.min_threads="1"
         oob_thread_pool.max_threads="8"
         oob_thread_pool.keep_alive_time="5000"
         oob_thread_pool.queue_enabled="false"
         oob_thread_pool.queue_max_size="100"
         oob_thread_pool.rejection_policy="discard"/>

    <PING timeout="2000"
            num_initial_members="20"/>
    <MERGE2 max_interval="30000"
            min_interval="10000"/>
    <FD_SOCK/>
    <FD_ALL/>
    <VERIFY_SUSPECT timeout="1500"  />
    <BARRIER />
    <pbcast.NAKACK2 xmit_interval="1000"
                    xmit_table_num_rows="100"
                    xmit_table_msgs_per_row="2000"
                    xmit_table_max_compaction_time="30000"
                    max_msg_batch_size="500"
                    use_mcast_xmit="false"
                    discard_delivered_msgs="true"/>
    <UNICAST3 xmit_interval="500"
              xmit_table_num_rows="100"
              xmit_table_msgs_per_row="2000"
              xmit_table_max_compaction_time="60000"
              conn_expiry_timeout="0"
              max_msg_batch_size="500"/>
    <pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000"
                   max_bytes="4M"/>
    <pbcast.GMS print_local_addr="true" join_timeout="3000"
                view_bundling="true"/>
    <UFC max_credits="2M"
         min_threshold="0.4"/>
    <MFC max_credits="2M"
         min_threshold="0.4"/>
    <FRAG2 frag_size="60K"  />
    <RSVP resend_interval="2000" timeout="10000"/>
    <tom.TOA />
    <pbcast.STATE_TRANSFER />
    <!-- pbcast.FLUSH  /-->
</config>
{code}

> TOA, inconsistent message delivery
> ----------------------------------
>
>                 Key: JGRP-1785
>                 URL: https://issues.jboss.org/browse/JGRP-1785
>             Project: JGroups
>          Issue Type: Bug
>    Affects Versions: 3.5
>         Environment: Fedora release 17 (Beefy Miracle)
>            Reporter: Ryan Emerson
>            Assignee: Pedro Ruivo
>             Fix For: 3.5
>
>
> When sending total order messages between two nodes, for a prolonged period of time, an inconsistency is encountered when comparing each node's total order of messages.  
> I believe this issue is related to how sequence numbers are handled in the implementation, not the protocol itself.  I appreciate that TOA is designed for environments where the subset of destinations in the network varies, however I have been unable to reproduce this error when the total number of nodes is > 2.  It is still possible that this inconsistency may occur after a large amount of time when the number of nodes is > 2.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira