[jboss-cvs] JBoss Messaging SVN: r8277 - branches/Branch_1_4/docs/userguide/en/modules.

Sat Apr 23 11:33:01 EDT 2011

Author: gaohoward
Date: 2011-04-23 11:33:01 -0400 (Sat, 23 Apr 2011)
New Revision: 8277

Modified:
   branches/Branch_1_4/docs/userguide/en/modules/c_configuration.xml
   branches/Branch_1_4/docs/userguide/en/modules/configuration.xml
Log:
JBMESSAGING-1842


Modified: branches/Branch_1_4/docs/userguide/en/modules/c_configuration.xml
===================================================================

--- branches/Branch_1_4/docs/userguide/en/modules/c_configuration.xml	2011-04-23 14:10:05 UTC (rev 8276)
+++ branches/Branch_1_4/docs/userguide/en/modules/c_configuration.xml	2011-04-23 15:33:01 UTC (rev 8277)
@@ -112,4 +112,43 @@
     connection, lookup a new connection factory from HA JNDI and recreate the
     connection.</para>
   </section>
-</chapter>
\ No newline at end of file
+  
+    <section id="c_conf.newfailovermodel">
+    <title>Handling an Isolated Node in a Cluster</title>
+
+    <para>Sometimes due to temporary network issues a node is falsely 
+    reported to have left a cluster and a failover is performed on behave of this node.
+    The result would be that the cluster thinks the node is dead but actually it is alive.
+    This can cause problems such as duplicated message delivery and/or orphaned messages.</para>
+
+    <para>To avoid such situation to happen, a JBM node can be configured to write its
+    timestamps to DB in order to tell other nodes in the cluster its true status. Only when the 
+    timestamp is failed to update can a failover for it happen. You need to set the KeepOldFailoverModel
+     attribute to false and give a suitable NodeStateRefreshInterval value. See the configuration for Post Office.</para>
+
+    <para>If a node is falsely reported to have left a cluster and at the same time
+    this node also loses its DB connection, the node won't be able to update its timestamps during this failure time and
+    yet keeps alive. The cluster has no way to tell this node's real state but to think it is dead and will failover for it 
+    -- resulting in potential issues.</para>
+    
+    <para>To avoid this, a new MessagingClusterHealthMBean mbean is introduced. This mbean is
+responsible for monitoring the node's state
+and stops/starts a node under such a failure situation above mentioned. When a node is 'shunned'
+from the cluster and also lose
+its DB connection, this mbean will shut down the node immediately, waiting for it to be failed
+over, observing the JGroups
+status and DB status, and restarting the node if DB connection is restored and JGroups is back to
+normal.
+To enable this feature, you need to deploy this mbean and add a 'MessagingClusterHealthMBean'
+attribute in your ServerPeer mbean, like
+    </para>
+
+<programlisting>
+&lt;depends optional-attribute-
+name="MessagingClusterHealthMBean"&gt;jboss.messaging:service=MessagingClusterHealt
+hMBean&lt;/depends&gt;
+</programlisting>
+
+  </section>
+
+</chapter>

Modified: branches/Branch_1_4/docs/userguide/en/modules/configuration.xml
===================================================================
--- branches/Branch_1_4/docs/userguide/en/modules/configuration.xml	2011-04-23 14:10:05 UTC (rev 8276)
+++ branches/Branch_1_4/docs/userguide/en/modules/configuration.xml	2011-04-23 15:33:01 UTC (rev 8277)
@@ -405,6 +405,12 @@
         shuts down itself immediately to allow it to be failed over to another node. Default is false.</para>
       </section>
 
+      <section id="conf.serverpeer.attributes.messagingclusterhealthmbean">
+        <title>MessagingClusterHealthMBean</title>
+
+        <para>The optional mbean object name. It is used to monitor the node's status in a cluster.</para>
+      </section>
+
       <section id="conf.serverpeer.operations">
         <title>We now discuss the MBean operations of the ServerPeer
         MBean.</title>
@@ -660,7 +666,39 @@
     </warning>
 
   </section>
+  
+  <section id="common.attributes.db">
+    <title>Common Attributes for Database Access</title>
+    
+    <para>The three mbeans The Persistence Manager, The PostOffice and the JMS User
+    Manager requires access to a data source. If the data source fails, e.g. connection broken or
+    SQL operation failure, the mbeans can be configured to retry the DB operation. Each of the
+    three mbeans is configured separately.</para>
+    
+    <section id="common.attributes.db.retryonconnectionfailure">
+      <title>RetryOnConnectionFailure</title>
 
+      <para>This is a boolean type parameter. It indicates whether to retry on DB connection failures. 
+      Default is false.</para>
+    </section>
+    
+    <section id="common.attributes.db.maxretry">
+      <title>maxretry</title>
+
+      <para>This is an Integer type parameter. It specifies maximal retry times on DataSource failures,
+default is 25. If you want retry forever, set it to -1.</para>
+    </section>
+    
+    <section id="common.attributes.db.retryinterval">
+      <title>retryinterval</title>
+
+      <para>This is an Integer type parameter. It specifies the retry interval (in milliseconds) between two
+consecutive retries, default 1000 (1 sec).
+      </para>
+    </section>
+
+  </section>
+
   <section id="conf.postoffice">
     <title>Configuring the Post office</title>
 
@@ -1037,6 +1075,23 @@
         if it is failed to get a connection. Default value is false.</para>
       </section>
 
+      <section id="conf.postoffice.attributes.keepoldfailovermodel">
+        <title>KeepOldFailoverModel</title>
+
+        <para>This is a boolean type parameter indicating whether to enable the new failover mode. Default is
+        true (disable new failover behavior).
+        </para>
+      </section>
+
+      <section id="conf.postoffice.attributes.nodestaterefreshinterval">
+        <title>NodeStateRefreshInterval</title>
+
+        <para>This long type parameter tells a node how long it should be between two consecutive timestamp
+update. A node needs constantly updates its timestamp to tell the cluster that it is alive. And the cluster uses this
+parameter to decide when it treats a node as being dead. Default is 30000 (30 seconds). This parameter takes effect only
+when KeepOldFailoverModel is set to false.</para>
+      </section>
+
     </section>
   </section>