[jboss-user] [JBoss Cache Users] - Buddy replication - initial state transfer after node restar

RichardTaylor do-not-reply at jboss.com
Fri Oct 9 10:39:04 EDT 2009

We are seeing an issue that I'd like to confirm with others before I write it up as a jira issue.

JBoss 5.1.0.
JBoss Cache 3.2.1
Buddy Replication (two nodes, separate machines)

We're currently using Total Replication for HTTP Session replication but we're increasing our node count (to three for now) and we'd like to change from total, to buddy replication.  After a significant amount of testing, buddy replication works very well, except in one case for us.

The problem:
Two servers, A and B.  
- Start server A
- Log into web app, creating an HTTP Session, S1, on server A
- Start server B.  When B starts (this first time), the session S1 is perfectly replicated from A to B into B's "_BUDDY_BACKUP_" for server A
- Now restart server B, upon starting the second time, the session S1 from  server A is only partially replicated to the "_BUDDY_BACKUP_" tree on server B.   In particular it appears that (at least) the "DistributableSessionMetadata" was not replicated (should have a key of "2" in the replicated cache entry)
- Shut down server A, when the user hits server B, JBoss will try to unserialize and use S1, however it will fail because some data is not present in S1, such as the session metadata.

Example session S1 that successfully replicated after the initial startup of server B
--- Cache1 ---
  | /  {}
  |   /_BUDDY_BACKUP_  {}
  |     /  {}
  |       /JSESSION  {}
  |         /ROOT_localhost  {}
  |           /TYqa9j30si-QlGaVL9OVvQ__  {0=16, 1=1255098672537, org.jboss.seam.security.rememberMe=org.jboss.ha.framework.server.SimpleCachableMarshalledValue{raw=nullserialized=true}, 2=org.jboss.web.tomcat.service.session.distributedcache.spi.DistributableSessionMetadata at c1e908b, org.jboss.seam.CONVERSATION#1$org.jboss.seam.persistence.persistenceContexts=org.jboss.ha.framework.server.SimpleCachableMarshalledValue{raw=nullserialized=true}, org.jboss.seam.core.conversationEntries=org.jboss.ha.framework.server.SimpleCachableMarshalledValue{raw=nullserialized=true}, org.jboss.seam.international.localeSelector=org.jboss.ha.framework.server.SimpleCachableMarshalledValue{raw=nullserialized=true}, org.jboss.seam.CONVERSATION#1$org.jboss.seam.core.conversation=org.jboss.ha.framework.server.SimpleCachableMarshalledValue{raw=nullserialized=true}, org.jboss.seam.CONVERSATION#1$org.jboss.seam.international.statusMessages=org.jboss.ha.framework.server.SimpleCachableMarshalledValue{raw=nullserialized=true}, org.jboss.seam.security.identity=org.jboss.ha.framework.server.SimpleCachableMarshalledValue{raw=nullserialized=true}, javax.faces.request.charset=UTF-8, org.jboss.seam.CONVERSATION#1$org.jboss.seam.faces.redirect=org.jboss.ha.framework.server.SimpleCachableMarshalledValue{raw=nullserialized=true}, org.jboss.seam.security.credentials=org.jboss.ha.framework.server.SimpleCachableMarshalledValue{raw=nullserialized=true}, pier=org.jboss.ha.framework.server.SimpleCachableMarshalledValue{raw=nullserialized=true}, org.jboss.seam.web.session=org.jboss.ha.framework.server.SimpleCachableMarshalledValue{raw=nullserialized=true}}
  | ------------

Example session S1 that failed to replicate fully to B after restart (notice the missing "2=..." among many other things)
--- Cache1 ---
  | /  {}
  |   /_BUDDY_BACKUP_  {}
  |     /  {}
  |       /JSESSION  {}
  |         /ROOT_localhost  {}
  |           /TYqa9j30si-QlGaVL9OVvQ__  {0=153, 1=1255097288878, foo=org.jboss.ha.framework.server.SimpleCachableMarshalledValue{raw=nullserialized=true}}
  | ------------

What appears to be happening is that during the initial startup of the secondary server, server A properly calls the "AssignToBuddyGroupCommand" on server B, passing the initial state.   However on subsequent restarts of server B, that command is never executed from server A.

I believe the problem is that server A fails to recognize the server B was shutdown, and the BuddyManager never removes him as a buddy (at least not in the timespan of server B restarting).  When I look at the BuddyManager on server A in the jmx-console, I can see that his buddy group is never updated when server B restarts.  I believe the data that does make it to B after a restart is just from the standard "replicate commands" that occur when something changes in the session S1 on server A.

For example, if server B was restarted at 6:50am, server A's buddy information in JMX looks like this (last updated 6:40am, which implies he never processed B leaving)
BuddyGroup: (dataOwner:, groupName:, buddies: [],lastModified: Fri Oct 09 06:40:27 PDT 2009)   

In conjunction with that, in the log output from server A, I see this when server B is restarting:
2009-10-09 06:50:52,477 DEBUG [org.jboss.cache.buddyreplication.BuddyManager] Nothing has changed; new buddy list is identical to the old one.

Here are my relevant configurations (let me know if I missed sections):


  |         <replication-config>
  |                 <replication-trigger>SET_AND_NON_PRIMITIVE_GET</replication-trigger>
  |                 <replication-granularity>ATTRIBUTE</replication-granularity>
  |         </replication-config>

<bean name="StandardSessionCacheConfig" class="org.jboss.cache.config.Configuration">
  |          <!-- Provides batching functionality for caches that don't want to interact with regular JTA Transactions -->
  |          <property name="transactionManagerLookupClass">org.jboss.cache.transaction.BatchModeTransactionManagerLookup</property>
  |          <!-- Name of cluster. Needs to be the same for all members -->
  |          <property name="clusterName">${jboss.partition.name:DefaultPartition}-SessionCache</property>
  |          <!-- Use a UDP (multicast) based stack. Need JGroups flow control (FC)
  |               because we are using asynchronous replication. -->
  |          <property name="multiplexerStack">${jboss.default.jgroups.stack:tcp}</property>
  |          <property name="fetchInMemoryState">true</property>
  |          <property name="nodeLockingScheme">PESSIMISTIC</property>
  |          <property name="isolationLevel">REPEATABLE_READ</property>
  |          <property name="useLockStriping">false</property>
  |          <property name="cacheMode">REPL_ASYNC</property>
  |          <!-- Number of milliseconds to wait until all responses for a
  |               synchronous call have been received. Make this longer 
  |               than lockAcquisitionTimeout.-->
  |          <property name="syncReplTimeout">17500</property>
  |          <!-- Max number of milliseconds to wait for a lock acquisition -->
  |          <property name="lockAcquisitionTimeout">15000</property>
  |          <!-- The max amount of time (in milliseconds) we wait until the
  |           state (ie. the contents of the cache) are retrieved from
  |           existing members at startup. -->
  |          <property name="stateRetrievalTimeout">60000</property>
  |          <!-- Not needed for a web session cache that doesn't use FIELD -->
  |          <property name="useRegionBasedMarshalling">false</property>
  |          <!-- Must match the value of "useRegionBasedMarshalling" -->
  |          <property name="inactiveOnStartup">false</property>
  |          <!-- Disable asynchronous RPC marshalling/sending -->
  |          <property name="serializationExecutorPoolSize">0</property>        
  |          <!-- We have no asynchronous notification listeners -->
  |          <property name="listenerAsyncPoolSize">0</property>
  |          <property name="exposeManagementStatistics">true</property>
  |          <property name="buddyReplicationConfig">
  |             <bean class="org.jboss.cache.config.BuddyReplicationConfig">
  |                <!--  Just set to true to turn on buddy replication -->
  |                <property name="enabled">true</property>
  |                <!-- A way to specify a preferred replication group.  We try
  |                     and pick a buddy who shares the same pool name (falling 
  |                     back to other buddies if not available). -->
  |                <property name="buddyPoolName">default</property>
  |                <property name="buddyCommunicationTimeout">17500</property>
  |                <!-- Do not change these -->
  |                <property name="autoDataGravitation">false</property>
  |                <property name="dataGravitationRemoveOnFind">true</property>
  |                <property name="dataGravitationSearchBackupTrees">true</property>
  |                <property name="buddyLocatorConfig">
  |                   <bean class="org.jboss.cache.buddyreplication.NextMemberBuddyLocatorConfig">
  |                      <!-- The number of backup copies we maintain -->
  |                      <property name="numBuddies">1</property>
  |                      <!-- Means that each node will *try* to select a buddy on 
  |                           a different physical host. If not able to do so 
  |                           though, it will fall back to colocated nodes. -->
  |                      <property name="ignoreColocatedBuddies">true</property>
  |                    </bean>
  |                </property>
  |             </bean>
  |          </property>
  |          <property name="cacheLoaderConfig">
  |             <bean class="org.jboss.cache.config.CacheLoaderConfig">
  |                    <!-- Do not change these -->
  |                    <property name="passivation">true</property>
  |                    <property name="shared">false</property>
  |                    <property name="individualCacheLoaderConfigs">
  |                      <list>
  |                         <bean class="org.jboss.cache.loader.FileCacheLoaderConfig">
  |                            <!-- Where passivated sessions are stored -->
  |                            <property name="location">${jboss.server.data.dir}${/}session</property>
  |                            <!-- Do not change these -->
  |                            <property name="async">false</property>
  |                            <property name="fetchPersistentState">true</property>
  |                            <property name="purgeOnStartup">true</property>
  |                            <property name="ignoreModifications">false</property>
  |                            <property name="checkCharacterPortability">false</property>
  |                         </bean>
  |                      </list>
  |                    </property>
  |             </bean>
  |          </property>
  |       </bean>  

I have tried UDP / TCP, passivation / no passivation, and confirmed that things again work fine when using "total", not "buddy" replication.

Has anyone else seen this?  Let me know if more information is needed.

View the original post : http://www.jboss.org/index.html?module=bb&op=viewtopic&p=4259659#4259659

Reply to the post : http://www.jboss.org/index.html?module=bb&op=posting&mode=reply&p=4259659

More information about the jboss-user mailing list