[jboss-user] [JBossCache] - Buddyrep issue

FredrikJ do-not-reply at jboss.com
Fri Dec 28 09:57:38 EST 2007


Hi.
I am currently using cache 2.1.0 GA and jgroups 2.6.1 with buddy replication. Buddy rep is configured to use one buddy only.

The setup is four nodes with ip addresses like:
172.16.0.5
  | 172.16.0.6
  | 172.16.0.7
  | 172.16.0.8
  | 
  | They are all started in the stated order so that .5 is the coordinator. The first node up (.5) will insert data to the cache and then data will gravitate to the other nodes when needed. This will occur mostly initially when applying load to the system. Data affinity is handled by a layer above the cache.
  | 
  | Using this scenario with 2.0.0 GA presented no problems except adding new nodes during load, so we are now investigating 2.1.0.
  | 
  | The issue I'm facing is that the coordinator seems to get two buddy backups, one being itself.
  | 
  | This is the contents on 172.16.0.5 (coordinator):
  | 
  | 
  |   | null  {}
  |   |   /91  {91=com.cubeia.firebase.game.table.InternalMetaData at 1134d9b com.cubeia.testgame.server.game.TestGame at 1cdc8ce}
  |   |   /63  {63=com.cubeia.firebase.game.table.InternalMetaData at cc9ff5 com.cubeia.testgame.server.game.TestGame at 951aeb}
  |   |   /92  {92=com.cubeia.firebase.game.table.InternalMetaData at e60a39 com.cubeia.testgame.server.game.TestGame at 185c84e}
  |   |    ...
  |   |   
  |   |   /_BUDDY_BACKUP_  {}
  |   |     /172.16.0.8_8786  {}
  |   |       /15  {15=com.cubeia.firebase.game.table.InternalMetaData at d9e1c0 null}
  |   |       /16  {16=com.cubeia.firebase.game.table.InternalMetaData at 742062 null}
  |   |     /172.16.0.5_8786  {}
  |   |       /31  {}
  | 
  | Notice that there are two members listed under_BUDDY_BACKUP, one is .8 and the other one is .5, which is itself.
  | 
  | Now, on 172.16.8 we get a lot of lock timeouts like the one below:
  | 
  | Caused by: org.jboss.cache.lock.TimeoutException: read lock for /_BUDDY_BACKUP_/172.16.0.5_8786 could not be acquired by GlobalTransaction:<172.16.0.6:8786>:41 after 5000 ms. Locks: Read lock owners: []
  |   | Write lock owner: GlobalTransaction:<172.16.0.6:8786>:1
  |   | , lock info: write owner=GlobalTransaction:<172.16.0.6:8786>:1 (activeReaders=0, activeWriter=Thread[Incoming,TableSpace,172.16.0.8:8786,5,Thread Pools], waitingReaders=25, waitingWriters=0, waitingUpgrader=0)
  |   | 
  | 
  | 172.16.0.8 also shows two members under the buddy backup:
  | 
  | null  {}
  |   |   /28  {}
  |   |   /29  {}
  |   |   /92  {}
  |   |    ...
  |   |   /_BUDDY_BACKUP_  {}
  |   |     /172.16.0.7_8786  {}
  |   |       /91  {91=com.cubeia.firebase.game.table.InternalMetaData at 1fbeed6 null}
  |   |       /41  {41=com.cubeia.firebase.game.table.InternalMetaData at fd3922 null}
  |   |       /115  {115=com.cubeia.firebase.game.table.InternalMetaData at b215d9 null}
  |   |        ...
  |   |     /172.16.0.5_8786  {}
  |   |       /31  {}
  |   | 
  | 
  | It seems like .8's buddy to backup is in fact .7. But we still hold some buddy ref to the .5 member as well. In fact, all the lock timeouts on .8 is related to to .5 buddy back fqn:
  | 
  | failure acquiring lock: fqn=/_BUDDY_BACKUP_/172.16.0.5_8786

View the original post : http://www.jboss.com/index.html?module=bb&op=viewtopic&p=4115924#4115924

Reply to the post : http://www.jboss.com/index.html?module=bb&op=posting&mode=reply&p=4115924



More information about the jboss-user mailing list