[jboss-user] [JBoss Cache: Core Edition] - Re: FD Issue

karnivas do-not-reply at jboss.com
Thu Apr 2 12:59:42 EDT 2009


We brought down one of the SOLARIS machine(P1 - Co-ordinator) to check the view in all machines.

As expected, the co-ordinator changed to one of the RHEL machine by removing the P1 from all views, but the dead RHEL members wasn't updated in the VIEW

Please find the DEBUG messages of jgroups.log

org.jgroups.protocols.pbcast.GMS --> new=[172.16.11.200:32790], suspected=[], leaving=[], new view: [172.16.11.20:35858|259] [172.16.11.20:35858, 172.16.11.11:51210, 172.16.11.191:37204, 172.16.11.10:51918, 172.16.11.12:40087, 172.16.11.13:38513, 172.16.11.13:38520, 172.16.11.13:38533, 172.16.11.200:32790]
  | org.jgroups.protocols.pbcast.GMS --> mcasting view {[172.16.11.20:35858|259] [172.16.11.20:35858, 172.16.11.11:51210, 172.16.11.191:37204, 172.16.11.10:51918, 172.16.11.12:40087, 172.16.11.13:38513, 172.16.11.13:38520, 172.16.11.13:38533, 172.16.11.200:32790]} (9 mbrs)
  | org.jgroups.protocols.UDP --> sending msg to null (src=172.16.11.20:35858), headers are {NAKACK=[MSG, seqno=3782], GMS=  GmsHeader[VIEW]: view=[172.16.11.20:35858|259] [172.16.11.20:35858, 172.16.11.11:51210, 172.16.11.191:37204, 172.16.11.10:51918, 172.16.11.12:40087, 172.16.11.13:38513, 172.16.11.13:38520, 172.16.11.13:38533, 172.16.11.200:32790], UDP  =[channel_name=ProvCache-LABS]}
  | org.jgroups.protocols.UDP --> message is [dst: 224.7.8.9:45567, src: 172.16.11.20:35858 (3 headers), size = 0 bytes], h  eaders are {GMS=GmsHeader[VIEW]: view=[172.16.11.20:35858|259] [172.16.11.20:35858, 172.16.11.11:51210, 172.16.11.191:37204, 172.16.11.10:51918, 17  2.16.11.12:40087, 172.16.11.13:38513, 172.16.11.13:38520, 172.16.11.13:38533, 172.16.11  .200:32790], NAKACK=[MSG, seqno=3782], UDP=[channel_name=ProvCache-LABS]}
  | org.jgroups.protocols.pbcast.GMS --> view=[172.16.11.20:35858|259] [172.16.11.20:35858, 172.16.11.11:51210, 172.16.11.1  91:37204, 172.16.11.10:51918, 172.16.11.12:40087, 172.16.11.13:38513, 172.16.11.13:38520, 172.16.11.13:38533, 172.16.11.13:38538, 172.16.11.200:32790]
  | org.jgroups.protocols.pbcast.GMS --> [local_addr=172.16.11.20:35858] view is [172.16.11.20:35858|259] [172.16.11.20:358  58, 172.16.11.11:51210, 172.16.11.191:37204, 172.16.11.10:51918, 172.16.11.12:40087, 172.16.11.13:38513, 172.16.11.13:38520, 172.16.11.13:38533, 172.16.11.200:32790]
  | org.jgroups.protocols.UDP --> message is [dst: 172.16.11.20:35858, src: 172.16.11.12:40087 (3 headers), size = 0 bytes], headers are {GMS=GmsHeader[VIEW_ACK]: view=[172.16.11.20:35858|259] [172.16.11.20:35858, 172.16.11.11:51210, 172.16.11.191:37204, 172.16.11.10:51  918, 172.16.11.12:40087, 172.16.11.13:38513, 172.16.11.13:38520, 172.16.11.13:38533, 172.16.11.200:32790], UNICAST=[UNICAST: DATA, seqno=1], UDP=[channel_name=ProvCache-LABS]}
  | org.jgroups.protocols.UDP --> message is [dst: 172.16.11.20:35858, src: 172.16.11.10:51918 (3 headers), size = 0 bytes], headers are {GMS=GmsHeader[VIEW_ACK]: view=[172.16.11.20:35858|259] [172.16.11.20:35858, 172.16.11.11:51210, 172.16.11.191:37204, 172.16.11.10:51  918, 172.16.11.12:40087, 172.16.11.13:38513, 172.16.11.13:38520, 172.16.11.13:38533, 172.16.11.200:32790], UNICAST=[UNICAST: DATA, seqno=1], UDP=[channel_name=ProvCache-LABS]}
  | org.jgroups.protocols.UDP --> sending msg to 172.16.11.20:35858 (src=172.16.11.20:35858), headers are {GMS=GmsHeader[VIEW_ACK]: view=[172.16.11.20:35858|259] [172.16.11.20:35858, 172.16.11.11:51210, 172.16.11.191:37204, 172.16.11.10:51918, 172.16.11.12:40087, 172.16.11.13:38513, 172.16.11.13:38520, 172.16.11.13:38533, 172.16.11.200:32790], UDP=[channe  l_name=ProvCache-LABS], UNICAST=[UNICAST: DATA, seqno=1]}
  | org.jgroups.protocols.UDP --> message is [dst: 172.16.11.20:35858, src: 172.16.11.20:35858 (3 headers), size = 0 bytes], headers are {GMS=GmsHeader[VIEW_ACK]: view=[172.16.11.20:35858|259] [172.16.11.20:35858, 172.16.11.11:51210, 172.16.11.191:37204, 172.16.11.10:51  918, 172.16.11.12:40087, 172.16.11.13:38513, 172.16.11.13:38520, 172.16.11.13:38533, 172.16.11.200:32790], UNICAST=[UNICAST: DATA, seqno=1], UDP=[channel_name=ProvCache-LABS]}
  | org.jgroups.protocols.UDP --> message is [dst: 172.16.11.20:35858, src: 172.16.11.11:51210 (3 headers), size = 0 bytes], headers are {GMS=GmsHeader[VIEW_ACK]: view=[172.16.11.20:35858|259] [172.16.11.20:35858, 172.16.11.11:51210, 172.16.11.191:37204, 172.16.11.10:51  918, 172.16.11.12:40087, 172.16.11.13:38513, 172.16.11.13:38520, 172.16.11.13:38533, 172.16.11.200:32790], UNICAST=[UNICAST: DATA, seqno=1], UDP=[channel_name=ProvCache-LABS]}
  | org.jgroups.protocols.UDP --> message is [dst: 172.16.11.20:35858, src: 172.16.11.191:37204 (3 headers), size = 0 bytes], headers are {GMS=GmsHeader[VIEW_ACK]: view=[172.16.11.20:35858|259] [172.16.11.20:35858, 172.16.11.11:51210, 172.16.11.191:37204, 172.16.11.10:5  1918, 172.16.11.12:40087, 172.16.11.13:38513, 172.16.11.13:38520, 172.16.11.13:38533, 172.16.11.200:32790], UNICAST=[UNICAST: DATA, seqno=1], UDP=[channel_name=ProvCache-LABS]}
  | org.jgroups.protocols.pbcast.GMS --> failed to collect all ACKs (11) for view [172.16.11.20:35858|259] [172.16.11.20:35858, 172.16.11.11:51210, 172.16.11.191:37204, 172.16.11.10:51918, 172.16.11.12:40087, 172.16.11.13:38513, 172.16.11.13:38520, 172.16.11.13:38533, 172.16.11.200:32790] after 2000ms, missing ACKs from [172.16.11.13:38513, 172.16.11.13:38515, 172.16.11.13:38520, 172.16.11.13:38533] (received=[172.16.11.11:51210, 172.16.11.20:35858, 172.16.11.1  91:37204, 172.16.11.12:40087, 172.16.11.10:51918]), local_addr=172.16.11.20:35858
  | org.jgroups.protocols.UDP --> sending msg to 172.16.11.200:32790 (src=172.16.11.20:35858), headers are {GMS=GmsHeader[JOIN_RSP]: join_rsp=view: [172.16.11.20:35858|259] [172.16.11.20:35858, 172.16.11.11:51210, 172.16.11.191:37204, 172.16.11.10:51918, 172.16.11.12:40087, 172.16.11.13:38513, 172.16.11.13:38520, 172.16.11.13:38533, 172.16.11.200:32790], digest: 172.16.11.11:51210: [0 : 0], 172.16.11.13:38513: [0 : 0], 172.16.11.10:51918: [4481 : 4482], 172.16.11.12:4008  7: [0 : 0], 172.16.11.13:38520: [0 : 0], 172.16.11.200:32790: [0 : 0], 172.16.11.20:35858: [3781 : 3782], 172.16.11.13  :38533: [0 : 0], 172.16.11.191:37204: [3685 : 3686], UDP=[channel_name=ProvCache-LABS], UNICAST=[UNICAST: DATA  , seqno=1]}
  | org.jgroups.protocols.UDP --> message is [dst: 172.16.11.20:35858, src: 172.16.11.200:32790 (3 headers), size = 0 bytes], headers are {GMS=GmsHeader[VIEW_ACK]: view=[172.16.11.20:35858|259] [172.16.11.20:35858, 172.16.11.11:51210, 172.16.11.191:37204, 172.16.11.10:51918, 172.16.11.12:40087, 172.16.11.13:38513, 172.16.11.13:38520, 172.16.11.13:38533,172.16.11.200:32790], UNICAST=[UNICAST: DATA, seqno=2], UDP=[channel_name=ProvCache-LABS]}


org.jgroups.protocols.UDP --> message is [dst: 224.7.8.9:45567, src: 172.16.11.12:40087 (2 headers), size = 0 bytes], headers are {UDP=[channel_name=ProvCache-LABS], FD=[FD: SUSPECT (suspected_mbrs=[172.16.11.13:38513, 172.16.11.13:38520, 172.16.11.13:38533], from=172.16.11.12:40087)]}
  | org.jgroups.protocols.FD --> [SUSPECT] suspect hdr is [FD: SUSPECT (suspected_mbrs=[172.16.11.13:38513, 172.16.11.13:38520, 172.16.11.13:38533], from=172.16.11.12:40087)]
  | org.jgroups.protocols.VERIFY_SUSPECT --> verifying that 172.16.11.13:38513 is dead
  | org.jgroups.protocols.UDP --> sending msg to 172.16.11.13:38513 (src=172.16.11.10:51918), headers are {VERIFY_SUSPECT=[VERIFY_SUSPECT: ARE_YOU_DEAD], UDP=[channel_name=ProvCache-LABS]}
  | org.jgroups.protocols.VERIFY_SUSPECT --> diff=2034, mbr 172.16.11.13:38513 is dead (passing up SUSPECT event)
  | org.jgroups.protocols.VERIFY_SUSPECT --> diff=2034, mbr 172.16.11.13:38533 is dead (passing up SUSPECT event)
  | org.jgroups.protocols.VERIFY_SUSPECT --> diff=2034, mbr 172.16.11.13:38520 is dead (passing up SUSPECT event)
  | org.jgroups.protocols.pbcast.GMS --> processing [SUSPECT(172.16.11.13:38513), SUSPECT(172.16.11.13:38533), SUSPECT(172.16.11.13:38520)]
  | org.jgroups.blocks.RequestCorrelator --> suspect=172.16.11.13:38513
  | org.jgroups.blocks.RequestCorrelator --> suspect=172.16.11.13:38533
  | org.jgroups.blocks.RequestCorrelator --> suspect=172.16.11.13:38520
  | org.jgroups.protocols.pbcast.GMS --> suspected members=[172.16.11.13:38513, 172.16.11.13:38533, 172.16.11.13:38520], suspected_mbrs=[172.16.11.13:38513, 172.16.11.13:38533, 172.16.11.13:38520]
  | 


As per these logs, the co-ordinator identifies the dead members correctly but don't update the view properly, please advice on this

Please tell us how to overcome...

View the original post : http://www.jboss.org/index.html?module=bb&op=viewtopic&p=4223083#4223083

Reply to the post : http://www.jboss.org/index.html?module=bb&op=posting&mode=reply&p=4223083



More information about the jboss-user mailing list