[JBoss Cache: Core Edition] - Re: FD Issue
by karnivas
We brought down one of the SOLARIS machine(P1 - Co-ordinator) to check the view in all machines.
As expected, the co-ordinator changed to one of the RHEL machine by removing the P1 from all views, but the dead RHEL members wasn't updated in the VIEW
Please find the DEBUG messages of jgroups.log
org.jgroups.protocols.pbcast.GMS --> new=[172.16.11.200:32790], suspected=[], leaving=[], new view: [172.16.11.20:35858|259] [172.16.11.20:35858, 172.16.11.11:51210, 172.16.11.191:37204, 172.16.11.10:51918, 172.16.11.12:40087, 172.16.11.13:38513, 172.16.11.13:38520, 172.16.11.13:38533, 172.16.11.200:32790]
| org.jgroups.protocols.pbcast.GMS --> mcasting view {[172.16.11.20:35858|259] [172.16.11.20:35858, 172.16.11.11:51210, 172.16.11.191:37204, 172.16.11.10:51918, 172.16.11.12:40087, 172.16.11.13:38513, 172.16.11.13:38520, 172.16.11.13:38533, 172.16.11.200:32790]} (9 mbrs)
| org.jgroups.protocols.UDP --> sending msg to null (src=172.16.11.20:35858), headers are {NAKACK=[MSG, seqno=3782], GMS= GmsHeader[VIEW]: view=[172.16.11.20:35858|259] [172.16.11.20:35858, 172.16.11.11:51210, 172.16.11.191:37204, 172.16.11.10:51918, 172.16.11.12:40087, 172.16.11.13:38513, 172.16.11.13:38520, 172.16.11.13:38533, 172.16.11.200:32790], UDP =[channel_name=ProvCache-LABS]}
| org.jgroups.protocols.UDP --> message is [dst: 224.7.8.9:45567, src: 172.16.11.20:35858 (3 headers), size = 0 bytes], h eaders are {GMS=GmsHeader[VIEW]: view=[172.16.11.20:35858|259] [172.16.11.20:35858, 172.16.11.11:51210, 172.16.11.191:37204, 172.16.11.10:51918, 17 2.16.11.12:40087, 172.16.11.13:38513, 172.16.11.13:38520, 172.16.11.13:38533, 172.16.11 .200:32790], NAKACK=[MSG, seqno=3782], UDP=[channel_name=ProvCache-LABS]}
| org.jgroups.protocols.pbcast.GMS --> view=[172.16.11.20:35858|259] [172.16.11.20:35858, 172.16.11.11:51210, 172.16.11.1 91:37204, 172.16.11.10:51918, 172.16.11.12:40087, 172.16.11.13:38513, 172.16.11.13:38520, 172.16.11.13:38533, 172.16.11.13:38538, 172.16.11.200:32790]
| org.jgroups.protocols.pbcast.GMS --> [local_addr=172.16.11.20:35858] view is [172.16.11.20:35858|259] [172.16.11.20:358 58, 172.16.11.11:51210, 172.16.11.191:37204, 172.16.11.10:51918, 172.16.11.12:40087, 172.16.11.13:38513, 172.16.11.13:38520, 172.16.11.13:38533, 172.16.11.200:32790]
| org.jgroups.protocols.UDP --> message is [dst: 172.16.11.20:35858, src: 172.16.11.12:40087 (3 headers), size = 0 bytes], headers are {GMS=GmsHeader[VIEW_ACK]: view=[172.16.11.20:35858|259] [172.16.11.20:35858, 172.16.11.11:51210, 172.16.11.191:37204, 172.16.11.10:51 918, 172.16.11.12:40087, 172.16.11.13:38513, 172.16.11.13:38520, 172.16.11.13:38533, 172.16.11.200:32790], UNICAST=[UNICAST: DATA, seqno=1], UDP=[channel_name=ProvCache-LABS]}
| org.jgroups.protocols.UDP --> message is [dst: 172.16.11.20:35858, src: 172.16.11.10:51918 (3 headers), size = 0 bytes], headers are {GMS=GmsHeader[VIEW_ACK]: view=[172.16.11.20:35858|259] [172.16.11.20:35858, 172.16.11.11:51210, 172.16.11.191:37204, 172.16.11.10:51 918, 172.16.11.12:40087, 172.16.11.13:38513, 172.16.11.13:38520, 172.16.11.13:38533, 172.16.11.200:32790], UNICAST=[UNICAST: DATA, seqno=1], UDP=[channel_name=ProvCache-LABS]}
| org.jgroups.protocols.UDP --> sending msg to 172.16.11.20:35858 (src=172.16.11.20:35858), headers are {GMS=GmsHeader[VIEW_ACK]: view=[172.16.11.20:35858|259] [172.16.11.20:35858, 172.16.11.11:51210, 172.16.11.191:37204, 172.16.11.10:51918, 172.16.11.12:40087, 172.16.11.13:38513, 172.16.11.13:38520, 172.16.11.13:38533, 172.16.11.200:32790], UDP=[channe l_name=ProvCache-LABS], UNICAST=[UNICAST: DATA, seqno=1]}
| org.jgroups.protocols.UDP --> message is [dst: 172.16.11.20:35858, src: 172.16.11.20:35858 (3 headers), size = 0 bytes], headers are {GMS=GmsHeader[VIEW_ACK]: view=[172.16.11.20:35858|259] [172.16.11.20:35858, 172.16.11.11:51210, 172.16.11.191:37204, 172.16.11.10:51 918, 172.16.11.12:40087, 172.16.11.13:38513, 172.16.11.13:38520, 172.16.11.13:38533, 172.16.11.200:32790], UNICAST=[UNICAST: DATA, seqno=1], UDP=[channel_name=ProvCache-LABS]}
| org.jgroups.protocols.UDP --> message is [dst: 172.16.11.20:35858, src: 172.16.11.11:51210 (3 headers), size = 0 bytes], headers are {GMS=GmsHeader[VIEW_ACK]: view=[172.16.11.20:35858|259] [172.16.11.20:35858, 172.16.11.11:51210, 172.16.11.191:37204, 172.16.11.10:51 918, 172.16.11.12:40087, 172.16.11.13:38513, 172.16.11.13:38520, 172.16.11.13:38533, 172.16.11.200:32790], UNICAST=[UNICAST: DATA, seqno=1], UDP=[channel_name=ProvCache-LABS]}
| org.jgroups.protocols.UDP --> message is [dst: 172.16.11.20:35858, src: 172.16.11.191:37204 (3 headers), size = 0 bytes], headers are {GMS=GmsHeader[VIEW_ACK]: view=[172.16.11.20:35858|259] [172.16.11.20:35858, 172.16.11.11:51210, 172.16.11.191:37204, 172.16.11.10:5 1918, 172.16.11.12:40087, 172.16.11.13:38513, 172.16.11.13:38520, 172.16.11.13:38533, 172.16.11.200:32790], UNICAST=[UNICAST: DATA, seqno=1], UDP=[channel_name=ProvCache-LABS]}
| org.jgroups.protocols.pbcast.GMS --> failed to collect all ACKs (11) for view [172.16.11.20:35858|259] [172.16.11.20:35858, 172.16.11.11:51210, 172.16.11.191:37204, 172.16.11.10:51918, 172.16.11.12:40087, 172.16.11.13:38513, 172.16.11.13:38520, 172.16.11.13:38533, 172.16.11.200:32790] after 2000ms, missing ACKs from [172.16.11.13:38513, 172.16.11.13:38515, 172.16.11.13:38520, 172.16.11.13:38533] (received=[172.16.11.11:51210, 172.16.11.20:35858, 172.16.11.1 91:37204, 172.16.11.12:40087, 172.16.11.10:51918]), local_addr=172.16.11.20:35858
| org.jgroups.protocols.UDP --> sending msg to 172.16.11.200:32790 (src=172.16.11.20:35858), headers are {GMS=GmsHeader[JOIN_RSP]: join_rsp=view: [172.16.11.20:35858|259] [172.16.11.20:35858, 172.16.11.11:51210, 172.16.11.191:37204, 172.16.11.10:51918, 172.16.11.12:40087, 172.16.11.13:38513, 172.16.11.13:38520, 172.16.11.13:38533, 172.16.11.200:32790], digest: 172.16.11.11:51210: [0 : 0], 172.16.11.13:38513: [0 : 0], 172.16.11.10:51918: [4481 : 4482], 172.16.11.12:4008 7: [0 : 0], 172.16.11.13:38520: [0 : 0], 172.16.11.200:32790: [0 : 0], 172.16.11.20:35858: [3781 : 3782], 172.16.11.13 :38533: [0 : 0], 172.16.11.191:37204: [3685 : 3686], UDP=[channel_name=ProvCache-LABS], UNICAST=[UNICAST: DATA , seqno=1]}
| org.jgroups.protocols.UDP --> message is [dst: 172.16.11.20:35858, src: 172.16.11.200:32790 (3 headers), size = 0 bytes], headers are {GMS=GmsHeader[VIEW_ACK]: view=[172.16.11.20:35858|259] [172.16.11.20:35858, 172.16.11.11:51210, 172.16.11.191:37204, 172.16.11.10:51918, 172.16.11.12:40087, 172.16.11.13:38513, 172.16.11.13:38520, 172.16.11.13:38533,172.16.11.200:32790], UNICAST=[UNICAST: DATA, seqno=2], UDP=[channel_name=ProvCache-LABS]}
org.jgroups.protocols.UDP --> message is [dst: 224.7.8.9:45567, src: 172.16.11.12:40087 (2 headers), size = 0 bytes], headers are {UDP=[channel_name=ProvCache-LABS], FD=[FD: SUSPECT (suspected_mbrs=[172.16.11.13:38513, 172.16.11.13:38520, 172.16.11.13:38533], from=172.16.11.12:40087)]}
| org.jgroups.protocols.FD --> [SUSPECT] suspect hdr is [FD: SUSPECT (suspected_mbrs=[172.16.11.13:38513, 172.16.11.13:38520, 172.16.11.13:38533], from=172.16.11.12:40087)]
| org.jgroups.protocols.VERIFY_SUSPECT --> verifying that 172.16.11.13:38513 is dead
| org.jgroups.protocols.UDP --> sending msg to 172.16.11.13:38513 (src=172.16.11.10:51918), headers are {VERIFY_SUSPECT=[VERIFY_SUSPECT: ARE_YOU_DEAD], UDP=[channel_name=ProvCache-LABS]}
| org.jgroups.protocols.VERIFY_SUSPECT --> diff=2034, mbr 172.16.11.13:38513 is dead (passing up SUSPECT event)
| org.jgroups.protocols.VERIFY_SUSPECT --> diff=2034, mbr 172.16.11.13:38533 is dead (passing up SUSPECT event)
| org.jgroups.protocols.VERIFY_SUSPECT --> diff=2034, mbr 172.16.11.13:38520 is dead (passing up SUSPECT event)
| org.jgroups.protocols.pbcast.GMS --> processing [SUSPECT(172.16.11.13:38513), SUSPECT(172.16.11.13:38533), SUSPECT(172.16.11.13:38520)]
| org.jgroups.blocks.RequestCorrelator --> suspect=172.16.11.13:38513
| org.jgroups.blocks.RequestCorrelator --> suspect=172.16.11.13:38533
| org.jgroups.blocks.RequestCorrelator --> suspect=172.16.11.13:38520
| org.jgroups.protocols.pbcast.GMS --> suspected members=[172.16.11.13:38513, 172.16.11.13:38533, 172.16.11.13:38520], suspected_mbrs=[172.16.11.13:38513, 172.16.11.13:38533, 172.16.11.13:38520]
|
As per these logs, the co-ordinator identifies the dead members correctly but don't update the view properly, please advice on this
Please tell us how to overcome...
View the original post : http://www.jboss.org/index.html?module=bb&op=viewtopic&p=4223083#4223083
Reply to the post : http://www.jboss.org/index.html?module=bb&op=posting&mode=reply&p=4223083
15 years, 9 months
[JBossMQ] - Re: Improvement suggestions: Messaging in a clustered enviro
by FrankTheTank
"adrian(a)jboss.org" wrote : bind.address:1100 should always be able to see things bound into jndi somewhere in the cluster.
|
Yes, it works with bind.addres.
I was just mentioning that the example uses 'localhost' which would lead to false results if JBoss is bound to a different address.
"adrian(a)jboss.org" wrote :
| * Use JBoss Messaging to get a proper clustered solution
|
Would this save me the hassle of doing all the below?
"adrian(a)jboss.org" wrote :
| Improvements:
|
| * Use hajndi-jms-ds.xml to create a pool of connections (this is what is used in the "all" config anyway) bound at connection factory java:/JmsXA.
|
Might you have an example for this?
Or need I not do anything except use it (java:/JmsXA) instead of ConnectionFactory?
"adrian(a)jboss.org" wrote :
| * Use jndi tricks to use the in-memory connector when co-located with the server
| in which the singleton jms server is running
|
| | <mbean code="org.jboss.naming.LinkRefPairService"
| | name="jboss.jms:alias=MyConnectionFactory">
| |
| | <!-- jndi binding MyConnectionFactory -->
| |
| | <attribute name="JndiName">MyConnectionFactory</attribute>
| |
| | <!-- Use TCP/IP when remote -->
| |
| | <attribute name="RemoteJndiName">ConnectionFactory</attribute>
| |
| | <!-- Use the in-memory connector when local -->
| |
| | <attribute name="LocalJndiName">java:/ConnectionFactory</attribute>
| | <depends>jboss:service=Naming</depends>
| | </mbean>
| |
|
Thanks for the help!
View the original post : http://www.jboss.org/index.html?module=bb&op=viewtopic&p=4223069#4223069
Reply to the post : http://www.jboss.org/index.html?module=bb&op=posting&mode=reply&p=4223069
15 years, 9 months