[
http://jira.jboss.com/jira/browse/JGRP-259?page=comments#action_12340899 ]
Bela Ban commented on JGRP-259:
-------------------------------
Works by changing some parameters: either reduce port_range to 2 or use sock_conn_timeout
and set it to a small value, e.g. 200ms. What happens when a member starts up, is to try
to connect to H:7800, H:7801... H:7805 in TCPPING. The default sock_conn_timeout is 2000,
so this will take ca 10 seconds (assuming at least 1 host is up).
Otherwise, a member will start as singeton, and then merge later:
NEW VIEW: MergeView::[192.168.5.2:7800|3] [192.168.5.2:7800, 192.168.5.2:7801],
subgroups=[[192.168.5.2:7800|2] [192.168.5.2:7800], [192.168.5.2:7801|0]
[192.168.5.2:7801]]
Unidentified errors with TCP/TCPPING/MERGE2 protocol stack.
-----------------------------------------------------------
Key: JGRP-259
URL:
http://jira.jboss.com/jira/browse/JGRP-259
Project: JGroups
Issue Type: Bug
Affects Versions: 2.2.9.2
Environment: Debian GNU/Linux 2.4.27, JDK 1.5.0_05
Reporter: Tomasz Skutnik
Assigned To: Bela Ban
Fix For: 2.4
Attachments: bla.java, Test1.java
I've encountered connect/disconnect problems during my experiments with
TCP/TCPPING/MERGE2 protocol stack. I run client program (see attachment)
on 2 different hosts (witnin two VMWare virtual hosts). One of the hosts
(h1) plays role of well-known host from which initial membership will be
retrieved, the other plays role of plain client.
I use following command lines to run client programs:
- on host h1:
$ java -cp \
jgroups-all-2.2.9.2.jar:concurrent.jar:commons-logging-1.1.jar:\
classes test.Test1 "h1" "h1[7800]"
- on host h2
$ java -cp \
jgroups-all-2.2.9.2.jar:concurrent.jar:commons-logging-1.1.jar:\
classes test.Test1 "h2" "h1[7800]"
When I run client programs in h1, h2 order everyting works fine. But if
I run them in h2, h1 order, or if I kill h1 and then rerun it something
weird happens: merge of two disjoint groups happens (MergeView is
transmitted), but then followin exception occurs:
2006-07-10 12:01:32 org.jgroups.Message readHeader
SEVERE: magic number 306192 is not available in magic map
2006-07-10 12:01:32 org.jgroups.protocols.TP handleIncomingPacket
SEVERE: failed unmarshalling message
java.io.IOException: failed read header: java.lang.NullPointerException
at org.jgroups.Message.readHeader(Message.java:697)
at org.jgroups.Message.readFrom(Message.java:614)
at org.jgroups.protocols.TP.bufferToMessage(TP.java:974)
at org.jgroups.protocols.TP.handleIncomingPacket(TP.java:830)
at org.jgroups.protocols.TP.receive(TP.java:781)
at org.jgroups.protocols.TCP.receive(TCP.java:226)
at
org.jgroups.blocks.ConnectionTable.receive(ConnectionTable.java:471)
at
org.jgroups.blocks.ConnectionTable$Connection.run(ConnectionTable.java:813)
at java.lang.Thread.run(Thread.java:595)
Because there's no magic number in magic map (whatever that means) it
causes NPE within org.jgroups.Message class (line 687).
>From now on everything is going downhill - jgroups logs infinite stream
of retransmition errors, e.g.:
SEVERE: magic number 306192 is not available in magic map
2006-07-10 12:01:54 org.jgroups.protocols.pbcast.NAKACK handleXmitRsp
SEVERE: message did not contain a list (LinkedList) of retransmitted
messages: java.io.IOException: failed read header:
java.lang.NullPointerException
This flow of exceptions does not stop until all clients (h1 and h2) are
killed - stopping h1 does not stop exceptions on h2 and vice versa.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://jira.jboss.com/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira