[
http://jira.jboss.com/jira/browse/JGRP-550?page=comments#action_12368353 ]
Bela Ban commented on JGRP-550:
-------------------------------
Works for me. The important thing is to make sure the (2, in my case) members actually
find each other: check the group membership as shown below:
13:53:01,843 [DEBUG] [main] SimpleExample.runTestReplicatedHashMap(): Group membership =
2
Also make sure you have a correct bind_addr (or -Djgroups.bind_addr) in TCP and hostname
for initial_hosts in TCPPING, as shown in my config (running on 192.168.5.2):
<TCP bind_addr="192.168.5.2"
start_port="9800"
loopback="true"
discard_incompatible_packets="true"
use_send_queues="false"
max_bundle_size="64000"
max_bundle_timeout="30"
enable_bundling="true"
sock_conn_timeout="300"
skip_suspected_members="true"
use_concurrent_stack="true"
thread_pool.enabled="true"
thread_pool.min_threads="1"
thread_pool.max_threads="8"
thread_pool.keep_alive_time="5000"
thread_pool.queue_enabled="true"
thread_pool.queue_max_size="100"
thread_pool.rejection_policy="Run"
oob_thread_pool.enabled="true"
oob_thread_pool.min_threads="1"
oob_thread_pool.max_threads="16"
oob_thread_pool.keep_alive_time="5000"
oob_thread_pool.queue_enabled="true"
oob_thread_pool.queue_max_size="100"
oob_thread_pool.rejection_policy="Run"
/>
<TCPPING
timeout="3000"
initial_hosts="192.168.5.2[9800]"
num_initial_members="2"
port_range="3"/>
When a new member joins a group and requests the cache from the
coordinator, it always fails first time
-------------------------------------------------------------------------------------------------------
Key: JGRP-550
URL:
http://jira.jboss.com/jira/browse/JGRP-550
Project: JGroups
Issue Type: Bug
Affects Versions: 2.5
Environment: Example code was run on Windows
Reporter: Dipak Kothari
Assigned To: Bela Ban
Fix For: 2.5
Attachments: example.zip
Service A starts - becomes coordinator. After it has started properly, Service B is
started. As part of the Join, it
1) Gets the members using TCPPING
2) Determines the coordinator
3) Joins
4) Applies view change via the installView. This re-adjusts the members and closes any
connections that are no longer members (so the connection to service A is removed).
5) Requests the cache from the coordinator. Service A on response to this tries to send
the cache but fails as peer connection has been closed. It tries twice and removes
connection. Service B timeout and tries again and this time it is successful. This
happens each time. I don't think this should happen - it should return the cache as
it knowns where it needs to be sent to.
I have added additional trace statements (these start with APM:) to show the flow for my
understanding. I have deliberately set the get_cache_timeout to a high number to
highlight this. I have also provided source and protocol properties in the zip for
convenience. There are 2 logs from the run I carried out: cord.log is the coordinator log
and cord1.log is second services' log.
Please let me know if there is a work around or a fix I can apply. If I have
mis-configured the properties then please advise how to rectify it.
To run the example, run the bat script passing in the service name. Note, the service
name needs to be unique as the log name is based on this.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://jira.jboss.com/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira