[
https://issues.jboss.org/browse/WFLY-6762?page=com.atlassian.jira.plugin....
]
Preeta Kuruvilla commented on WFLY-6762:
----------------------------------------
Thanks Paul. We have configured UDP based multicast for Ehcache -Cache replication for
Wildfly cluster. This uses the jgroups.
Below is the udp.xml.
<!--
Default stack using IP multicasting. It is similar to the "udp"
stack in stacks.xml, but doesn't use streaming state transfer and flushing
author: Bela Ban
-->
<config xmlns="urn:org:jgroups"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="urn:org:jgroups
http://www.jgroups.org/schema/jgroups.xsd">
<UDP
mcast_port="${jgroups.udp.mcast_port:45588}"
ip_ttl="4"
tos="8"
ucast_recv_buf_size="5M"
ucast_send_buf_size="5M"
mcast_recv_buf_size="5M"
mcast_send_buf_size="5M"
max_bundle_size="64K"
max_bundle_timeout="30"
enable_diagnostics="true"
thread_naming_pattern="cl"
timer_type="new3"
timer.min_threads="2"
timer.max_threads="4"
timer.keep_alive_time="3000"
timer.queue_max_size="500"
thread_pool.enabled="true"
thread_pool.min_threads="2"
thread_pool.max_threads="8"
thread_pool.keep_alive_time="5000"
thread_pool.queue_enabled="true"
thread_pool.queue_max_size="10000"
thread_pool.rejection_policy="discard"
oob_thread_pool.enabled="true"
oob_thread_pool.min_threads="1"
oob_thread_pool.max_threads="8"
oob_thread_pool.keep_alive_time="5000"
oob_thread_pool.queue_enabled="false"
oob_thread_pool.queue_max_size="100"
oob_thread_pool.rejection_policy="discard"/>
<PING />
<MERGE3 max_interval="30000"
min_interval="10000"/>
<FD_SOCK/>
<FD_ALL/>
<VERIFY_SUSPECT timeout="1500" />
<BARRIER />
<pbcast.NAKACK2 xmit_interval="500"
xmit_table_num_rows="100"
xmit_table_msgs_per_row="2000"
xmit_table_max_compaction_time="30000"
max_msg_batch_size="500"
use_mcast_xmit="true"
discard_delivered_msgs="true"/>
<UNICAST3 xmit_interval="500"
xmit_table_num_rows="100"
xmit_table_msgs_per_row="2000"
xmit_table_max_compaction_time="60000"
conn_expiry_timeout="0"
max_msg_batch_size="500"/>
<pbcast.STABLE stability_delay="1000"
desired_avg_gossip="50000"
max_bytes="4M"/>
<pbcast.GMS print_local_addr="true" join_timeout="2000"
view_bundling="true"/>
<UFC max_credits="2M"
min_threshold="0.4"/>
<MFC max_credits="2M"
min_threshold="0.4"/>
<FRAG2 frag_size="60K" />
<RSVP resend_interval="2000" timeout="10000"/>
<pbcast.STATE_TRANSFER />
<!-- pbcast.FLUSH /-->
</config>
Wildfly cluster failover test not working as expected on windows OS,
when network is disabled on a VM(Node) or by shutting down the VM (Node).
----------------------------------------------------------------------------------------------------------------------------------------------
Key: WFLY-6762
URL:
https://issues.jboss.org/browse/WFLY-6762
Project: WildFly
Issue Type: Quality Risk
Components: Clustering
Affects Versions: 8.2.0.Final
Reporter: Preeta Kuruvilla
Assignee: Paul Ferraro
Priority: Blocker
In your mail related to WFLY-6749 you has said the below :-
**The default stack contains the following failure detection protocols:
FD_SOCK
FD_ALL
These protocols are described here:
http://www.jgroups.org/manual/index.html#FailureDetection
I suspect that your method of simulating a failure - by disabling the network of the host
machine is not being detected by FD_SOCK. It will however, be detected by FD_ALL, but only
after 1 minute. The heartbeat timeout used by FD_ALL can be manipulated via the timeout
property.
e.g.
<protocol type="FD_ALL" ><property
name="timeout">60000</property></protocol>
**************************************************************************************************
Thanks for the quick response on WFLY-6749.
Based on your suggestion, I had a taken a look at the testing scenarios mentioned in
"Table 29. Failure detection behavior" in the link that you provided-
http://www.jgroups.org/manual/index.html#FailureDetection. No where its mentioned that
disabling a network on a node, is a valid testing scenario in Wildfly cluster.
The Failover is working properly when the network on a node is disabled on a weblogic
cluster for our application. However it doesn't work and it hampers the application
functionality on Wildfly cluster when we try to disable the network on a node in Wildfly
cluster.
However as I said earlier, the failover on wildfly cluster works when we stop a node from
admin console or give Ctrl + C to stop the services on a node.
Would like to get a confirmation from you that disabling the network on a node is not the
valid failover testing scenario for wildfly cluster.
Also we tried to test the same failover scenario by Shutting down/power off a VM (node)
in a wildfly cluster. It did not work for Windows Environment although it worked for linux
environment.
Note: we are using Windows 2012 environment. Here is a link I found:
http://stackoverflow.com/questions/31218710/unable-to-stop-wildfly-8-2-se...
https://developer.jboss.org/thread/238135?tstart=0
Thanks,
Preeta
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)