[JBoss JIRA] (JGRP-1529) RELAY2: Intra-site view not being accepted upon inter-site installation failure
by Radim Vansa (JIRA)
Radim Vansa created JGRP-1529:
---------------------------------
Summary: RELAY2: Intra-site view not being accepted upon inter-site installation failure
Key: JGRP-1529
URL: https://issues.jboss.org/browse/JGRP-1529
Project: JGroups
Issue Type: Bug
Reporter: Radim Vansa
Assignee: Bela Ban
When a node becomes coordinator, it sends the VIEW_CHANGE event up the stack. This should result in Receiver.viewAccepted(...) method call. However, when RELAY2 is in stack and the coordinator cannot be reached, it blocks the thread (sending discovery pings) and, therefore, the viewAccepted event is postponed.
In my opinion the inter-site stack should be created and handled in different thread.
Context:
In my case, the coordinator for both local cluster and the global (inter-site) cluster was killed. The FD_SOCK on inter-site stack somehow failed to notice that the coordinator has crashed (more investigation required) and the nodes in global cluster still reported the crashed node as the global coordinator.
Therefore, the new coordinator of local cluster failed to join the global cluster (obviously got no response from the dead global coordinator).
The restarted node joined the local cluster and then tried to join the local Infinispan cache with a new local view ID. However, the coordinator failed to notice (in Infinispan viewAccepted handler which was not called) that he had already installed a new JGroups view and it did not respond to the cache join request because it was waiting until it got the new JGroups view (again, which was installed in JGroups but the viewAccepted did not notified Infinispan about that).
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years, 8 months
[JBoss JIRA] (JGRP-1528) RELAY2: batch messages forwarded to remote sites
by Bela Ban (JIRA)
Bela Ban created JGRP-1528:
------------------------------
Summary: RELAY2: batch messages forwarded to remote sites
Key: JGRP-1528
URL: https://issues.jboss.org/browse/JGRP-1528
Project: JGroups
Issue Type: Feature Request
Reporter: Bela Ban
Assignee: Bela Ban
Fix For: 3.3
If a site is down, then forwarding a message to it will block the thread (actually: wait) for max_forward_attempts * forward_sleep milliseconds.
If we have many incoming messages (from the local site) to be forwarded to the remote site, then having every thread sleep for (say) 10 seconds will increase the thread pool.
We should therefore add *batching* to RELAY2; messages to be forwarded are queued until a certain time has elapsed or the accumulated total size of all queued messages exceeds a certain threshold.
Incoming messages to be forwarded would be added to the queue (allowing the sending thread to be returned to the thread pool). A separate thread (or task) would monitor the queue and do the forwarding, and it would also notice that a site is down and send unreachable messages back to the original senders.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years, 8 months
[JBoss JIRA] (JGRP-1532) Don't receive heartbeat in Nic Teaming configuration after NIC swicth
by PASCAL BROUWET (JIRA)
PASCAL BROUWET created JGRP-1532:
------------------------------------
Summary: Don't receive heartbeat in Nic Teaming configuration after NIC swicth
Key: JGRP-1532
URL: https://issues.jboss.org/browse/JGRP-1532
Project: JGroups
Issue Type: Bug
Affects Versions: 2.12.2
Environment: Windows Server Standard 2008 SP2.
two network cards Broadcom BCM5709S NetXtreme II (DualPort) with NIC-Teaming Software (
BASC3 Version 12.2.9.0. (Broadcom Advanced Control Suite 3)
Reporter: PASCAL BROUWET
Assignee: Bela Ban
we haven't problems in single cards configuration without NIC Teaming.
But with all machines with dual cards with Nic Teaming is activated, we have a problem of "didn't received heartbeat".
With WireShark analyser, we observed that when heartbeat Multicast packet stay on same card, we did not have problem but if the heartbeat Multicast packet switches to second card, we have in logs file failure detections.
For example : the first heartfailure in logs appears at 03:41:25 until 05:03:20
2012-10-23 03:41:25.234 [FINE] - FD_ALL: haven't received a heartbeat from ctc809091084-27510(5ae571864ef0) for 11061 ms, adding it to suspect list
2012-10-23 03:41:25.234 [FINE] - FD_ALL: suspecting [ctc809091084-27510(5ae571864ef0), ctc804291084-11401(de9a6a421087)]
2012-10-23 03:41:28.245 [FINE] - FD_ALL: haven't received a heartbeat from ctc809091084-27510(5ae571864ef0) for 14072 ms, adding it to suspect list
2012-10-23 03:41:28.245 [FINE] - FD_ALL: haven't received a heartbeat from ctc804291084-11401(de9a6a421087) for 12044 ms, adding it to suspect list
2012-10-23 03:41:28.245 [FINE] - FD_ALL: suspecting [ctc809091084-27510(5ae571864ef0), ctc804291084-11401(de9a6a421087)]
2012-10-23 03:41:31.255 [FINE] - FD_ALL: haven't received a heartbeat from ctc809091084-27510(5ae571864ef0) for 17082 ms, adding it to suspect list
2012-10-23 03:41:31.255 [FINE] - FD_ALL: haven't received a heartbeat from ctc804291084-11401(de9a6a421087) for 15054 ms, adding it to suspect list
2012-10-23 03:41:31.255 [FINE] - FD_ALL: suspecting [ctc809091084-27510(5ae571864ef0), ctc804291084-11401(de9a6a421087)]
2012-10-23 03:41:34.266 [FINE] - FD_ALL: haven't received a heartbeat from ctc809091084-27510(5ae571864ef0) for 20093 ms, adding it to suspect list
2012-10-23 03:41:34.266 [FINE] - FD_ALL: haven't received a heartbeat from ctc804291084-11401(de9a6a421087) for 18065 ms, adding it to suspect list
2012-10-23 03:41:34.266 [FINE] - FD_ALL: suspecting [ctc809091084-27510(5ae571864ef0), ctc804291084-11401(de9a6a421087)]
2012-10-23 03:41:37.277 [FINE] - FD_ALL: haven't received a heartbeat from ctc809091084-27510(5ae571864ef0) for 23104 ms, adding it to suspect list
2012-10-23 03:41:37.277 [FINE] - FD_ALL: haven't received a heartbeat from ctc804291084-11401(de9a6a421087) for 21076 ms, adding it to suspect list
2012-10-23 03:41:37.277 [FINE] - FD_ALL: suspecting [ctc809091084-27510(5ae571864ef0), ctc804291084-11401(de9a6a421087)]
2012-10-23 03:41:40.288 [FINE] - FD_ALL: haven't received a heartbeat from ctc809091084-27510(5ae571864ef0) for 26115 ms, adding it to suspect list
2012-10-23 03:41:40.288 [FINE] - FD_ALL: haven't received a heartbeat from ctc804291084-11401(de9a6a421087) for 24087 ms, adding it to suspect list
...
the logs of Card 1 during the period :
----------------------------------------------------
2012-10-23 03:41:15.563 MULTICAST id=321 src=/10.120.180.64:45588 dest=/228.8.8.8:45588 (47 bytes)
Msg1 src=cc74a22f-6e18-1b7a-5521-3abebdd47ab6(3ba17876e725) dest=ALL
flags=[OOB]
headers=[
HeartbeatHeader:heartbeat
]
----------------------------------------------------
2012-10-23 03:41:15.996 MULTICAST id=7481 src=/10.120.120.64:45588 dest=/228.8.8.8:45588 (47 bytes)
Msg1 src=17da3e81-158b-4440-50c7-412aebce41e2(de9a6a421087) dest=ALL
flags=[OOB]
headers=[
HeartbeatHeader:heartbeat
]
----------------------------------------------------
2012-10-23 04:25:49.221 MULTICAST id=2868 src=/10.120.180.64:45588 dest=/228.8.8.8:45588 (47 bytes)
Msg1 src=cc74a22f-6e18-1b7a-5521-3abebdd47ab6(3ba17876e725) dest=ALL
flags=[OOB]
headers=[
HeartbeatHeader:heartbeat
]
The Cards was in standby between 03:41:15 and 04:25:49
The logs of Card 0 during the period :
-------------------------------------------------
----------------------------------------------------
2012-10-23 03:41:25.029 MULTICAST id=74b1 src=/10.120.120.64:45588 dest=/228.8.8.8:45588 (47 bytes)
Msg1 src=17da3e81-158b-4440-50c7-412aebce41e2(de9a6a421087) dest=ALL
flags=[OOB]
headers=[
HeartbeatHeader:heartbeat
]
----------------------------------------------------
2012-10-23 03:41:25.961 MULTICAST id=5adb src=/10.120.220.64:45588 dest=/228.8.8.8:45588 (47 bytes)
Msg1 src=f1e9fdac-6d36-d321-6f9d-ec0cbf771608(5ae571864ef0) dest=ALL
flags=[OOB]
headers=[
HeartbeatHeader:heartbeat
]
----------------------------------------------------
2012-10-23 03:41:26.874 MULTICAST id=5ae0 src=/10.120.220.64:45588 dest=/228.8.8.8:45588 (91 bytes)
Msg1 src=f1e9fdac-6d36-d321-6f9d-ec0cbf771608(5ae571864ef0) dest=ALL
flags=[OOB]
headers=[
PingHeader:[PING: type=GET_MBRS_REQ, cluster=REPL, view_id=[f1e9fdac-6d36-d321-6f9d-ec0cbf771608(5ae571864ef0)|2]]
]
----------------------------------------------------
2012-10-23 03:41:27.607 MULTICAST id=362 src=/10.120.180.64:45588 dest=/228.8.8.8:45588 (47 bytes)
Msg1 src=cc74a22f-6e18-1b7a-5521-3abebdd47ab6(3ba17876e725) dest=ALL
flags=[OOB]
headers=[
HeartbeatHeader:heartbeat
]
----------------------------------------------------
2012-10-23 03:41:28.040 MULTICAST id=74bf src=/10.120.120.64:45588 dest=/228.8.8.8:45588 (47 bytes)
Msg1 src=17da3e81-158b-4440-50c7-412aebce41e2(de9a6a421087) dest=ALL
flags=[OOB]
headers=[
HeartbeatHeader:heartbeat
]
----------------------------------------------------
2012-10-23 03:41:28.962 MULTICAST id=5ae8 src=/10.120.220.64:45588 dest=/228.8.8.8:45588 (47 bytes)
Msg1 src=f1e9fdac-6d36-d321-6f9d-ec0cbf771608(5ae571864ef0) dest=ALL
flags=[OOB]
headers=[
HeartbeatHeader:heartbeat
]
----------------------------------------------------
2012-10-23 03:41:30.617 MULTICAST id=36f src=/10.120.180.64:45588 dest=/228.8.8.8:45588 (47 bytes)
Msg1 src=cc74a22f-6e18-1b7a-5521-3abebdd47ab6(3ba17876e725) dest=ALL
flags=[OOB]
headers=[
HeartbeatHeader:heartbeat
]
etc ... heartbeats received every 3 secondes until 06:00
The two cards have been configured with the same IP Address (10.120.180.64) and also virtual NIC (10.120.180.64).
We tested with Mcast.exe on these configuration without problems.
All is working like JGroups (or JAVA) was "plugged" only the card n°1.
JGroups was been configured with this parameters.
<?xml version="1.0" encoding="UTF-8"?>
<config xmlns="urn:org:jgroups">
<UDP bind_addr="10.120.180.64" bind_interface="eth10" bind_port="7800" diagnostics_addr="224.0.75.75" discard_incompatible_packets="true" enable_bundling="true" enable_diagnostics="true" ip_ttl="10" loopback="true" max_bundle_size="64K" max_bundle_timeout="30" mcast_group_addr="228.8.8.8" mcast_port="45588" mcast_recv_buf_size="25M" mcast_send_buf_size="640K" oob_thread_pool.enabled="true" oob_thread_pool.keep_alive_time="5000" oob_thread_pool.max_threads="8" oob_thread_pool.min_threads="1" oob_thread_pool.queue_enabled="false" oob_thread_pool.queue_max_size="100" oob_thread_pool.rejection_policy="Run" singleton_name="UDP" thread_naming_pattern="pl" thread_pool.enabled="true" thread_pool.keep_alive_time="5000" thread_pool.max_threads="8" thread_pool.min_threads="2" thread_pool.queue_enabled="false" thread_pool.queue_max_size="100" thread_pool.rejection_policy="Run" tos="8" ucast_recv_buf_size="20M" ucast_send_buf_size="640K"/>
<PING num_initial_members="3" timeout="2000"/>
<MERGE2 max_interval="30000" min_interval="10000"/>
<FD_SOCK bind_addr="10.120.180.64" bind_interface="eth10"/>
<FD_ALL/>
<VERIFY_SUSPECT bind_addr="10.120.180.64" bind_interface="eth10" timeout="1500"/>
<pbcast.NAKACK discard_delivered_msgs="false" exponential_backoff="150" gc_lag="0" retransmit_timeout="300,600,1200" use_mcast_xmit="true" use_stats_for_retransmission="false"/>
<UNICAST timeout="300,600,1200"/>
<pbcast.STABLE desired_avg_gossip="50000" max_bytes="4M" stability_delay="1000"/>
<pbcast.GMS join_timeout="5000" print_local_addr="true" view_bundling="true"/>
<UFC max_credits="2M" min_threshold="0.4"/>
<MFC max_credits="2M" min_threshold="0.4"/>
<FRAG2 frag_size="60K"/>
<pbcast.STREAMING_STATE_TRANSFER bind_addr="10.120.180.64" bind_interface="eth10" bind_port="7810" socket_buffer_size="16384" use_default_transport="false"/>
</config>
Have you ever heard about NIC teaming problems ?
Thanks.
Pascal BROUWET
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years, 8 months
[JBoss JIRA] (JGRP-1531) Message Headers Not Encrypted
by Edward Sutter (JIRA)
Edward Sutter created JGRP-1531:
-----------------------------------
Summary: Message Headers Not Encrypted
Key: JGRP-1531
URL: https://issues.jboss.org/browse/JGRP-1531
Project: JGroups
Issue Type: Bug
Affects Versions: 3.1
Reporter: Edward Sutter
Assignee: Bela Ban
I am running 3.1.0-FINAL and have observed that message headers are not
being encrypted even though the "encrypt_entire_message" option is set to
true. After stepping through the code I suspect that the headers are being
appended to the output stream that is used at the TCP layer.
Specifically...
org.jgroups.protocols.ENCRYPT.sendDown(Event evt) will encrypt the message
(including headers) and send down the stack via this block...
private void sendDown(Event evt) throws Exception {
...
EncryptHeader hdr=new EncryptHeader(EncryptHeader.ENCRYPT, getSymVersion());
hdr.encrypt_entire_msg=this.encrypt_entire_message;
if(encrypt_entire_message) {
byte[] serialized_msg=Util.streamableToByteBuffer(msg);
byte[] encrypted_msg=encryptMessage(symEncodingCipher,
serialized_msg,
0,
serialized_msg.length);
Message tmp=msg.copy(false); // we need to preserve headers which may already be present
tmp.setBuffer(encrypted_msg);
if(tmp.getSrc() == null)
tmp.setSrc(local_addr);
tmp.putHeader(this.id, hdr);
passItDown(new Event(Event.MSG, tmp));
return;
}
...
}
Note that the unencrypted headers are preserved even though the encrypted
headers are included in "encrypted_msg".
Later on, org.jgroups.protocols.TP.writeMessage(Message msg, DataOutputStream dos, boolean multicast) will call
Message.writeTo(DataOutput) to build the message that is sent. The problem
seems to be that Message.writeTo(DataOutput) will include the encrypted
byte[] (expected) but also include the unencrypted headers that were
preserved.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years, 8 months
[JBoss JIRA] (AS7-2516) Arquillian throws OutOfMemoryError: PermGen space after deploying several big wars on JBoss 7
by Geoffrey De Smet (Created) (JIRA)
Arquillian throws OutOfMemoryError: PermGen space after deploying several big wars on JBoss 7
---------------------------------------------------------------------------------------------
Key: AS7-2516
URL: https://issues.jboss.org/browse/AS7-2516
Project: Application Server 7
Issue Type: Bug
Components: Test Suite
Affects Versions: 7.0.2.Final
Reporter: Geoffrey De Smet
Assignee: Andrew Rubinger
Priority: Minor
Here's the proof (that build is saved forever):
https://hudson.qa.jboss.com/hudson/view/Drools%20jBPM/job/guvnor/1161/con...
Our war is apparently so big, that the 3th deployment fails with PermGen.
A few tests later it even hangs forever (until it times out)
{code}
10:41:56,393 ERROR [org.apache.catalina.core.ContainerBase.[jboss.web].[default-host].[/dc25ad71-9358-412e-8a90-c4471cb3422f].[ArquillianServletRunner]] (http--127.0.0.1-8080-2) Servlet.service() for servlet ArquillianServletRunner threw exception: java.lang.OutOfMemoryError: PermGen space
{code}
How to reproduce?
=================
git clone git@github.com:droolsjbpm/guvnor.git guvnor
cd guvnor
mvn clean install -DskipTests
cd guvnor-webapp
gedit src/test/resources/arquillian.xml
// remove the maxTestClassesBeforeRestart property
mvn test
Using arquillian 1.0.0.CR5 and jboss 7.0.2.
Workaround: use maxTestClassesBeforeRestart = 1.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.jboss.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years, 8 months
[JBoss JIRA] Created: (JBAS-6326) InterruptedException on shutdown
by Jeff Yu (JIRA)
InterruptedException on shutdown
--------------------------------
Key: JBAS-6326
URL: https://jira.jboss.org/jira/browse/JBAS-6326
Project: JBoss Application Server
Issue Type: Bug
Security Level: Public (Everyone can see)
Components: JCA service
Affects Versions: JBossAS-4.2.2.GA
Reporter: Jeff Yu
Assignee: Jeff Zhang
Priority: Minor
I see the following exception on shutdown with either default, all, production servers (using all's log4j configuration). I'm not sure it is necessary for users to see. Probably needs to be changed to DEBUG level if really not relevant.
09:07:59,647 WARN [NewMsgsWorker] Interrupted waiting for new msg check
java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:474)
at EDU.oswego.cs.dl.util.concurrent.Semaphore.acquire(Semaphore.java:108)
at EDU.oswego.cs.dl.util.concurrent.SemaphoreControlledChannel.take(SemaphoreControlledChannel.java:131)
at org.jboss.resource.adapter.mail.inflow.NewMsgsWorker.run(NewMsgsWorker.java:75)
at org.jboss.resource.work.WorkWrapper.execute(WorkWrapper.java:204)
at org.jboss.util.threadpool.BasicTaskWrapper.run(BasicTaskWrapper.java:275)
at EDU.oswego.cs.dl.util.concurrent.PooledExecutor$Worker.run(PooledExecutor.java:756)
at java.lang.Thread.run(Thread.java:595)
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://jira.jboss.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years, 8 months
[JBoss JIRA] (JBREM-1314) 3873 was closed suddenly
by Richard Zhang (JIRA)
Richard Zhang created JBREM-1314:
------------------------------------
Summary: 3873 was closed suddenly
Key: JBREM-1314
URL: https://issues.jboss.org/browse/JBREM-1314
Project: JBoss Remoting
Issue Type: Bug
Security Level: Public (Everyone can see)
Reporter: Richard Zhang
While JBOSS5.1.0 GA running, there was an exception log in server.log like:
2012-05-30 07:43:43,176 INFO (main)[org.jboss.bootstrap.microcontainer.ServerImpl] JBoss (Microcontainer) [5.1.0.GA (build: SVNTag=JBoss_5_1_0_GA date=200905221053)] Started in 38s:998ms
2012-06-26 21:00:08,515 ERROR (WorkerThread#0[2001:1890:1001:240c:0:0:0:4:50783])[org.jboss.remoting.transport.socket.ServerThread] WorkerThread#0[2001:1890:1001:240c:0:0:0:4:50783] exception occurred during first invocation
java.lang.reflect.InvocationTargetException
at sun.reflect.GeneratedConstructorAccessor422.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at org.jboss.remoting.transport.socket.ServerThread.createServerSocketWrapper(ServerThread.java:909)
at org.jboss.remoting.transport.socket.ServerThread.dorun(ServerThread.java:491)
at org.jboss.remoting.transport.socket.ServerThread.run(ServerThread.java:232)
Caused by: java.io.EOFException
at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2281)
at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2750)
at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:780)
at java.io.ObjectInputStream.<init>(ObjectInputStream.java:280)
at org.jboss.remoting.loading.ObjectInputStreamWithClassLoader.<init>(ObjectInputStreamWithClassLoader.java:100)
at org.jboss.remoting.serialization.impl.java.JavaSerializationManager.createInput(JavaSerializationManager.java:54)
at org.jboss.remoting.marshal.serializable.SerializableUnMarshaller.getMarshallingStream(SerializableUnMarshaller.java:75)
at org.jboss.remoting.marshal.serializable.SerializableUnMarshaller.getMarshallingStream(SerializableUnMarshaller.java:58)
at org.jboss.remoting.transport.socket.ClientSocketWrapper.createInputStream(ClientSocketWrapper.java:179)
at org.jboss.remoting.transport.socket.ClientSocketWrapper.createStreams(ClientSocketWrapper.java:162)
at org.jboss.remoting.transport.socket.ClientSocketWrapper.<init>(ClientSocketWrapper.java:66)
at org.jboss.remoting.transport.socket.ServerSocketWrapper.<init>(ServerSocketWrapper.java:46)
... 6 more
And we checked the 3873 using "netstat -antp" command. Don't know when this port was closed.
Could someone tell me how to prevent it or the root cause. Thanks in advance.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.jboss.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years, 8 months