We are using jbossCache 1.4.1.SP8 jgroups 2.4.1.SP4 on websphere 6.1 (IBM JDK AIX)
This is our config string:
| <config>
| <TCP start_port="58000" sock_conn_timeout="500"
send_buf_size="150000" recv_buf_size="80000"
loopback="false"
| use_send_queues="false" />
| <TCPPING timeout="2000" down_thread="false"
up_thread="false" initial_hosts="host1[58000],host2[58000]"
| port_range="100" num_initial_members="1" />
| <MERGE2 min_interval="10000" max_interval="20000" />
| <FD_SOCK />
| <VERIFY_SUSPECT timeout="1500" up_thread="false"
down_thread="false" />
| <pbcast.NAKACK gc_lag="50"
retransmit_timeout="600,1200,2400,4800" max_xmit_size="8192"
up_thread="false"
| down_thread="false" />
| <UNICAST timeout="600,1200,2400" window_size="100"
min_threshold="10" down_thread="false" />
| <pbcast.STABLE desired_avg_gossip="20000" up_thread="false"
down_thread="false" />
| <FRAG frag_size="8192" down_thread="false"
up_thread="false" />
| <pbcast.GMS join_timeout="5000" join_retry_timeout="2000"
shun="true" print_local_addr="true" />
| <pbcast.STATE_TRANSFER up_thread="true" down_thread="true"
/>
| </config>
|
Our current configuration contains only 2 nodes.
We are seeing a problem where after about a week of normal operation we see some random
1.6gb byte[] allocation attempts. One of our apps attempted to allocate 33 of these 1.6gb
byte[]s and it hung the app. We took a thread and GC dump from our of our frozen
applications and noticed the following.
1. We had 33 jgroups send threads hung. 32 of these thread dumps had the following
stacktrace:
| 3XMTHREADINFO "ConnectionTable.Connection.Sender [10.98.111.61:58001 -
10.98.111.61:58001]" (TID:0x36E07400, sys_thread_t:0x37379850, state:CW, native
ID:0x001D20B5) prio=5
| 4XESTACKTRACE at java/lang/Object.wait(Native Method)
| 4XESTACKTRACE at java/lang/Object.wait(Object.java:199(Compiled Code))
| 4XESTACKTRACE at org/jgroups/util/Queue.remove(Queue.java:257(Compiled
Code))
| 4XESTACKTRACE at
org/jgroups/blocks/BasicConnectionTable$Connection$Sender.run(BasicConnectionTable.java:686(Compiled
Code))
| 4XESTACKTRACE at java/lang/Thread.run(Thread.java:810(Compiled Code))
|
The other thread was:
| 3XMTHREADINFO "ConnectionTable.Connection.Receiver [10.98.111.61:58000 -
10.98.111.62:52906]" (TID:0x36CEAA00, sys_thread_t:0x36D12D08, state:R, native
ID:0x00125049) prio=5
| 4XESTACKTRACE at java/net/SocketInputStream.socketRead0(Native Method)
| 4XESTACKTRACE at
java/net/SocketInputStream.read(SocketInputStream.java:155(Compiled Code))
| 4XESTACKTRACE at
java/io/BufferedInputStream.fill(BufferedInputStream.java:229(Compiled Code))
| 4XESTACKTRACE at
java/io/BufferedInputStream.read1(BufferedInputStream.java:267(Compiled Code))
| 4XESTACKTRACE at
java/io/BufferedInputStream.read(BufferedInputStream.java:324(Compiled Code))
| 4XESTACKTRACE at
java/io/DataInputStream.readFully(DataInputStream.java:202(Compiled Code))
| 4XESTACKTRACE at
java/io/DataInputStream.readInt(DataInputStream.java:380(Compiled Code))
| 4XESTACKTRACE at
org/jgroups/blocks/BasicConnectionTable$Connection.run(BasicConnectionTable.java:575)
| 4XESTACKTRACE at java/lang/Thread.run(Thread.java:810)
|
The 32 threads are associated with sequential ports 58001-58033 so it appears jgroups is
scanning the ports to determine if there are any new nodes in the cluster?
We have not been able to duplicate this problem when using "mping" instead of
"tcpping" for member finding however we are not allowed to use multicast in our
production environment.
We are going to try and change our port_range to smaller number to see if that helps.
does anyone on the board has any other ideas?
Mike
View the original post :
http://www.jboss.com/index.html?module=bb&op=viewtopic&p=4136139#...
Reply to the post :
http://www.jboss.com/index.html?module=bb&op=posting&mode=reply&a...