[
https://jira.jboss.org/jira/browse/JGRP-1003?page=com.atlassian.jira.plug...
]
Victor N commented on JGRP-1003:
--------------------------------
The problem is reproduced again at one of our servers. I do not see anything specific in
log files - I just see that all my threads are blocked trying to send via JGroups. And
JGroups itself can not process its join/merge procedure (because of the queue problem
mentioned earlier). No exceptions in my log file around the time the problem has appeared.
It just hangs waiting on the queue.
Bela, do you have any ideas? Maybe we could add some try/catch(Throwable) blocks and log
these events in all critical places? Maybe some threads are not working?
deadlock in TP.send - socket write hangs inside a Lock, preventing
other threads to do TP.send()
------------------------------------------------------------------------------------------------
Key: JGRP-1003
URL:
https://jira.jboss.org/jira/browse/JGRP-1003
Project: JGroups
Issue Type: Bug
Affects Versions: 2.7
Environment: Linux (Debian), kernel 2.6.x
Reporter: Victor N
Assignee: Bela Ban
Fix For: 2.10
Attachments: j1__.txt, jgroups_s2.txt, jgroups_stacktrace_09feb2010.txt,
stacktrace.txt
I am using JGroups 2.7.0 GA with the typical protocols stack based on TCP (taken from
tcp.xml).
Sometimes it occurs that socket write operation hangs inside TP.send() -- this is not so
untypical for blocking I/O approach! -- after that JGroups is not working at all because
most of threads are blocked inside TP.send() waiting for that first thread releasing
"out_stream_lock". Below is a fragment from my stack trace.
1) the thread that hangs in socket write until forever (maybe due to a broken socket or
something else); this thread owns "out_stream_lock in TP class":
"Timer-1,name,IP:port" daemon prio=10 tid=0x084e5c00 nid=0x15f0 runnable
[0x211c1000..0x211c2040]
java.lang.Thread.State: RUNNABLE
at java.net.SocketOutputStream.socketWrite0(Native Method)
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
- locked <0x43427300> (a java.io.BufferedOutputStream)
at java.io.DataOutputStream.flush(DataOutputStream.java:106)
at
org.jgroups.blocks.TCPConnectionMap$TCPConnection.doSend(TCPConnectionMap.java:472)
at
org.jgroups.blocks.TCPConnectionMap$TCPConnection._send(TCPConnectionMap.java:453)
at
org.jgroups.blocks.TCPConnectionMap$TCPConnection.send(TCPConnectionMap.java:436)
at
org.jgroups.blocks.TCPConnectionMap$TCPConnection.access$100(TCPConnectionMap.java:341)
at org.jgroups.blocks.TCPConnectionMap.send(TCPConnectionMap.java:137)
at org.jgroups.protocols.TCP.send(TCP.java:53)
at org.jgroups.protocols.BasicTCP.sendToSingleMember(BasicTCP.java:141)
at org.jgroups.protocols.TP.doSend(TP.java:1105)
at org.jgroups.protocols.TP.send(TP.java:1088)
at org.jgroups.protocols.TP.down(TP.java:907)
at org.jgroups.protocols.Discovery.down(Discovery.java:363)
at org.jgroups.protocols.MERGE2.down(MERGE2.java:169)
at org.jgroups.protocols.FD_SOCK.down(FD_SOCK.java:333)
at org.jgroups.protocols.FD.down(FD.java:327)
at org.jgroups.protocols.VERIFY_SUSPECT.down(VERIFY_SUSPECT.java:72)
at org.jgroups.protocols.BARRIER.down(BARRIER.java:96)
at org.jgroups.protocols.pbcast.NAKACK.retransmit(NAKACK.java:1530)
at org.jgroups.protocols.pbcast.NAKACK.retransmit(NAKACK.java:1476)
at org.jgroups.stack.Retransmitter$Task.run(Retransmitter.java:207)
at org.jgroups.util.TimeScheduler$TaskWrapper.run(TimeScheduler.java:218)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:207)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
Locked ownable synchronizers:
- <0x2b5522e0> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
- <0x2b5550f8> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
- <0x434273f0> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
2) and there are many threads like this blocked "on out_stream_lock.lock()"
call inside TP class:
"Connection.Receiver [ip:port - ip:port],name,ip:port" prio=10 tid=0x020dc000
nid=0x1fab waiting on condition [0x1365f000..0x1365ffc0]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x2b5550f8> (a
java.util.concurrent.locks.ReentrantLock$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:747)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:778)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1114)
at
java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:186)
at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:262)
at org.jgroups.protocols.TP.send(TP.java:1082)
at org.jgroups.protocols.TP.down(TP.java:907)
at org.jgroups.protocols.Discovery.up(Discovery.java:274)
at org.jgroups.protocols.TP.passMessageUp(TP.java:995)
at org.jgroups.protocols.TP.access$100(TP.java:52)
at org.jgroups.protocols.TP$IncomingPacket.handleMyMessage(TP.java:1451)
at org.jgroups.protocols.TP$IncomingPacket.run(TP.java:1427)
at
java.util.concurrent.ThreadPoolExecutor$CallerRunsPolicy.rejectedExecution(ThreadPoolExecutor.java:1738)
at
org.jgroups.util.ShutdownRejectedExecutionHandler.rejectedExecution(ShutdownRejectedExecutionHandler.java:34)
at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767)
at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658)
at org.jgroups.protocols.TP.dispatchToThreadPool(TP.java:1061)
at org.jgroups.protocols.TP.receive(TP.java:1038)
at org.jgroups.protocols.BasicTCP.receive(BasicTCP.java:180)
at
org.jgroups.blocks.TCPConnectionMap$TCPConnection$ConnectionPeerReceiver.run(TCPConnectionMap.java:553)
at java.lang.Thread.run(Thread.java:619)
Locked ownable synchronizers:
- None
(Since all threads from JGroups' ThreadPool are locked, you can see
"rejectedExecution" in the last stack trace).
Maybe we could implement some solution for TCP (blocking I/O)? Maybe it is possible to
release the lock before socket write?
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
https://jira.jboss.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira