[jboss-jira] [JBoss JIRA] Commented: (JGRP-1003) deadlock in TP.send - socket write hangs inside a Lock, preventing other threads to do TP.send()
Victor N (JIRA)
jira-events at lists.jboss.org
Fri Feb 19 04:35:17 EST 2010
[ https://jira.jboss.org/jira/browse/JGRP-1003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12515353#action_12515353 ]
Victor N commented on JGRP-1003:
--------------------------------
The problem is reproduced again at one of our servers. I do not see anything specific in log files - I just see that all my threads are blocked trying to send via JGroups. And JGroups itself can not process its join/merge procedure (because of the queue problem mentioned earlier). No exceptions in my log file around the time the problem has appeared. It just hangs waiting on the queue.
Bela, do you have any ideas? Maybe we could add some try/catch(Throwable) blocks and log these events in all critical places? Maybe some threads are not working?
> deadlock in TP.send - socket write hangs inside a Lock, preventing other threads to do TP.send()
> ------------------------------------------------------------------------------------------------
>
> Key: JGRP-1003
> URL: https://jira.jboss.org/jira/browse/JGRP-1003
> Project: JGroups
> Issue Type: Bug
> Affects Versions: 2.7
> Environment: Linux (Debian), kernel 2.6.x
> Reporter: Victor N
> Assignee: Bela Ban
> Fix For: 2.10
>
> Attachments: j1__.txt, jgroups_s2.txt, jgroups_stacktrace_09feb2010.txt, stacktrace.txt
>
>
> I am using JGroups 2.7.0 GA with the typical protocols stack based on TCP (taken from tcp.xml).
> Sometimes it occurs that socket write operation hangs inside TP.send() -- this is not so untypical for blocking I/O approach! -- after that JGroups is not working at all because most of threads are blocked inside TP.send() waiting for that first thread releasing "out_stream_lock". Below is a fragment from my stack trace.
> 1) the thread that hangs in socket write until forever (maybe due to a broken socket or something else); this thread owns "out_stream_lock in TP class":
> "Timer-1,name,IP:port" daemon prio=10 tid=0x084e5c00 nid=0x15f0 runnable [0x211c1000..0x211c2040]
> java.lang.Thread.State: RUNNABLE
> at java.net.SocketOutputStream.socketWrite0(Native Method)
> at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
> at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
> at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
> at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
> - locked <0x43427300> (a java.io.BufferedOutputStream)
> at java.io.DataOutputStream.flush(DataOutputStream.java:106)
> at org.jgroups.blocks.TCPConnectionMap$TCPConnection.doSend(TCPConnectionMap.java:472)
> at org.jgroups.blocks.TCPConnectionMap$TCPConnection._send(TCPConnectionMap.java:453)
> at org.jgroups.blocks.TCPConnectionMap$TCPConnection.send(TCPConnectionMap.java:436)
> at org.jgroups.blocks.TCPConnectionMap$TCPConnection.access$100(TCPConnectionMap.java:341)
> at org.jgroups.blocks.TCPConnectionMap.send(TCPConnectionMap.java:137)
> at org.jgroups.protocols.TCP.send(TCP.java:53)
> at org.jgroups.protocols.BasicTCP.sendToSingleMember(BasicTCP.java:141)
> at org.jgroups.protocols.TP.doSend(TP.java:1105)
> at org.jgroups.protocols.TP.send(TP.java:1088)
> at org.jgroups.protocols.TP.down(TP.java:907)
> at org.jgroups.protocols.Discovery.down(Discovery.java:363)
> at org.jgroups.protocols.MERGE2.down(MERGE2.java:169)
> at org.jgroups.protocols.FD_SOCK.down(FD_SOCK.java:333)
> at org.jgroups.protocols.FD.down(FD.java:327)
> at org.jgroups.protocols.VERIFY_SUSPECT.down(VERIFY_SUSPECT.java:72)
> at org.jgroups.protocols.BARRIER.down(BARRIER.java:96)
> at org.jgroups.protocols.pbcast.NAKACK.retransmit(NAKACK.java:1530)
> at org.jgroups.protocols.pbcast.NAKACK.retransmit(NAKACK.java:1476)
> at org.jgroups.stack.Retransmitter$Task.run(Retransmitter.java:207)
> at org.jgroups.util.TimeScheduler$TaskWrapper.run(TimeScheduler.java:218)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
> at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:207)
> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:619)
> Locked ownable synchronizers:
> - <0x2b5522e0> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
> - <0x2b5550f8> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
> - <0x434273f0> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
> 2) and there are many threads like this blocked "on out_stream_lock.lock()" call inside TP class:
> "Connection.Receiver [ip:port - ip:port],name,ip:port" prio=10 tid=0x020dc000 nid=0x1fab waiting on condition [0x1365f000..0x1365ffc0]
> java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for <0x2b5550f8> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
> at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:747)
> at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:778)
> at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1114)
> at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:186)
> at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:262)
> at org.jgroups.protocols.TP.send(TP.java:1082)
> at org.jgroups.protocols.TP.down(TP.java:907)
> at org.jgroups.protocols.Discovery.up(Discovery.java:274)
> at org.jgroups.protocols.TP.passMessageUp(TP.java:995)
> at org.jgroups.protocols.TP.access$100(TP.java:52)
> at org.jgroups.protocols.TP$IncomingPacket.handleMyMessage(TP.java:1451)
> at org.jgroups.protocols.TP$IncomingPacket.run(TP.java:1427)
> at java.util.concurrent.ThreadPoolExecutor$CallerRunsPolicy.rejectedExecution(ThreadPoolExecutor.java:1738)
> at org.jgroups.util.ShutdownRejectedExecutionHandler.rejectedExecution(ShutdownRejectedExecutionHandler.java:34)
> at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767)
> at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658)
> at org.jgroups.protocols.TP.dispatchToThreadPool(TP.java:1061)
> at org.jgroups.protocols.TP.receive(TP.java:1038)
> at org.jgroups.protocols.BasicTCP.receive(BasicTCP.java:180)
> at org.jgroups.blocks.TCPConnectionMap$TCPConnection$ConnectionPeerReceiver.run(TCPConnectionMap.java:553)
> at java.lang.Thread.run(Thread.java:619)
>
> Locked ownable synchronizers:
> - None
> (Since all threads from JGroups' ThreadPool are locked, you can see "rejectedExecution" in the last stack trace).
> Maybe we could implement some solution for TCP (blocking I/O)? Maybe it is possible to release the lock before socket write?
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://jira.jboss.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
More information about the jboss-jira
mailing list