[jboss-jira] [JBoss JIRA] Commented: (JGRP-1003) deadlock in TP.send - socket write hangs inside a Lock, preventing other threads to do TP.send()
Victor N (JIRA)
jira-events at lists.jboss.org
Tue Feb 9 04:54:26 EST 2010
[ https://jira.jboss.org/jira/browse/JGRP-1003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12511225#action_12511225 ]
Victor N commented on JGRP-1003:
--------------------------------
Seems the cluster splits into 2 partitions:
1) with server3 as coordinator - its view contains 9 members (all excepting server8)
2) with server8 as coordinator - its view contains 6 members (I don't know why - maybe there were 6 members visible before everything stalled in JGroups)
and these 2 partitions do not merge. But where is the root cause of the problem? Is this because of blocked threads (one queue is full, another is empty) in serve8 or something else? Maybe some timeouts in my config are too small?
Can we monitor/diagnose somehow why that queues are full/empty? If I should turn on some logging in JGroups, please let me know.
> deadlock in TP.send - socket write hangs inside a Lock, preventing other threads to do TP.send()
> ------------------------------------------------------------------------------------------------
>
> Key: JGRP-1003
> URL: https://jira.jboss.org/jira/browse/JGRP-1003
> Project: JGroups
> Issue Type: Bug
> Affects Versions: 2.7
> Environment: Linux (Debian), kernel 2.6.x
> Reporter: Victor N
> Assignee: Bela Ban
> Fix For: 2.10
>
> Attachments: j1__.txt, jgroups_s2.txt, jgroups_stacktrace_09feb2010.txt, stacktrace.txt
>
>
> I am using JGroups 2.7.0 GA with the typical protocols stack based on TCP (taken from tcp.xml).
> Sometimes it occurs that socket write operation hangs inside TP.send() -- this is not so untypical for blocking I/O approach! -- after that JGroups is not working at all because most of threads are blocked inside TP.send() waiting for that first thread releasing "out_stream_lock". Below is a fragment from my stack trace.
> 1) the thread that hangs in socket write until forever (maybe due to a broken socket or something else); this thread owns "out_stream_lock in TP class":
> "Timer-1,name,IP:port" daemon prio=10 tid=0x084e5c00 nid=0x15f0 runnable [0x211c1000..0x211c2040]
> java.lang.Thread.State: RUNNABLE
> at java.net.SocketOutputStream.socketWrite0(Native Method)
> at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
> at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
> at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
> at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
> - locked <0x43427300> (a java.io.BufferedOutputStream)
> at java.io.DataOutputStream.flush(DataOutputStream.java:106)
> at org.jgroups.blocks.TCPConnectionMap$TCPConnection.doSend(TCPConnectionMap.java:472)
> at org.jgroups.blocks.TCPConnectionMap$TCPConnection._send(TCPConnectionMap.java:453)
> at org.jgroups.blocks.TCPConnectionMap$TCPConnection.send(TCPConnectionMap.java:436)
> at org.jgroups.blocks.TCPConnectionMap$TCPConnection.access$100(TCPConnectionMap.java:341)
> at org.jgroups.blocks.TCPConnectionMap.send(TCPConnectionMap.java:137)
> at org.jgroups.protocols.TCP.send(TCP.java:53)
> at org.jgroups.protocols.BasicTCP.sendToSingleMember(BasicTCP.java:141)
> at org.jgroups.protocols.TP.doSend(TP.java:1105)
> at org.jgroups.protocols.TP.send(TP.java:1088)
> at org.jgroups.protocols.TP.down(TP.java:907)
> at org.jgroups.protocols.Discovery.down(Discovery.java:363)
> at org.jgroups.protocols.MERGE2.down(MERGE2.java:169)
> at org.jgroups.protocols.FD_SOCK.down(FD_SOCK.java:333)
> at org.jgroups.protocols.FD.down(FD.java:327)
> at org.jgroups.protocols.VERIFY_SUSPECT.down(VERIFY_SUSPECT.java:72)
> at org.jgroups.protocols.BARRIER.down(BARRIER.java:96)
> at org.jgroups.protocols.pbcast.NAKACK.retransmit(NAKACK.java:1530)
> at org.jgroups.protocols.pbcast.NAKACK.retransmit(NAKACK.java:1476)
> at org.jgroups.stack.Retransmitter$Task.run(Retransmitter.java:207)
> at org.jgroups.util.TimeScheduler$TaskWrapper.run(TimeScheduler.java:218)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
> at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:207)
> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:619)
> Locked ownable synchronizers:
> - <0x2b5522e0> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
> - <0x2b5550f8> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
> - <0x434273f0> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
> 2) and there are many threads like this blocked "on out_stream_lock.lock()" call inside TP class:
> "Connection.Receiver [ip:port - ip:port],name,ip:port" prio=10 tid=0x020dc000 nid=0x1fab waiting on condition [0x1365f000..0x1365ffc0]
> java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for <0x2b5550f8> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
> at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:747)
> at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:778)
> at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1114)
> at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:186)
> at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:262)
> at org.jgroups.protocols.TP.send(TP.java:1082)
> at org.jgroups.protocols.TP.down(TP.java:907)
> at org.jgroups.protocols.Discovery.up(Discovery.java:274)
> at org.jgroups.protocols.TP.passMessageUp(TP.java:995)
> at org.jgroups.protocols.TP.access$100(TP.java:52)
> at org.jgroups.protocols.TP$IncomingPacket.handleMyMessage(TP.java:1451)
> at org.jgroups.protocols.TP$IncomingPacket.run(TP.java:1427)
> at java.util.concurrent.ThreadPoolExecutor$CallerRunsPolicy.rejectedExecution(ThreadPoolExecutor.java:1738)
> at org.jgroups.util.ShutdownRejectedExecutionHandler.rejectedExecution(ShutdownRejectedExecutionHandler.java:34)
> at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767)
> at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658)
> at org.jgroups.protocols.TP.dispatchToThreadPool(TP.java:1061)
> at org.jgroups.protocols.TP.receive(TP.java:1038)
> at org.jgroups.protocols.BasicTCP.receive(BasicTCP.java:180)
> at org.jgroups.blocks.TCPConnectionMap$TCPConnection$ConnectionPeerReceiver.run(TCPConnectionMap.java:553)
> at java.lang.Thread.run(Thread.java:619)
>
> Locked ownable synchronizers:
> - None
> (Since all threads from JGroups' ThreadPool are locked, you can see "rejectedExecution" in the last stack trace).
> Maybe we could implement some solution for TCP (blocking I/O)? Maybe it is possible to release the lock before socket write?
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://jira.jboss.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
More information about the jboss-jira
mailing list