[infinispan-dev] JGRP-1877

Sanne Grinovero sanne at infinispan.org
Wed Sep 10 17:03:43 EDT 2014


Very interesting, I didn't know about the negative value being an
option; don't replace all occurrences though as in some cases
System.currentTimeMillis() is more appropriate, you can find some
interesting discussions here:
http://lists.jboss.org/pipermail/infinispan-dev/2011-October/009277.html

I'm having a test shutdown "hung" right now; all other nodes have
stopped since minutes already, but the following is still hung.. could
it be the same problem?
I'm extremely surprised I could hit it.

java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0x000000071b442558> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2163)
at org.jgroups.blocks.Request.responsesComplete(Request.java:197)
at org.jgroups.blocks.Request.execute(Request.java:89)
at org.jgroups.blocks.MessageDispatcher.sendMessage(MessageDispatcher.java:406)
at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.processSingleCall(CommandAwareRpcDispatcher.java:370)
at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommand(CommandAwareRpcDispatcher.java:167)
at org.infinispan.remoting.transport.jgroups.JGroupsTransport.invokeRemotely(JGroupsTransport.java:536)
at org.infinispan.topology.LocalTopologyManagerImpl.executeOnCoordinator(LocalTopologyManagerImpl.java:324)
at org.infinispan.topology.LocalTopologyManagerImpl.leave(LocalTopologyManagerImpl.java:128)
at org.infinispan.statetransfer.StateTransferManagerImpl.stop(StateTransferManagerImpl.java:236)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at org.infinispan.commons.util.ReflectionUtil.invokeAccessibly(ReflectionUtil.java:168)
at org.infinispan.factories.AbstractComponentRegistry$PrioritizedMethod.invoke(AbstractComponentRegistry.java:869)
at org.infinispan.factories.AbstractComponentRegistry.internalStop(AbstractComponentRegistry.java:674)
at org.infinispan.factories.AbstractComponentRegistry.stop(AbstractComponentRegistry.java:552)
- locked <0x00000007195f4dc8> (a org.infinispan.factories.ComponentRegistry)
at org.infinispan.factories.ComponentRegistry.stop(ComponentRegistry.java:241)
at org.infinispan.cache.impl.CacheImpl.stop(CacheImpl.java:782)
at org.infinispan.cache.impl.CacheImpl.stop(CacheImpl.java:777)
at org.infinispan.query.helper.TestableCluster$Node.kill(TestableCluster.java:270)
at org.infinispan.query.helper.TestableCluster$Node.access$2(TestableCluster.java:265)
at org.infinispan.query.helper.TestableCluster.killAll(TestableCluster.java:94)

 -- Sanne


On 10 September 2014 14:08, Bela Ban <bban at redhat.com> wrote:
>
>
> On 10/09/14 13:58, Alan Field wrote:
>> Hey Bela,
>>
>>> Just a quick heads up. I'm currently working on
>>> https://issues.jboss.org/browse/JGRP-1877, which it critical as it
>>> may: - cause RPCs to return prematurely (possibly with a
>>> TimeoutException), or - cause RPCs to blocks for a long time (pick
>>> which one is worse :-))
>>
>> How frequently can these errors occur? Is this something that is not
>> very likely to happen or something that requires an external action
>> to trigger it? (i.e. changing the time via NTP) Just trying to
>> determine the priority of this issue.
>
>
> Changing the system time will definitely screw up code that relies on
> System.currentTimeMillis(). Once I replace this with nanoTime(), this
> problem should be eliminated.
>
> The nanoTime() problem is that an 'origin' chosen by the JVM can be in
> the future, so all calls to nanoTime() return negative values. Or - if
> positive - due to numeric overflow, the long can wrap around and become
> negative.
>
> Once this happens, all RPCs (for example) will return immediately,
> without any response, or throw TimeoutExceptions. This will last for 292
> years... :-)
>
>
>> Thanks, Alan
>>
>> ----- Original Message -----
>>> From: "Bela Ban" <bban at redhat.com> To:
>>> infinispan-dev at lists.jboss.org Sent: Wednesday, September 10, 2014
>>> 12:05:11 PM Subject: [infinispan-dev] JGRP-1877
>>>
>>> Just a quick heads up. I'm currently working on
>>> https://issues.jboss.org/browse/JGRP-1877, which it critical as it
>>> may: - cause RPCs to return prematurely (possibly with a
>>> TimeoutException), or - cause RPCs to blocks for a long time (pick
>>> which one is worse :-))
>>>
>>> This is due to my misunderstanding of the semantics of
>>> System.nanoTime(), I frequently have code like this, which computes
>>> a future deadline for a timeout:
>>>
>>> long wait_time=TimeUnit.NANOSECONDS.convert(timeout,
>>> TimeUnit.MILLISECONDS); final long target_time=System.nanoTime() +
>>> wait_time; while(wait_time > 0 && !hasResult) { /* Wait for
>>> responses: */ wait_time=target_time - System.nanoTime();
>>> if(wait_time > 0) { try {cond.await(wait_time,
>>> TimeUnit.NANOSECONDS);} catch(Exception e) {} } } if(!hasResult &&
>>> wait_time <= 0) throw new TimeoutException();
>>>
>>> Variable target_time can possibly become *negative* if nanoTime()
>>> returns a negative value. If so, hasResult is false and wait_time
>>> negative, and therefore a TimeoutException would be thrown !
>>>
>>> While I'm at it, I'll also fix my uses of
>>> System.currentTimeMillis(), and replace it with nanoTime(). Our
>>> good friend Erik has run into issues with RPCs (using
>>> currentTimeMillis()) hanging forever when their NTP-based servers
>>> adjusted the time .... backwards !
>>>
>>> Please be aware of nanoTime() in your own code, e.g. long
>>> t0=nanoTime(); ... long t1=nanoTime();
>>>
>>> It is *not* guaranteed that t1 > t0 because of numeric overflow
>>> (t0 might be Long.MAX_VALUE-1 and t1 Long.MAX_VALUE +2 !). The only
>>> way to compare them is t1 - t0 > 0 (t1 is more recent) or < 0 t0 is
>>> more recent.
>>>
>>> Just thought I wanted to pass this on, in case somebody made the
>>> same stupid mistake...
>>>
>>> Thanks to David Lloyd for pointing this out !
>>>
>>> -- Bela Ban, JGroups lead (http://www.jgroups.org)
>>> _______________________________________________ infinispan-dev
>>> mailing list infinispan-dev at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>
>> _______________________________________________ infinispan-dev
>> mailing list infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
>
> --
> Bela Ban, JGroups lead (http://www.jgroups.org)
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev


More information about the infinispan-dev mailing list