[infinispan-dev] JGRP-1877

Bela Ban bban at redhat.com
Thu Sep 11 07:44:50 EDT 2014



On 10/09/14 23:03, Sanne Grinovero wrote:
> Very interesting, I didn't know about the negative value being an
> option; don't replace all occurrences though as in some cases
> System.currentTimeMillis() is more appropriate, you can find some
> interesting discussions here:
> http://lists.jboss.org/pipermail/infinispan-dev/2011-October/009277.html

I don't see a single case where I should use currentTimeMillis() (except 
for some simple testing code): nanoTime() is not affected by system 
clock changes, currentTimeMillis() is.

JGroups mostly uses both calls to measure elapsed time, or to determine 
when something has timed out (e.g. an RPC).


> I'm having a test shutdown "hung" right now; all other nodes have
> stopped since minutes already, but the following is still hung.. could
> it be the same problem?

Could be, but I don't know. Chances of this happening are very low.

I'm going to see if I should log the time to wait in a TRACE statement, 
so we can determine extremely high waits (or maybe just log if it 
exceeds a threshold). This would also show negative wait times

> I'm extremely surprised I could hit it.
>
> java.lang.Thread.State: TIMED_WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x000000071b442558> (a
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
> at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2163)
> at org.jgroups.blocks.Request.responsesComplete(Request.java:197)
> at org.jgroups.blocks.Request.execute(Request.java:89)
> at org.jgroups.blocks.MessageDispatcher.sendMessage(MessageDispatcher.java:406)
> at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.processSingleCall(CommandAwareRpcDispatcher.java:370)
> at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommand(CommandAwareRpcDispatcher.java:167)
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.invokeRemotely(JGroupsTransport.java:536)
> at org.infinispan.topology.LocalTopologyManagerImpl.executeOnCoordinator(LocalTopologyManagerImpl.java:324)
> at org.infinispan.topology.LocalTopologyManagerImpl.leave(LocalTopologyManagerImpl.java:128)
> at org.infinispan.statetransfer.StateTransferManagerImpl.stop(StateTransferManagerImpl.java:236)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:483)
> at org.infinispan.commons.util.ReflectionUtil.invokeAccessibly(ReflectionUtil.java:168)
> at org.infinispan.factories.AbstractComponentRegistry$PrioritizedMethod.invoke(AbstractComponentRegistry.java:869)
> at org.infinispan.factories.AbstractComponentRegistry.internalStop(AbstractComponentRegistry.java:674)
> at org.infinispan.factories.AbstractComponentRegistry.stop(AbstractComponentRegistry.java:552)
> - locked <0x00000007195f4dc8> (a org.infinispan.factories.ComponentRegistry)
> at org.infinispan.factories.ComponentRegistry.stop(ComponentRegistry.java:241)
> at org.infinispan.cache.impl.CacheImpl.stop(CacheImpl.java:782)
> at org.infinispan.cache.impl.CacheImpl.stop(CacheImpl.java:777)
> at org.infinispan.query.helper.TestableCluster$Node.kill(TestableCluster.java:270)
> at org.infinispan.query.helper.TestableCluster$Node.access$2(TestableCluster.java:265)
> at org.infinispan.query.helper.TestableCluster.killAll(TestableCluster.java:94)
>
>   -- Sanne
>
>
> On 10 September 2014 14:08, Bela Ban <bban at redhat.com> wrote:
>>
>>
>> On 10/09/14 13:58, Alan Field wrote:
>>> Hey Bela,
>>>
>>>> Just a quick heads up. I'm currently working on
>>>> https://issues.jboss.org/browse/JGRP-1877, which it critical as it
>>>> may: - cause RPCs to return prematurely (possibly with a
>>>> TimeoutException), or - cause RPCs to blocks for a long time (pick
>>>> which one is worse :-))
>>>
>>> How frequently can these errors occur? Is this something that is not
>>> very likely to happen or something that requires an external action
>>> to trigger it? (i.e. changing the time via NTP) Just trying to
>>> determine the priority of this issue.
>>
>>
>> Changing the system time will definitely screw up code that relies on
>> System.currentTimeMillis(). Once I replace this with nanoTime(), this
>> problem should be eliminated.
>>
>> The nanoTime() problem is that an 'origin' chosen by the JVM can be in
>> the future, so all calls to nanoTime() return negative values. Or - if
>> positive - due to numeric overflow, the long can wrap around and become
>> negative.
>>
>> Once this happens, all RPCs (for example) will return immediately,
>> without any response, or throw TimeoutExceptions. This will last for 292
>> years... :-)
>>
>>
>>> Thanks, Alan
>>>
>>> ----- Original Message -----
>>>> From: "Bela Ban" <bban at redhat.com> To:
>>>> infinispan-dev at lists.jboss.org Sent: Wednesday, September 10, 2014
>>>> 12:05:11 PM Subject: [infinispan-dev] JGRP-1877
>>>>
>>>> Just a quick heads up. I'm currently working on
>>>> https://issues.jboss.org/browse/JGRP-1877, which it critical as it
>>>> may: - cause RPCs to return prematurely (possibly with a
>>>> TimeoutException), or - cause RPCs to blocks for a long time (pick
>>>> which one is worse :-))
>>>>
>>>> This is due to my misunderstanding of the semantics of
>>>> System.nanoTime(), I frequently have code like this, which computes
>>>> a future deadline for a timeout:
>>>>
>>>> long wait_time=TimeUnit.NANOSECONDS.convert(timeout,
>>>> TimeUnit.MILLISECONDS); final long target_time=System.nanoTime() +
>>>> wait_time; while(wait_time > 0 && !hasResult) { /* Wait for
>>>> responses: */ wait_time=target_time - System.nanoTime();
>>>> if(wait_time > 0) { try {cond.await(wait_time,
>>>> TimeUnit.NANOSECONDS);} catch(Exception e) {} } } if(!hasResult &&
>>>> wait_time <= 0) throw new TimeoutException();
>>>>
>>>> Variable target_time can possibly become *negative* if nanoTime()
>>>> returns a negative value. If so, hasResult is false and wait_time
>>>> negative, and therefore a TimeoutException would be thrown !
>>>>
>>>> While I'm at it, I'll also fix my uses of
>>>> System.currentTimeMillis(), and replace it with nanoTime(). Our
>>>> good friend Erik has run into issues with RPCs (using
>>>> currentTimeMillis()) hanging forever when their NTP-based servers
>>>> adjusted the time .... backwards !
>>>>
>>>> Please be aware of nanoTime() in your own code, e.g. long
>>>> t0=nanoTime(); ... long t1=nanoTime();
>>>>
>>>> It is *not* guaranteed that t1 > t0 because of numeric overflow
>>>> (t0 might be Long.MAX_VALUE-1 and t1 Long.MAX_VALUE +2 !). The only
>>>> way to compare them is t1 - t0 > 0 (t1 is more recent) or < 0 t0 is
>>>> more recent.
>>>>
>>>> Just thought I wanted to pass this on, in case somebody made the
>>>> same stupid mistake...
>>>>
>>>> Thanks to David Lloyd for pointing this out !
>>>>
>>>> -- Bela Ban, JGroups lead (http://www.jgroups.org)
>>>> _______________________________________________ infinispan-dev
>>>> mailing list infinispan-dev at lists.jboss.org
>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>>
>>> _______________________________________________ infinispan-dev
>>> mailing list infinispan-dev at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>
>>
>> --
>> Bela Ban, JGroups lead (http://www.jgroups.org)
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>

-- 
Bela Ban, JGroups lead (http://www.jgroups.org)


More information about the infinispan-dev mailing list