[JBoss JIRA] Created: (ISPN-425) Stale data read when L1 invalidation happens while UnionConsistentHash is in use
by Galder Zamarreno (JIRA)
Stale data read when L1 invalidation happens while UnionConsistentHash is in use
--------------------------------------------------------------------------------
Key: ISPN-425
URL: https://jira.jboss.org/jira/browse/ISPN-425
Project: Infinispan
Issue Type: Bug
Components: Distributed Cache
Affects Versions: 4.1.0.BETA1
Reporter: Galder Zamarreno
Assignee: Galder Zamarreno
Fix For: 4.1.0.CR1
See below:
----- "Manik Surtani" <manik(a)jboss.org> wrote:
> On 3 May 2010, at 08:51, Galder Zamarreno wrote:
>
> > Resending without log until the message is approved.
> >
> > --
> > Galder Zamarreño
> > Sr. Software Engineer
> > Infinispan, JBoss Cache
> >
> > ----- Forwarded Message -----
> > From: galder(a)redhat.com
> > To: "infinispan -Dev List" <infinispan-dev(a)lists.jboss.org>
> > Sent: Friday, April 30, 2010 6:30:05 PM GMT +01:00 Amsterdam /
> Berlin / Bern / Rome / Stockholm / Vienna
> > Subject: Stale data read when L1 invalidation happens while
> UnionConsistentHash is in use
> >
> > Hi,
> >
> > I've spent all day chasing down a random Hot Rod testsuite failure
> related to distribution. This is the last hurdle to close
> https://jira.jboss.org/jira/browse/ISPN-411. In
> HotRodDistributionTest, which is still to be committed, I test adding
> a new node, doing a put on this node, and then doing a get in a
> different node and making sure that I get what was put. The test
> randomly fails saying that the get returns the old value. The failure
> is nothing to do with Hot Rod itself but rather a race condition where
> union consistent hash is used. Let me explain:
> >
> > 1. An earlier operation had set
> "k-testDistributedPutWithTopologyChanges" key to
> "v5-testDistributedPutWithTopologyChanges".
> > 2. Start a new hot rod server in eq-7969.
> > 2. eq-7969 node calls a put on that key with
> "v6-testDistributedPutWithTopologyChanges". Recipients for the put
> are: eq-7969 and eq-61332.
> > 3. eq-7969 sends an invalidate L1 to all, including eq-13415
> > 4. eq-13415 should invalidate
> "k-testDistributedPutWithTopologyChanges" but it doesn't, since it
> considers that "k-testDistributedPutWithTopologyChanges" is local to
> eq-13415:
> >
> > 2010-04-30 18:02:19,907 6046 TRACE
> [org.infinispan.distribution.DefaultConsistentHash]
> (OOB-2,Infinispan-Cluster,eq-13415:) Hash code for key
> CacheKey{data=ByteArray{size=39, hashCode=17b1683, array=[107, 45,
> 116, 101, 115, 116, 68, 105, 115, 116, ..]}} is 344897059
> > 2010-04-30 18:02:19,907 6046 TRACE
> [org.infinispan.distribution.DefaultConsistentHash]
> (OOB-2,Infinispan-Cluster,eq-13415:) Candidates for key
> CacheKey{data=ByteArray{size=39, hashCode=17b1683, array=[107, 45,
> 116, 101, 115, 116, 68, 105, 115, 116, ..]}} are {5458=eq-7969,
> 6831=eq-61332}
> > 2010-04-30 18:02:19,907 6046 TRACE
> [org.infinispan.distribution.DistributionManagerImpl]
> (OOB-2,Infinispan-Cluster,eq-13415:) Is local
> CacheKey{data=ByteArray{size=39, hashCode=17b1683, array=[107, 45,
> 116, 101, 115, 116, 68, 105, 115, 116, ..]}} to eq-13415 query returns
> true and consistentHash is
> org.infinispan.distribution.UnionConsistentHash@10747b4
> >
> > This is a log with log messages that I added to debug it. The key
> factor here is that UnionConsistentHash is in use, probably due to
> rehashing not having fully finished.
> >
> > 5. The end result is that a read of
> "k-testDistributedPutWithTopologyChanges" in eq-13415 returns
> "v5-testDistributedPutWithTopologyChanges".
> >
> > I thought that maybe we could be more conservative here and if
> rehashing is in progress (or UnionConsistentHash is in use) invalidate
> regardless. Assuming that a put always follows an invalidation in
> distribution and not viceversa, that would be fine. The only downside
> is that you'd be invalidating too much but put would replace the data
> in the node where invalidation should not have happened but it did, so
> not a problem.
> >
> > Thoughts? Alternatively, maybe I need to shape my test so that I
> wait for rehashing to finish, but the problem would still be there.
>
> Yes, this seems to be a bug with concurrent rehashing and invalidation
> rather than HotRod.
>
> Could you modify your test to so the following:
>
> 1. start 2 caches C1 and C2.
> 2. put a key K such that K maps on to C1 and C2
> 3. add a new node, C3. K should now map to C1 and C3.
> 4. Modify the value on C1 *before* rehashing completes.
> 5. See if we see the stale value on C2.
>
> To do this you would need a custom object for K that hashes the way
> you would expect (this could be hardcoded) and a value which blocks
> when serializing so we can control how long rehashing takes.
Since logical addresses are used underneath and these change from one run to the other, I'm not sure how I can generate such key programatically. It's even more complicated to figure out a key that will later, when C3 starts, map to it. Without having these addresses locked somehow, or their hash codes, I can't see how this is doable. IOW, to be able to do this, I need to mock these addresses into giving fixed as hash codes. I'll dig further into this.
>
> I never promised the test would be simple! :)
>
> Cheers
> Manik
> --
> Manik Surtani
> manik(a)jboss.org
> Lead, Infinispan
> Lead, JBoss Cache
> http://www.infinispan.org
> http://www.jbosscache.org
>
>
>
>
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev(a)lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
_______________________________________________
infinispan-dev mailing list
infinispan-dev(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://jira.jboss.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
14 years, 4 months
[JBoss JIRA] Created: (ISPN-799) JoinTask as it invalidates L1 entries should be given precedence in acquiring locks
by Vladimir Blagojevic (JIRA)
JoinTask as it invalidates L1 entries should be given precedence in acquiring locks
------------------------------------------------------------------------------------
Key: ISPN-799
URL: https://jira.jboss.org/browse/ISPN-799
Project: Infinispan
Issue Type: Bug
Components: Locking and Concurrency
Affects Versions: 4.2.0.CR1
Reporter: Vladimir Blagojevic
Assignee: Manik Surtani
Fix For: 4.2.0.Final, 5.0.0.Final
The SingleJoinTest transaction test failure itself is intermittent due to the way addresses are organised in the hash wheel, so you are correct that it is a timing issue. Anyway, it still is a very real problem. Just to re-iterate and to make sure we are talking about the same thing:
1. View is {A, B, C}
2. K is mapped to {A, B}
3. A tx starts to update K, and is prepared. Locks now held for K on {A, B}
4. D joins. D is placed on the hash wheel between A and B. So the new view is {A, D, B, C}
5. As per the test (artificial, I know, but could still happen), the tx waits for a long time before committing. In the case of the test, artificially waits until D has finished joining before committing, by use of a latch.
6. D never joins as even though it receives the prepare for the tx and could potentially commit itself (as a new owner), it fails as it is unable to invalidate K on B.
There are a few solutions here:
1) This is pretty easy to detect. Attempt to acquire the lock with a smaller lock acquisition timeout and if the transaction is still stuck, abort the transaction and proceed with the join.
2) If the blocking node is *not* the transaction originator (as in this case: the tx was started on A), then just force lock removal and tx rollback on B *only*. Let the tx complete on A, since the new joiner will receive the transactional event and will be able to apply it as a new owner.
My vote is to go for solution 1 - a bit more crude, but 2 would be very complex to implement. And even then, would only solve for the invalidation being blocked on a node that did not originate the transaction. E.g., the tx originated on A but the lock issue was on B. If, however, the tx originated on B, *and* B no longer owns the entry in question, then 2 is no longer a solution and the only solution would be 1.
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
14 years, 4 months
[JBoss JIRA] Created: (ISPN-244) Enable external user/framework defined Externalizers
by Galder Zamarreno (JIRA)
Enable external user/framework defined Externalizers
----------------------------------------------------
Key: ISPN-244
URL: https://jira.jboss.org/jira/browse/ISPN-244
Project: Infinispan
Issue Type: Feature Request
Components: RPC
Reporter: Galder Zamarreno
Fix For: 4.1.0.BETA1
Create an internal magic number (i.e. -1 or 255) for user defined externalizers. This is done to avoid users using our number space.
So, internally: <magic_number> <stream>
Users: <magic_number><user defined magic number (int)> <stream>
Mandate unsigned ints so that we can optimise by sending them as variable lenght
Internal frameworks could use high enough numbers for example up to 2 bytes: 5000, 7000, 20000
1 byte: 128
2 bytes: 32767
3 bytes: ...
GlobalConfiguration.registerMarshallable(Class type, Externalizer ext, int id);
Maybe CacheManager better?
CacheManager.registerMarshallable(Class type, Externalizer ext, int id);
Future improvement, maybe generate ids automatically for user defined classes?
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://jira.jboss.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
14 years, 4 months
[JBoss JIRA] Created: (ISPN-826) RemoteLockCleanupStressTest throws EOFException when reading state
by Galder Zamarreño (JIRA)
RemoteLockCleanupStressTest throws EOFException when reading state
------------------------------------------------------------------
Key: ISPN-826
URL: https://jira.jboss.org/browse/ISPN-826
Project: Infinispan
Issue Type: Bug
Components: Marshalling, State transfer
Affects Versions: 4.2.0.CR3, 4.1.0.Final
Reporter: Galder Zamarreño
Assignee: Galder Zamarreño
Priority: Blocker
Fix For: 4.2.0.CR4, 4.2.0.Final
While doing some work to verify ISPN-244, I've spotted that RemoteLockCleanupStressTest throws. Note that this is not related to the ISPN-244 cos the issue is present in 4.2.x as well where ISPN-244 has no bearing:
2010-12-08 18:16:25,945 13951 ERROR [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (Incoming-2,Infinispan-Cluster,NodeC-2057:) Caught while requesting or applying state
org.infinispan.statetransfer.StateTransferException: java.io.EOFException: Read past end of file
at org.infinispan.statetransfer.StateTransferManagerImpl.applyState(StateTransferManagerImpl.java:331)
at org.infinispan.remoting.InboundInvocationHandlerImpl.applyState(InboundInvocationHandlerImpl.java:102)
at org.infinispan.remoting.transport.jgroups.JGroupsTransport.setState(JGroupsTransport.java:598)
at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.handleUpEvent(MessageDispatcher.java:712)
at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:772)
at org.jgroups.JChannel.up(JChannel.java:1422)
at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:954)
at org.jgroups.protocols.pbcast.FLUSH.up(FLUSH.java:478)
at org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER.connectToStateProvider(STREAMING_STATE_TRANSFER.java:525)
at org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER.handleStateRsp(STREAMING_STATE_TRANSFER.java:464)
at org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER.up(STREAMING_STATE_TRANSFER.java:225)
at org.jgroups.protocols.FRAG2.up(FRAG2.java:190)
at org.jgroups.protocols.FC.up(FC.java:483)
at org.jgroups.protocols.pbcast.GMS.up(GMS.java:888)
at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:234)
at org.jgroups.protocols.UNICAST.handleDataReceived(UNICAST.java:615)
at org.jgroups.protocols.UNICAST.up(UNICAST.java:295)
at org.jgroups.protocols.pbcast.NAKACK.up(NAKACK.java:707)
at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:132)
at org.jgroups.protocols.FD.up(FD.java:266)
at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:269)
at org.jgroups.protocols.MERGE2.up(MERGE2.java:210)
at org.jgroups.protocols.Discovery.up(Discovery.java:292)
at org.jgroups.protocols.TP.passMessageUp(TP.java:1093)
at org.jgroups.protocols.TP.access$100(TP.java:56)
at org.jgroups.protocols.TP$IncomingPacket.handleMyMessage(TP.java:1633)
at org.jgroups.protocols.TP$IncomingPacket.run(TP.java:1615)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:680)
Caused by: java.io.EOFException: Read past end of file
at org.jboss.marshalling.AbstractUnmarshaller.eofOnRead(AbstractUnmarshaller.java:184)
at org.jboss.marshalling.AbstractUnmarshaller.readUnsignedByteDirect(AbstractUnmarshaller.java:319)
at org.jboss.marshalling.AbstractUnmarshaller.readUnsignedByte(AbstractUnmarshaller.java:280)
at org.jboss.marshalling.river.RiverUnmarshaller.doReadObject(RiverUnmarshaller.java:207)
at org.jboss.marshalling.AbstractUnmarshaller.readObject(AbstractUnmarshaller.java:85)
at org.infinispan.marshall.jboss.GenericJBossMarshaller.objectFromObjectStream(GenericJBossMarshaller.java:162)
at org.infinispan.marshall.VersionAwareMarshaller.objectFromObjectStream(VersionAwareMarshaller.java:184)
at org.infinispan.statetransfer.StateTransferManagerImpl.processCommitLog(StateTransferManagerImpl.java:228)
at org.infinispan.statetransfer.StateTransferManagerImpl.applyTransactionLog(StateTransferManagerImpl.java:250)
at org.infinispan.statetransfer.StateTransferManagerImpl.applyState(StateTransferManagerImpl.java:320)
... 29 more
Now, in spite of this issue the test passes!!! (WTF?).
So, first things first, the test needs improving to send these issues back up. That probably means changing to a Callable rather than Runnable, and will see which exceptions we wanna swallow and which ones we wanna propagate. Then I'll get going to figure out the cause.
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
14 years, 4 months
[JBoss JIRA] Created: (ISPN-827) DefaultCacheManager don't fallback gracefully to file system based configuration when initiated from within glassfish.
by Jonas Lasson (JIRA)
DefaultCacheManager don't fallback gracefully to file system based configuration when initiated from within glassfish.
----------------------------------------------------------------------------------------------------------------------
Key: ISPN-827
URL: https://jira.jboss.org/browse/ISPN-827
Project: Infinispan
Issue Type: Bug
Components: Configuration
Affects Versions: 4.2.0.CR3, 4.1.0.Final
Environment: Windows 7 64 bit.
Reporter: Jonas Lasson
Assignee: Manik Surtani
When initiating the DefaultCacheManager with an absolute windows path from within Glassfish, a IllegalArgumentException is thrown before the configuration is even tried to be read with FileInputStream.
SEVERE: org.infinispan.config.ConfigurationException: java.lang.IllegalArgumentException: name
at org.infinispan.manager.DefaultCacheManager.<init>(DefaultCacheManager.java:256)
The problem is easily reproducible by doing:
new DefaultCacheManager("c:\\cachetest.xml");
>From within a servlet.
The reason for this problem is found in FileLookup.java line 68, and happens because the classloader throws an IllegalArgumentException for the path provided.
A simple solution would be to catch runtime exceptions when we try to load a resource from the cache loader, and then fall back to reading the config file with FileInputStream
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
14 years, 4 months