deadlock detection - coin toss
by Mircea Markus
this is regarding: http://tinyurl.com/nunmyu
In order to be able to determine which tx should commit and which will
rollback, I need to generate a coin toss for each node - a random number.
In order to do that I'm thinking to use on of the following:
1) use java.util.Random . The issue with this class is that it has an
undocumented behavior when it comes to concurrent access, so I cannot
rely on it behaving correctly if I don't enforce proper
synchronization. I'm afraid this might become a bottleneck as all the
prepapares requests might want to synchronize on it.
2) Another approach would be to build an Random instance for each call.
I made a test an the performance cost of new Random() + Random.nextLong
is about 4 times bigger than call of random.nextLong only (on my laptop
only), still about 15 mils for 1k creations
3) use System.nanos. Even though the conflicting tx are created at the
same time, the chance for it to happen in the same nanosecond is very
small, I would say. Don't know how this works for all architectures,
though...
Any feedback much welcomed!
Cheers,
Mircea
15 years, 8 months
Minor semantic rules in conf files
by Vladimir Blagojevic
Guys,
Currently as soon as transactional element is in xml config file, even
if transactionManagerLookupClass is not specified, it defaults to a
certain value. Having these minor semantic rules require special
processing of elements during xml reading. Of course, the design
accommodates these special rules (case of CacheLoaderManagerConfig where
we add individual CacheLoaderConfigs). So far there are only two special
processing elements (another is clustering).
Can we avoid it for transaction element, and if yes, how? By requiring
transactionManagerLookupClass to be mandatory?
Cheers,
Vladimir
15 years, 8 months
Re: Distributed hashing - heterogenous nodes
by Alex Kluge
Hi,
I have been working on a use of Jboss cache, which has a lot of overlap with the Infinispan project. I expect to be able to employ significant parts of this work in Infinispan as well. One point of overlap is the use of a consistent hash.
I have looked at the org.infinispan.distribution.DefaultConsistentHash, and this is actually a simple hash, and not a consistent hash. Luckily I have a version of a consistent hash that can almost be dropped in here. There are a number of properties of a consistent hash that make it valuable for a distributed hash table.
- If a server is removed, the number of keys that shift to a different
bin (different server) is minimal.
- The same key is always mapped to the same server.
- If a server is added, the number of keys that shift is minimal.
The current DefaultConsistentHash doesn't deliver on these. I hope you don't mind if I go into some details here.
For example, the hash is sensitive to the order in which servers appear in the initial collection of caches. If one cache is built with a list of servers (S1, S2, S3), and another is built with a list (S3, S2, S1), keys will be mapped to different servers, even though the set of servers is actually the same.
If one server is removed, many, or even all, keys will be shifted. For example one hash with the set of servers (S1, S2, S3) will map many keys to different servers than one with (S2, S3). In a true consistent hash, the keys originally mapped to S2 will remain mapped to S2, and those mapped to S3 will remain mapped to S3. The keys that were mapped to S1 will (depending on the exact implementation) will be divided between S1 and S2.
There are a few differences, specifically, I work with arrays rather than collections – in part for performance, I also support weights for the servers, and the replication count is an instance variable rather than an argument to the locate method. How wedded are you to supplying the replication count as part of the locate method? Other than this, it looks like an adaptation of my implementation to Infinispan would be fairly painless, and I suggest replacing the current implementation with it.
Alex Kluge
15 years, 8 months
Re: [infinispan-dev] Re: Distributed hashing - heterogenous nodes
by Alex Kluge
Hi,
> First of all, apologies for the delay getting back to you.
No problem, we are all busy. (OK, I confess it was the Fourth of July and I was on vacation :))
> With regards to keeping replication count as instance variable or method
> parameter, I need to look at it in more detailed. I'll reply to asap.
That's actually an easy change, there are a number of options. A slightly difficult issue
is that I use arrays. It is much easier to locate the additional nodes for replication with
an array than with other data structures.
> Apologies for my ignorance if this has mentioned before but I don't
> remember, do you have committer access to the Infininispan code already?
Nope – I could always send a copy to the dev mailing list – but I am not sure about
the appropriateness of dropping about 480 lines of code onto a mailing list. And
it's anybody's guess how it would be formatted.
Thanks,
Alex
15 years, 8 months
[Fwd: [Fwd: Sometimes TCP responses not getting through on localhost]]
by Galder Zamarreno
Zipped version :)
-------- Original Message --------
Subject: [Fwd: Sometimes TCP responses not getting through on localhost]
Date: Wed, 08 Jul 2009 17:52:14 +0200
From: Galder Zamarreno <galder.zamarreno(a)redhat.com>
To: infinispan-dev(a)lists.jboss.org <infinispan-dev(a)lists.jboss.org>
As a FYI: Not sure if you've seen similar but when running DIST tests, I
randomly get stoppages like the one below. Just sending it to the rest
of the team in case they have more info.
Vladimir is planning to have a look to them at some point.
-------- Original Message --------
Subject: Sometimes TCP responses not getting through on localhost
Date: Tue, 07 Jul 2009 09:10:26 +0200
From: Galder Zamarreno <galder.zamarreno(a)redhat.com>
To: Vladimir Blagojevic <vladimir.blagojevic(a)jboss.com>
Hi Vladimir,
I'm running one of the Infinispan distribution tests locally and from
time to time, I'm seeing some stoppage receiving a response from one of
the nodes. See attached infinispan and jgroups TRACE log.
More importantly, focus on request id=1246949535122.
First, 47089 sends a clustered get request:
2009-07-07 08:52:15,122 5920 TRACE [org.jgroups.protocols.TCP] (main:)
sending msg to null, src=localhost.localdomain-47089, headers are
MsgDisp: [Header: name=MsgDisp, type=REQ, id=1246949535122,
rsp_expected=true], dest_mbrs=[localhost.localdomain-44649,
localhost.localdomain-15543], NAKACK: [MSG, seqno=1], TCP:
[channel_name=Infinispan-Cluster]
44649 deals with it and replies:
infinispan.log:2749:2009-07-07 08:52:15,124 5922 TRACE
[org.jgroups.blocks.RequestCorrelator]
(Incoming-2,Infinispan-Cluster,localhost.localdomain-44649:) calling
(org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher)
with request 1246949535122
infinispan.log:2761:2009-07-07 08:52:15,125 5923 TRACE
[org.jgroups.blocks.RequestCorrelator]
(Incoming-2,Infinispan-Cluster,localhost.localdomain-44649:) sending rsp
for 1246949535122 to localhost.localdomain-47089
infinispan.log:2765:2009-07-07 08:52:15,126 5924 TRACE
[org.jgroups.protocols.TCP]
(Incoming-2,Infinispan-Cluster,localhost.localdomain-44649:) sending msg
to localhost.localdomain-47089, src=localhost.localdomain-44649, headers
are MsgDisp: [Header: name=MsgDisp, type=RSP, id=1246949535122,
rsp_expected=false], UNICAST: [UNICAST: DATA, seqno=1,
conn_id=1246949535125, first], TCP: [channel_name=Infinispan-Cluster]
15543 deals with it too and replies:
infinispan.log:2788:2009-07-07 08:52:15,129 5927 TRACE
[org.jgroups.blocks.RequestCorrelator]
(Incoming-1,Infinispan-Cluster,localhost.localdomain-15543:) calling
(org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher)
with request 1246949535122
infinispan.log:2800:2009-07-07 08:52:15,129 5927 TRACE
[org.jgroups.blocks.RequestCorrelator]
(Incoming-1,Infinispan-Cluster,localhost.localdomain-15543:) sending rsp
for 1246949535122 to localhost.localdomain-47089
infinispan.log:2804:2009-07-07 08:52:15,130 5928 TRACE
[org.jgroups.protocols.TCP]
(Incoming-1,Infinispan-Cluster,localhost.localdomain-15543:) sending msg
to localhost.localdomain-47089, src=localhost.localdomain-15543, headers
are MsgDisp: [Header: name=MsgDisp, type=RSP, id=1246949535122,
rsp_expected=false], UNICAST: [UNICAST: DATA, seqno=1,
conn_id=1246949535130, first], TCP: [channel_name=Infinispan-Cluster]
47089 notes receiving it from 44649:
infinispan.log:2806:2009-07-07 08:52:15,130 5928 TRACE
[org.jgroups.protocols.TCP]
(OOB-1,Infinispan-Cluster,localhost.localdomain-47089:) message is [dst:
localhost.localdomain-47089, src: localhost.localdomain-15543 (3
headers), size=14 bytes, flags=OOB], headers are MsgDisp: [Header:
name=MsgDisp, type=RSP, id=1246949535122, rsp_expected=false], UNICAST:
[UNICAST: DATA, seqno=1, conn_id=1246949535130, first], TCP:
[channel_name=Infinispan-Cluster]
But after 15 seconds, 15543 has not been received:
2009-07-07 08:52:30,132 20930 TRACE [org.jgroups.blocks.GroupRequest]
(main:) timed out waiting for responses
2009-07-07 08:52:30,133 20931 TRACE [org.jgroups.blocks.GroupRequest]
(main:) call did not execute correctly, request is [req_id=1246949535122
caller=localhost.localdomain-47089
entries:
localhost.localdomain-15543: sender=localhost.localdomain-15543,
retval=null, received=false, suspected=false
localhost.localdomain-44649: sender=localhost.localdomain-44649,
retval=SuccessfulResponse, received=true, suspected=false
Any idea what could be causing this? This is happening randomly, not in
all runs. Is this related to that SingletonCacheStore stuff we saw last
time around?
Cheers,
--
Galder Zamarreño
Sr. Software Engineer
Infinispan, JBoss Cache
--
Galder Zamarreño
Sr. Software Engineer
Infinispan, JBoss Cache
--
Galder Zamarreño
Sr. Software Engineer
Infinispan, JBoss Cache
15 years, 8 months
https://jira.jboss.org/jira/browse/ISPN-38
by Mircea Markus
Hi,
This is related to https://jira.jboss.org/jira/browse/ISPN-38 (Early
distributed deadlock detection).
When I determine that a deadlock scenario happens, I try to interrupt a
tx originated locally. This translates to an thread.interrupt() call on
a thread that is replicating to other caches. The issue is that this
call is ignored in jgroups's GroupRequest:
private boolean collectResponses(long timeout) {
.....
if(timeout > 0) {
try {
completed.await(timeout, TimeUnit.MILLISECONDS);
//LINE 552
}
catch(Exception e) {
}
}
....
}
So my call to interrupt is ignored (line 552) and the collectResponses
method will only return after timeout elapses - not good for me, as I
want the method to return immediately if thread is interrupted.
Would it be possible to enhance the code to be aware of interrupt calls?
This is in JGroups 2.8.0 Beta2.
Cheers,
Mircea
15 years, 8 months
[ISPN-32] API changes for query module
by Navin Surtani
Heya guys,
I'm just getting going with the query module for infinispan (ISPN-32
in branches) and I just wanted to send around a couple of emails so I
could get an idea on some changes that I was thinking about making.
1 - query module: -
I decided this would be a better name than 'searchable' from JBCS
because I think that this would be a better name, and also JBCS never
really took off imo so I don't really want to stick with the old name.
2 - createQuery(): -
This was JBCS's only extra API method on the cache interface (or
SearchableCache) and I was thinking that maybe this is quite a long
method name - given that the cache has put(), get() etc ... I was
thinking of using query() but given that Lucene and Hibernate Search
both have Query classes that users deal with I'm wondering if there's
too much use of the word query. On the flipside, those are classes and
not API methods so it might work.
I'm not too worried about continuing on with createQuery() but think
it might work.
Another alternative would be search() but I'm not too convinced by that.
3 - SearchableCache: -
Should the name of this class be changed? I don't think so but if
anyone has any ideas ...
Thanks guys,
Navin Surtani
Intern Infinispan
Intern JBoss Cache Searchable
15 years, 8 months
Preview of configuration reference
by Vladimir Blagojevic
Hi,
Here is how (before all nice stylesheets are applied) infinispan
configuration reference could look like! Note this is only partial
reference - I do not want to proceed with annotating all the
configuration beans yet.
Feedback appreciated!
Regards,
Vladimir
15 years, 8 months
Running GUI from Eclipse
by Galder Zamarreno
Hi guys,
Running the GUI demo from Eclipse throws an NPE:
Exception in thread "main" java.lang.NullPointerException
at org.infinispan.demo.InfinispanDemo.<init>(InfinispanDemo.java:100)
at org.infinispan.demo.InfinispanDemo.main(InfinispanDemo.java:89)
It's clear that IntelliJ does some kind of magic underneath with
InfinispanDemo.form but I haven't been able to figure out yet. If anyone
has seen this b4 and knows how to get such UI running from Eclipse, help
would be appreciated. Maybe something related to the magic done by
https://svn.codehaus.org/mojo/tags/ideauidesigner-maven-plugin-1.0-alpha-2/
?
Alternatively, does anyone know how to enable the right logging to see
why IntelliJ classes are not doing what's needed?
Cheers,
--
Galder Zamarreño
Sr. Software Engineer
Infinispan, JBoss Cache
15 years, 8 months