DLD continued/optimizing modifications within a tx
by Mircea Markus
Hi,
Right now we support DLD in the following scenarios: local caches,
symmetric tx that create deadlocks over two replicated caches (symmetric
in the sense that A replicates to B and B replicates to A at the same
time).
In the case of distribution (replication as well, especially async one)
there might still be a deadlock if both A and B replicate to C, but they
replicate transactions that would result in deadlocks, so deadlock would
result in C.
Let's take an example, A wants to replicate Tx1 that affects (key1,
key2) in this sequence. B wants to replicate Tx2 that affects (key2,
key1) in this sequence. While replicating on C, Tx1 and Tx2 would result
in a deadlock (classic scenario).
Now a simple way of solving this (if I'm not missing something!!) is to
order the keys in the replicated transaction based on some criterion
(e.g. lexicographic) and apply them in the same sequence: Tx1(key1,key2)
and Tx2(key1, key2). This will avoid deadlocks -> increase throughput.
Now, at the core of this approach is the fact that the order of
operations in the transaction is not relevant - which does not stand in
current implementation. E.g. let's say we have a tx that does
(remove(key), put(key, 'otherVal')). If we change the order result is
totally different - not good! A way to avoid this (and to also reduce
the amount of replication, by compacting changes by key) is to keep the
modifications in a Map<key,Modification>. For each key we keep only the
last value set within it, so if we modify the same key 1000 times within
a tx we only replicate last modification vs current approach where 1000
operations are replicated. If a key is deleted, we still keep it in the
map, with a marker for deletion. This way we only replicate the delta of
modifications, and the order of operations is no longer relevant, so
that we can leverage previously described deadlock detection.
The advantages of this would be higher throughput by reducing the chance
of 'asymmetric' deadlocks (A,B encounter a deadlock while replicating on
C) and possible reduction in the size of the transaction (if it has
multiple operations on the same key). The drawback is mainly the fact
that some additional computation needs to be made to manage the map
content (not a lot I reckon, instead of a List.add we'll do an map.get()
+ map.put(), for each modification within the transaction).
Wdyt?
Cheers,
Mircea
15 years, 5 months
Maven.assembly.useTransitiveDependencies true or false? WAS Re: More demo
by Galder Zamarreno
Hmmmmm, this has a domino effect. Because s3 module depends on
infinispan-core, it now pulls all its dependencies, so we need to
exclude them somehow, cos they're dependencies of the core module and
hence they should reside there rather than have them in each and every
module.
Personally, in spite of the repetition, I prefer explicit dependency
definition rather than letting maven bring all transitive dependencies
and then having to go and do exclusions.
More globally, I think this needs to be put the infinispan-dev list.
During assembly, do we either:
- Leave useTransitiveDependencies as it is, which is false, and make
modules like s3 explicitly define the 3rd party dependencies that
jclouds depends on.
- Or set useTransitiveDependencies to true so that they're all brought
in and exclude direct infinispan-core dependecies from getting to the
lib directories of submodules such as s3.
For the moment, and to make my life easier getting the demo working, I'm
using the latter option.
Thoughts?
On 06/30/2009 05:47 PM, Adrian Cole wrote:
> beware of the assemblies... they are extremely crufty compared to ant ;)
>
> On Tue, Jun 30, 2009 at 5:45 PM, Galder Zamarreno<
> galder.zamarreno(a)redhat.com> wrote:
>
>>
>> On 06/30/2009 05:22 PM, Adrian Cole wrote:
>>
>>> Well.. :) I setup the dependencies of jclouds, so it shouldn't bring
>>> anything crazy in! Besides, a missing dep might be symptomatic of a
>>> larger
>>> issue in the distribution assembly.
>>>
>> Ahh, maybe just found the reason why transitive dependencies are not
>> included:
>>
>> src/main/resources/assemblies/bin.xml:
>> <useTransitiveDependencies>false</useTransitiveDependencies>
>>
>> I'm trying out with that parameter set to true.
>>
>>
>>
>>> Anyway, let me know how it goes.
>>>
>>> on the SNAPSHOT bit, it was irrelevant to the topic. no worries on that.
>>>
>>> Thanks again for sorting this out, Galder.
>>> -a
>>>
>>> On Tue, Jun 30, 2009 at 5:11 PM, Galder Zamarreno<
>>> galder.zamarreno(a)redhat.com> wrote:
>>>
>>>
>>>> On 06/30/2009 05:00 PM, Galder Zamarreno wrote:
>>>>
>>>>
>>>>> On 06/30/2009 04:55 PM, Adrian Cole wrote:
>>>>>
>>>>> many thanks, Galder. It should be a transitive dep from jclouds-s3...
>>>>>> mvn
>>>>>> dependency:tree
>>>>>>
>>>>>> maybe the distribution isn't pulling in deps properly.
>>>>>>
>>>>>> Yeah, that looks to be the issue. At the moment I'm trying to redefine
>>>>> them in the s3 project but maybe there's another way to do this? Google
>>>>> to the rescue...
>>>>>
>>>>> Hmmm, even though it's repetition, I think it might be safer to
>>>> redefine
>>>> the transitive dependencies rather than let it bring them all. You never
>>>> know what Maven might bring with it...
>>>>
>>>>
>>>>
>>>> I will be cutting a new beta within the next couple days, so maybe you
>>>>>> can
>>>>>> switch temporarily to snapshot to ensure it will work?
>>>>>>
>>>>>> Hmmm, why would I need a snapshot of jclouds? I've simply been adding
>>>>> the transitive dependencies manually in s3/pom.xml
>>>>>
>>>>>
>>>>> Thanks,
>>>>>> -Adrian
>>>>>>
>>>>>> On Tue, Jun 30, 2009 at 4:40 PM, Galder Zamarreno<
>>>>>> galder.zamarreno(a)redhat.com> wrote:
>>>>>>
>>>>>> Hi Adrian,
>>>>>>
>>>>>>> I'm trying to make further progress with the demo and I'm
>>>>>>> encountering the
>>>>>>> following issue:
>>>>>>>
>>>>>>> Caused by: java.lang.ClassNotFoundException:
>>>>>>> org.jclouds.logging.config.LoggingModule
>>>>>>>
>>>>>>> The list of jars that is distributed does not include
>>>>>>> jclouds-core-1.0-beta-1.jar and this looks to be cos there's no direct
>>>>>>> dependency on this jar in the pom.xml for the s3 cache store module.
>>>>>>>
>>>>>>> FYI: I'm adding this dependency. I think there's a similar issue with
>>>>>>> google guice...
>>>>>>>
>>>>>>> Anyway, just letting you know of issues as I find them.
>>>>>>>
>>>>>>> See ya,
>>>>>>> --
>>>>>>> Galder Zamarreño
>>>>>>> Sr. Software Engineer
>>>>>>> Infinispan, JBoss Cache
>>>>>>>
>>>>>>>
>>>>>>> --
>>>> Galder Zamarreño
>>>> Sr. Software Engineer
>>>> Infinispan, JBoss Cache
>>>>
>>>>
>> --
>> Galder Zamarreño
>> Sr. Software Engineer
>> Infinispan, JBoss Cache
>>
>
--
Galder Zamarreño
Sr. Software Engineer
Infinispan, JBoss Cache
15 years, 5 months
DeadlockDetection (DLD) - benchmarks and status
by Mircea Markus
Hi,
I've extended the original DLD design to also support deadlock detection
on local caches and updated design forum [1].
This, together with the replicated deadlock detection is implemented in
trunk (some minor stuff to do still: DLD for aggregation methods like
clear and addAll + unit test).
I've also created a benchmark to test what's the throughput (tx/min)
between caches running with and without DLD.
You can find full test description within test class:
http://anonsvn.jboss.org/repos/infinispan/trunk/core/src/test/java//org/i...
Local DLD does good job (cca 5.5 times better) but replicated DLD does
extraordinary: cca 101 better throughput (see attached).
I think DLD is cool stuff and differentiates us a bit from competition,
afaik none of them have a DLD.
One more thing that's worth mentioning: while running DLD tests I've
noticed that if all tx acquire locks on keys in same order, then no
deadlocks exists. This is logic and might seem obvious, but is not
stated anywhere and the performance increase by doing this might be very
dramatic. I think we should document this as a best practice when it
comes to transactions - any suggestion where?
I also intend to blog about it shortly.
Cheers,
Mircea
[1] http://www.jboss.org/index.html?module=bb&op=viewtopic&p=4244838#4244838
15 years, 5 months
JPA and the query API
by Manik Surtani
Guys
I was just thinking whether it makes sense to make the Query API a
public one if we're going to support querying via the JPA interface.
Does it make more sense to only allow querying via the JPA API?
What do folks think? One less API to support?
Cheers
--
Manik Surtani
manik(a)jboss.org
Lead, Infinispan
Lead, JBoss Cache
http://www.infinispan.org
http://www.jbosscache.org
15 years, 5 months
Distribution, take 2
by Manik Surtani
So another good thing that came out of this week's travelling between
JUGs is that I was able to sit down with Adrian Cole in Berlin and,
over a record-breaking number of coffees in a single sitting, we were
able to rehash the distribution algorithm, to remove the weird
complexities and race conditions we have in the current setup. I've
made a note of this below - please have a read through and provide
feedback.
Cheers,
Manik
--
Manik Surtani
manik(a)jboss.org
Lead, Infinispan
Lead, JBoss Cache
http://www.infinispan.org
http://www.jbosscache.org
DIST, take 2!
Previous design failed due to two flaws:
1. Transaction logs maintained on sender, for each recipient.
Plenty of scope for races, or heavily synchronized.
2. Consistent hash attempted to be overly fair in evenly dispersing
nodes across a hash space. Meant that there was an often large and
unnecessary amount of rehashing to do, which exacerbated the problem
in 1.
So we have a new approach, based on the following premises.
1. Consistent hash (CH) based on fixed positions in the hash space
rather than relative ones.
1.1. Pros: limited and finite rehashing, particularly when
there is a leave.
1.1.1. If the leaver is L, only node (L - 1) and node (L + 1) will
have to push state, and only (L + 1) and (L + replCount) will have to
receive state.
1.2. Cons: uneven spread of load (mitigated with grid size)
1.3. Virtual nodes is a good way to reduce this, but leaving VN
out for now since it adds complexity. VN won't change the rest of
the algorithm anyway.
2. State is both pulled (on JOIN) and pushed (on LEAVE)
3. State transfers are implemented using RPC commands and byte
array payloads
3.1. Will later evolve into streaming scheme
4. Implementation notes:
4.1. Each node has a DistributionManager (DM)
4.2. Each DM has a RehashExecutor (with 1 low prio thread)
4.3. LeaveTask and JoinTask are 2 types of tasks handled by this
executor encapsulating the processing that takes place when there is a
leave or join.
4.4. InstallConsistentHashCommand - an RPC command that "installs" a
consistent hash instance on remote nodes.
4.5. GetConsistentHashCommand - an RPC command that "asks" a node to
serialize and transmit across its current CH impl.
4.6. PullStateCommand - an RPC command that "asks" for state.
Command should have an Address of the requestor. Return value for
this command is the state.
4.7. PushStateCommand - an RPC command that "pushes" state by
providing a payload of state.
4.8. Current RPC responses are either SuccessfulResponses or
UnsuccessfulResponses. Introducing a new type, UnsureResponse to deal
with eventual consistency semantics of concurrent operation during a
rehash. ClusteredGetResponseFilter should look for a quorum for
UnsureResponses.
5. JoinTask: This is a PULL based rehash. JoinTask is kicked off
on the JOINER.
5.1. Obtain OLD_CH from coordinator (using
GetConsistentHashCommand)
5.2. Generate TEMP_CH (which is a union of OLD_CH and NEW_CH)
5.3. Broadcast TEMP_CH across the cluster (using
InstallConsistentHashCommand)
5.4. Log all incoming writes/txs and respond with a positive ack.
5.5. Ignore incoming reads, forcing callers to check next owner of
data.
5.6. Ping each node in OLD_CH's view and ask for state
(PullStateCommand)
5.7. Apply state received from 5.6.
5.8. Drain tx log and apply, stop logging writes once drained.
5.9. Reverse 5.5.
5.10. Broadcast NEW_CH so this is applied (using
InstallConsistentHashCommand)
5.11. Loop through data container and unicast invalidations for keys
that "could" exist on OLD_CH and not in NEW_CH
6. When a leave is detected (view change with 1 less member):
6.1. Make a note of the Leaver (L)
6.2. All nodes switch to a NEW_CH, which does not include L
6.3. Nodes (L - replCount + 1), (L + 1) and (L + replCount) kick
off a LeaveTask.
6.3.1. Not everyone in the cluster need be involved in this
rehash thanks to fixed CH positions.
7. LeaveTask: This is PUSH based
7.1. If an existing LeaveTask is running, the existing task should
be cancelled first.
7.2. if is_receiver(): Log all writes/incoming txs
7.3. if is_receiver(): Incoming reads:
7.3.1. If the key was owned by L in OLD_CH and not by self, and in
NEW_CH is owned by self, return UnsureResponse. Else,
SuccessfulResponse.
7.4. if is_pusher():
7.4.1. Iterate over all keys in data container. If any key maps
to L in OLD_CH, that key needs to be rehashed. Call
addState(stateMap, k). See below for details of this algo.
7.4.2. Call PushStateCommand for each recipient in stateMap.
7.5. if is_receiver() and on receiving PushStateCommand
7.5.1. Drain tx log and apply, stop logging writes once drained.
7.5.2. Reverse 7.3.
8. A state map is a data structure representing state generated by
each node to be pushed.
8.1. StateMap{
Map<Address, Map<Object, InternalCacheEntry>> state
}
8.2. addState(stateMap, k):
OL = OLD_CH.locate(k)
if (OL.contains(L) and ((position(L, OL) == last and
current_address == L - 1) or (position(L, OL) != last and
current_address == L + 1)) {
stateMap.add(NEW_CH.locate(k) - OL, k)
break
}
9. Functions to determine who pushes and who receives state during
a leave:
9.1. is_pusher(address): return address == (L + 1) or (L - 1)
9.2. is_receiver(address): return address == (L + 1) or (L +
replCount)
Concurrency
1. Deal with concurrent JOINs
1.1. Concurrent JOINs in completely disjoint parts of the hash space
will have no overlap whatsoever.
1.2. Concurrent JOINs in the same hash space would work as well,
since JOINs are controlled by the joiner.
2. Deal with concurrent LEAVEs
2.1. Concurrent LEAVEs in completely disjoint parts of the hash
space will have no overlap whatsoever.
2.2. Concurrent LEAVEs that do not affect the same nodes involved in
providing or receiving state will not be a problem.
2.3. Concurrent LEAVES affecting the same nodes already providing or
receiving state will interrupt the LeaveTasks on those nodes. Leave
tasks will need to clean up appropriately, which may involve issuing a
"cleanup command" to ensure partially transmitted state is reconciled.
3. Deal with concurrent LEAVE and JOIN
3.1. Concurrent JOIN and LEAVEs in completely disjoint parts of the
hash space will have no overlap whatsoever.
3.2. FOr overlapping areas of hash space, this needs further thought.
15 years, 5 months
ISPN-105 and ISPN-106
by Klaus Friedel
Hi !
I recently submitted two RFEs to enhance Jdbc-Store(s)
The first one suggests to add support for JNDIDatasoures.
The second one suggests extending the backing JDBC table with a
"cacheName" column to make CacheManagers connected with a Jdbc-Store
usable for more than one Cache.
I could spend some time in implementing these.
How should I contribute ?
Klaus Friedel
15 years, 5 months
Re: [infinispan-dev] [jboss-dev] Infinispan configuration - all in one solution
by Vladimir Blagojevic
On 7/17/09 5:16 PM, Bill Burke wrote:
> Wouldn't it be better to write a javadoc-like generator that
> introspected for JAXB and Inifispan annotations instead of writing your
> own XML parser and schema generator? Or maybe I'm misunderstanding your
> post.
>
> In other words, use JAXB for marshalling/unmarshalling/schema
> generation. Use inifinispan javadoc generator for documentation.
>
I looked around a bit more to see that now in JAXB 2.x we have
annotations as well :) If we annotate our configuration beans with JAXB
annotations I suppose we can do exactly what you suggest - use JAXB for
marshalling/unmarshalling/schema generation and at the same time use
those annotation in a custom tool that creates reference documentation.
> By creating your own XML parsing you've just complicated the maintenance
> problems for future maintainers of infinispan.
>
I agree.
Cheers,
Vladimir
15 years, 5 months
Clustering mode attribute
by Vladimir Blagojevic
Hey dudes,
Ambiguity that we allow in attribute mode of <clustering> element is not
good IMHO. Normally, we use allowedValues attribute of
ConfigurationElement annotation to capture that the type even though is
a String or int or whatever only allows only certain set of values. We
constrain properly locking, shutdownHook and what not. However, for mode
attribute in <clustering> we allow all possible combination of
substrings, upper caps, lower caps and what not.
I was wondering if we can constrain the allowed values to LOCAL,
INVALIDATION, REPL, and DIST. Presence of either <async> or <sync>
subelement of <clustering> determines if it is synchronous or not.
Having these clear constraints will be automatically documented in
configuration reference and reflected in generated schema. The downside
is that current alpha customers that created their own configuration
files will have to change their values to new ones.
WDYT?
Vladimir
15 years, 5 months