Re: [infinispan-dev] infinispan-dev Digest, Vol 22, Issue 30
by Ales Justin
> The last item Manik and I disagree on is use of DistributedTaskContext.
> DistributedTaskContext is given to each DistributedCallable once it has
> migrated to remote node for execution. DistributedTaskContext might
> evolve and I'd rather keep it in the framework while Manik wants to have
> a simple setter on DistributedCallable:
>
> setEnvironment(Cache, K...)
>
> I think of it as an insurance policy in case we need to bootstrap
> DistributedCallable with more parameters rather than only Cache and
> input keys K.
>
> Lets hear your thoughts and comments.
I like Vladimir's suggestion better.
At least in MC this design proved useful,
specially for the reason mentioned - evolution.
Otoh, we have a ton of features in MC, probably just from this decision,
as it was super easy to add them, but no-one uses them. :-)
14 years
Distributed execution framework - API proposal(s)
by Vladimir Blagojevic
Hey,
I spent the last week working on concrete API proposals for distributed
execution framework. I believe that we are close to finalize the
proposal and your input and feedback is important now! Here are the main
ideas where I think we made progress since we last talked.
Access to multiple caches during task execution
While we have agreed to allow access to multiple caches during task
execution including this logic into task API complicates it greatly. The
compromise I found is to focus all API on to a one specific cache but
allow access to other caches through DistributedTaskContext API. The
focus on one specific cache and its input keys will allows us to
properly CH map task units across Infinispan cluster and will cover most
of the use cases. DistributedTaskContext can also easily be mapped to a
single cache. See DistributedTask and DistributedTaskContext for more
details.
DistributedTask and DistributedCallable
I found it useful to separate task characteristics in general and actual
work/computation details. Therefore the main task characteristics are
specified through DistributedTask API and details of actual task
computation are specified through DistributedCallable API.
DistributedTask specifies coarse task details, the failover policy, the
task splitting policy, cancellation policy and so on while in
DistributedCallable API implementers focus on actual details of a
computation/work unit.
I have updated the original document [1] to reflect API update. You can
see the actual proposal in git here [2] and I have also included the
variation of this approach [3] that separates map and reduce task phases
with separate interfaces and removes DistributedCallable interaface. I
have also kept Trustin's ideas in another proposal [4] since I would
like to include them as well if possible.
Regards,
Vladimir
[1] http://community.jboss.org/wiki/InfinispanDistributedExecutionFramework
[2] https://github.com/vblagoje/infinispan/tree/t_ispn-39_master_prop1
[3] https://github.com/vblagoje/infinispan/tree/t_ispn-39_master_prop2
[4] https://github.com/vblagoje/infinispan/tree/t_ispn-39_master_prop3
14 years
ISPN-872 - MIME type for pure binary forms?
by Galder Zamarreño
Hi all,
Re: https://issues.jboss.org/browse/ISPN-872
As stated in my last comment, it appears to me that Infinispan's REST server is not distinguishing between a byte array that comes from a Java client that serialized an object, and a pure byte[] that is not necessarily a serialized form of a Java object.
The test added by Michal might not make sense to a lot of you (http://goo.gl/8qfnw - testByteArrayStorage), but the underlying issue is still present.
Looking back at past tests, IntegrationTest.testSerializedObjects is not correct. The byte array passed is not "application/x-java-serialized-object". In fact, the server tries to deserialize it and fails, which leads to the byte[] being stored as is, but still under the "application/x-java-serialized-object" banner.
My question is, what should be the type to use for pure byte arrays? Should it be "application/octet-stream" ? Any other suggestions?
Cheers,
--
Galder Zamarreño
Sr. Software Engineer
Infinispan, JBoss Cache
14 years
Amanuensis: the infinispan clustered indexwriter
by Tristan Tarrant
Dear all,
just wanted to let you know that I have published Amanuensis on GitHub:
https://github.com/tristantarrant/amanuensis
As the subject says, Amanuensis is a clustered IndexWriter for Infinispan's
Lucene Directory which overcomes the usual limitation of having only one
writer open at any one time on a Directory by using JGroups muxed channels
to stream changes from the slaves to the coordinator.
I have to thank Sanne for help in learning about JGroups muxed channels and
also for pointing me at Hibernate Search's backend. My approach is nearly
identical, but I wanted something that was a bit closer to a Lucene
IndexWriter.
Tell me what you think.
Tristan
14 years
5.0.0.ALPHA2
by Galder Zamarreño
I plan to release an ALPHA2 sometime mid next week (Tue/Wed) primarily with the changes associated to https://issues.jboss.org/browse/ISPN-857.
Vladimir, I believe you might have an initial impl of the executors ready for then?
If anyone else has anything urgent to add to this release, let me know.
Cheers,
--
Galder Zamarreño
Sr. Software Engineer
Infinispan, JBoss Cache
14 years
Re: [infinispan-dev] XAResource.isSameRM
by Mircea Markus
On 5 Jan 2011, at 17:19, Jonathan Halliday wrote:
> On 01/05/2011 03:42 PM, Mircea Markus wrote:
>>
>> On 5 Jan 2011, at 14:51, Mircea Markus wrote:
>>
>>> FYI, a discussion I have with Jonathan around recovery support from TM
>>>
>>> On 5 Jan 2011, at 14:43, Jonathan Halliday wrote:
>>>> On 01/05/2011 02:18 PM, Mircea Markus wrote:
>>>>
>>>>> I don't know how the TM recovery process picks up the XAResource instance on which to call XAResource.recover, but I imagine it expects this method to return all the prepared(or heuristic completed) transactions from the _whole transaction branch_, i.e. from the entire cluster.
>>>>
>>>> all from the logical RM, which you happen to implement as a cluster, yes.
>>>>
>>>>> I'm asking this because right now there's no way for a node to know all the prepared transaction in the entire cluster. This is doable but would involve an broadcast to query the cluster, which might be costly (time and bandwidth).
>>>>
>>>> right. not to mention it should, strictly speaking, block or fail if any node is unreachable, which kinda sucks from an availability perspective.
>> So if a node does not respond to the broadcast, it is incorrect to return the prepared transactions received from the other nodes? (is this because the TM expects to receive some tx that it knows for sure to be prepared?) Or would a "best effort" be "good enough"? (e.g. I broadcast the query and return all the results received in 1 sec)
>
> hmm, interesting question.
>
> Keep in mind that the XA spec dates from a time when a typical large clustered RM was 2-3 oracle nodes on the same LAN segment. It simply isn't geared to a world where the number of nodes is so large and widely distributed that the probability of *all* of them being available simultaneously is pretty slim. Likewise the number of transaction managers connected to a resource was assumed to be small, often 1, rather than the large N we see on modern clusters / clouds. As a result, the spec either fails to give guidance on some issues because they weren't significant at the time it was written, or implies/mandates behaviour that is counter productive in modern environments.
>
> Thus IMO some compromises are necessary to make XA usable in the real world, especially at scale. To further complicate matters, these are split across RM and TM, with different vendors having different views on the subject. My advice is geared to the way JBossTS drives XA recovery - other TMs may behave differently and make greater or lesser assumptions about compliance with the letter of the spec. As a result you may find that making your RM work with multiple vendor's TMs requires a) configuration options and b) a lot of painful testing. Likewise JBossTS contains code paths and config options geared to dealing with bugs or non-compliant behaviour in various vendor's RMs.
>
> Now, on to the specific question: The list returned should, strictly speaking, be complete. There are two problems with that. First, you have to be able to reach all your cluster nodes to build a complete list which, as previously mentioned, is pretty unlikely in a sufficiently large cluster. Your practical strategies are thus as you say: either a) throw an XAException(XAER_RMFAIL) if any node is unreachable within a reasonable timeout and accept that this may mean an unnecessary delay in recovering the subset of tx that are known or b) return a partial list on a best effort basis. The latter approach allows the transaction manager to deal with at least some of the in-doubt tx, which may in turn mean releasing resources/locks in the RM. In general I'd favour that option as having higher practical value in terms of allowing the best possible level of service to be maintained in the face of ongoing failures.
>
> JBossTS will rescan every N minutes (2 by default) and thus you can simply include any newly discovered in-doubt tx as they become known due to e.g. partitioned nodes rejoining the cluster, and the TM will deal with them when they are first seen. Note however that some TMs assume that if they scan an RM and that RM does not subsequently crash, no new in-doubt transactions will occur except from heuristics. Let's gloss over how they can even detect a crash/recover of the RM if the driver masks it with failover or the event happens during a period when the TM makes no call on the driver. Such a TM will perform a recovery scan once at TM startup and not repeat. In such case you may have in-doubt tx from nodes unavailable at that crucial time subsequently sitting around for a prolonged period, tying up precious resources and potentially blocking subsequent updates. Most RM vendors provide some kind of management capability for admins to view and manually force completion of in-doubt tx. command line tool, jmx, web gui, whatever, just so long as it exists.
When a node crashes all the transactions that node owns (i.e. tx which were originated on that node and XAResource instance residing on that node) automatically rollback, so that no resources (locks mainly) are held. The only thing we need to make sure though is that the given transaction ids (the one that heuristically rollback) are returned by theXAResource.recover method - doable in the same way we handle prepares. I imagine that we'll have to keep these XIDs until XAResource.forget(XID) is called, am I right? Is it common/possible for people to use TM _without_ recovery? If so, this "held heuristic completed TX" functionality should be configurable (enabled/disabled) in order to avoid memory leaks (no recovery means .forget never gets called)
> Another interesting issue is what constitutes an 'in-doubt' tx. Pretty much all RMs will include heuristically completed tx in the recovery list. Some will include tx branches that have prepared but not yet committed or rolled back. Some will include such only if they have been in the prepared state for greater than some threshold length of time (a few seconds i.e. a couple of order of magnitude longer than a tx would normally be expected to hold that state). There is also the question of when a tx should be removed from the list. The wording of the spec
>
> 'Two consecutive invocation of [recover] that starts from the beginning of the list must return the same list
> of transaction branches unless one of the following takes place:
> - the transaction manager invokes the commit, forget, prepare, or rollback method for that resource
> manager, between the two consecutive invocation of the recovery scan
> ...'
>
> seems to imply a single transaction manager.
doesn't this also imply that the prepare-treshold isn't the spec's way? I.e. even though TM doesn't call any method on the RM , the RM returns a new XID in the result of XAResource.recover when the threshold is reached.
> In cases where more than one TM is connected to the RM, the list clearly can't be considered stable between calls, as that would require prolonged blocking. Thus a reasonable TM should not expect a stable list. However, it's less clear how it should react to items that appear and disappear arbitrarily over consecutive calls to recover.
>
> In the case of JBossTS, it assumes the RM may include tx branches that are actually proceeding normally in the results of a recovery scan. It thus caches the list, sleeps for an interval considered long enough for any normal prepared tx to complete, then rescans and compares the results. Any tx appearing in both scans is considered genuinely in need of recovery. If an RM includes in its recover results a normally proceeding tx branch and the TM does not perform such a backoff, it may, in a clustered environment, rollback a tx branch that another TM will try to commit a split second later, thus resulting in an unnecessary heuristic outcome. The worst scenario I've ever seen was a certain large DB vendor who considered tx to be in-doubt not just from the time they were prepared, but from the moment they were started. ouch.
>
> Naturally most well behaved TMs will have some sense of tx branch ownership and not recover unrecognised tx, but it's one of those delightful gray areas where you may benefit from being a little paranoid.
So each TM would only care to recover the transactions it manages? Sort of makes sense.
> Which, as it happens, gives you a perfect excuse to return a partial list of tx - in terms of the way TM recovery works there is nothing to distinguish a tx that's missing due to node outage from one that is missing due to not yet having reached the in-doubt time threshold.
>
> Of course it does not end with the recovery scan. Let's take the case where a specific tx is identified as in-doubt and returned from the recover call. The TM may then call e.g. commit on it. Processing that commit may involve the driver/server talking to multiple cluster nodes, some of which may not respond. Indeed this is the case during a normal commit too, not just one resulting from a recovery scan. You need to think very carefully about what the result of a failed call should be. A lot of bugs we've seen result from resource managers using incorrect XAException codes or transaction managers misinterpreting them. Be aware of the semantics of specific error codes and be as concise as possible when throwing errors.
Good to know! Something to discuss/review on our next meeting.
>
>> I would expect a single (periodic) XAResource.recover call per cluster(assuming the cluster is the RM) / transaction manager. Am I wrong?
>
> Only in so far as you are assuming one logical TM per cluster, with cluster here meaning JBossAS cluster as distinct from a possible separate infinispan RM cluster.
>
> In practice although a group of JBossAS servers may be clustering e.g. web sessions or EJBs, they don't cluster TMs. Each JBossAS instance is a logically separate and independently functioning TM and each is responsible for running its own recovery. Note that there ways of clustering the JTS version of the TM, but we don't advertise or support them and as a result to the best of my knowledge no end user actually runs with such configuration. Likewise for running a single out of process recovery subsystem which is shared by multiple TMs/JBossAS nodes.
>
>>>> you mean a logically separate resource manager, yes. You are basically talking about not doing interposition in the driver/server but rather relying on the transaction manager to handle multiple resources. It may make your implementation simpler but probably less performant on the critical path (transaction commit) vs. recovery.
>> I don't know exactly how many RPC's happen in this approach, with TM handling multiple resources. I imagine the TM would do a XAResource.prepare for each of the nodes involved. In this XAResource.prepare call I would have to implement the logic of going remotely to each involved node. Then the same for XAResource.commit. Is that so? (If so then this is pretty much what we already do in ISPN when it comes to commit/rollback).
>> On of the advantages of allowing TM to handle each individual node is that we can benefit from some nice TM features like read-only optimisation or 1PC for single participants (these are to be implemented anyway in ISPN).
>
> Although the RPC topology can matter, particularly where WAN hops are involved, the critical difference is actually where the logging responsibility lies.
>
> If the TM has more than one XAResource enlisted, it has to write to disk between the prepare and commit phases. So where it has got e.g. two infinispan nodes as separate XAResources, the tx is bottlenecked on disk I/O even though the RM is entirely in (network distributed) RAM.
Right! I was looking in the wrong place.
>
> Where the clustered RM presents itself as a single resource, the TM won't necessarily log, due to automatic one phase commit optimisation.
>
> However...
>
> In that case your driver/server has to guarantee correct behaviour in the event of node failure during the commit call. In other words, it has to do its own persistence if the state updates span more than one cluster node. In the case of infinispan that's likely to be via. additional RPCs for replication rather than by a disk write as in most database cluster RMs.
>
> In short, you're trading disk I/O in the TM against additional network I/O in the clustered RM. All else being equal I think having the RM do the work will perform better at scale, but that's just a guess.
+1
> You more or less need the code for that either way, as even where a single infinispan node is a logically separate RM, it still needs to talk to its cluster mates (or the disk I guess) for persistence or it can't guarantee durability of the tx.
Thanks Jonathan!
>
> Jonathan.
>
> --
> ------------------------------------------------------------
> Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom.
> Registered in UK and Wales under Company Registration No. 3798903 Directors: Michael Cunningham (USA), Charlie Peters (USA), Matt Parsons (USA) and Brendan Lane (Ireland)
14 years
Re: [infinispan-dev] Can we replace Hypersonic with...
by Manik Surtani
Ah right, I get your point. So there are 2 cases:
1) When the failing resource manager is *not* an Infinispan node: returning a list of prepared or heuristically committed Xids is trivial for Infinispan since (a) we maintain an internal list of prepared txs, and (b) we don't heuristically commit. (Mircea can confirm this)
2) When the failing resource manager is an Infinispan node (failed and restarted, for example), and the TM calls recover on this node. In this case, recover will return a null, which, correctly, will lead the TM to believe that there are no prepared or heuristically committed txs on this node - which is correct, since the node would have been reset to the last-known stable state prior to the failure.
So what we have right now is the ability to deal with case (2). Case (1) should be implemented as well to be a "good participant" in a distributed transaction.
Adding infinispan-dev in cc, as this would be of interest there.
Cheers
Manik
On 16 Dec 2010, at 10:36, Mark Little wrote:
> So XAResource.recover and the XA protocol have well defined semantics for recovery. If you do a no-op and there are resources that need recovering, the transaction may still not be entirely ACID even though the transaction manager believes it to be the case. Unless you do the recovery for the nodes in the recover call and return a null list of Xids, your XAResource implementation is breaking the XA protocol.
>
> Mark.
>
>
> On 16 Dec 2010, at 10:31, Manik Surtani wrote:
>
>>
>> On 16 Dec 2010, at 10:23, Mark Little wrote:
>>
>>> So what happens when the recover method is invoked on the Infinispan XAResource implementation? I'm assuming it obeys the protocol if you're saying "we do support XA" ;-)
>>
>> Well, this is what I mean by "we don't support recover" right now. At the moment recover() is a no-op and we just log it, expecting manual intervention (node restart), but this should be automated (wipe in-memory state and rejoin the cluster).
>>
>>
>>>
>>> Mark.
>>>
>>>
>>> On 16 Dec 2010, at 10:18, Manik Surtani wrote:
>>>
>>>> Well, it hinges on how we implement recover. Recovery for Infinispan is, simply, restarting the node at fault and allow it to regain state from a neighbour. As opposed to more "traditional" impls of XA recovery, involving maintaining a tx log (fsync'd to disk). One may say then that we do support recovery, only that the tx log is maintained "in the cluster".
>>>>
>>>> On 16 Dec 2010, at 09:23, Mark Little wrote:
>>>>
>>>>> So we support a bit of XA then, i.e., not the recover operation?
>>>>>
>>>>> Mark.
>>>>>
>>>>>
>>>>> On 15 Dec 2010, at 17:29, Manik Surtani wrote:
>>>>>
>>>>>>
>>>>>> On 15 Dec 2010, at 17:24, Bill Burke wrote:
>>>>>>
>>>>>>>>
>>>>>>>> eh - you would have the same problems with Infinispan as with Hypersonic explaining users that if you want ACID database access you need
>>>>>>>> to use a real database and not a glorified hashmap ;)
>>>>>>>>
>>>>>>>
>>>>>>> sounds like a good feature request, to support XA/recovery. If you're gonna use Infinispan for a data grid, prolly a lot of people gonna want this.
>>>>>>
>>>>>> We do support XA. Not recovery though - since it is a p2p grid. ("Recovering" would simply involve the node wiping in-memory state, and re-joining the cluster since non-corrupted copies of its data exists elsewhere in the cluster).
>>>>>>
>>>>>> Cheers
>>>>>> Manik
>>>>>>
>>>>>> --
>>>>>> Manik Surtani
>>>>>> manik(a)jboss.org
>>>>>> twitter.com/maniksurtani
>>>>>>
>>>>>> Lead, Infinispan
>>>>>> http://www.infinispan.org
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> ---
>>>>> Mark Little
>>>>> mlittle(a)redhat.com
>>>>>
>>>>> JBoss, by Red Hat
>>>>> Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom.
>>>>> Registered in UK and Wales under Company Registration No. 3798903 Directors: Michael Cunningham (USA), Charlie Peters (USA), Matt Parsons (USA) and Brendan Lane (Ireland).
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>> --
>>>> Manik Surtani
>>>> manik(a)jboss.org
>>>> twitter.com/maniksurtani
>>>>
>>>> Lead, Infinispan
>>>> http://www.infinispan.org
>>>>
>>>>
>>>>
>>>>
>>>
>>> ---
>>> Mark Little
>>> mlittle(a)redhat.com
>>>
>>> JBoss, by Red Hat
>>> Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom.
>>> Registered in UK and Wales under Company Registration No. 3798903 Directors: Michael Cunningham (USA), Charlie Peters (USA), Matt Parsons (USA) and Brendan Lane (Ireland).
>>>
>>>
>>>
>>>
>>
>> --
>> Manik Surtani
>> manik(a)jboss.org
>> twitter.com/maniksurtani
>>
>> Lead, Infinispan
>> http://www.infinispan.org
>>
>>
>>
>
> ---
> Mark Little
> mlittle(a)redhat.com
>
> JBoss, by Red Hat
> Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom.
> Registered in UK and Wales under Company Registration No. 3798903 Directors: Michael Cunningham (USA), Charlie Peters (USA), Matt Parsons (USA) and Brendan Lane (Ireland).
>
>
>
>
--
Manik Surtani
manik(a)jboss.org
twitter.com/maniksurtani
Lead, Infinispan
http://www.infinispan.org
14 years
Feedback from Mobicents Cluster Framework on top of Infinispan 5.0 Alpha1
by Eduardo Martins
Hi all, just completed first iteration of Mobicents Cluster Framework
2.x, which includes impl using first alpha release of Infinispan 5. We
have everything we had in jboss cache working, it's a dumb down but
higher level framework, with stuff like Fault Tolerant timers, which
is then reuse in whole Mobicents platform to provide cluster related
features. I believe we now already use a lot of stuff from Infinispan
rich feature set, so I guess it's good timing to give some feedback,
report some minor issues, and clear some doubts.
1) Marshallers
Generally I'm "just OK" with the current API, the fact it
Externalizers depends on Annotations makes impossible for a higher
level framework to allow it's clients to plug their own kind of
Externalizers, without exposing Infinispan classes. In concrete, our
framework exposes its own Externalizer model, but to use them we need
to wrap the instances of classes these handle, in classes bound to
Infinispan Externalizers, which when invoked by Infinispan then lookup
our "own" Externalizers, that is, 2 Externalizers lookups per call. If
there was no annotations required we could wrap instead our clients
Externalizers and plug them into Infinispan, this would mean a single
Externalizer lookup. By the way, since the Externalizer now uses
generics the Annotations values look bad, specially since it's
possible to introduce errors that can compile.
Another issue, in this case I consider it of minor importance due to
low level of Externalizers concept, is the lack of inheritance of
Externalizers, for instance we extend DefaultConsistentHash, instead
of extending its Externalizer I had to make an Externalizer from
scratch, and copy of DefaultConsistenHash's Externalizer code. This is
messy to manage, if the code on Infinispan change we will always have
to check if the code we copied is still valid.
2) CacheContainer vs CacheManager
The API is now deprecating the CacheManager, replacing with
CacheContainer, but some functionality is missing, for instance
CacheContainer is not a Listenable, thus no option to register
Listeners, unless unsafe casting of the instance exposed by the Cache
or a ref is available to the original CacheManager. Related matter,
IMHO a Cache listener should be able to listen to CacheManager events,
becoming global listeners.
3) Configuration stuff
Generally I think the Configuration and GlobalConfiguration could be
simplified a bit, I found myself several times looking at the impl
code to understand how to achieve some configurations. Better
documentation wouldn't hurt too, it's great to have a complete
reference, but the configuration samples are not ok, one is basically
empty, the other has all possible stuff, very unrealistic, would be
better to have reusable examples for each mode, with then
recommendations on how to improve these.
Infinispan 5 introduces a new global configuration setter to provide
an instance, with same method name as the one to provide the class
name. I believe one is enough, and to be more friendly with
Microcontainer and similar frameworks I would choose the one to set
the instance.
4) AtomicMap doubt
I read in the Infinispan blog that AtomicMap provides colocation of
all entries, is that idea outdated? If not we may need a way to turn
that off :) For instance would not that mean the Tree API does not
works well with distribution mode? I apologize in advance if I'm
missing something, but if AtomicMap defines colocation, AtomicMap is
good for the node's data map, but not for the node's childs fqns.
Shouldn't each child fqn be freely distributed, being colocated
instead with the related node cache entry and data (atomic)map? Our
impl is kind of an "hybrid" of the Tree API, allows cache entries
references (similar to childs) but no data map, and the storage of
references through AtomicMap in same way as Tree API worries me.
Please clarify.
5) Minor issues found
See these a lot, forgotten info logging?
03:39:06,603 INFO [TransactionLoggerImpl] Starting transaction logging
03:39:06,623 INFO [TransactionLoggerImpl] Stopping transaction logging
MBean registration tries twice the same MBean, the second time fails
and prints log (no harm besides that, the process continues without
failures):
03:39:06,395 INFO [ComponentsJmxRegistration] Could not register
object with name:
org.infinispan:type=Cache,name="___defaultcache(dist_sync)",manager="DefaultCacheManager",component=Cache
6) Final thoughts
Kind of feel bad to provide all these negative stuff in a single mail,
took me an hour to write it, but don't get me wrong, I really enjoy
Infinispan, it's a great improvement. I'm really excited to have it
plugged in AS7 (any plan on this?) and then migrate our platform to
this new cluster framework. I expect a big performance improvement, on
something already pretty fast, and much less memory usage, our
Infinispan impl "feels" very fine grained and optimized in all points
of view. Of course, the distribution mode is the cherry on top of the
cake, hello true scalability.
I hope to find time to contribute back more, and in better ways, like
concrete enhancements or issues with test cases as happened with jboss
cache, but right now that's the best I could. By the way, I'm using
the nick mart1ns in the infinispan freenode irc channel, feel free to
ping me there.
Regards,
-- Eduardo
..............................................
http://emmartins.blogspot.com
http://redhat.com/solutions/telco
14 years
Re: [infinispan-dev] XAResource.isSameRM
by Mircea Markus
FYI, a discussion I have with Jonathan around recovery support from TM
On 5 Jan 2011, at 14:43, Jonathan Halliday wrote:
> On 01/05/2011 02:18 PM, Mircea Markus wrote:
>
>> I don't know how the TM recovery process picks up the XAResource instance on which to call XAResource.recover, but I imagine it expects this method to return all the prepared(or heuristic completed) transactions from the _whole transaction branch_, i.e. from the entire cluster.
>
> all from the logical RM, which you happen to implement as a cluster, yes.
>
>> I'm asking this because right now there's no way for a node to know all the prepared transaction in the entire cluster. This is doable but would involve an broadcast to query the cluster, which might be costly (time and bandwidth).
>
> right. not to mention it should, strictly speaking, block or fail if any node is unreachable, which kinda sucks from an availability perspective.
>
> Keep in mind the global list of known in-doubt tx is a point in time snapshot anyhow - in an active system it's out of date as soon as another running tx is prepared. So, you can serve one that's slightly stale without too much risk. Not that periodic broadcast of the in-doubt list between nodes is necessarily better than doing it on-demand in response to a recovery call, but at least it's O(1) rather than O(number of clients calling recover). The (mild) problem we've seen in the past is where a large cluster of app server nodes i.e. tx managers, is started more or less simultaniously, the RMs get a storm of recovery requests every two minutes. If the impl of that is expensive and not cached it can cause performance spikes in the RM.
>
> That said, keep in mind a recovery pass is (in the default config) only every two minutes and run on a background thread. It's not something you want to worry about performance optimization of in the initial implementation. Do it right rather than fast. Optimize it only when users scream.
>
>> On the other hand I imagine this call is performed asynchronously and doesn't impact TM performance in managing ongoing transactions, so it might not be that bad after all.
>
> correct
>
>> Another approach would be to to consider each node as an transaction branch.
>
> you mean a logically separate resource manager, yes. You are basically talking about not doing interposition in the driver/server but rather relying on the transaction manager to handle multiple resources. It may make your implementation simpler but probably less performant on the critical path (transaction commit) vs. recovery.
>
>
>> The advantage here is that recovery can be easily implemented, as the TM recovery would ask all the nodes that were registered for prepared transactions
>
> 'registered'? you plan of having the config on each transaction manager contain a list of all cluster nodes? That's not admin friendly.
>
>> , and no cluster broadcast would be required when XAResource.recover is called.
>
> the broadcast is automatic. Maintaining the list of known nodes in a config file is not. No contest.
>
>> Considering this approach, do you see any drawbacks compared with the other one?
>> E.g. each node being a branch might involve multiple RPC between remote TM and XAResource on each node (v.s. one in prev example).
>
> yeah, as mentioned above its a non-interposed model rather than one where your driver/server is doing interposition. Favour the one that makes the commit path fast, even though it makes recovery an utter pain.
>
> Jonathan.
>
> -
> ------------------------------------------------------------
> Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom.
> Registered in UK and Wales under Company Registration No. 3798903 Directors: Michael Cunningham (USA), Charlie Peters (USA), Matt Parsons (USA) and Brendan Lane (Ireland)
14 years
TreeCache needs Flag(s) to be maintained for the duration of a batch/tx
by Galder Zamarreño
Hi,
Re: https://issues.jboss.org/browse/ISPN-841
The issue here is the fact that if you call a TreeCache operation passing flags, you want this flags to apply to all cache operations encompassing the tree cache op. Now, the thing to remember about flags is that they get cleared after each cache invocation, so we must somehow pass flags around to all methods that operate on the cache as a result of a treecache.put for example.
A rudimentary way to do so would be to pass Flag... to all methods involved which is not pretty and hard to maintain. An alternative would be to have some flags thread local that gets populated on start of tree cache operation and gets cleared in the end of the operation. Although this might work, isn't this very similar to what CacheDelegate does to maintain flags except that instead of keeping them for a cache invocation, it would keep them hanging around until the end of the operation? TreeCache operations are bounded by start/stop atomic calls that are essentially calls to start/stop batches. So, it seems to me that what this is asking for is for a wider functionality to keep flags for the duration of a transaction/batch, which would most likely be solved better in core/
Thoughts?
--
Galder Zamarreño
Sr. Software Engineer
Infinispan, JBoss Cache
14 years