A very interesting thread in the JSR-107 group, which appears just as Mircea has looked
into the XA transactions and cache loaders/stores. Going back to that thread, it
wasn't very clear what would happen if Infinispan caches were configured with XA
transactions and they had a cache store. What's should a user expect in that case?
IOW, how does our approach here compare to what's being suggested in the thread below?
My feeling is that we're doing a variant of Option 3, where each cache store will run
its own transaction (if they support it...)
@Manik, It's also interesting from a data grid perspective since it highlights the
boundaries of a cache vs data grid in this area.
Cheers,
Begin forwarded message:
From: Brian Oliver <brian.oliver(a)oracle.com>
Subject: Re: Transaction Semantics when using CacheLoaders and CacheWriters
Date: August 1, 2013 5:55:14 PM GMT+02:00
To: jsr107(a)googlegroups.com
Reply-To: jsr107(a)googlegroups.com
Thanks for your feedback. It's much appreciated.
Interestingly Oracle Coherence mostly takes much the same approach. Transactional (XA)
Multi-Version-Concurrency-Control Caches don't allow Cache Loaders or Cache Writers
(or Expiry) aka: a stronger form of Option 2.
Personally I don't really classify these Caches as Caches (as eviction and expiry
isn't supported). In essence they are really a transactional map, but leverage the
Coherence NamedCache interface. Ultimately it's pure "Data Grid"
functionality.
While I think developers may like to think Option 1 is possible, when anyone explains the
"cost" of this, they reluctantly decide to use Option 2, or move to using Entry
Processors - which provides the atomicity for the most part.
Historically Coherence also supported a form of Option 3 - but that also presents some
challenges.
I'm trying hard to find an answer to these challenges, but the way forward is
unclear. What I can tell from our discussions here, in this group and at conferences,
those that have shown interest in "transactionality" of Caches aren't really
wanting Caches. They want an "fast in-memory" data-stores, perhaps like a map
or nosql, to transact against, because they don't want to transact against a database.
Why? They are seen as bottleneck or they are seen as being to "slow" and are
trying to solve the architectural problem of the layer below their application tier. They
like to call these "Caches", because they are "in-memory", but
technically they aren't Caches. When you get down to it, ultimately the features and
semantics being requested aren't really caches. So perhaps this is where the Data
Grid specification can come into play?
With my "standardization hat" on, my biggest concern is that anytime a
developer needs to change their application, say between vendors, especially to adopt
transactions that are "implementation specific", it leads me to believe
there's something wrong with the specification. Personally I think we should be making
it "easier" to adopt not harder.
On Thursday, August 1, 2013 10:55:21 AM UTC-4, Brian Martin wrote:
Brian,
I think you are spot-on with the problem and this is why we don't currently (in
WebSphere eXtreme Scale) allows Loaders to be part of a distributed transaction that cross
containers [your option 2]. If the transaction is to a single container, then we allow
the local transaction (a believe this is equivalent to a variation of your option 3).
As your dialog indicates, the scenario is messy and I don't like the state we are in
currently with different capabilities depending on how many containers are enlisted in
your transaction. At the moment, I don't have a better suggestion but I think your
concern is valid and we should hash at a solution the community agrees with.
Brian Martin
IBM
WebSphere eXtreme Scale
On Thu, Aug 1, 2013 at 9:55 AM, Brian Oliver <brian....(a)oracle.com> wrote:
Hi All,
I'd like to propose the challenge of how we think vendors should deal with
transactions in the context of Caches with CacheLoaders/Writers configured, especially in
the context of a distributed Cache. While this is an "implementation concern",
it's very important to see how this may be implemented as it very much effects the API
design.
As part of reviewing the specification with the Java EE team, and in particular how
multiple-servers will interact, we've found a few challenges. In the spirit of
openness, I've added some commentary to the following issue:
https://github.com/jsr107/jsr107spec/issues/153
Currently I feel that the way the API is defined, all CacheLoader and CacheWriter
operations will need to be performed "locally" which fundamentally prevents
efficient (or any) implementation in a highly concurrent and distributed manner.
Furthermore, interaction across multiple application processes, Java SE or otherwise may
be a problem, simply because the API doesn't provide enough fidelity for CacheLoader
and CacheWriter operations to be part of a larger transaction. eg: there's no
"prepare" and "commit" for CacheWriters! Just "store".
Even with a few changes, as I've suggested in the issue above, I honestly feel
we're essentially forcing vendors to implement fully recoverable XA Transaction
Managers as part of their Caching infrastructure, simply to coordinate transactions across
the underlying Cache Writers in a distributed setting. Why? because the API basically
implies this coordination would need to be performed by the Cache implementation itself -
even in "local" mode!
eg: Say a developer starts a transaction that updated n entries, those of which are
partitioned across n servers. As part of the "commit", all n servers will need
to take care of committing, say to memory. Behind this are the Cache Writers, which also
need to be coordinated. The entries need to be stored as part of the Caching contract.
Unfortunately our current API provides no mechanism to coordinately this, eg: share a
global transaction to a single database across said the n Cache Writers. Without this what
essentially happens at the moment is that each CacheWriter starts their own individual
transaction, not attached to or part of the application transaction. That may seem
reasonable to some, but consider the case where there is a parent-child or some other
relationship between the cache entries that are being updated (which is why your using a
transaction in the first place). If individual transactions are used by the Cache Writers
and are committed in some non-deterministic order (as there is no ordering constraints or
ways to control this in the API) database integrity constraints are likely to be violated.
So while the "commit" to the Cache may seem to be atomic, the "stores"
to the underlying Cache Writers aren't.
Essentially there are a few options (as I've covered in the issue).
1. Allow a global transaction to be provided to all of the Cache Writers. Wow... that
would be pretty crazy and horribly slow. Every server would need to contact the
transaction manager, do a bunch of work, etc, just to set things up.
This sort of contradicts the entire reason people would be using a cache in the first
place. To even achieve this I think we'd need to change the CacheLoader/Writer API.
Specifically we'd need to add "prepare", "commit" and
"rollback".
2. Don't allow CacheLoaders/Writers to be configured with Caches. I think this is
pretty easy to do, but again, wow... that would force developers to change their
application code significantly to use Transactional Caches with external stores.
3. Only allow "local" transactions to be performed. This would ultimately
mean that Caches would be the last-local-resource in XA transactions (not too bad, though
it's a challenge if there are others as well). Additionally in the distributed case,
while entries may be distributed, the loading / writing would always occur locally. This
works, but significantly reduces scalability as all "versioning" of data being
touched may need to be held locally. It's highly likely a huge amount of distributed
locks would be required (if the Cache isn't using MVVC), which we know is horribly
slow. eg: imagine a transaction with a "putAll" containing a few million
entries. In pessimistic mode, an implementation may need to do a lot of work locally to
ensure versioning is held and updated correctly. It may also need to perform a few
million locks! Saying that a developer shouldn't use "putAll" with
transactions probably isn't a solution either.
Personally I'm not sure if any of this is desirable? I haven't really seen much
of this discussed or addressed. Perhaps I'm missing something? I'd certainly
be happy to do some further research!
The bottom line is that while we're trying to define an API that provides developers
with a means to improve the performance, through-put and scalability of an application
through the temporary storage of data, the requirements to implement transactions, even
optionally, may throw much of the benefit away.
It would be great to get your thoughts on this. I don't think we can get away with
the statement "transactions are implementation specific" in the specification,
especially if the API doesn't provide enough fidelity to cover these simple
use-cases.
-- Brian
--
You received this message because you are subscribed to the Google Groups
"jsr107" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
jsr107+un...(a)googlegroups.com.
For more options, visit
https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to the Google Groups
"jsr107" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
jsr107+unsubscribe(a)googlegroups.com.
For more options, visit
https://groups.google.com/groups/opt_out.