August 2013 - infinispan-dev - Jboss List Archives

cache loader: purgerThreads and purgeSynchronously

by Mircea Markus

Hi, I'm not sure these config attributes are needed. - *purgeThreads* configures the number of threads that run the storage purging (removal of expired entries from the storage). The more threads the faster the purging processes. Is there a reason for putting effort in making the purging fast (and parallel) though? The cache store implementations check if an entry is expired before returning it anyway. Actually I'll move the code in CacheLoader interceptor to make sure this happens for all stores. - *purgeSynchronously*. I think the reason for this parameter is that the purging is invoked by the eviction thread. If the purging takes long then eviction is delayed. This is false by default (I doubt users change this btw) so there's a different thread than the eviction thread that runs the purging. I'd rather remove the config option and always run the purging in its own thread. Opinions? Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org)

10 years, 8 months

4
6
0 / 0

Re: [infinispan-dev] https://issues.jboss.org/browse/ISPN-2719

by Mircea Markus

On 13 Aug 2013, at 00:55, Madhu Venugopal (vmadhu) <vmadhu(a)cisco.com> wrote: > Hi Mircea, > > I asked this question in the IRC #infinispan. > > Madhu:Recently I faced an issue wherein, putIfAbsent is not atomic for a NON_TRANSCTIONAL cache (Infinispan 5.2.3.final) What is your use case precisely? Does it happen during rehashing? ISPN-3366 and ISPN-3357 should fix that. > [4:37pm]Madhu:Advice was to use TRANSACTIONAL Cache. > [4:38pm]Madhu:Hence we replaced the DummyTransactionManager to JBossTransactionManager and used TRASCTIONAL cache. > [4:39pm]Madhu:With this change, i see that putIfAbsent is behaving better. But, I am consistently running into an exception : > [4:39pm]Madhu:org.infinispan.CacheException: Remote transaction for global transaction (RecoveryAwareGlobalTransaction{xid=< 131077, 29, 36, 0000000000-1-1-64-88-1100-573382911275000649, 0000000000-1-1-64-88-1100-573382911275000700000000 >, internalId=281483566645250} GlobalTransaction:<Madhu-Mac-19065>:2:remote) not found > [4:39pm]Madhu:after this exception, the cache goes completely out of sync. > [4:39pm]Madhu:Can Anyone help ? > [4:43pm]Madhu:seems similar to https://issues.jboss.org/browse/ISPN-2719 > > I see that you are working on it. Can you please let me know if you want me to try out any patch. I will be glad to :-) > > Thanks, > Madhu Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org)

10 years, 8 months

2
2
0 / 0

loaders/shared

by Mircea Markus

Currently the "shared" attribute is configured at *loaders* level, so all the defined cache loaders inherit this attribute. It should be configured on a per loader basis. Does anyone se any problem with that? Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org)

10 years, 8 months

4
5
0 / 0

How to get key ID passivated into DB

by Vitalii Chepeliuk

Hi all! I have a question about entry passivation into DB, concretly this class LockSupportCacheStore and method store @Override public final void store(InternalCacheEntry ed) throws CacheLoaderException { if (trace) { log.tracef("store(%s)", ed); } if (ed == null) { return; } if (ed.canExpire() && ed.isExpired(timeService.wallClockTime())) { if (containsKey(ed.getKey())) { if (trace) { log.tracef("Entry %s is expired! Removing!", ed); } remove(ed.getKey()); } else { if (trace) { log.tracef("Entry %s is expired! Not doing anything.", ed); } } return; } L keyHashCode = getLockFromKey(ed.getKey()); <<< here key is generated like ed.getKey().hashCode() & 0xfffffc00; lockForWriting(keyHashCode); try { storeLockSafe(ed, keyHashCode); <<< here it should be stored into Bucket and then stored in DB } finally { unlock(keyHashCode); } if (trace) { log.tracef("exit store(%s)", ed); } } When I use RemoteCacheManager and RemoteCache I am putting entries into cache cache.put("key1", "v1"); cache.put("key2", "v2"); cache.put("key3", "v3"); Then 2 entries are passivated and stored in DB ID DATA VERSION 183713792 0301fe032a01034c422b21033e286d7942657374506572736f6e616c4b657957686963684861734e657665724265656e426574746572420521033e02763203620003630000000000000002 -1 23486464 0301fe032a01034c420721033e046b657931420521033e02763103620003630000000000000001 -1 IDs are generated from method above and byte[] keyBytes = marshaller.objectToByteBuffer(key, 64); <<< data are marshalled long keyID = ByteArrayEquivalence.INSTANCE.hashCode(keyBytes) & 0xfffffc00 //computation taken from BucketBasedCacheStore << this does not work for me And next step I'd like to retrieve data from DB SELECT ID, DATA FROM JDBC_BINARY_DEFAULT WHERE ID=keyID But in method @Override public Integer getLockFromKey(Object key) { return key.hashCode() & 0xfffffc00; <<< here should be used Arrays.hashCode((byte[])key) & 0xfffffc00), if key is represented as byte array, or Used ByteArrayEquivalence instead of simple byte array(byte[]) as argument }

10 years, 8 months

3
3
0 / 0

Re: [infinispan-dev] the new CacheLoader API

by Manik Surtani

On 9 Aug 2013, at 16:29, Mircea Markus <mmarkus(a)redhat.com> wrote: >> >> My guess is Mircea was going for an overloaded method for bulkLoadKeys where we want both one that takes a Collection and one that takes a KeyFilter? If so it seems to me it would be simpler to just have a single method that takes KeyFilter only, but then have another class like KeyFilters that has various static factory methods that can take a Iterable or Iterator for example so we don't have too many methods on the loader itself. > > I think we'll end up needing both actually, as bulkLoadAll with a collection still makes sense. E.g. for JDBC queries it's easier to build a WHERE clause and select all the elements in one go. > > Sanne ha a good alternative suggestion to the the bulkLoadAll: > > public process(KeyFilter, j.u.c.Executor, CacheLoaderTask clt); > > and > > interface CacheLoaderTask { > //return false if don't need to process any longer > boolean process (CacheLoaderEntry cle); > } > > interface CacheLoaderEntry { > Object getKey(); > ICV getInternalCacheValue(); > //.. ongoing discussion about some other byte[] based methods > } > > > This would allow the CacheStore to iterate over the entries in parallel, whilst still allowing sequential iteration. Pretty awesome. All sounds good, but may I suggest the following API (slight changes in naming, reasons in comments): https://gist.github.com/maniksurtani/97c62352347e61d60768#file-cacheloade... -- Manik Surtani

10 years, 9 months

2
3
0 / 0

Re: [infinispan-dev] Design session today

by Manik Surtani

We should actually move all of this to infinispan-dev - cc'ing infinispan-dev on my response. On 9 Aug 2013, at 11:19, Mircea Markus <mmarkus(a)redhat.com> wrote: > Hi, > > I've been giving some thought last evening and here are some second-day thoughts: > > 1. parallel processing is a great idea and I think its really something that would make a difference compared to our competition +1. We should consider the JDK 8 collections APIs as a reference, as I mentioned. > > 2. using a two interfaces(CacheLoader,CacheWriter) over one. I'm still not totally bought by the idea > Pros: cleaner design (interface segregation principle[1]) which would allow users to only implement what they need > Cons: the difference between cache loader and cache store (or writer) has been a source of confusion through the users, as most of the users[2] only use the combined version > I'll continue the discussion on the public list Also, JSR 107 (and from that, most other data grids) will also follow a separate CacheLoader/CacheWriter. I think people will get used to the separation of interfaces. > > 3. allowing the cache loader to expose unserialised data directly (ValueHolder.getBytes[]). I used the name ValueHolder but this is a really poor term - how about ContentsProxy? It is a proxy for the contents of the entry, exposing methods: interface ContentsProxy { ByteBuffer getValueBuffer(); ByteBuffer getInternalCacheValueBuffer(); InternalCacheValue getInternalCacheValue(); // Same as above except this method only deserializes timestamps and metadata. Not the actual value. InternalCacheValue getSparseInternalCacheValue(); } > The use cases we had for this are: > a) streaming data during rolling upgrades. This works for scenarios where the data format (user classes) haven't changed and the data is written directly to a persistent store in the destination cluster > b) backups. This can be a generic and efficient (no serialisation) way of creating a backup tool. There are two more: c) Pre-populating a cache store from an external resource. d) Exposing the underlying byte buffers directly for placement into, say, a native data container or directly onto the network stack for transmission (once JGroups has moved to JDK 7). > I haven't thought a) entirely, but seems to me that only applies in to a rather specific rolling upgrade scenario. > Re: b) there might be some mere efficient ways of backing up data: take a database dump(jdbc cache store), copy the files (file cache store) etc. Also I'm not sure that the speed with which you take the dump is critical - i.e. even if you serialise/deserialize data might just work. It's not just the performance hit we take on serialisation/de-serialisation, but also the additional CPU load we place on the system which should be running, performing transactions! > Also in order to solve a) and b) I don't think ValueHolder.getBytes[] is the way to go. E.g. for the bucket cache stores use as read(and serialisation) unit an entire bucket so forcing them to returns the byte on an per entry basis would mean: > - read the bucket as byte[] > - deserialize the bucket structure > - iterate over entries in the bucket and serialise them again in order to satisfy ValueHolder.getBytes[] That's just the way buckets are currently designed. If, for example, each bucket has a header with a structure that looks like: [key1][position1][key2][position2][end-of-keys marker][value1][value2], then just by reading the header part of the bucket, we can grab chunks based on the position information for the values without deserializing them. Of course, this is an "efficient" implementation. A naive one could do what you said above and still comply with the contract. > A better approach for this is to have toStream and fromStream methods similar to what we currently have CacheStore, so that the whole marshalling/unmarshalling business is delegated to the CacheStore itself. Also now that we're here, the CacheStore.toStream/fromStream API was added with the intention of solving the same problem some 4 years ago and are not used at all at this stage, though implemented by all the existing store. Yes, but I think we can do better than the toStream/fromStream API. > Bottom line for 3: I think this is a case in which we should stick to the "if you're not sure don't add it" rule. We can always add it later: a new interface StreamableCacheLoader to extend CacheLoder. Not always true, since in this case, the API we choose may dictate storage format on disk, which in turn will become a compatibility issue when reading data written using an older version of the same cache loader. > > [1] http://en.wikipedia.org/wiki/Interface_segregation_principle > [2] but Sanne / joke > > On 8 Aug 2013, at 17:26, Manik Surtani <msurtani(a)redhat.com> wrote: > >> Hey guys >> >> This was good fun today. >> >> Regarding the parallelised "process()" method, we should also look at Java 8 collections (which are introducing similar methods) and see if there is something we can learn (API-wise) there. >> >> http://www.javabeat.net/2012/05/enhanced-collections-api-in-java-8-suppor... >> http://download.java.net/jdk8/docs/api/ >> http://download.java.net/jdk8/docs/api/java/util/Map.html#compute(K, java.util.function.BiFunction) >> >> Also, what we didn't chat about: exposing ByteBuffers. I suppose instead of exposing byte[] in ValueHolder, we should provide a reference to a ByteBuffers - http://download.java.net/jdk8/docs/api/java/nio/ByteBuffer.html - and also provide similar techniques for writing ByteBuffers on AdvancedCacheWriter. >> >> And then all we have to do is re-implement the DataContainer to use ByteBuffers as well, and we can take advantage of Bela's upcoming changes to JGroups! :) >> >> >> >> -- >> Manik Surtani >> >> >> > > Cheers, > -- > Mircea Markus > Infinispan lead (www.infinispan.org) > > > > -- Manik Surtani

10 years, 9 months

4
7
0 / 0

Fwd: [jgroups-dev] Request for comments on FORK: grabbing a private channel for communication from an existing channel

by Bela Ban

FYI. -------- Original Message -------- Subject: [jgroups-dev] Request for comments on FORK: grabbing a private channel for communication from an existing channel Date: Fri, 09 Aug 2013 17:37:46 +0200 From: Bela Ban <belaban(a)yahoo.com> To: jg-dev <javagroups-development(a)lists.sourceforge.net> I wanted to solicit feedback on FORK [1]. Using FORK, one can get a private light-weight channel off of an existing channel, e.g. to send messages, create an RpcDispatcher on top etc. The forked channel is then used exclusively by the application which forks it off of the main channel, which means it will neither interfere with the main channel, nor will it see messages other than its own. A forked channel can also add protocols, these are then private to it, too. For example, an application could grab the JGroups channel from Infinispan (Cache.getAdvancedCache().getRpcManager().getTransport().getChannel()), then fork a light-weight channel with CENTRAL_LOCK on top and use it to manage cluster wide locks. The forked channel would be private to the application which creates it, but it would piggyback its messages on the existing channel, without interfering with it. I'm currently implementing a prototype, and would like to get feedback for the current design [1]. The design doc and impl are in branch JGRP-1613. Feedback on the mailing list is appreciated ! Cheers, [1] https://github.com/belaban/JGroups/blob/JGRP-1613/doc/design/FORK.txt -- Bela Ban, JGroups lead (http://www.jgroups.org)

10 years, 9 months

1
0
0 / 0

Re: [infinispan-dev] the new CacheLoader API

by Mircea Markus

On 7 Aug 2013, at 14:57, Galder Zamarreño <galder(a)redhat.com> wrote: > A few things to note that have not been mentioned: > > - BulkCacheLoader.bulk* methods: if what you're doing is return an iterator, why not call that method iterator(), just like Collection defines iterator? > > - BulkCacheLoader and Iterator classes: I find it a bit confusing that we return Infinispan specific iterators from bulk* methods but removeAll and others take a different Iterator… I don't have an answer for this… :| This will change based on Sanne's suggestion of parallel processing. However the reason I preferred an custom iterator over the one in java are: - j.u.Teartor.remove method won't be supported so not nice having it there - when iterating over the CacheEntries, with the j.u.Itearator I have to create an Map.Entry object for each entry I process. This object doesn't need to be created on an native iterator that has the key(), value() and next() methods. > - We're moving away from the cache loader vs cache store interface separation? I kinda liked the way JSR-107 had defined these, keeping them separate and keeping them independent [1]. A read-only cache store would implement only cache loader. A read-write one would implement both interfaces. Yeah people seem to prefer this for some reason :-) Here's my though on this : I think CacheLoader + CacheWriter is a nice OOP design, but is rather theoretical. In all external users scenarios I know, the interaction with the store is read+write so most of the people thing about a store along this lines. Having a distinction between loads and stores seems unnatural and creates confusion. For the few that only need a loader they can simply leave the store() empty - as simple as that. > > More below... > > [1] https://github.com/jsr107/jsr107spec/tree/v0.8/src/main/java/javax/cache/... > > On Aug 6, 2013, at 7:13 PM, William Burns <wburns(a)redhat.com> wrote: > >> >> ----- Original Message ----- >>> From: "Manik Surtani" <msurtani(a)redhat.com> >>> To: "Mircea Markus" <mmarkus(a)redhat.com> >>> Cc: "Infinispan Core Devs" <infinispan-core-dev(a)infinispan.org> >>> Sent: Monday, August 5, 2013 9:23:41 AM >>> Subject: Re: the new CacheLoader API >>> >>> >>> On 1 Aug 2013, at 11:50, Mircea Markus <mmarkus(a)redhat.com> wrote: >>> >>>> Hi, >>>> >>>> Based on the feedback I received and after doing some prototyping, here's >>>> the new CacheLoader API I came up >>>> with:https://github.com/mmarkus/infinispan/tree/t_ISPN-3290/CS_redesign/c... >>>> >>>> I know everybody's quite busy but can you please take some time and review >>>> this? It's a very important API chance and this would help to getting it >>>> off on the right foot. >>>> >>>> It should cover all the gathered requirements: >>>> >>>> - support for non-distributed transaction cache stores (1PC) and support >>>> for XA capable cache store >>>> [Decision: all the transaction support is delegated to Infinispan. The >>>> cache store SPI is much simplified.] >>> >>> +1 >>> >>>> >>>> - support iteration over all the keys/entries in the store >>>> - needed for efficient Map/Reduce integration >>>> - needed for efficient implementation of Cache.keySet(), Cache.entrySet(), >>>> Cache.values() methods > > ^ We all agree that these methods are problematic. I don't think we should work towards making them efficient. Ultimately we should phase them out... > >>>> [look in BulkCacheLoader] >>> >>> I think you meant for bulkLoadKeys() to take in a KeyFilter, not a >>> Collection? >> >> My guess is Mircea was going for an overloaded method for bulkLoadKeys where we want both one that takes a Collection and one that takes a KeyFilter? If so it seems to me it would be simpler to just have a single method that takes KeyFilter only, but then have another class like KeyFilters that has various static factory methods that can take a Iterable or Iterator for example so we don't have too many methods on the loader itself. > > +1 > >> Also removeAll should probably take a KeyFilter similarly. > > +1 > >> I noticed there is an overloaded bulkLoad which takes no arguments, which I assume just streams/loads all entries from the cache store. Do we need to have similarly overloaded methods for bulkLoadKeys? If not than do we even need the no arg bulkLoad method and just have null/AllKeyFilter be passed in instead? Also similarly would we need clear method then? Just trying to think if we can make the interface have as few methods as needed. > > +1, and +100 to keeping as few methods as needed. > >> Some methods don't have types on them. I am guessing we want to put Object type just to prevent random warnings for user and CacheStore implementations. > > +1 > >> >>> >>>> >>>> - a simple read(k) + write(k,v) interface to be implemented by users that >>>> just want to position ISPN as a cache between an app and a legacy system >>>> and which don't need/want to be bothered with all the other complex >>>> features >>>> [look at BasicCacheLoader] >> >> We should move to the immutable CacheStoreConfiguration class now instead of CacheStoreConfig, right? > > +1 > >> >>> >>> Shouldn't remove() return a boolean? >> >> It looks like we don't use the return value in ISPN code currently, but I would agree for future usage we should probably have it return a boolean. Also to be more inline with Collection interface do we want a boolean to be returned from the removeAll method if something was removed? > > It all depends what the guarantees should be at the end of the remove… if the method has no return, implementations are more free to do the remove asynchronously and not having to wait to see if anything was removed. The return forces some kind of result out of that removal, which could help Infinsipan makes decisions in the future... > >> >>> >>>> >>>> - support for expiration notification (ISPN-3064) >>>> [look at ExpiryCacheLoader.purgeExpired] >> >> I am wondering if there is some way to have this be more of a streaming approach instead of returning a Set will all of the expired keys in memory. What do you think if we passed in a callback of some type to the purgeExpired method instead. This would make it the burden of the CacheLoader to properly invoke the method for each one, but it would allow for more efficient use of memory when the cache loader can implement a streaming approach to eviction. > > That sounds like an good idea :). Also, why does ExpiryCacheLoader need to extend BasicCacheLoader? Couldn't it be treated like a mixin? IOW, have a cache loader that can optionally purgeExpired entries. > >> >>> >>> Returning the set of purged keys could be unnecessarily expensive... e.g., in >>> a JDBC-backed store, expiration could be a simple DELETE WHERE... statement, >>> whereas if the keys need to be returned, this becomes far more complex. >>>> >>>> - support for size (efficient implementation of the cache.size() method) >>>> [look at BulkCacheLoader.size()] >>> >>> I'm not so sure if I buy the separate interfaces approach. E.g., what >>> happens to calling cache.size() when the configured cache loader is not a >>> BulkCacheLoader? YOu effectively change the contract of cache.size()? >> >> Actually that is an interesting point. I would normally say it has to fallback to the bulkLoadKeys method but that is also on the BulkCacheLoader interface. Actually without defining your Cache to implement BulkCacheLoader you can't even have preload enabled either. It seems we maybe need another interface or move some of the methods down to the BasicCacheLoader > > I also agree that this separation of bulk vs basic cache loader interface is a bit weird… , particularly since the difference between the two is not only configuration (whether preload is enabled or not), but also the fact that some cache operations currently require an iterable cache store (size, keys, values, entries and iterator!!). > > To reiterate what I said above, we shouldn't fixate ourselves with size, keys, values and entries methods, since they're difficult to implement and potentially expensive, but iterator() is something we're always going to be needing in the Cache interface. So, iterator() should really be living in CacheLoader (or BasicCacheLoader) interface. > > With all these in mind, I can see the point of separating any cache loader methods that support cache methods that we'll be phasing out (keys, values, size) to a separate inteface, i.e. BulkCacheLoader. This way, when we phase them out, we don't need to modify the base cache loader interface. By simply removing the BulkCacheLoader interface you've achieved that. > > Cheers, > >> >>> >>>> >>>> Cheers, >>>> -- >>>> Mircea Markus >>>> Infinispan lead (www.infinispan.org) >>>> >>>> >>>> >>>> >>> >>> -- >>> Manik Surtani >>> >>> >>> >>> > > > -- > Galder Zamarreño > galder(a)redhat.com > twitter.com/galderz > > Project Lead, Escalante > http://escalante.io > > Engineer, Infinispan > http://infinispan.org > Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org)

10 years, 9 months

1
0
0 / 0

Re: [infinispan-dev] the new CacheLoader API

by Mircea Markus

My bad, this was intended for ISPN-dev in the first place. On 9 Aug 2013, at 16:34, Manik Surtani <msurtani(a)redhat.com> wrote: > Can we move this back to infinispan-dev, pls? > > On 9 Aug 2013, at 16:29, Mircea Markus <mmarkus(a)redhat.com> wrote: > >> >> On 6 Aug 2013, at 18:13, William Burns <wburns(a)redhat.com> wrote: >> >>>>> Based on the feedback I received and after doing some prototyping, here's >>>>> the new CacheLoader API I came up >>>>> with:https://github.com/mmarkus/infinispan/tree/t_ISPN-3290/CS_redesign/c... >>>>> >>>>> I know everybody's quite busy but can you please take some time and review >>>>> this? It's a very important API chance and this would help to getting it >>>>> off on the right foot. >>>>> >>>>> It should cover all the gathered requirements: >>>>> >>>>> - support for non-distributed transaction cache stores (1PC) and support >>>>> for XA capable cache store >>>>> [Decision: all the transaction support is delegated to Infinispan. The >>>>> cache store SPI is much simplified.] >>>> >>>> +1 >>>> >>>>> >>>>> - support iteration over all the keys/entries in the store >>>>> - needed for efficient Map/Reduce integration >>>>> - needed for efficient implementation of Cache.keySet(), Cache.entrySet(), >>>>> Cache.values() methods >>>>> [look in BulkCacheLoader] >>>> >>>> I think you meant for bulkLoadKeys() to take in a KeyFilter, not a >>>> Collection? >>> >>> My guess is Mircea was going for an overloaded method for bulkLoadKeys where we want both one that takes a Collection and one that takes a KeyFilter? If so it seems to me it would be simpler to just have a single method that takes KeyFilter only, but then have another class like KeyFilters that has various static factory methods that can take a Iterable or Iterator for example so we don't have too many methods on the loader itself. >> >> I think we'll end up needing both actually, as bulkLoadAll with a collection still makes sense. E.g. for JDBC queries it's easier to build a WHERE clause and select all the elements in one go. >> >> Sanne ha a good alternative suggestion to the the bulkLoadAll: >> >> public process(KeyFilter, j.u.c.Executor, CacheLoaderTask clt); >> >> and >> >> interface CacheLoaderTask { >> //return false if don't need to process any longer >> boolean process (CacheLoaderEntry cle); >> } >> >> interface CacheLoaderEntry { >> Object getKey(); >> ICV getInternalCacheValue(); >> //.. ongoing discussion about some other byte[] based methods >> } >> >> >> This would allow the CacheStore to iterate over the entries in parallel, whilst still allowing sequential iteration. Pretty awesome. >> >>> >>> Also removeAll should probably take a KeyFilter similarly. >> >> +1 >> >>> >>> I noticed there is an overloaded bulkLoad which takes no arguments, which I assume just streams/loads all entries from the cache store. Do we need to have similarly overloaded methods for bulkLoadKeys? If not than do we even need the no arg bulkLoad method and just have null/AllKeyFilter be passed in instead? Also similarly would we need clear method then? Just trying to think if we can make the interface have as few methods as needed. >>> >>> Some methods don't have types on them. I am guessing we want to put Object type just to prevent random warnings for user and CacheStore implementations. >> >> I'm not sure parametrized types are very useful for stores, I'll think about it a bit. >> >> Cheers, >> -- >> Mircea Markus >> Infinispan lead (www.infinispan.org) >> >> >> >> > > -- > Manik Surtani > > > Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org)

10 years, 9 months

1
0
0 / 0

Fwd: Transaction Semantics when using CacheLoaders and CacheWriters

by Galder Zamarreño

A very interesting thread in the JSR-107 group, which appears just as Mircea has looked into the XA transactions and cache loaders/stores. Going back to that thread, it wasn't very clear what would happen if Infinispan caches were configured with XA transactions and they had a cache store. What's should a user expect in that case? IOW, how does our approach here compare to what's being suggested in the thread below? My feeling is that we're doing a variant of Option 3, where each cache store will run its own transaction (if they support it...) @Manik, It's also interesting from a data grid perspective since it highlights the boundaries of a cache vs data grid in this area. Cheers, Begin forwarded message: > From: Brian Oliver <brian.oliver(a)oracle.com> > Subject: Re: Transaction Semantics when using CacheLoaders and CacheWriters > Date: August 1, 2013 5:55:14 PM GMT+02:00 > To: jsr107(a)googlegroups.com > Reply-To: jsr107(a)googlegroups.com > > Thanks for your feedback. It's much appreciated. > > Interestingly Oracle Coherence mostly takes much the same approach. Transactional (XA) Multi-Version-Concurrency-Control Caches don't allow Cache Loaders or Cache Writers (or Expiry) aka: a stronger form of Option 2. > > Personally I don't really classify these Caches as Caches (as eviction and expiry isn't supported). In essence they are really a transactional map, but leverage the Coherence NamedCache interface. Ultimately it's pure "Data Grid" functionality. > > While I think developers may like to think Option 1 is possible, when anyone explains the "cost" of this, they reluctantly decide to use Option 2, or move to using Entry Processors - which provides the atomicity for the most part. > > Historically Coherence also supported a form of Option 3 - but that also presents some challenges. > > I'm trying hard to find an answer to these challenges, but the way forward is unclear. What I can tell from our discussions here, in this group and at conferences, those that have shown interest in "transactionality" of Caches aren't really wanting Caches. They want an "fast in-memory" data-stores, perhaps like a map or nosql, to transact against, because they don't want to transact against a database. Why? They are seen as bottleneck or they are seen as being to "slow" and are trying to solve the architectural problem of the layer below their application tier. They like to call these "Caches", because they are "in-memory", but technically they aren't Caches. When you get down to it, ultimately the features and semantics being requested aren't really caches. So perhaps this is where the Data Grid specification can come into play? > > With my "standardization hat" on, my biggest concern is that anytime a developer needs to change their application, say between vendors, especially to adopt transactions that are "implementation specific", it leads me to believe there's something wrong with the specification. Personally I think we should be making it "easier" to adopt not harder. > > On Thursday, August 1, 2013 10:55:21 AM UTC-4, Brian Martin wrote: > Brian, > > I think you are spot-on with the problem and this is why we don't currently (in WebSphere eXtreme Scale) allows Loaders to be part of a distributed transaction that cross containers [your option 2]. If the transaction is to a single container, then we allow the local transaction (a believe this is equivalent to a variation of your option 3). As your dialog indicates, the scenario is messy and I don't like the state we are in currently with different capabilities depending on how many containers are enlisted in your transaction. At the moment, I don't have a better suggestion but I think your concern is valid and we should hash at a solution the community agrees with. > > Brian Martin > IBM > WebSphere eXtreme Scale > > > On Thu, Aug 1, 2013 at 9:55 AM, Brian Oliver <brian....(a)oracle.com> wrote: > Hi All, > > I'd like to propose the challenge of how we think vendors should deal with transactions in the context of Caches with CacheLoaders/Writers configured, especially in the context of a distributed Cache. While this is an "implementation concern", it's very important to see how this may be implemented as it very much effects the API design. > > As part of reviewing the specification with the Java EE team, and in particular how multiple-servers will interact, we've found a few challenges. In the spirit of openness, I've added some commentary to the following issue: https://github.com/jsr107/jsr107spec/issues/153 > > Currently I feel that the way the API is defined, all CacheLoader and CacheWriter operations will need to be performed "locally" which fundamentally prevents efficient (or any) implementation in a highly concurrent and distributed manner. Furthermore, interaction across multiple application processes, Java SE or otherwise may be a problem, simply because the API doesn't provide enough fidelity for CacheLoader and CacheWriter operations to be part of a larger transaction. eg: there's no "prepare" and "commit" for CacheWriters! Just "store". > > Even with a few changes, as I've suggested in the issue above, I honestly feel we're essentially forcing vendors to implement fully recoverable XA Transaction Managers as part of their Caching infrastructure, simply to coordinate transactions across the underlying Cache Writers in a distributed setting. Why? because the API basically implies this coordination would need to be performed by the Cache implementation itself - even in "local" mode! > > eg: Say a developer starts a transaction that updated n entries, those of which are partitioned across n servers. As part of the "commit", all n servers will need to take care of committing, say to memory. Behind this are the Cache Writers, which also need to be coordinated. The entries need to be stored as part of the Caching contract. > > Unfortunately our current API provides no mechanism to coordinately this, eg: share a global transaction to a single database across said the n Cache Writers. Without this what essentially happens at the moment is that each CacheWriter starts their own individual transaction, not attached to or part of the application transaction. That may seem reasonable to some, but consider the case where there is a parent-child or some other relationship between the cache entries that are being updated (which is why your using a transaction in the first place). If individual transactions are used by the Cache Writers and are committed in some non-deterministic order (as there is no ordering constraints or ways to control this in the API) database integrity constraints are likely to be violated. So while the "commit" to the Cache may seem to be atomic, the "stores" to the underlying Cache Writers aren't. > > Essentially there are a few options (as I've covered in the issue). > > 1. Allow a global transaction to be provided to all of the Cache Writers. Wow... that would be pretty crazy and horribly slow. Every server would need to contact the transaction manager, do a bunch of work, etc, just to set things up. > > This sort of contradicts the entire reason people would be using a cache in the first place. To even achieve this I think we'd need to change the CacheLoader/Writer API. Specifically we'd need to add "prepare", "commit" and "rollback". > > 2. Don't allow CacheLoaders/Writers to be configured with Caches. I think this is pretty easy to do, but again, wow... that would force developers to change their application code significantly to use Transactional Caches with external stores. > > 3. Only allow "local" transactions to be performed. This would ultimately mean that Caches would be the last-local-resource in XA transactions (not too bad, though it's a challenge if there are others as well). Additionally in the distributed case, while entries may be distributed, the loading / writing would always occur locally. This works, but significantly reduces scalability as all "versioning" of data being touched may need to be held locally. It's highly likely a huge amount of distributed locks would be required (if the Cache isn't using MVVC), which we know is horribly slow. eg: imagine a transaction with a "putAll" containing a few million entries. In pessimistic mode, an implementation may need to do a lot of work locally to ensure versioning is held and updated correctly. It may also need to perform a few million locks! Saying that a developer shouldn't use "putAll" with transactions probably isn't a solution either. > > Personally I'm not sure if any of this is desirable? I haven't really seen much of this discussed or addressed. Perhaps I'm missing something? I'd certainly be happy to do some further research! > > The bottom line is that while we're trying to define an API that provides developers with a means to improve the performance, through-put and scalability of an application through the temporary storage of data, the requirements to implement transactions, even optionally, may throw much of the benefit away. > > It would be great to get your thoughts on this. I don't think we can get away with the statement "transactions are implementation specific" in the specification, especially if the API doesn't provide enough fidelity to cover these simple use-cases. > > -- Brian > > > > > > > -- > You received this message because you are subscribed to the Google Groups "jsr107" group. > To unsubscribe from this group and stop receiving emails from it, send an email to jsr107+un...(a)googlegroups.com. > For more options, visit https://groups.google.com/groups/opt_out. > > > > > -- > You received this message because you are subscribed to the Google Groups "jsr107" group. > To unsubscribe from this group and stop receiving emails from it, send an email to jsr107+unsubscribe(a)googlegroups.com. > For more options, visit https://groups.google.com/groups/opt_out. > > -- Galder Zamarreño galder(a)redhat.com twitter.com/galderz Project Lead, Escalante http://escalante.io Engineer, Infinispan http://infinispan.org

10 years, 9 months

2
1
0 / 0

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

infinispan-dev August 2013