July 2011 - infinispan-dev - Jboss List Archives

by Yuri de Wit

Hi Galder, Thanks for your reply. Let me continue this discussion here first to validate my thinking before I create any issues in JIRA (forgive me for the lengthy follow up). First of all, thanks for this wonderful project! I started looking into Ehcache as the default caching implementation, but found it lacking on some key features when using JGroups. My guess is that all the development there is going towards the Terracotta distribution instead of JGroups. Terracotta does seems like a wonderful product, but I was hoping to stick to JGroups based caching impl. So I was happy to have found Infinispan. I need to create a distributed cache that loads data from the file system. It's a tree of folders/files containing mostly metadata info that changes seldom, but changes. Our mid-term goal is to move the metadata away from the file system and into a database, but that is not feasible now due to a tight deadline and the risks of refactoring too much of the code base. So I was happy to see the GridFilesystem implementation in Infinispan and the fact that clustered caches can be lazily populated (the metadata tree in the FS can be large and having all nodes in the cluster preloaded with all the data would not work for us). However, it defines it's own persistence scheme with specific file names and serialized buckets, which would require us to have a cache-aside strategy to read our metadata tree and populate the GridFilesystem with it. What I am looking for is to be able to plug into the GridFilesystem a new FileCacheStore that can load directly from an existing directory tree, transparently. This will basically automatically lazy load FS content across the cluster without having to pre-populate the GridFilesystem programatically. At first I was hoping to extend the existing FileCacheStore to support this (hence why I was asking for a GripInputStream.skip()/available() implementation and make the constructors protected instead package level access), but I later realized that what I needed was an entire new implementation since the buckets abstraction there is not really appropriate. The good news is that I am close to 75% complete with the impl here. It is working, with a few caveats, beautifully for a single node, but I am facing some issues trying to launch a second node in the cluster (most of it my ignorance, I am sure). ** Do you see any issues with this approach that I am not aware of? In addition, I am having a couple of issues launching the second node in the cluster. A couple of NPEs and an exception "java.net.NoRouteToHostException: No route to host". I will send the details to these exceptions in a follow up email. This is where I am stuck at the moment. In my setup I have two configuration files: * cache-master.xml * cache-slave.xml Both define data and the metadata caches required by GridFilesystem but -master.xml configures the custom FileCacheStore I implemented and -slave.xml uses the ClusterCacheLoader. These are some of the items/todo's for this custom FileCacheStore impl: ** Implement chunked writes with a special chunking protocol to trigger when the last chunk has been delivered ** custom configuration to simplify it for GridFilesystem. regards, -- yuri With the exception of supporting a safe chunk write (for now I am sending the whole file content when writing to the cache, since chunked write would require additional changes to GridFS such as a protocol to let the loader know that the current chunk is the last one and so it can finally update the underlying file as a whole, etc). can parsed and translate into file read on the real Any change to implement the skip() and available() methods in GridInputStream or make the constructors in the GridFileSystem package public so I can easily extend them? I am trying to plug in a custom FileMetadataCacheStore and FileDataCacheStore implementation under the metadata/data caches used by the GridFS so that loading from an existing FS it completely transparent and lazy (I'll be happy to contribute if it makes sense). The problem is that any BufferedInputStream's wrapped around the GridInputStream call available/skip but they are not implemented in gridfs. Do you also see any issues with the above approach? regards, On Mon, Jul 11, 2011 at 12:06 PM, galderz <reply+m-9778632-9f501737dc4435143baf6908afcd349935f887a8(a)reply.github.com> wrote: > Hey Yuri, > > Why do you need two new file based stores? Can't you plug Infinispan with a file based cache store to give you FS persistence? > > Anyway, I'd suggest you discuss it in the Infinispan dev list (http://lists.jboss.org/pipermail/infinispan-dev/) and in parallel, create an issue in https://issues.jboss.org/browse/ISPN > > Cheers, > Galder > > -- > Reply to this email directly or view it on GitHub: > http://github.com/inbox/9770320#reply >

12 years, 9 months

2
8
0 / 0

Cutting 5.0.0.Final

by Manik Surtani

Guys, any thoughts on this? Anything that's come up that needs addressing? We seem to have a few minor open JIRAs, nothing significant... -- Manik Surtani manik(a)jboss.org twitter.com/maniksurtani Lead, Infinispan http://www.infinispan.org

12 years, 9 months

3
3
0 / 0

transactions :: incremental locking

by Mircea Markus

Hi, This is the document describing the incremental optimistic locking Dan and myself discussed last week: http://community.jboss.org/wiki/IncrementalOptimisticLocking Unless I'm missing something, this together with lock reordering[1] cover 100% of the possible deadlock situations in the optimistic locking scheme[2] - which is pretty awesome! Cheers, Mircea [1] http://community.jboss.org/wiki/LockReorderingForAvoidingDeadlocks [2] http://community.jboss.org/wiki/OptimisticLockingInInfinispan

12 years, 9 months

5
13
0 / 0

Faster LRU

by Vladimir Blagojevic

Hey guys, In the past few days I've look around how to squeeze every bit of performance out of BCHM and particularly our LRU impl. What I did not like about current LRU is that a search for an element in the queue is not constant time operation but requires full queue traversal if we need to find an element[1]. It would be be particularly nice to have a hashmap with a constant cost for look up operations. Something like LinkedHashMap. LinkedHashMap seems to be a good container for LRU as it provides constant time lookup but also a hook, a callback call for eviction of the oldest entry in the form of removeEldestEntry callback. So why not implement our segment eviction policy by using a LinkedHashMap [2]? I've seen about 50% performance increase for smaller caches (100K) and even more for larger and more contended caches - about 75% increase. After this change BCHM performance was not that much worse than CHM and it was faster than synchronized hashmap. Should we include this impl as FAST_LRU as I would not want to remove current LRU just yet? We have to prove this one is correct and that it does not have have any unforeseen issues. Let me know what you think! Vladimir [1] https://github.com/infinispan/infinispan/blob/master/core/src/main/java/o... [2] Source code snippet for LRU in BCHM. static final class LRU<K, V> extends LinkedHashMap<HashEntry<K,V>, V> implements EvictionPolicy<K, V> { /** The serialVersionUID */ private static final long serialVersionUID = -6627108081544347068L; private final ConcurrentLinkedQueue<HashEntry<K, V>> accessQueue; private final Segment<K,V> segment; private final int maxBatchQueueSize; private final int trimDownSize; private final float batchThresholdFactor; private final Set<HashEntry<K, V>> evicted; public LRU(Segment<K,V> s, int capacity, float lf, int maxBatchSize, float batchThresholdFactor) { super((int)(capacity*lf)); this.segment = s; this.trimDownSize = (int)(capacity*lf); this.maxBatchQueueSize = maxBatchSize > MAX_BATCH_SIZE ? MAX_BATCH_SIZE : maxBatchSize; this.batchThresholdFactor = batchThresholdFactor; this.accessQueue = new ConcurrentLinkedQueue<HashEntry<K, V>>(); this.evicted = new HashSet<HashEntry<K, V>>(); } @Override public Set<HashEntry<K, V>> execute() { Set<HashEntry<K, V>> evictedCopy = new HashSet<HashEntry<K, V>>(); for (HashEntry<K, V> e : accessQueue) { put(e, e.value); } evictedCopy.addAll(evicted); accessQueue.clear(); evicted.clear(); return evictedCopy; } @Override public Set<HashEntry<K, V>> onEntryMiss(HashEntry<K, V> e) { return Collections.emptySet(); } /* * Invoked without holding a lock on Segment */ @Override public boolean onEntryHit(HashEntry<K, V> e) { accessQueue.add(e); return accessQueue.size() >= maxBatchQueueSize * batchThresholdFactor; } /* * Invoked without holding a lock on Segment */ @Override public boolean thresholdExpired() { return accessQueue.size() >= maxBatchQueueSize; } @Override public void onEntryRemove(HashEntry<K, V> e) { remove(e); // we could have multiple instances of e in accessQueue; remove them all while (accessQueue.remove(e)) { continue; } } @Override public void clear() { super.clear(); accessQueue.clear(); } @Override public Eviction strategy() { return Eviction.LRU; } protected boolean removeEldestEntry(Entry<HashEntry<K,V>,V> eldest){ HashEntry<K, V> evictedEntry = eldest.getKey(); segment.evictionListener.onEntryChosenForEviction(evictedEntry.value); segment.remove(evictedEntry.key, evictedEntry.hash, null); evicted.add(evictedEntry); return size() > trimDownSize; } }

12 years, 9 months

5
18
0 / 0

Atomic operations and transactions

by Sanne Grinovero

Hello all, some team members had a meeting yesterday, one of the discussed subjects was about using atomic operations (putIfAbsent, etc..). Mircea just summarised it in the following proposal: The atomic operations, as defined by the ConcurrentHashMap, don't fit well within the scope of optimistic transaction: this is because there is a discrepancy between the value returned by the operation and the value and the fact that the operation is applied or not: E.g. putIfAbsent(k, v) might return true as there's no entry for k in the scope of the current transaction, but in fact there might be a value committed by another transaction, hidden by the fact we're running in repeatable read mode. Later on, at prepare time when the same operation is applied on the node that actually holds k, it might not succeed as another transaction has updated k in between, but the return value of the method was already evaluated long before this point. In order to solve this problem, if an atomic operations happens within the scope of a transaction, Infinispan eagerly acquires a lock on the remote node. This locks is held for the entire duration of the transaction, and is an expensive lock as it involves an RPC. If keeping the lock remotely for potentially long time represents a problem, the user can suspend the running transaction and run the atomic operation out of transaction's scope, then resume the transaction. In addition to this, would would you think about adding a flag to these methods which acts as suspending the transaction just before and resuming it right after? I don't know what is the cost of suspending & resuming a transaction, but such a flag could optionally be optimized in future by just ignoring the current transaction instead of really suspending it, or apply other clever tricks we might come across. I also think that we should discuss if such a behaviour should not be the default - anybody using an atomic operation is going to make some assumptions which are clearly incompatible with the transaction, so I'm wondering what is the path here to "least surprise" for default invocation. Regards, Sanne

12 years, 9 months

7
35
0 / 0

Contributing to Infinispan Guide updated

by Pete Muir

Hi All Please take a look at https://docs.jboss.org/author/display/ISPN/Contributing+to+Infinispan - I've given it a thorough review and reorganisation. Please let me know if I've removed material that you think is still needed. pete

12 years, 10 months

4
6
0 / 0

Yahoo Cloud Service Benchmark (YCSB)

by Manik Surtani

Hi guys I wrote a plugin for the YCSB framework, which compares key/value stores. Care to have a look at my impl and see what you think? I haven't actually run the benchmark on anything more than my laptop yet, but it would be a good idea to do so on our cluster lab and see how we fare against other NoSQL stores. https://github.com/maniksurtani/YCSB -- Manik Surtani manik(a)jboss.org twitter.com/maniksurtani Lead, Infinispan http://www.infinispan.org

12 years, 10 months

2
2
0 / 0

Tidy up of forum + wiki

by Pete Muir

All, Today I have spent some time tidying up the wiki and forum. Wiki ------ * All documentation articles have been moved to the "Infinispan Archive". This space is not generally writable by the community. Contact me if an article has been moved there by mistake * The wiki "home page" is now the SBS document view, as all documents in the Infinispan space are wiki content (design docs, notes etc.). This means we no longer need to manually maintain a contents page for the wiki :-) Forums ---------- * All open questions older than 2 months have been marked as answered regardless of their status. This was necessary to give us a clean slate to work from. * I have added http://community.jboss.org/wiki/HowToAskAForumQuestion to encourage people to ask questions with all necessary info up front and also to mark questions as answered. This will appear as an "Announcement" when there are no other announcements showing as well. * I have added guidance on how the community can contribute by answering questions https://docs.jboss.org/author/display/ISPN/Contributing+to+Infinispan#Con... - this section also addresses how we manage the forum. I will add sections on stackoverflow etc. here as I get to them ;-) * I have asked the jboss.org team if we can have all discussions be questions, this will allow much easier tracking of "open user questions" Core team, please can you go through the open questions we currently have (http://community.jboss.org/en/infinispan?view=discussions#/?filter=open) and try to answer any that are open. Some you may feel you have already answered and not seen any activity from the user for a couple of weeks. Feel free to mark these as "Assumed Answered". Pete

12 years, 10 months

2
2
0 / 0

splitting infinispan-lucene-directory.jar

by Sanne Grinovero

Hello all, it seems that people defining an Infinispan configuration in the application server for the Lucene directory, and using the ad-hoc TwoWayKey2StringMapper, need to move the infinispan-lucene-directory.jar in the commons-lib directory of the application server. Since this module depends on Lucene too, people would need to move the Lucene jar too, and this is not desirable as applications might want to use different applications. The mapper depends of course to the objects it creates: the only clean option I'm seeing is to split the jar in two jars, making sure that the keyMapper and keys are independent from Lucene, but I'm not a big fan of this split. Thoughts? http://community.jboss.org/message/613078#613078 Sanne

12 years, 10 months

2
7
0 / 0

Writing Documentation and FAQs Guide

by Pete Muir

First draft is up at https://docs.jboss.org/author/display/ISPN/Contributing+to+Infinispan#Con... Please review it, let me know if you spot any errors ;-) I've tried to follow my own advice when writing this section, but I'm sure I missed something!

12 years, 10 months

4
4
0 / 0

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

infinispan-dev July 2011