Solved - turned out to be an issue on the Cassandra side.
For the benefit of the list, in case anyone is Googling and has the same
problem...
I searched to logs and found some failing assertions in
/var/log/cassandra/system.log on some of the nodes in the cluster. Turns
out this bug was the issue:
https://issues.apache.org/jira/browse/CASSANDRA-4687
I've followed the workaround of disabling the key cache to workaround the
issue until there's a fix. For anyone following that workaround - you'll
need to update your schema to disable the key cache to start with, then run
nodetool invalidatekeycache to make sure anything current cached on each
node is evicted before the changes take effect. Also, if your app is still
running and has pooled connections that have received TimedOutExceptions
previously, you'll need to reload it as those connections will be hanging
around in a bad state.
James.
On 10 September 2013 11:15, James Aley <james.aley(a)swiftkey.net> wrote:
Hey guys,
Not sure if this is the most appropriate list - please let me know if this
question is best asked elsewhere!
I'm using Infinispan's Lucene Directory with a Cassandra CacheStore
backing it, version 5.3.0.Final. Everything seems to be more-or-less
working fine, except for some kind of apparent race condition on start-up.
At the first point where my code accesses the InfinispanDirectory, I get
usually get a Cassandra TimedOutException from the Client.get() operation
in the CacheLoader. I've been googling around and all the suggestions on
the web (relating to using Cassandra directly) don't seem particularly
relevant, as the most common cause is reading too much data in single
operations. This is merely a 32kb read.
I was wondering if the problem is that my code is supposed to somehow wait
for the Cache instances to be "ready" before using them? Is there some kind
of listener I'm supposed to use to be notified of the Cache being ready for
read operations?
Any suggestions greatly appreciated!
Thanks,
James.