[infinispan-issues] [JBoss JIRA] Commented: (ISPN-1255) RequestIgnoredException on rehash using the Distributed Executor Service

Fri Jul 22 07:49:23 EDT 2011

    [ https://issues.jboss.org/browse/ISPN-1255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12615864#comment-12615864 ] 

Dan Berindei commented on ISPN-1255:
------------------------------------

I decreased the GMS timeouts and the RequestIgnoredResponses disappeared from my log:

{code:xml}
    <pbcast.GMS print_local_addr="true" 
        join_timeout="3000"
        leave_timeout="3000"
        merge_timeout="60000"
        view_ack_collection_timeout="2000"
        view_bundling="true"
        max_bundling_time="1000"/>
{code}

I'm pretty sure that this is because GMS delays the new node's start procedure by {{join_timeout}} milliseconds, and 

> RequestIgnoredException on rehash using the Distributed Executor Service
> ------------------------------------------------------------------------
>
>                 Key: ISPN-1255
>                 URL: https://issues.jboss.org/browse/ISPN-1255
>             Project: Infinispan
>          Issue Type: Bug
>    Affects Versions: 5.0.0.CR7
>            Reporter: Erik Salter
>            Assignee: Vladimir Blagojevic
>             Fix For: 5.0.0.FINAL
>
>         Attachments: cacheTest.zip, server_node1.log, server_node2.log
>
>
> My application exposes its distributed operations via a REST-based infrastructure.  To minimize the delta between JBoss starting and the cache starting, I used the new Distributed Executor to "sticky" a task to the data owner of a set of keys (with the same hash code). 
> NOTE:  Rehash still causes problems seen in ISPN-1106.  (Attached new logs)
> I see a lot of the following error from the DistributedExecutorService when the new node's cache doesn't start in a timely manner: 
> Reason: java.lang.IllegalStateException: Invalid response {Satriani-52149(PHL)=RequestIgnoredResponse}
> In addition, I see:
> org.infinispan.util.concurrent.TimeoutException: Timed out waiting for valid responses!
> It takes the cache about 2+ minutes at low throughput rate (30 tx/s) to recover.  For high throughput rate, the cluster doesn't recover. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira