[infinispan-issues] [JBoss JIRA] (ISPN-9345) TimeutException involving the org.infinispan.CONFIG cache

Mon Jul 9 03:22:00 EDT 2018

    [ https://issues.jboss.org/browse/ISPN-9345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13602142#comment-13602142 ] 

Dan Berindei commented on ISPN-9345:
------------------------------------

[~belaban] after reading some more about IPv6 I realized I jumped to conclusions: the do-not-fragment rule is only for routers, and the network stack on the sender is free to fragment the packets:

{quote}
   In order to send a packet that is too large to fit in the MTU of the
   path to its destination, a source node may divide the packet into
   fragments and send each fragment as a separate packet, to be
   reassembled at the receiver.
{quote}

These logs from `tcpdump` actually show that a packet was fragmented, and the "bad length" message is logged by tcpdump every time the UDP datagram size is bigger than the IP packet size:

{noformat}
09:30:32.070507 IP6 (class 0x08, flowlabel 0xbb8fe, hlim 2, next-header Fragment (44) payload length: 1456) denulu-tp3 > ff0e::e406:708: frag (0xbbd85d01:0|1448) 56784 > 46655: UDP, bad length 1453 > 1440
09:30:32.070563 IP6 (class 0x08, flowlabel 0xbb8fe, hlim 2, next-header Fragment (44) payload length: 21) denulu-tp3 > ff0e::e406:708: frag (0xbbd85d01:1448|13)
{noformat}

This is a non-fragmented packet, with the length just below the threshold:

{noformat}
09:31:21.935633 IP6 (class 0x08, flowlabel 0x839b7, hlim 2, next-header UDP (17) payload length: 1460) denulu-tp3.57807 > ff0e::e406:708.46655: [udp sum ok] UDP, length 1452
{noformat}

Initially I thought it was a problem with the wireless router I was connected to at the time. But I was able to reproduce the problem by connecting only to an ethernet switch, so I'm now convinced the problem is in my machine's IPv6 stack.

{quote}
Sending a large packet at startup is too cumbersome, but this could be integrated into some configuration task perhaps?
{quote}

Well, it wouldn't be a simple configuration change, but we can surely modify {{FRAGx}} to send a message with {{frag_size}} bytes on startup/view change and log an error if there is no confirmation from the coordinator?

I'm not sure what to do with all the protocols below {{FRAGx}} that send data in the message body. Maybe they can check {{TP.max_bundle_size}} and do their own fragmentation based on that? The only inconvenience would be that {{TP.max_bundle_size}} is always set, even when using {{TCP}}, and we wouldn't need fragmentation there.

> TimeutException involving the org.infinispan.CONFIG cache
> ---------------------------------------------------------
>
>                 Key: ISPN-9345
>                 URL: https://issues.jboss.org/browse/ISPN-9345
>             Project: Infinispan
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 9.3.0.Final
>            Reporter: Gustavo Fernandes
>            Assignee: Dan Berindei
>
> {noformat}
> Caused by: org.infinispan.commons.CacheException: Initial state transfer timed out for cache org.infinispan.CONFIG on jedha-64980
>     at org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete(StateTransferManagerImpl.java:233)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:498)
>     at org.infinispan.commons.util.SecurityActions.lambda$invokeAccessibly$0(SecurityActions.java:79)
> {noformat}

--
This message was sent by Atlassian JIRA
(v7.5.0#75005)