Hey Galder,

I haven't sent any email since I didn't have enough time to create a proper reproducer or investigate what was going on.

During the summit work, I switched from a custom build of 9.2.1.Final to the latest master. This resulted in all sites going up and down. I was struggling for 5 hours and I couldn't stabilize it. Then, 30 mins before rehearsal session I decided to revert back to 9.2.1.Final.

I wish I had more clues. Maybe I haven't done proper migration or used too short timeouts for some FD* protocol. It's hard to say. 

Thanks,
Sebastian

On Mon, Apr 30, 2018 at 5:16 PM Galder Zamarreno <galder@redhat.com> wrote:
Ups, sent too early! So, the NYC site is not up, so I see in the logs:

2018-04-30 16:53:49,411 ERROR [org.infinispan.test.fwk.TEST_RELAY2] (testng-ProtobufMetadataXSiteStateTransferTest[DIST_SYNC, tx=false]:[]) 
ProtobufMetadataXSiteStateTransferTest[DIST_SYNC, tx=false]-NodeA-55452: no route to NYC: dropping message

But the put hangs and never completes [2]. I've traced the code and [3] never gets called, with no events.

I think this might be a JGroups bug because ChannelCallbacks implements UpHandler, but JChannel never deals with a receiver that might implement UpHandler, so it never delivers site unreachable message up the stack.

@Bela?

Cheers,
Galder

[2] https://gist.github.com/galderz/ada0e9317889eaa272845430b8d36ba1
[3] https://github.com/infinispan/infinispan/blob/master/core/src/main/java/org/infinispan/remoting/transport/jgroups/JGroupsTransport.java#L1366
[4] https://github.com/belaban/JGroups/blob/master/src/org/jgroups/JChannel.java#L953-L983



On Mon, Apr 30, 2018 at 5:09 PM Galder Zamarreno <galder@redhat.com> wrote:
Hi Sebastian,

Did you mention something about x-site not working on master?

The reason I ask is cos I was trying to create a state transfer test for [1] and there are some odds happening.

In my test, I start LON site configured with NYC but NYC is not up yet.