On Oct 13, 2011, at 5:57 PM, David M. Lloyd wrote:
> On 10/13/2011 10:45 AM, Paul Ferraro wrote:
>> On Tue, 2011-10-11 at 10:59 -0500, David M. Lloyd wrote:
>>> There are at least two basic paths we can follow for clustered
>>> invocation based on the current architecture. The right choice is going
>>> to depend primarily upon the expected use cases, which I am not in a
>>> position to properly judge.
>>>
>>> Option 1: Clustered Invocation Transport
>>> ----------------------------------------
>>>
>>> In this option, we introduce a new "LAN" transport type for
invocation
>>> on the cluster. The transport would use direct TCP connections or UDP
>>> messages (or both, depending on request size) to convey the invocation.
>>> The characteristics of this option are as follows:
>>>
>>> - Security: reliance on physical network security only (no TLS or
>>> authentication)
>>> - Latency is very low, even to new nodes
>>> - Topology changes can be conveyed as separate asynchronous messages
>>> - Invocations from external networks would happen through a proxy node,
>>> with Remoting being bridged to the LAN, to perform security functions
>>>
>>> Option 2: Load-balanced Remoting Connections
>>> --------------------------------------------
>>>
>>> In this option, we rely on the client to establish one or more Remoting
>>> connection(s) to one or more of the nodes of the cluster. Logic in the
>>> client will be used to determine what connection(s) to use for what
>>> clusters. We have the option of automatically connecting as topology
>>> changes or requiring the user to set up the connections in advance.
>>> Note that automatic connection cannot work in the case of
>>> user-interactive authentication. Characteristics:
>>>
>>> - Security: full authentication and TLS supported
>>> - Latency is low once the connection is established, however there is
>>> some overhead involved in authentication and security negotiation
>>> - Topology changes should be asynchronous notifications
>>> - Each connection has to be separately authenticated
>>> - Automatically establishing connections is not presently supported, so
>>> we'd need a bit of infrastructure for that. Deal with user-interactive
>>> authentication. Deal with connection lifecycle management. Deal with
>>> configuration. This will be a point of fragility
>>>
>>> Summary
>>> -------
>>>
>>> For both options, we have to determine an appropriate load-balancing
>>> strategy. The choice of direction will affect how our clustering and
>>> transaction interceptors function. We also have to suss out the logic
>>> around dealing with conflicting or wrongly-ordered topology updates;
>>> hopefully our existing policies will continue to apply.
>>
>> Do topology changes really need to be asynchronous notifications? Can
>> we simply update cluster topology per invocation?
>>
>> Maintaining an accurate cluster topology via asynchronous notifications
>> has the following benefits:
>> 1. Topology changes between invocations won't require failover in the
>> event of a load balanced invocation (as opposed to a sticky one).
>> * Load balancing will potentially be more effective following topology
>> changes by leveraging new cluster members.
>> * Minimizes invocation payload (since we don't need to tack on cluster
>> topology to every invocation response). We can optimize this somewhat
>> by sending a topology view ID with the invocation request, and only
>> including the topology in the response if the topology changed (i.e.
>> request view ID != current view ID).
As a side note, this is what Hot Rod does in Infinispan in order to detect stale views.
We might optimise this further in the future as indicated in
https://issues.jboss.org/browse/ISPN-1403, but this optimisation is particular to the
Infinispan use case.
Alternative optimisation avenues could be investigated for clustered invocations.
>>
>> The only disadvantage of which I can think is implementation complexity.
>> Topology update ordering is not an issue if we take the simpler
>> approach. However, we can also make an assumption that topology changes
>> are not common - so it becomes a matter of whether or not to optimize
>> for frequent topology changes.
>
> The problem is that (with R3 anyway) many threads may concurrently use a
> single connection, and invocation replies can come in any order with
> respect to the original invocation, so in effect even if we attach
> topology information to the reply they're still essentially asynchronous
> with the disadvantage that topology changes also bog down invocation
> response times.
>
> And if we did a non-persistent-connection-based transport, there's even
> less of a guarantee because each reply could come in separate packets or
> connections which can be arbitrary reordered at a network level.
>
> In other words, topology update ordering is always an issue, even more
> so when multiple nodes come into play.
>
> Using a view ID is fine as long as all nodes in the cluster always agree
> on what view ID is the latest (which afaik is essentially impossible to
> guarantee).
Well, if they're running in a cluster, JGroups provides a guarantee via GMS that a
common viewId is maintained, at least until a cluster partitition occurs.
When a partition occurs, several cluster islands can evolve their viewId independently.
Okay, so as long as our cluster client node is directly participating in
the JGroups group then we can rely on that I guess?
This still brings us no closer to a determination about what approach we
should take for the invocation protocol itself.
--
- DML