On 10/13/2011 10:45 AM, Paul Ferraro wrote:
On Tue, 2011-10-11 at 10:59 -0500, David M. Lloyd wrote:
> There are at least two basic paths we can follow for clustered
> invocation based on the current architecture. The right choice is going
> to depend primarily upon the expected use cases, which I am not in a
> position to properly judge.
>
> Option 1: Clustered Invocation Transport
> ----------------------------------------
>
> In this option, we introduce a new "LAN" transport type for invocation
> on the cluster. The transport would use direct TCP connections or UDP
> messages (or both, depending on request size) to convey the invocation.
> The characteristics of this option are as follows:
>
> - Security: reliance on physical network security only (no TLS or
> authentication)
> - Latency is very low, even to new nodes
> - Topology changes can be conveyed as separate asynchronous messages
> - Invocations from external networks would happen through a proxy node,
> with Remoting being bridged to the LAN, to perform security functions
>
> Option 2: Load-balanced Remoting Connections
> --------------------------------------------
>
> In this option, we rely on the client to establish one or more Remoting
> connection(s) to one or more of the nodes of the cluster. Logic in the
> client will be used to determine what connection(s) to use for what
> clusters. We have the option of automatically connecting as topology
> changes or requiring the user to set up the connections in advance.
> Note that automatic connection cannot work in the case of
> user-interactive authentication. Characteristics:
>
> - Security: full authentication and TLS supported
> - Latency is low once the connection is established, however there is
> some overhead involved in authentication and security negotiation
> - Topology changes should be asynchronous notifications
> - Each connection has to be separately authenticated
> - Automatically establishing connections is not presently supported, so
> we'd need a bit of infrastructure for that. Deal with user-interactive
> authentication. Deal with connection lifecycle management. Deal with
> configuration. This will be a point of fragility
>
> Summary
> -------
>
> For both options, we have to determine an appropriate load-balancing
> strategy. The choice of direction will affect how our clustering and
> transaction interceptors function. We also have to suss out the logic
> around dealing with conflicting or wrongly-ordered topology updates;
> hopefully our existing policies will continue to apply.
Do topology changes really need to be asynchronous notifications? Can
we simply update cluster topology per invocation?
Maintaining an accurate cluster topology via asynchronous notifications
has the following benefits:
1. Topology changes between invocations won't require failover in the
event of a load balanced invocation (as opposed to a sticky one).
* Load balancing will potentially be more effective following topology
changes by leveraging new cluster members.
* Minimizes invocation payload (since we don't need to tack on cluster
topology to every invocation response). We can optimize this somewhat
by sending a topology view ID with the invocation request, and only
including the topology in the response if the topology changed (i.e.
request view ID != current view ID).
The only disadvantage of which I can think is implementation complexity.
Topology update ordering is not an issue if we take the simpler
approach. However, we can also make an assumption that topology changes
are not common - so it becomes a matter of whether or not to optimize
for frequent topology changes.
The problem is that (with R3 anyway) many threads may concurrently use a
single connection, and invocation replies can come in any order with
respect to the original invocation, so in effect even if we attach
topology information to the reply they're still essentially asynchronous
with the disadvantage that topology changes also bog down invocation
response times.
And if we did a non-persistent-connection-based transport, there's even
less of a guarantee because each reply could come in separate packets or
connections which can be arbitrary reordered at a network level.
In other words, topology update ordering is always an issue, even more
so when multiple nodes come into play.
Using a view ID is fine as long as all nodes in the cluster always agree
on what view ID is the latest (which afaik is essentially impossible to
guarantee).
But again all this ties back to the transport implementation. R3
transport means persistent connections but we likely cannot
automatically bring up new connections to new nodes; custom transport
would mean no persistent connections but new nodes can be accessed
instantly.
--
- DML