Author: jmesnil
Date: 2009-10-28 09:25:50 -0400 (Wed, 28 Oct 2009)
New Revision: 8155
Added:
trunk/docs/user-manual/en/diagrams/ha-replicated-store.odg
trunk/docs/user-manual/en/diagrams/ha-shared-store.odg
trunk/docs/user-manual/en/images/ha-replicated-store.png
trunk/docs/user-manual/en/images/ha-shared-store.png
Modified:
trunk/docs/user-manual/en/client-reconnection.xml
trunk/docs/user-manual/en/examples.xml
trunk/docs/user-manual/en/ha.xml
Log:
HA documentation (WIP)
Modified: trunk/docs/user-manual/en/client-reconnection.xml
===================================================================
--- trunk/docs/user-manual/en/client-reconnection.xml 2009-10-28 01:01:50 UTC (rev 8154)
+++ trunk/docs/user-manual/en/client-reconnection.xml 2009-10-28 13:25:50 UTC (rev 8155)
@@ -78,7 +78,7 @@
<para><literal>reconnect-attempts</literal>. This optional
parameter determines the
total number of reconnect attempts the bridge will make before giving up
and
shutting down. A value of <literal>-1</literal> signifies an
unlimited number of
- attempts. The default value is
<literal>-1</literal>.</para>
+ attempts. The default value is
<literal>0</literal>.</para>
</listitem>
</itemizedlist>
<para>If you're using JMS, and you're using the JMS Service on the
server to load your JMS
Added: trunk/docs/user-manual/en/diagrams/ha-replicated-store.odg
===================================================================
(Binary files differ)
Property changes on: trunk/docs/user-manual/en/diagrams/ha-replicated-store.odg
___________________________________________________________________
Name: svn:mime-type
+ application/octet-stream
Added: trunk/docs/user-manual/en/diagrams/ha-shared-store.odg
===================================================================
(Binary files differ)
Property changes on: trunk/docs/user-manual/en/diagrams/ha-shared-store.odg
___________________________________________________________________
Name: svn:mime-type
+ application/octet-stream
Modified: trunk/docs/user-manual/en/examples.xml
===================================================================
--- trunk/docs/user-manual/en/examples.xml 2009-10-28 01:01:50 UTC (rev 8154)
+++ trunk/docs/user-manual/en/examples.xml 2009-10-28 13:25:50 UTC (rev 8155)
@@ -33,10 +33,7 @@
<para>Here's a listing of the examples with a brief
description.</para>
<section id="application-level-failover">
<title>Application-Layer Failover</title>
- <para>HornetQ implements fully transparent automatic failover of
connections from a live
- to backup node, this requires no special coding for failover, and is
described in a
- different example. Automatic failover requires server
replication.</para>
- <para>However, HornetQ also supports Application-Layer failover, useful
in the case that
+ <para>HornetQ also supports Application-Layer failover, useful in the
case that
replication is not enabled on the server side.</para>
<para>With Application-Layer failover, it's up to the application
to register a JMS
<literal>ExceptionListener</literal> with HornetQ which
will be called by
@@ -50,18 +47,6 @@
uncommitted work in the old session will be lost, and any unacknowledged
messages
might be redelivered.</para>
</section>
- <section id="automatic-failover">
- <title>Automatic (Transparent) Failover</title>
- <para>The <literal>automatic-failover</literal> example
demonstrates two servers coupled
- as a live-backup pair for high availability (HA), and a client
connection
- transparently failing over from live to backup when the live server is
- crashed.</para>
- <para>HornetQ implements seamless, transparent failover of client
connections between
- live and backup servers. This is implemented by the replication of state
between
- live and backup nodes. When replication is configured and a live node
crashes, the
- client connections can carry on as if nothing happened and carry on
sending and
- consuming messages.</para>
- </section>
<section>
<title>Automatic Reconnect</title>
<para>The <literal>reconnect-same-node</literal> example
demonstrates how HornetQ
@@ -334,6 +319,19 @@
<para>In some case buffering is not desirable, and HornetQ allows it to
be switched off.
This example demonstrates that.</para>
</section>
+ <section id="examples.non-transaction-failover">
+ <title>Non-Transaction Failover With Server Data
Replication</title>
+ <para>The <literal>non-transaction-failover</literal>
example demonstrates two servers coupled
+ as a live-backup pair for high availability (HA), and a client using a
<emphasis>non-transacted
+ </emphasis> JMS session failing over from live to backup when the
live server is
+ crashed.</para>
+ <para>HornetQ implements failover of client connections between
+ live and backup servers. This is implemented by the replication of state
between
+ live and backup nodes. When replication is configured and a live node
crashes, the
+ client connections can carry and continue to send and consume messages.
When non-transacted
+ sessions are used, once and only once message delivery is not guaranteed
and it is possible
+ that some messages will be lost or delivered twice.</para>
+ </section>
<section id="examples.paging">
<title>Paging</title>
<para>The <literal>paging</literal> example shows how
HornetQ can support huge queues
@@ -438,6 +436,18 @@
<para>The <literal>topic-selector-example1</literal>
example shows you how to
selectively consume messages using message selectors with topic
consumers.</para>
</section>
+ <section id="examples.transaction-failover">
+ <title>Transaction Failover With Data Replication</title>
+ <para>The <literal>transaction-failover</literal> example
demonstrates two servers coupled
+ as a live-backup pair for high availability (HA), and a client using a
transacted JMS
+ session failing over from live to backup when the live server is
+ crashed.</para>
+ <para>HornetQ implements failover of client connections between
+ live and backup servers. This is implemented by the replication of data
between
+ live and backup nodes. When replication is configured and a live node
crashes, the
+ client connections can carry and continue to send and consume messages.
When transacted
+ sessions are used, once and only once message delivery is
guaranteed.</para>
+ </section>
<section>
<title>Transactional Session</title>
<para>The <literal>transactional</literal> example shows
you how to use a transactional
Modified: trunk/docs/user-manual/en/ha.xml
===================================================================
--- trunk/docs/user-manual/en/ha.xml 2009-10-28 01:01:50 UTC (rev 8154)
+++ trunk/docs/user-manual/en/ha.xml 2009-10-28 13:25:50 UTC (rev 8155)
@@ -24,110 +24,127 @@
client connections to migrate from one server to another in event of server
failure so
client applications can continue to operate</emphasis>.</para>
<para>HornetQ provides high availability by replicating servers in pairs. It
also provides both
- automatic client failover and application-level client failover.</para>
+ client failover and application-level client failover.</para>
<section>
- <title>Server replication</title>
+ <title>Live - Backup Pairs</title>
<para>HornetQ allows pairs of servers to be linked together as
<emphasis>live -
backup</emphasis> pairs. In this release there is a single backup
server for each
live server. Backup servers are not operational until failover occurs. In
later releases
we will most likely support replication onto multiple backup
servers.</para>
- <para>When a <emphasis>live - backup</emphasis> pair is
configured, HornetQ ensures that the
- live server state is replicated to the backup server. Replicated state
includes session
- state, and also global state such as the set of queues and addresses on the
server. </para>
- <para>When a client fails over from live to backup server, the backup
server will already
- have the correct global and session state, so the client will be able to
resume its
- session(s) on the backup server as if nothing happened.</para>
- <para>Replication is performed in an asynchronous fashion between live and
backup server.
- Data is replicated one way in a stream, and responses that the data has
reached the
- backup is returned in another stream. By pipelining replications and
responses to
- replications in separate streams allows replication throughput to be much
higher than if
- we synchronously replicated data and waited for a response serially in an RPC
manner
- before replicating the next piece of data.</para>
- <section id="configuring.live.backup">
- <title>Configuring live-backup pairs</title>
- <para>First, on the live server, in
<literal>hornetq-configuration.xml</literal>
- configure the live server with knowledge of its backup server. This is
done by
- specifying a <literal>backup-connector-ref</literal> element.
This element
- references a connector, also specified on the live server which contains
knowledge
- of how to connect to the backup server. Here's a snippet from
<literal
- >hornetq-configuration.xml</literal> showing a live server
configured with a
- backup server:</para>
- <programlisting><backup-connector-ref
connector-name="backup-connector"/>
-
-<!-- Connectors -->
+ <para>Before failover, only the live server is serving the HornetQ clients
while the backup server remains passive.
+ When clients fail over to the backup server, the backup server becomes active
and start to service the HornetQ clients.</para>
+
+ <section id="ha.mode">
+ <title>HA modes</title>
+ <para>HornetQ provides two different modes for High Availability, either
by <emphasis>replicating data</emphasis> from the live server journal
+ to the backup server or using a <emphasis>shared
state</emphasis> for both servers.</para>
+ <section id="ha.mode.replicated">
+ <title>Data Replication</title>
+ <para>In this mode, data stored in HornetQ journal are replicated from
the live servers's journal to the
+ backuper server's journal.</para>
+ <para>Replication is performed in an asynchronous fashion between live
and backup server.
+ Data is replicated one way in a stream, and responses that the data has
reached the
+ backup is returned in another stream. Pipelining replications and
responses to
+ replications in separate streams allows replication throughput to be
much higher than if
+ we synchronously replicated data and waited for a response serially in
an RPC manner
+ before replicating the next piece of data.</para>
+ <graphic fileref="images/ha-replicated-store.png"
align="center"/>
+ <section id="configuring.live.backup">
+ <title>Configuration</title>
+ <para>First, on the live server, in
<literal>hornetq-configuration.xml</literal>,
+ configure the live server with knowledge of its backup server. This is
done by
+ specifying a <literal>backup-connector-ref</literal>
element. This element
+ references a connector, also specified on the live server which
contains knowledge
+ of how to connect to the backup server.</para>
+ <para>Here's a snippet from live server's <literal
+ >hornetq-configuration.xml</literal> configured to connect
to its backup server:</para>
+ <programlisting>
+ <backup-connector-ref connector-name="backup-connector"/>
-<connectors>
+ <connectors>
+ <!-- This connector specifies how to connect to the backup server -->
+ <!-- backup server is located on host "192.168.0.11" and port
"5445" -->
+ <connector name="backup-connector">
+
<factory-class>org.hornetq.integration.transports.netty.NettyConnectorFactory</factory-class>
+ <param key="hornetq.remoting.netty.host"
value="192.168.0.11" type="String"/>
+ <param key="hornetq.remoting.netty.port" value="5445"
type="Integer"/>
+ </connector>
+ </connectors></programlisting>
+ <para>Secondly, on the backup server, we flag the server as a backup
and make sure it has an acceptor that the live server can connect to:</para>
+ <programlisting>
+ <backup>true</backup>
- ...
-
- <!-- This connector specifies how to connect to the backup server -->
- <connector name="backup-connector">
- <factory-class>
- org.hornetq.integration.transports.netty.NettyConnectorFactory
- </factory-class>
- <param key="hornetq.remoting.netty.port" value="5445"
type="Integer"/>
- </connector>
-
-</connectors></programlisting>
- <para>Secondly, on the backup server, also in <literal
- >hornetq-configuration.xml</literal> , the element
<literal>backup</literal>
- must be set to true. I.e. :</para>
- <programlisting><backup>true</backup>
-</programlisting>
+ <acceptors>
+ <acceptor name="acceptor">
+
<factory-class>org.hornetq.integration.transports.netty.NettyAcceptorFactory</factory-class>
+ <param key="hornetq.remoting.netty.host"
value="192.168.0.11" type="String"/>
+ <param key="hornetq.remoting.netty.port" value="5445"
type="Integer"/>
+ </acceptor>
+ </acceptors>
+ </programlisting>
+ <para>For a backup server to function correctly it's also
important that it has the same
+ set of bridges, predefined queues, cluster connections, broadcast
groups and
+ discovery groups as defined on the live node. The easiest way to ensure
this is just
+ to copy the entire server side configuration from live to backup and
just make the
+ changes as specified above. </para>
+ </section>
+ <section>
+ <title>Synchronization of live-backup pairs</title>
+ <para>In order for live - backup pairs to operate properly, they
must be identical
+ replicas. This means you cannot just use any backup server
that's previously been
+ used for other purposes as a backup server, since it will have
different data in its
+ persistent storage. If you try to do so you will receive an
exception in the logs
+ and the server will fail to start.</para>
+ <para>To create a backup server for a live server that's
already been used for other
+ purposes, it's necessary to copy the
<literal>data</literal> directory from the live
+ server to the backup server. This means the backup server will have
an identical
+ persistent store to the backup server.</para>
+ <para>After failover, when the live server is restarted, the
backup server will copy its
+ journal back to the live server. When the live server has the updated
journal, it will
+ become active again and the backup server will become
passive.</para>
+ </section>
+ </section>
+ <section id="ha.mode.shared">
+ <title>Shared Store</title>
+ <para>When using a shared store, both live and backup servers share
the <emphasis>same</emphasis> journal
+ using a shared file system. When failover occurs and backup server takes
over, it will load the journal and
+ clients can connect to it.</para>
+ <graphic fileref="images/ha-shared-store.png"
align="center"/>
+ <section id="ha/mode.shared.configuration">
+ <title>Configuration</title>
+ <para>To configure the live and backup server to share their store,
configure both <literal>hornetq-configuration.xml</literal>:</para>
+ <programlisting>
+ <shared-store>true<shared-store>
+ </programlisting>
+ <para>In order for live - backup pairs to operate properly with a
shared store, both servers
+ must have configured the location of journal directory to point
+ to the <emphasis>same shared location</emphasis> (as
explained in <xref linkend="configuring.message.journal" />)</para>
+ </section>
+ <section>
+ <title>Synchronization of live-backup pairs</title>
+ <para>As both live and backup servers share the same journal, they
do not need to be synchronized.
+ However until, both live and backup servers are up and running,
high-availability can not be provided with a single server.
+ After failover, at first opportunity, stop the backup server (which
is active) and restart the live and backup servers.</para>
+ </section>
+ </section>
</section>
- <section>
- <title>Synchronization of live-backup pairs</title>
- <para>In order for live - backup pairs to operate properly, they must
be identical
- replicas. This means you cannot just use any backup server that's
previously been
- used for other purposes as a backup server, since it will have different
data in its
- persistent storage. If you try to do so you will receive an exception in
the logs
- and the server will fail to start.</para>
- <para>To create a backup server for a live server that's already
been used for other
- purposes, it's necessary to copy the
<literal>data</literal> directory from the live
- server to the backup server. This means the backup server will have an
identical
- persistent store to the backup server.</para>
- <para>Similarly when a client fails over from a live server
<literal>L</literal> to a
- backup server <literal>B</literal>, the server
<literal>L</literal> becomes invalid
- since, from that point on, the data on <literal>L</literal>
and <literal>B</literal>
- may diverge. After such a failure, at the next available opportunity the
<literal
- >B</literal> server should be taken down, and its
<literal>data</literal>
- directory copied back to the <literal>L</literal> server.
Live and backup servers
- can then be restarted. In this release of HornetQ we do not provide any
automatic
- facility for re-assigning a backup node with a live node while it is
running.</para>
- <para>For a backup server to function correctly it's also important
that it has the same
- set of bridges, predefined queues, cluster connections, broadcast groups
and
- discovery groups as defined on the live node. The easiest way to ensure
this is just
- to copy the entire server side configuration from live to backup and just
make the
- changes as specified in the previous section. </para>
- </section>
- <section id="queue.activation.timeout">
- <title>Queue activation timeout</title>
- <para>If a live server fails, as client connections failover from the
live node to the
- backup, they do so at a rate determined by the client, and it might be
the case that
- some client connections never fail over.</para>
- <para>Different client connections may have different consumers on the
same queue(s).
- The queue on the backup will wait for all its consumers to reattach
before
- activating delivery on itself. If all connections have not reattached
with this
- timeout then the queue will activate regardless.</para>
- <para>This param is defined in
<literal>hornetq-configuration.xml</literal> using the
- setting <literal>queue-activation-timeout</literal>. Its
default value is <literal
- >30000</literal> milliseconds.</para>
- </section>
</section>
+
+ <section id="failover">
+ <title>Failover Modes</title>
+ <para>HornetQ defines 3 types of failover:</para>
+ <itemizedlist>
+ <listitem><para>100% transparent re-attach to a single server as
explained in <xref linkend="client-reconnection"
/></para></listitem>
+ <listitem><para>automatic failover</para></listitem>
+ <listitem><para>application-level
failover</para></listitem>
+ </itemizedlist>
+
<section id="ha.client.automatic">
- <title>Automatic client failover</title>
+ <title>Automatic Client Failover</title>
<para>HornetQ clients can be configured with knowledge of live and backup
servers, so that
in event of connection failure of the client - live server connection, the
client will
- detect this and reconnect its sessions to the backup server. Because of
server
- replication, then backup server will already have those sessions in the same
state they
- were left on the live server and the client will be able to reconnect them
and resume
- them 100% transparently as if nothing happened.</para>
- <para>For automatic failover HornetQ requires
<emphasis>zero</emphasis> coding of special
- failover code on the client or server. This differs from other messaging
systems which
- intrusively require you to code special failover handling code. HornetQ
automatic
- failover preserves all your normal JMS or core API semantics and allows your
client code
- to continue 100% uninterrupted on event of connection failure and failover
from a live
- to a backup server.</para>
+ detect this and reconnect to the backup server. The backup server will have
recreated the sessions
+ and consumers but it will not preserve the session state from the live
server.</para>
<para>HornetQ clients detect connection failure when it has not received
packets from the
server within the time given by
<literal>client-failure-check-period</literal> as
explained in section <xref linkend="connection-ttl"/>. If the
client does not receive
@@ -137,6 +154,8 @@
way of doing this is to use <emphasis>server discovery</emphasis>
for the client to
automatically discover the list. For full details on how to configure clients
please see
<xref
linkend="clusters.server-discovery"/>.</para>
+ <para>To enable automatic client failover, the client must be configured to
allow non-zero reconnection attempts
+ (as explained in <xref linkend="client-reconnection"
/>).</para>
<para>Sometimes you want a client to failover onto a backup server even if
the live server
is just cleanly shutdown rather than having crashed or the connection failed.
To
configure this you can set the property
<literal>FailoverOnServerShutdown</literal> to
@@ -147,19 +166,18 @@
this property is <literal>false</literal>, this means that by
default <emphasis>HornetQ
clients will not failover to a backup server if the live server is simply
shutdown
cleanly.</emphasis></para>
- <para>For a fully functional example of automatic failover, please see
<xref
- linkend="automatic-failover"/>.</para>
- </section>
+ <para>For examples of automatic failover with transacted and non-transacted
JMS sessions, please see <xref
+ linkend="examples.transaction-failover"/> and <xref
linkend="examples.non-transaction-failover" />.</para>
</section>
<section>
- <title>Application-level client failover</title>
+ <title>Application-Level Failover</title>
<para>In some cases you may not want automatic client failover, and prefer
to handle any
connection failure yourself, and code your own manually reconnection logic in
your own
failure handler. We define this as
<emphasis>application-level</emphasis> failover,
since the failover is handled at the user application level.</para>
- <para>If all your clients use application-level failover then you do not
need server
+ <para>If all your clients use application-level failover then you do not
need data
replication on the server side, and should disabled this. Server replication
has some
performance overhead and should be disabled if it is not required. To disable
server
- replication simply do not specify a
<literal>backup-connector</literal> element for each
+ replication simply do not specify a
<literal>backup-connector</literal> element on each
live server.</para>
<para>To implement application-level failover, if you're using JMS then
you need to code an
<literal>ExceptionListener</literal> class on the JMS
connection. The <literal
@@ -175,4 +193,5 @@
<literal>FailureListener</literal> on your core
<literal>ClientSession</literal>
instances.</para>
</section>
+ </section>
</chapter>
Added: trunk/docs/user-manual/en/images/ha-replicated-store.png
===================================================================
(Binary files differ)
Property changes on: trunk/docs/user-manual/en/images/ha-replicated-store.png
___________________________________________________________________
Name: svn:mime-type
+ application/octet-stream
Added: trunk/docs/user-manual/en/images/ha-shared-store.png
===================================================================
(Binary files differ)
Property changes on: trunk/docs/user-manual/en/images/ha-shared-store.png
___________________________________________________________________
Name: svn:mime-type
+ application/octet-stream