[Design of Messaging on JBoss (Messaging/JBoss)] - Re: split-brain between live and backup node
by jmesnil
Using FailBackupServerTest, I've written a test to reach a split-brain test but it is not working as expected.
It's very similar to FailBackupServerTest except that I failover the session to the backup node before failing all the replication channels of the live node.
The failover is successful but when I create a new session on live node, the code blocks when creating the session:
| ClientSessionFactoryInternal sf2 = new ClientSessionFactoryImpl(
| new TransportConfiguration(InVMConnectorFactory.class.getName()),
| new TransportConfiguration(InVMConnectorFactory.class.getName(),
| backupParams));
| sf2.setSendWindowSize(32 * 1024);
|
| ClientSession session2 = sf2.createSession(false, true, true);
|
Tim, from what you said today on the logs, you were expecting it to succeed, right?
I let the code as is but when debugging, I saw that the replicating connections on live node are destroyed when the client failover to the backup node.
When the backup is activated, it closes its connection to the live node which in turn destroy the replication connections at the other side of the wire on the live node.
| package org.jboss.messaging.tests.integration.cluster.failover;
|
| import java.util.HashMap;
| import java.util.Map;
| import java.util.Set;
| import java.util.concurrent.CountDownLatch;
| import java.util.concurrent.TimeUnit;
|
| import junit.framework.TestCase;
|
| import org.jboss.messaging.core.client.ClientConsumer;
| import org.jboss.messaging.core.client.ClientMessage;
| import org.jboss.messaging.core.client.ClientProducer;
| import org.jboss.messaging.core.client.ClientSession;
| import org.jboss.messaging.core.client.impl.ClientSessionFactoryImpl;
| import org.jboss.messaging.core.client.impl.ClientSessionFactoryInternal;
| import org.jboss.messaging.core.client.impl.ClientSessionImpl;
| import org.jboss.messaging.core.config.Configuration;
| import org.jboss.messaging.core.config.TransportConfiguration;
| import org.jboss.messaging.core.config.impl.ConfigurationImpl;
| import org.jboss.messaging.core.exception.MessagingException;
| import org.jboss.messaging.core.logging.Logger;
| import org.jboss.messaging.core.remoting.FailureListener;
| import org.jboss.messaging.core.remoting.RemotingConnection;
| import org.jboss.messaging.core.remoting.impl.invm.InVMAcceptorFactory;
| import org.jboss.messaging.core.remoting.impl.invm.InVMConnectorFactory;
| import org.jboss.messaging.core.remoting.impl.invm.InVMRegistry;
| import org.jboss.messaging.core.remoting.impl.invm.TransportConstants;
| import org.jboss.messaging.core.server.Messaging;
| import org.jboss.messaging.core.server.MessagingService;
| import org.jboss.messaging.integration.transports.netty.NettyAcceptorFactory;
| import org.jboss.messaging.integration.transports.netty.NettyConnectorFactory;
| import org.jboss.messaging.jms.client.JBossTextMessage;
| import org.jboss.messaging.util.SimpleString;
|
| /**
| *
| * A FailBackupServerTest
| *
| * Make sure live sever continues ok if backup server fails
| *
| * @author <a href="mailto:tim.fox@jboss.com">Tim Fox</a>
| *
| * Created 6 Nov 2008 11:27:17
| *
| *
| */
| public class FailLiveServerTest extends TestCase
| {
| private static final Logger log = Logger.getLogger(FailLiveServerTest.class);
|
| // Constants -----------------------------------------------------
|
| // Attributes ----------------------------------------------------
|
| private static final SimpleString ADDRESS = new SimpleString("FailLiveServerTest");
|
| private MessagingService liveService;
|
| private MessagingService backupService;
|
| private final Map<String, Object> backupParams = new HashMap<String, Object>();
|
| // Static --------------------------------------------------------
|
| // Constructors --------------------------------------------------
|
| // Public --------------------------------------------------------
|
| public void testFailBackup() throws Exception
| {
| ClientSessionFactoryInternal sf1 = new ClientSessionFactoryImpl(new TransportConfiguration(InVMConnectorFactory.class.getName()),
| new TransportConfiguration(InVMConnectorFactory.class.getName(),
| backupParams));
|
| sf1.setSendWindowSize(32 * 1024);
|
| ClientSession session1 = sf1.createSession(false, true, true);
|
| session1.createQueue(ADDRESS, ADDRESS, null, false, false);
|
| ClientProducer producer = session1.createProducer(ADDRESS);
|
| final int numMessages = 1000;
|
| for (int i = 0; i < numMessages; i++)
| {
| ClientMessage message = session1.createClientMessage(JBossTextMessage.TYPE,
| false,
| 0,
| System.currentTimeMillis(),
| (byte)1);
| message.putIntProperty(new SimpleString("count"), i);
| message.getBody().putString("aardvarks");
| message.getBody().flip();
| producer.send(message);
| }
|
| ClientConsumer consumer1 = session1.createConsumer(ADDRESS);
|
| session1.start();
|
| for (int i = 0; i < numMessages; i++)
| {
| ClientMessage message = consumer1.receive(1000);
|
| assertNotNull(message);
|
| assertEquals("aardvarks", message.getBody().getString());
|
| assertEquals(i, message.getProperty(new SimpleString("count")));
|
| if (i == 0)
| {
| ///// DIFF WITH FailBackupServerTest /////
| log.info("Failing client connection");
| assertEquals("invm:0", ((ClientSessionImpl)session1).getConnection().getRemoteAddress());
| ((ClientSessionImpl)session1).getConnection().fail(new MessagingException(MessagingException.NOT_CONNECTED, "simulated failure b/w client and live node"));
| assertEquals("must have failed over on backup node", "invm:1", ((ClientSessionImpl)session1).getConnection().getRemoteAddress());
| ///// END OF DIFF WITH FailBackupServerTest /////
|
| // Fail all the replicating connections - this simulates the backup server crashing
|
| Set<RemotingConnection> conns = liveService.getServer().getRemotingService().getConnections();
|
| for (RemotingConnection conn : conns)
| {
| log.info("Failing replicating connection");
| assertEquals("invm:1", conn.getReplicatingConnection().getRemoteAddress());
| System.out.println(conn.getReplicatingConnection().getID());
| conn.getReplicatingConnection().fail(new MessagingException(MessagingException.NOT_CONNECTED, "simulated failure b/w live and backup node"));
| }
| }
|
| message.acknowledge();
| }
|
| assertEquals("invm:1", ((ClientSessionImpl)session1).getConnection().getRemoteAddress());
|
| ClientMessage message = consumer1.receive(1000);
|
| assertNull(message);
|
| // Send some more
|
| for (int i = 0; i < numMessages; i++)
| {
| message = session1.createClientMessage(JBossTextMessage.TYPE, false, 0, System.currentTimeMillis(), (byte)1);
| message.putIntProperty(new SimpleString("count"), i);
| message.getBody().putString("aardvarks");
| message.getBody().flip();
| producer.send(message);
| }
|
| for (int i = 0; i < numMessages; i++)
| {
| message = consumer1.receive(1000);
|
| assertNotNull(message);
|
| assertEquals("aardvarks", message.getBody().getString());
|
| assertEquals(i, message.getProperty(new SimpleString("count")));
|
| message.acknowledge();
| }
|
| message = consumer1.receive(1000);
|
| assertNull(message);
|
| // create another client session factory
|
| ClientSessionFactoryInternal sf2 = new ClientSessionFactoryImpl(new TransportConfiguration(InVMConnectorFactory.class.getName()),
| new TransportConfiguration(InVMConnectorFactory.class.getName(),
| backupParams));
|
| sf2.setSendWindowSize(32 * 1024);
|
| ClientSession session2 = sf2.createSession(false, true, true);
|
| assertEquals("invm:0", ((ClientSessionImpl)session2).getConnection().getRemoteAddress());
|
| session1.close();
| session2.close();
|
| }
|
| // Package protected ---------------------------------------------
|
| // Protected -----------------------------------------------------
|
| @Override
| protected void setUp() throws Exception
| {
| Configuration backupConf = new ConfigurationImpl();
| backupConf.setSecurityEnabled(false);
| backupParams.put(TransportConstants.SERVER_ID_PROP_NAME, 1);
| backupConf.getAcceptorConfigurations()
| .add(new TransportConfiguration(InVMAcceptorFactory.class.getName(),
| backupParams));
| backupConf.setBackup(true);
| backupService = Messaging.newNullStorageMessagingService(backupConf);
| backupService.start();
|
| Configuration liveConf = new ConfigurationImpl();
| liveConf.setSecurityEnabled(false);
| liveConf.getAcceptorConfigurations()
| .add(new TransportConfiguration(InVMAcceptorFactory.class.getName()));
| Map<String, TransportConfiguration> connectors = new HashMap<String, TransportConfiguration>();
| TransportConfiguration backupTC = new TransportConfiguration(InVMConnectorFactory.class.getName(),
| backupParams, "backup-connector");
| connectors.put(backupTC.getName(), backupTC);
| liveConf.setConnectorConfigurations(connectors);
| liveConf.setBackupConnectorName(backupTC.getName());
| liveService = Messaging.newNullStorageMessagingService(liveConf);
| liveService.start();
| }
|
| @Override
| protected void tearDown() throws Exception
| {
| backupService.stop();
|
| liveService.stop();
|
| assertEquals(0, InVMRegistry.instance.size());
| }
|
| // Private -------------------------------------------------------
|
| // Inner classes -------------------------------------------------
| }
|
|
View the original post : http://www.jboss.org/index.html?module=bb&op=viewtopic&p=4208691#4208691
Reply to the post : http://www.jboss.org/index.html?module=bb&op=posting&mode=reply&p=4208691
15 years, 5 months
[Design of Messaging on JBoss (Messaging/JBoss)] - Journal Clean up / Compactor
by clebert.suconic@jboss.com
Let me first explain the issues we have on the journal:
Issue I - The biggest issue we have on the journal right now is the second criteria on Reclaiming.
The first criteria is a simple reference counting of adds and deletes.
But, as we mix Adds and Deletes on the same files, we can't delete a file if it has deletes towards another file. And if delay ACKing a message at any point, that will lead to a lot of files hanging on the disk.
To fix this first issue, I see two possible solutions:
Solution I) Have a Cleaner Thread removing the links between the files. Something that would phisically eliminate deleted records that will have the second level dependency, and that would allow several files to be reclaimed.
Solution II) Put Deletes and Adds on separate journal files. We would have JournalLogs only used to add, and JournalLogs only used to delete.
Soluion III) Read all valid records, place them on another journal-file and replace the files.
Solution I and II are very easy to implement.
Solution III will have a little extra complexity due to issues on locking. (deletes coming as we are still doing the compacting)
______________________________________________________________________________________________________
Issue II) Supposing you don't have an issue with linked deletes any more (Criteria 2 on reclaiming) , there is a also a need for compacting files.
Say you have an Address where you Produce and Consume Messages very fast, say at a rate of 1000/second, and another Address where you Produce a message without any consumers, say at a rate of 1 message per Second.
This won't be as bad as if you had the linked-deletes, but it may still be an issue depending on the usage.
Solution IV) To solve this we need to compact several files into a single file, as a slight variation of Solution III.
There is some complexity on keeping original file-ids on the file, as the commit holds a count of records per file (transaction-summary) to detect incomplete commits, besides the issue on locks also.
______________________________________________________________________________________________________
So far, independently of any of those solutions, I need to be able to read JournalFiles, outside of the context of Loading the files. For that I have refactored the method load, as a private method readJournalFile(JournalFile....), and I'm passing an interface as a parameter.
This way readJournal will know how to deal with the journal data-format, and the implementation passed by parameter will deal with the specifics of loading or compacting.
I don't want to duplicate the data-format of the journal file among two distinct methods.
(I'm actually considering moving that method to JournalFile, but I'm not sure yet)
View the original post : http://www.jboss.org/index.html?module=bb&op=viewtopic&p=4208683#4208683
Reply to the post : http://www.jboss.org/index.html?module=bb&op=posting&mode=reply&p=4208683
15 years, 5 months