]
Bela Ban commented on JGRP-1672:
--------------------------------
Any design of shared memory for exchanging must be faster than using an multicast (UDP) or
N-1 unicasts (TCP). If this is not the case, we can scrap this issue.
Here's a possible design:
* Each sender (producer) sends a message that's to be received by all receivers
(consumers) in the same cluster on the same box
* MPMC (multiple producers multiple consumers)
* Use a memory mapped file (or a direct ByteBuffer). I don't want to use Unsafe at
this point, I'd rather wait and see what the replacement to it will be.
* A ring buffer is implemented in shared memory.
* There's a write pointer (WP) and a read pointer (RP)
* Producers advance the WP
* Consumers (essentially processes in the same cluster on the same box) register by adding
their own TRP (thread read pointer) to shared memory (location TBD)
* A consumer reads messages by advancing their TRP until they hit the WP
* We know what the highest value for the RP will be by taking the min of all TRPs
* The RP is advanced as follows:
** The first writer to hit the RP advances the RP by setting it to the min of all TRPs
h4. Exclusive access and visibility
Producers need to make sure only one at a time advances the WP/RP and read/write from/to
those values needs proper visibility, so other producers / consumers see the changes.
Investigate if volatile read/writes can be used.
Investigate if \[1\] can be used as an MPSC queue.
\[1\]
Shared memory to send message between different processes on the same
box
-------------------------------------------------------------------------
Key: JGRP-1672
URL:
https://issues.jboss.org/browse/JGRP-1672
Project: JGroups
Issue Type: Feature Request
Reporter: Bela Ban
Assignee: Bela Ban
Fix For: 4.1
Attachments: ShmTest.java
Investigate whether it makes sense to use shared memory to pass messages between
processes on the same box. Say if we have A, B and C on box-1 and X, Y, Z on box-2, when A
multicasts a message, it could loop it back to itself, place it into shared memory for B
and C to read and multicast it to X, Y, Z. The multicast socket could be non-loopback, so
box-1 would not receive it.
Problems:
* Shared memory in Java can only be done via memory mapped (sequential or random access)
files. To pass a lot of messages, something like a ring buffer would have to be created in
shared memory
* Unless we use FileLock, or polling/busy reading, there is no way to know when a
producer has written a message into shared memory. We'd therefore have to use a
signalling mechanism, probably a small JGroups message, to notify the consumer(s) of new
messages.
** Alternatively, we could do busy waiting: the producer writes into a memory location
when a message is ready to be consumed. Perhaps this memory location can be the number of
messages ready to be read. The consumer could busy-wait, and decrement the number of
messages read. This variable could be protected by a file lock, so after some amount of
busy-waiting, the consumer could go back and do a real wait on the file lock, instead of
burning CPU doing busy-waits.
* For multicast messages, we'd have 1 producer but many consumers. A RingBuffer would
not work here, as we don't know when all consumers have read a given message, ie. when
to advance the read pointer
** As an alternative, we could have one shared memory buffer per member on the same host.
This would also cater to unicast messages. However, then we'd use up a lot of memory.
* How would this work for TCP ? We'd have to send the message to only members which
are outside the local box. How do we identify those members ?
* Message reception: a multicast message received and targetted to all members on the
same box could also be placed into shared memory, so everyone on the same box receives it
** How would this work for TCP ? E.g. A sending a multicast message M would use shared
memory to deliver M to B and C on box-1, but if it sends it to X, Y and Z, then that's
unneeded work, as it could send it only to X, which could place it into shared memory for
Y and Z to consume M.
*** We'd have to include the knowledge of 'affinity' into an address