[infinispan-dev] Musings on ISPN/JGRPs OSI transport choices and ambitions
cotton-ben
ben.cotton at ALUMNI.RUTGERS.EDU
Sat Mar 1 05:22:04 EST 2014
Ø In a nutshell, you want to use a zero-copy transport between processes
that run on the *same physical box* (e.g. shipping only pointers to
native shared memory between processes).
This is exactly what we want.
Without this capability our beloved RedHat ISPN/JGRPS stack pays no homage
to our architected locality premiums. Remember when EJB made the fatal
mistake of forcing remoteness on its APIs end-user? Just plain LAZY these
EJB designers! How dare they? They ended up with the grossly inelegant
hack of providing a localremote() interface. OBSCENE. When you hack up
something to the point of being forced to cattle-prod the API with something
called localremote() (a literal oxymoron in your API. Are you kidding
me?) you just have to laugh and say damn that is BROKE!.
It prejudiced Java in the eyes of the HPC community. For years.
Now Bela, we totally respect that you wont ever put C++/JNI into JGRPs
even to accommodate our architected supercomputing locality premiums. We
get that. But, now Bela! But now! We have a 100% Java solution to take us
where we want to go an IPC transport over /dev/shm without a single line
of C++ nor JNI. It is beautiful. It solves a problem of ours
<https://groups.google.com/forum/#!topic/mechanical-sympathy/rBG7hcamt1k> .
By doing this soon in JGRPs, Bela, you can realize two beautiful outcomes:
1. JGRPs remains magnifique: 100% Pure Java, not a single JNI bridge
to C++ to native kernel system calls (we agree that is unattractive)
2. You liberate the JGRPs end-user from having remoteness (which OSI
loopback is) forced down their throats
Ø I'm interested in adding such a transport in JGroups 4,
This thrills us!
Ø in which I plan to revamp the transport to adopt an NIO based scheme
I may be mistaken, but I think the OpenHFT solution for using SHM as an IPC
transport has big advantages over using the NIO bridges to Off-Heap
capabilities. Doesnt NIOs DirectByteBuffer have a capacity delinquency
that will immediately betray big data views? Im not sure about this, Ill
get back to you (publicly, here).
Ø a group-wide message (a multicast) would be sent via SHR_MEM *and* UDP.
Perfect.
Ø why don't you post an edited version of my private replies to you to
this topic as well, so we have some background ?
very good idea. Will do.
The bottom line Is this: lets fix this. Lets work together with OpenHFT
to find the best locality is a premium! solution (I dont think it is NIO,
but I might be wrong). After we fix this, lets demand the JGRID world fix
it. We can go to 347 (BTW, OpenHFTs Peter Lawrey is now being seated on
the 347 EG) and specify that providing a transport that accommodates
locality is required to be JGRID standard. Lets make it better. :-)
Thank you Bela (and RedHat).
From: Bela Ban [via Infinispan Developer List]
[mailto:ml-node+s980875n4028928h85 at n3.nabble.com]
Sent: Saturday, March 1, 2014 4:30 AM
To: cotton-ben
Subject: Re: [infinispan-dev] Musings on ISPN/JGRPs OSI transport choices
and ambitions
Hi Ben,
why don't you post an edited version of my private replies to you to
this topic as well, so we have some background ?
In a nutshell, you want to use a zero-copy transport between processes
that run on the *same physical box* (e.g. shipping only pointers to
native shared memory between processes). Currently, using TCP or UDP
between processes on the same box still requires 1 or 2 copies, even
when a loopback device is used.
I'm interested in adding such a transport in JGroups 4, in which I plan
to revamp the transport to adopt an NIO based scheme, accommodating both
TDP and TCP. This is all still in the planning phase, but one feature
will be to have multiple transports running in the same stack and
sending messages alternatively via different transports. E.g. multicasts
would use UDP whereas unicasts would use TCP (by default), but this
could be overridden per message (with flags).
If we then had 5 physical boxes, with 20 processes on each box, for a
total of 100 nodes, then we could configure the stacks to run both
SHR_MEM and UDP: a group-wide message (a multicast) would be sent via
SHR_MEM *and* UDP.
The SHR_MEM transport would disseminate the message to all 20 processes
on the same physical box, using shared memory. The UDP transport would
be configured as non-loopback (IP_MULTICAST_LOOP=false), which means
that the message would be multicast to the other 3 physical boxes, but
the local multicast would be dropped. The other boxes would then use
SHR_MEM to disseminate the message locally to all 20 processes.
Just an idea atm, this could also be done via RELAY2, but the QoS would
not be the same.
I'm planning on releasing 3.5 in 6-8 weeks from now. This includes a
community baking phase during which I'll be working on a deep-dive
course on JGroups.
So a *very tentative* schedule is to start on 4.0 at the beginning of
summer.
On 28/02/14 19:16, cotton-ben wrote:
> Hi Mircea, Manik, Bela, et. al.
>
> I want to more publicly muse on this SUBJ line. Here now, then maybe in
> ISPN /user/ forum, then maybe JSR-347 provider wide. I know we had a
> semi-private (Bela led) exchange, but I want to be more public with this
> conversation.
>
> Long post again. sorry.
>
> This is just on open musing. I realize this musing should not expect to
be
> accommodated by any "oh, we got to do this in ISPN/JGRPs now!" repsonse
...
> there is absolutely only the most infrequent use-case that would /today/
be
> served by addressing this musing ... but tomorrow that /will/ be a
different
> story.
>
> Questions::
>
> Does the concept of ISPN/JGRPs transport between "Cluster" nodes
currently
> depend on OSI transport layer sockets' participation(s)?
>
> In other words, if all the nodes on my "Cluster" have locality=127.0.0.1
is
> ISPN/JGRPs accommodating enough to use a native OS IPC choice as an
> intra-node transport?
>
> Or, is it true that my transport choices are always limited to just
> {TCP,UDP} -- independent of the participating nodes' locality (and that I
> am thus forced to go over an OSI loopback)?
>
> If my transport choices are only limited to {TCP,UDP} for all node
locality,
> then I might ask that you consider additional upcoming modern Java
transport
> options.
>
> With the ambitions of upcoming OpenJDK JEPs, that will make mainstream
an
> API capabilty that today is only available via sun.misc.Unsafe, Java will
> soon have "more complete" transport options that will include all of
>
> { TCP, UDP, RDMA/SDP, IPC }
>
> Some examples of upcoming accommodating providers=
>
> 1. RDMA/SDP: via Infiniband VERBS (works today in JDK 7 on OSI physical
> layer IB NICs, does not work over Ethernet)
> 2. IPC via OpenHFT' SHM as IPC solution (will work this year)
>
> Again, I realize that these transport choices are useful today only in a
> very rare use case. However, should these transports be in your offering
to
> ISPN/JGRPs customers, then ISPN/JGRPs becomes -- like all of Java has
> become in recent years -- increasingly more attractive to /all/ HPC Linux
> supercomputing use cases (not just ours).
>
>
>
>
>
>
>
> --
> View this message in context:
http://infinispan-developer-list.980875.n3.nabble.com/Musings-on-ISPN-JGRPs-
OSI-transport-choices-and-ambitions-tp4028925.html
> Sent from the Infinispan Developer List mailing list archive at
Nabble.com.
> _______________________________________________
> infinispan-dev mailing list
> [hidden email]
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
--
Bela Ban, JGroups lead (http://www.jgroups.org)
_______________________________________________
infinispan-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/infinispan-dev
_____
If you reply to this email, your message will be added to the discussion
below:
http://infinispan-developer-list.980875.n3.nabble.com/Musings-on-ISPN-JGRPs-
OSI-transport-choices-and-ambitions-tp4028925p4028928.html
To start a new topic under Infinispan Developer List, email
ml-node+s980875n2085493h0 at n3.nabble.com
To unsubscribe from Infinispan Developer List, click here
<http://infinispan-developer-list.980875.n3.nabble.com/template/NamlServlet.
jtp?macro=unsubscribe_by_code&node=2085493&code=YmVuLmNvdHRvbkBBTFVNTkkuUlVU
R0VSUy5FRFV8MjA4NTQ5M3wtMTU2ODA0NTc1OA==> .
<http://infinispan-developer-list.980875.n3.nabble.com/template/NamlServlet.
jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.nam
l.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.
view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aem
ail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3A
email.naml> NAML
--
View this message in context: http://infinispan-developer-list.980875.n3.nabble.com/Musings-on-ISPN-JGRPs-OSI-transport-choices-and-ambitions-tp4028925p4028929.html
Sent from the Infinispan Developer List mailing list archive at Nabble.com.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140301/2a5582d9/attachment-0001.html
More information about the infinispan-dev
mailing list