[infinispan-dev] Musings on ISPN/JGRPs OSI transport choices and ambitions

cotton-ben ben.cotton at ALUMNI.RUTGERS.EDU
Sat Mar 1 05:22:04 EST 2014


Ø  In a nutshell, you want to use a zero-copy transport between processes 
that run on the *same physical box* (e.g. shipping only pointers to 
native shared memory between processes).

 

This is exactly what we want.  

 

Without this capability our beloved RedHat ISPN/JGRPS stack pays no homage
to our architected locality premiums.  Remember when EJB made the fatal
mistake of forcing remoteness on its API’s end-user? Just plain  LAZY these
EJB designers!  How dare they?  They ended up with the grossly inelegant
hack of providing a  localremote()  interface.  OBSCENE.  When you hack up
something to the point of being forced to cattle-prod the API with something
called “localremote()”  (a literal oxymoron in your API. Are you kidding
me?) you just have to laugh and say “damn that is BROKE!”.

 

It prejudiced Java in the eyes of the HPC community.  For years.

 

Now Bela, we totally  respect that you won’t ever put C++/JNI into JGRPs –
even to accommodate our architected supercomputing locality premiums.  We
get that.  But, now Bela!  But now!  We have a 100% Java solution to take us
where we want to go – an IPC transport over /dev/shm – without a single line
of C++ nor JNI.  It is beautiful.  It solves a problem of ours
<https://groups.google.com/forum/#!topic/mechanical-sympathy/rBG7hcamt1k> . 

 

By doing this soon in JGRPs, Bela, you can realize two beautiful outcomes:

 

1.       JGRPs remains magnifique:  100% Pure Java, not a single JNI bridge
to C++ to native kernel system calls (we agree that is unattractive)

2.      You liberate the JGRPs end-user from having remoteness (which OSI
loopback is) forced down their throats 

 

Ø  I'm interested in adding such a transport in JGroups 4, 

 

This thrills us!

 

Ø  in which I plan to revamp the transport to adopt an NIO based scheme

 

I may be mistaken, but I think the OpenHFT solution for using SHM as an IPC
transport has big advantages over using the NIO bridges to Off-Heap
capabilities.  Doesn’t NIO’s DirectByteBuffer have a capacity delinquency
that will immediately betray big data views?  I’m not sure about this, I’ll
get back to you (publicly, here).

 

Ø  a group-wide message (a multicast) would be sent via SHR_MEM *and* UDP. 

 

Perfect.

 

 

Ø  why don't you post an edited version of my private replies to you to 
this topic as well, so we have some background ? 




very good idea.  Will do.  

 

The bottom line Is this: let’s fix this.  Let’s work together with OpenHFT
to find the best “locality is a premium!” solution (I don’t think it is NIO,
but I might be wrong).  After we fix this, let’s demand the JGRID world fix
it.  We can go to 347 (BTW, OpenHFT’s Peter Lawrey is now being seated on
the 347 EG) and specify that providing a transport that accommodates
locality is required to be JGRID standard.  Let’s make it better.  :-) 

 

Thank you Bela (and RedHat).

 

 

 

 

From: Bela Ban [via Infinispan Developer List]
[mailto:ml-node+s980875n4028928h85 at n3.nabble.com] 
Sent: Saturday, March 1, 2014 4:30 AM
To: cotton-ben
Subject: Re: [infinispan-dev] Musings on ISPN/JGRPs OSI transport choices
and ambitions

 

Hi Ben, 

why don't you post an edited version of my private replies to you to 
this topic as well, so we have some background ? 

In a nutshell, you want to use a zero-copy transport between processes 
that run on the *same physical box* (e.g. shipping only pointers to 
native shared memory between processes). Currently, using TCP or UDP 
between processes on the same box still requires 1 or 2 copies, even 
when a loopback device is used. 

I'm interested in adding such a transport in JGroups 4, in which I plan 
to revamp the transport to adopt an NIO based scheme, accommodating both 
TDP and TCP. This is all still in the planning phase, but one feature 
will be to have multiple transports running in the same stack and 
sending messages alternatively via different transports. E.g. multicasts 
would use UDP whereas unicasts would use TCP (by default), but this 
could be overridden per message (with flags). 

If we then had 5 physical boxes, with 20 processes on each box, for a 
total of 100 nodes, then we could configure the stacks to run both 
SHR_MEM and UDP: a group-wide message (a multicast) would be sent via 
SHR_MEM *and* UDP. 

The SHR_MEM transport would disseminate the message to all 20 processes 
on the same physical box, using shared memory. The UDP transport would 
be configured as non-loopback (IP_MULTICAST_LOOP=false), which means 
that the message would be multicast to the other 3 physical boxes, but 
the local multicast would be dropped. The other boxes would then use 
SHR_MEM to disseminate the message locally to all 20 processes. 

Just an idea atm, this could also be done via RELAY2, but the QoS would 
not be the same. 

I'm planning on releasing 3.5 in 6-8 weeks from now. This includes a 
community baking phase during which I'll be working on a deep-dive 
course on JGroups. 

So a *very tentative* schedule is to start on 4.0 at the beginning of 
summer. 


On 28/02/14 19:16, cotton-ben wrote: 


> Hi Mircea, Manik, Bela, et. al. 
> 
> I want to more publicly muse on this SUBJ line.  Here now, then maybe in 
> ISPN /user/ forum, then maybe JSR-347 provider wide.  I know we had a 
> semi-private (Bela led) exchange, but I want to be more public with this 
> conversation. 
> 
> Long post again.  sorry. 
> 
> This is just on open musing.  I realize this musing should not expect to
be 
> accommodated by any "oh, we got to do this in ISPN/JGRPs now!" repsonse
... 
> there is absolutely only the most infrequent use-case that would /today/
be 
> served by addressing this musing ... but tomorrow that /will/ be a
different 
> story. 
> 
> Questions:: 
> 
> Does the concept of ISPN/JGRPs  transport between "Cluster" nodes
currently 
> depend on OSI transport layer sockets' participation(s)? 
> 
> In other words, if all the nodes on my "Cluster" have locality=127.0.0.1
is 
> ISPN/JGRPs  accommodating  enough to use a native OS IPC choice as an 
> intra-node transport? 
> 
> Or, is it true that my transport choices are always limited to just 
> {TCP,UDP} --  independent of the participating nodes' locality (and that I

> am thus forced to go over an OSI loopback)? 
> 
> If my transport choices are only limited to {TCP,UDP} for all node
locality, 
> then I might ask that you consider additional upcoming modern Java
transport 
> options. 
> 
>   With the ambitions of upcoming OpenJDK JEPs,  that will make mainstream
an 
> API capabilty that today is only available via sun.misc.Unsafe, Java will 
> soon have "more complete" transport options that will include all of 
> 
>   { TCP, UDP,  RDMA/SDP,   IPC } 
> 
> Some examples of upcoming accommodating providers= 
> 
> 1.  RDMA/SDP: via  Infiniband VERBS (works today in JDK 7 on OSI physical 
> layer IB NICs, does not work over Ethernet) 
> 2.  IPC via OpenHFT' SHM as IPC solution (will work this year) 
> 
> Again, I realize that these transport choices are useful today only  in a 
> very rare use case.  However, should these transports be in your offering
to 
> ISPN/JGRPs customers, then ISPN/JGRPs becomes   -- like all of Java has 
> become in recent years --  increasingly more attractive to /all/ HPC Linux

> supercomputing use cases (not just ours). 
> 
> 
> 
> 
> 
> 
> 
> -- 
> View this message in context:
http://infinispan-developer-list.980875.n3.nabble.com/Musings-on-ISPN-JGRPs-
OSI-transport-choices-and-ambitions-tp4028925.html
> Sent from the Infinispan Developer List mailing list archive at
Nabble.com. 
> _______________________________________________ 
> infinispan-dev mailing list 
> [hidden email] 
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> 


-- 
Bela Ban, JGroups lead (http://www.jgroups.org) 
_______________________________________________ 
infinispan-dev mailing list 
[hidden email] 
https://lists.jboss.org/mailman/listinfo/infinispan-dev



  _____  

If you reply to this email, your message will be added to the discussion
below:

http://infinispan-developer-list.980875.n3.nabble.com/Musings-on-ISPN-JGRPs-
OSI-transport-choices-and-ambitions-tp4028925p4028928.html 

To start a new topic under Infinispan Developer List, email
ml-node+s980875n2085493h0 at n3.nabble.com 
To unsubscribe from Infinispan Developer List, click here
<http://infinispan-developer-list.980875.n3.nabble.com/template/NamlServlet.
jtp?macro=unsubscribe_by_code&node=2085493&code=YmVuLmNvdHRvbkBBTFVNTkkuUlVU
R0VSUy5FRFV8MjA4NTQ5M3wtMTU2ODA0NTc1OA==> .
 
<http://infinispan-developer-list.980875.n3.nabble.com/template/NamlServlet.
jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.nam
l.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.
view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aem
ail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3A
email.naml> NAML 





--
View this message in context: http://infinispan-developer-list.980875.n3.nabble.com/Musings-on-ISPN-JGRPs-OSI-transport-choices-and-ambitions-tp4028925p4028929.html
Sent from the Infinispan Developer List mailing list archive at Nabble.com.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140301/2a5582d9/attachment-0001.html 


More information about the infinispan-dev mailing list