Improving the performance of index writers
by Sanne Grinovero
Hi all,
we have been breaking down the problem of latency during Index
Writing into smaller manageable tasks, you can find the general
overview JIRA here :
- https://issues.jboss.org/browse/ISPN-4847
As you can see some minor improvements have been fixed already, and
while each of them provides only minor 10% to 30% improvements, some
provide more and combined the composite ratio is getting interesting.
While these minor issues (even combined) won't give us the many orders
of magnitude performance improvements that we'd like to see, they are
important as they are paving the road to the more significant
efficiency improvements.
I documented the main idea here, as it belongs into the Hibernate Search engine:
https://hibernate.atlassian.net/browse/HSEARCH-1699
I don't expect that to be implemented overnight, but Gustavo already
sent a PR for the ASYNC case, which is based on the same principle of
avoiding the commits but is simpler to implement:
https://hibernate.atlassian.net/browse/HSEARCH-1693
We expect this one to be a proof of concept for the performance that
we'll get from HSEARCH-1699, and also I think it's very useful on its
own: previously users of ASYNC indexing were forced into a "very
async" architecture which might have been a bit too hard to manage,
while now being able to set a maximum delay for the async operation I
also expect that to be an acceptable compromise for a much wider range
of use cases.
Essentially this will decouple the achievable throughput of indexed
caches from the RPC latency, although obviously this latency will
still be the limiting factor for some dimensions, especially the
response time for a single synchronous indexed write will still be
affected primarily by the ability of Infinispan to improve the number
of blocking RPCs needed for a single write.
Feedback very welcome!
Sanne
9 years, 6 months
my status
by Ion Savin
Hi all,
I'll be missing the meeting today so here is my status:
Last week:
* resolved ISPN-4784 and ISPN-4251
* spent a good amount of time studying JBoss Modules and how we package
infinispan as AS modules
This week still:
* ISPN-3836 TxCleanupService can cause TCCL leak
* HRCPP-173 The HotRod client should support a separate CH for each cache
--
Ion Savin
9 years, 6 months
Feedback and requests on clustered and remote listeners
by Emmanuel Bernard
Hi all,
I have had a good exchange on how someone would use clustered / remote listeners to do custom continuous query features.
I have a few questions and requests to make this fully and easily doable
## Value as bytes or as objects
Assuming a Hot Rod based usage and protobuf as the serialization layer. What are KeyValueFilter and Converter seeing?
I assume today the bytes are unmarshalled and the Java object is provided to these interfaces.
In a protobuf based storage, does that mean that the user must create the Java objects out of a protobuf compiler and deploy these classes in the classpath of each server node?
Alternatively, could we pass the raw protobuf data to the KeyValueFilter and Converter? They could read the relevant properties at no deserialization cost and with lss problems related to the classloader.
Thoughts?
## Synced listeners
In a transactional clustered listener marked as sync. Does the transaction commits and then waits for the relevant clustered listeners to proceed before returning the hand to the Tx client? Or is there something else going on?
## oldValue and newValue
I understand why the oldValue was not provided in the initial work. It requires to send more data across the network and at least double the number of values unmarshalled.
But for continuous queries, being able to compare the old and the new value is critical to reduce the number of events sent to the listener.
Imagine the following use case. A listener exposes the average age for a certain type of customer. You would implement it the following way.
1. Add a KeyValueFilter that
- upon creation, filter out the customers of the wrong type
- upon update, keep customers that
- *were* of the right time but no longer are
- were not of the right type but now now *are*
- remains of the right type and whose age has changed
- upon deletion, keep customers that *were* of the right type
2. Converter
In the converter, one could send the whole customer but it would be more efficient to only send the age of the customer as well as wether it is added to or removed from the matching customers
- upon creation, you send the customer age and mark it as addition
- upon deletion, you send the customer age and mark it as deletion
- upon update
- if the customer was of the right type but no longer is, send the age as well as a deletion flag
- if the customer was not of the right type but now is, send the age as well as an addition flag
- if the customer age has changed, send the difference with a modification flag
3. The listener then needs to keep the total sum of all ages as well as the total number of customers of the right type. Based on the sent events, it can adjust these two counters.
That requires us to be able to provide the old and new value to the KeyValueFilter and the Converter interface as well as the type of event (creation, update, deletion).
If you keep the existing interfaces and their data, the data send and the memory consumed becomes much much bigger. I leave it as an exercise but I think you need to:
- send *all* remove and update events regardless of the value (essentially no KeyValueFilter)
- in the listener, keep a list of *all* matching keys so that you know if a new event is about a data that was already matching your criteria or not and act accordingly.
BTW, you need the old and new value even if your listener returns actual matching results instead of an aggregation. More or less for the same reasons.
Continuous query is about the most important use case for remote and clustered listeners and I think we should address it properly and as efficiently as possible. Adding continuous query to Infinispan will then “simply” be a matter of agreeing on the query syntax and implement the predicates as smartly as possible.
With the use case I describe, I think the best approach is to merge the KVF and Converter into a single Listener like interface that is able to send or silence an event payload. But that’s guestimate.
Because oldValue / newValue implies an unmarshalling overhead we might want to make it an annotation based flag on the class that is executed on each node (somewhat similar to the settings hosted on @Listener).
## includeCurrentState and very narrow filtering
The existing approach is fine (send a create event notif for all existing keys and queue changes in the mean time) as long as the listener plans to consume most of these events.
But in case of a big data grid, with a lot of passivated entries, the cost would become non negligible.
An alternative approach is to first do a query matching the elements the listener is interested in and queue up the events until the query is fully processed. Can a listener access a cache and do a query? Should we offer such option in a more packaged way?
For a listener that is only interested in keys whose value city contains Springfield, Virginia, the gain would be massive.
## Remote listener and non Java HR clients
Does the API of non Java HR clients support the enlistements of listeners and attach registered keyValueFilter / Converter? Or is that planned? Just curious.
Emmanuel
9 years, 6 months
Multiple Spring modules
by Sebastian Łaskawiec
Hey!
Currently I'm working on Spring 3 and 4 support and because these
versions are not compatible (in terms of Cache API), we probably would
need to have 2 modules for Spring.
Now the question is - how to maintain them? Here are the options which
comes into my mind:
1. Create copy of Spring 3 module and put everything into newly created
Spring 4, then update versions and implement new methods in Cache
interface.
Pros:
- 1 OSGi bundle - transparent upgrade - just replace spring bundle
- Easy to maintain Spring 4 only fixes
Cons:
- Code duplication
2. Extract common part and create 2 modules which depend on it - very
hard because Cache interface is logically at the bottom of the
structure. Everything depends on it.
Pros:
- No code duplication
Cons:
- Increased code complexity
- 2 bundles needed - common + spring 3/4
3. Make Spring 4 module depend on Spring 3 and replace Cache
implementations, run Maven Shade plugin to put everything together
Pros:
- No code duplication
Cons:
- Hacking into code, no intuitive design
- Will probably work in this specific case, further maintenance
might be hard.
4. Implement 2 missing methods in Spring module without @override
annotation. This way it should work against Spring 3 and 4
Pros:
- Really small change and single jar will support both spring 3
and 4
Cons:
- Spring version ranges in pom (not sure if it fits into
Infinispan design and BOMs)
- Not intuitive
I like option #1 - much easier maintenance + we might start using Spring
4 features without breaking Spring 3 module. Option #4 is also not that
bad...
Which option would you prefer?
Best regards
Sebastian
9 years, 6 months
Naming of project modules
by Sanne Grinovero
All,
I occasionally have to hard-reset my whole workspace, delete the
Eclipse projects, and re-import them, especially when I switch between
branches.
I have lots of projects, and they are all nicely "grouped" as Eclipse
shows projects in alphabetical order, and all projects use a
consistent prefix like "hibernate-ogm-" or "wildfly-", etc..
But Infinispan often manages to fool me, as most modules have an
"infinispan-" prefix, but not all of them follow this rule so some get
to hide out of sight (I literally have hundreds of projects in my
primary workspace).
Could we please make sure they all have a name starting with "infinispan-" ?
If you agree I'm happy to send a PR to fix the couple of exotic ones.
Sanne
9 years, 6 months
Re: [infinispan-dev] Clustering standalone Infinispan w/ WF running Infinispan
by Tristan Tarrant
Markdown chewed on my markup :)
https://raw.githubusercontent.com/tristantarrant/infinispan-playground-hy...
On 10/10/14 15:42, Kurt T Stam wrote:
> Hi Tristan,
>
> I'm trying to follow your instructions but am I bit confused by the
> following:
>
> "You will also need to modify the following file:
>
> modules/system/layers/base/org/jboss/as/clustering/infinispan/main/module.xml
>
>
> by adding the following line to its dependencies:"
>
> What do I have to add?
>
> Thx,
>
> --Kurt
>
>
>
> On 10/2/14, 9:21 AM, Tristan Tarrant wrote:
>> I have successfully created a "hybrid" cluster between an application
>> using Infinispan in embedded mode and an Infinispan server by doing
>> the following on the embedded side:
>>
>> - use a JGroups Channel wrapped in a MuxHandler
>> - use a custom class resolver which simulates (or rather... hacks)
>> the behaviour of the ModularClassResolver when not using modules
>>
>> You can find the code at my personal GitHub repo:
>>
>> https://github.com/tristantarrant/infinispan-playground/tree/master/src/m...
>>
>>
>> suggestions and improvements are welcome.
>>
>> Tristan
>>
>> On 30/09/14 10:01, Stelios Koussouris wrote:
>>> Hi,
>>>
>>> To give a bit of context on this. We are doing a POC where the
>>> customer wishes to utilize JDG to speed up their application. We
>>> need (due to some customer requirements) to cluster
>>> EMBEDDED JDG (infinispan library mode) with REMOTE JDG (Infinispan
>>> Server) nodes. The infinispan jars should be the same as they are
>>> only libraries and they
>>> are on the same version. However, during "clustering" of the caches
>>> we started seeing errors which looked like there were due to the
>>> fact that the clustering of the caches contained different
>>> info between the 2 types of cache instantiation (embedded vs server).
>>>
>>> The result was to for a suggestion to create our own MuxChannel (I
>>> don't know if we have any other alternatives at this stage to
>>> cluster embedded with server infinispan caches) but at the moment we
>>> are facing https://gist.github.com/skoussou/5edc5689446b67f85ae8
>>>
>>> Regards,
>>>
>>> Stylianos Kousouris
>>> Red Hat Middleware Consultant
>>>
>>> ----- Original Message -----
>>> From: "Tristan Tarrant" <ttarrant(a)redhat.com>
>>> To: "infinispan -Dev List" <infinispan-dev(a)lists.jboss.org>, "Kurt T
>>> Stam" <kurt.stam(a)jboss.com>
>>> Cc: "Stelios Koussouris" <stkousso(a)redhat.com>, "Richard
>>> Achmatowicz" <rachmato(a)redhat.com>
>>> Sent: Tuesday, 30 September, 2014 8:02:27 AM
>>> Subject: Re: [infinispan-dev] Clustering standalone Infinispan w/ WF
>>> running Infinispan
>>>
>>> I don't know what Kurt is doing, but Stelios is attempting to
>>> cluster an
>>> application using embedded Infinispan deployed within WF together with
>>> an Infinispan Server instance.
>>> The application is managing its own caches, and therefore it is not
>>> interacting with the underlying Infinispan and JGroups subsystems in
>>> WF.
>>> Infinispan Server uses its Infinispan and JGroups subsystems (which are
>>> forked from WF's) and therefore are using MuxChannels.
>>>
>>> I told Stelios to use a MuxChannel-wrapped Channel in his application
>>> and it solved part of the issue (he was initially importing the one
>>> included in the WF's jgroups subsystem, but now he's using his local
>>> copy), but now he has run into further problems and I believe what Paul
>>> & Dennis have written might be correct.
>>>
>>> The code that configures this is in
>>> EmbeddedCacheManagerConfigurationService:
>>>
>>> GlobalConfigurationBuilder builder = new GlobalConfigurationBuilder();
>>> ModuleLoader moduleLoader = this.dependencies.getModuleLoader();
>>> builder.serialization().classResolver(ModularClassResolver.getInstance(moduleLoader));
>>>
>>>
>>> I don't know how you'd get a ModuleLoader from within a WF deployment,
>>> but I'm sure it can be done.
>>>
>>> Tristan
>>>
>>> On 29/09/14 18:57, Paul Ferraro wrote:
>>>> You should not need to use a MuxChannel. This would only be
>>>> necessary if there are other EAP services sharing the channel.
>>>> Using a MuxChannel allows your standalone Infinispan instance to
>>>> filter these irrelevant messages. However, in JDG, there should be
>>>> no other services other than Infinispan using the channel - hence
>>>> the MuxChannel stuff is unnecessary.
>>>>
>>>> I think Dennis earlier response was spot on. EAP/JDG configures
>>>> it's cache managers using a ModularClassResolver (which includes a
>>>> module name along with the class name when marshalling). Your
>>>> standalone Infinispan instances do not use this and therefore
>>>> cannot make sense of the message body.
>>>>
>>>> Paul
>>>>
>>>> ----- Original Message -----
>>>>> From: "Kurt T Stam" <kurt.stam(a)jboss.com>
>>>>> To: "Stelios Koussouris" <stkousso(a)redhat.com>, "Radoslav Husar"
>>>>> <rhusar(a)redhat.com>
>>>>> Cc: "Galder Zamarreño" <galder(a)redhat.com>, "Paul Ferraro"
>>>>> <paul.ferraro(a)redhat.com>, "Richard Achmatowicz"
>>>>> <rachmato(a)redhat.com>, "infinispan -Dev List"
>>>>> <infinispan-dev(a)lists.jboss.org>
>>>>> Sent: Monday, September 29, 2014 11:39:59 AM
>>>>> Subject: Re: Clustering standalone Infinispan w/ WF running
>>>>> Infinispan
>>>>>
>>>>> Thanks for following up Stelios, I think Galder is traveling the
>>>>> next 2
>>>>> weeks.
>>>>>
>>>>> So - do we need fixes on both ends then so that the boot order
>>>>> does not
>>>>> matter? In which project(s) would we apply
>>>>> there changes? Or can they be applied in the end-user's code?
>>>>>
>>>>> Thx,
>>>>>
>>>>> --Kurt
>>>>>
>>>>>
>>>>>
>>>>> On 9/26/14, 11:19 AM, Stelios Koussouris wrote:
>>>>>> Hi,
>>>>>>
>>>>>> Rado: It is both ways. ie. if I start first the JDG Server I get
>>>>>> the issue
>>>>>> on the library mode side when I start that one. If reverse the
>>>>>> order of
>>>>>> startup I get it in the JDG Server side.
>>>>>>
>>>>>> Question:
>>>>>> -----------------------------------------------------------------------------------------------------------------------
>>>>>>
>>>>>> ...IMO the channel needs to be wrapped as
>>>>>> org.jboss.as.clustering.jgroups.MuxChannel before passing to
>>>>>> infinispan.
>>>>>> ...
>>>>>> -----------------------------------------------------------------------------------------------------------------------
>>>>>>
>>>>>> For now that this is not being done. If I wanted to do it
>>>>>> manually on the
>>>>>> library side where I can create the protocol programmatically we are
>>>>>> talking about something like this?
>>>>>>
>>>>>> ProtocolStackConfigurator configurator =
>>>>>> ConfiguratorFactory.getStackConfigurator("jgroups-udp.xml");
>>>>>> MuxChannel channel = new MuxChannel(configurator);
>>>>>> org.infinispan.remoting.transport.Transport transport = new
>>>>>> org.infinispan.remoting.transport.jgroups.JGroupsTransport(channel);
>>>>>>
>>>>>> ....
>>>>>> then replace the below
>>>>>> new
>>>>>> GlobalConfigurationBuilder().clusteredDefault().globalJmxStatistics().cacheManagerName("RDSCacheManager").allowDuplicateDomains(true).enable().transport().clusterName("UDM-CLUSTER").addProperty("configurationFile",
>>>>>>
>>>>>> "jgroups-udp.xml")
>>>>>> WITH
>>>>>> new
>>>>>> GlobalConfigurationBuilder().clusteredDefault().globalJmxStatistics().cacheManagerName("RDSCacheManager").allowDuplicateDomains(true).enable().transport(Transport).clusterName("UDM-CLUSTER")
>>>>>>
>>>>>>
>>>>>> Btw, someone mentioned that if I follow this method I need to to
>>>>>> know the
>>>>>> assigned mux ids, but that is not quite clear what it means with
>>>>>> regards
>>>>>> to the JGroupsTransport configuration
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Stylianos Kousouris
>>>>>> Red Hat Middleware Consultant
>>>>>>
>>>>>> ----- Original Message -----
>>>>>> From: "Radoslav Husar" <rhusar(a)redhat.com>
>>>>>> To: "Galder Zamarreño" <galder(a)redhat.com>, "Paul Ferraro"
>>>>>> <paul.ferraro(a)redhat.com>
>>>>>> Cc: "Richard Achmatowicz" <rachmato(a)redhat.com>, "infinispan -Dev
>>>>>> List"
>>>>>> <infinispan-dev(a)lists.jboss.org>, "Stelios Koussouris"
>>>>>> <stkousso(a)redhat.com>, "Kurt T Stam" <kurt.stam(a)jboss.com>
>>>>>> Sent: Friday, 26 September, 2014 3:47:16 PM
>>>>>> Subject: Re: Clustering standalone Infinispan w/ WF running
>>>>>> Infinispan
>>>>>>
>>>>>> From what Stelios is telling me the question is a little bit
>>>>>> other way
>>>>>> round: he is using library mode infinispan and jgroups in EAP and
>>>>>> connecting to JDG. So the question is what JDG is doing with the
>>>>>> stack,
>>>>>> not AS/WF as its infinispan/jgroups subsystem is not used.
>>>>>>
>>>>>> Unfortunately I don't have access to the JDG repo so I don't know
>>>>>> what
>>>>>> changes have been made there but if you are using the same jgroups
>>>>>> logic, IMO the channel needs to be wrapped as
>>>>>> org.jboss.as.clustering.jgroups.MuxChannel before passing to
>>>>>> infinispan.
>>>>>>
>>>>>> Rado
>>>>>>
>>>>>> On 26/09/14 15:03, Galder Zamarreño wrote:
>>>>>>> Hey Paul,
>>>>>>>
>>>>>>> In the last couple of days, a couple of people have encountered the
>>>>>>> exception in [1] when trying to cluster a standalone Infinispan
>>>>>>> app with
>>>>>>> its own JGroups configuration file with a AS/WF running
>>>>>>> Infinispan cache.
>>>>>>>
>>>>>>> From my POV, 3 possible causes:
>>>>>>>
>>>>>>> 1. Dependency mismatches between AS/WF and the standalone app.
>>>>>>> Having done
>>>>>>> some quick study of Kurt’s case, apart from micro version
>>>>>>> changes, all
>>>>>>> looks good.
>>>>>>>
>>>>>>> 2. Mismatch in the Infinispan and/or JGroups configuration file.
>>>>>>>
>>>>>>> 3. AS/WF puts something on the clustered wire that standalone
>>>>>>> Infinispan
>>>>>>> does not expect. Are you still doing multiplexing? Could you be
>>>>>>> adding
>>>>>>> extra info to the wire?
>>>>>>>
>>>>>>> With this email, I’m trying to get some clarification from you
>>>>>>> if the
>>>>>>> issue could be due to 3rd option. If it’s either of the first
>>>>>>> two, it’s a
>>>>>>> matter of digging and finding the difference, but if it’s 3rd
>>>>>>> one, it’s
>>>>>>> more problematic.
>>>>>>>
>>>>>>> Any ideas?
>>>>>>>
>>>>>>> [1] https://gist.github.com/skoussou/92f062f2d0bd17168e01
>>>>>>> --
>>>>>>> Galder Zamarreño
>>>>>>> galder(a)redhat.com
>>>>>>> twitter.com/galderz
>>>>>>>
>>>> _______________________________________________
>>>> infinispan-dev mailing list
>>>> infinispan-dev(a)lists.jboss.org
>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>
>>
>
9 years, 6 months
Infinispan 7.0.0.CR1 is out!
by Dan Berindei
Dear Community,
We are gearing up towards a great Infinispan 7.0.0, and we are happy to
announce our first candidate release!
Notable features and improvements in this release:
* Cross-site state transfer now handles failures (ISPN-4025)
* Easier management of Protobuf schemas (ISPN-4357)
* New uberjars-based distribution (ISPN-4728)
* The HotRod protocol and Java client now have a size() operation
(ISPN-4736)
* Cluster listeners' filters and converters can now see the old value and
metadata (ISPN-4753)
See the full announcement here:
http://goo.gl/ERslmk
Cheers
Dan
9 years, 7 months