Infinispan and change data capture
by Randall Hauch
The Debezium project [1] is working on building change data capture connectors for a variety of databases. MySQL is available now, MongoDB will be soon, and PostgreSQL and Oracle are next on our roadmap.
One way in which Debezium and Infinispan can be used together is when Infinispan is being used as a cache for data stored in a database. In this case, Debezium can capture the changes to the database and produce a stream of events; a separate process can consume these change and evict entries from an Infinispan cache.
If Infinispan is to be used as a data store, then it would be useful for Debezium to be able to capture those changes so other apps/services can consume the changes. First of all, does this make sense? Secondly, if it does, then Debezium would need an Infinispan connector, and it’s not clear to me how that connector might capture the changes from Infinispan.
Debezium typically monitors the log of transactions/changes that are committed to a database. Of course how this works varies for each type of database. For example, MySQL internally produces a transaction log that contains information about every committed row change, and MySQL ensures that every committed change is included and that non-committed changes are excluded. The MySQL mechanism is actually part of the replication mechanism, so slaves update their internal state by reading the master’s log. The Debezium MySQL connector [2] simply reads the same log.
Infinispan has several mechanisms that may be useful:
Interceptors - See [3]. This seems pretty straightforward and IIUC provides access to all internal operations. However, it’s not clear to me whether a single interceptor will see all the changes in a cluster (perhaps in local and replicated modes) or only those changes that happen on that particular node (in distributed mode). It’s also not clear whether this interceptor is called within the context of the cache’s transaction, so if a failure happens just at the wrong time whether a change might be made to the cache but is not seen by the interceptor (or vice versa).
Cross-site replication - See [4][5]. A potential advantage of this mechanism appears to be that it is defined (more) globally, and it appears to function if the remote backup comes back online after being offline for a period of time.
State transfer - is it possible to participate as a non-active member of the cluster, and to effectively read all state transfer activities that occur within the cluster?
Cache store - tie into the cache store mechanism, perhaps by wrapping an existing cache store and sitting between the cache and the cache store
Monitor the cache store - don’t monitor Infinispan at all, and instead monitor the store in which Infinispan is storing entries. (This is probably the least attractive, since some stores can’t be monitored, or because the store is persisting an opaque binary value.)
Are there other mechanism that might be used?
There are a couple of important requirements for change data capture to be able to work correctly:
Upon initial connection, the CDC connector must be able to obtain a snapshot of all existing data, followed by seeing all changes to data that may have occurred since the snapshot was started. If the connector is stopped/fails, upon restart it needs to be able to reconnect and either see all changes that occurred since it last was capturing changes, or perform a snapshot. (Performing a snapshot upon restart is very inefficient and undesirable.) This works as follows: the CDC connector only records the “offset” in the source’s sequence of events; what this “offset” entails depends on the source. Upon restart, the connector can use this offset information to coordinate with the source where it wants to start reading. (In MySQL and PostgreSQL, every event includes the filename of the log and position in that file. MongoDB includes in each event the monotonically increasing timestamp of the transaction.
No change can be missed, even when things go wrong and components crash.
When a new entry is added, the “after” state of the entity will be included. When an entry is updated, the “after” state will be included in the event; if possible, the event should also include the “before” state. When an entry is removed, the “before” state should be included in the event.
Any thoughts or advice would be greatly appreciated.
Best regards,
Randall
[1] http://debezium.io
[2] http://debezium.io/docs/connectors/mysql/
[3] http://infinispan.org/docs/stable/user_guide/user_guide.html#_custom_inte...
[4] http://infinispan.org/docs/stable/user_guide/user_guide.html#CrossSiteRep...
[5] https://github.com/infinispan/infinispan/wiki/Design-For-Cross-Site-Repli...
8 years
Spring module - change dependencies to Uber Jars
by Sebastian Laskawiec
Hey!
I'm currently trying to solve a tricky class loading issue connected to
Spring, CDI and Uber Jars. Here's the scenario:
- Remote Uber Jar contains CDI module
- Our Hot Rod client use newer version of JBoss Logging which is present
in Wildfly/EAP modules
- However EAP and Wildfly will load (and make available for deployment)
their own version of JBoss Logging [1]
- The easiest fix for this is to relocate JBoss Logging package in
Uber Jar
- Spring module requires some classes from Infinispan Common and they in
turn need BasicLogger from JBoss Logging
- If we relocate JBoss Logging and will try to use Uber Jar with
Spring - we will end up with classloading issue [2]
So it seems the best approach is to make Spring depend on Uber Jars instead
of "small ones". Of course, users who use small jars will probably be
affected by this change (they would have to either accept using Uber Jars
or exclude them in their poms and add dependencies manually).
Is anyone against this solution? JIRA tracking ticket: [3].
Thanks
Sebastian
[1] Scenario with Weld enabled WAR
https://docs.jboss.org/author/display/AS7/Implicit+module+dependencies+fo...
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1266831#c7
[3] https://issues.jboss.org/browse/ISPN-6132
8 years
Missing Externalizers for the Lucene Query classes
by Sanne Grinovero
Following up on today's meeting minutes.
Galder asked if Hibernate Search was going to provide externalizers
for the Lucene Query classes; let me clarify that I don't think that
those belong in the Hibernate Search code base, not least we hope to
avoid ever needing to implement them.
A soft reason to not have them in Hibernate Search is that this
project never needs to serialise any Query; this is a requirement of
Infinispan Query only, needed to implement Infinispan specific
extensions to the query engine.
Although, these are very nice extensions so I'd not like to see them
dropped: I'd hope that Infinispan could work around the lack of proper
externalizers for the moment, as it has always been able to do so far.
A stronger reason is that this would introduce circular dependencies
between the two projects, and a big overhead of release coordination:
we had this in the past, very all very glad this is in the past!
When we'll have IQL, this will both define a good "on the wire"
representation which would solve the serialization problem, and IQL
will also limit the amount of Query types which we will need to
support, as at that point we will be able to limit the support for
Clustered Queries (which is the feature needing to serialize the
queries) to those which IQL can express, and thus serialize.
At that point we'll be able to deprecate the the Clustered Query API
which accepts a user instance of the Lucene Query, and only run
clustered queries for queries expressed over IQL. Not least we'll be
able to automatically determine if the query is best run as a
clustered query or as a local query, removing this complexity from the
user's responsibilities.
In conclusion, we'll still be using the "Clustered Query"
functionality, but not exposing it, and by doing so we won't need any
externalizer. But for now please keep the tests running with the
existing externalizer strategies: we just need to keep it functional,
but there's no need to optimise performance of these externalizers as
we'll get rid of them.
Thanks,
Sanne
8 years, 1 month
Data Container configuration changes
by William Burns
I have been working on adding in off heap support for a given cache. I
wanted to check in and let you all know what I was thinking for the
configuration and changes that would come about with it.
TLDR;
New config under data container to enable off heap, StoreAsBinary removed,
Equivalence removed
First I was planning on adding new sub elements of data container. These
would be instance, binary and off-heap. Only of the three could be picked
as they are mutually exclusive. Instance is as we operate now where we
store the instance of the object passed to us. Binary is essentially what
we have now that is called storeAsBinary with both keys and values
converted. Lastly off-heap would store the entry as a byte[] store
completely in native memory.
Example:
<data-container>
<off-heap/>
</data-container>
The reason it is a subelement instead of a property is because off-heap
will most likely require some additional configuration to tell how many
entries to store in the a bucket (think non resizing HashMap).
With these changes storeAsBinary becomes redundant, so I was planning on
removing this configuration completely. I would rather remove since this
is 9.0 and not deprecate. As far as I know nobody really used it before.
Also another side effect is I was removing all of the Equivalence classes.
I am not sure if I can plainly remove them since they have lived in commons
for quite a while, but it would be best if I could, although I am fine
deprecating. In its place the instance setting for data-container will
always wrap byte[] to satisfy equals and hashCode methods.
Any feedback would be appreciated.
Thanks,
- Will
8 years, 1 month
Names, names, names...
by Tristan Tarrant
Hi all,
something trivial and fun for a Monday morning.
I've just issued a PR [1] to update the codename for Infinispan 9.0.
And while we're at it, let's give a name to the new query language that
Adrian (and Emmanuel) have designed. We already have a number of
suggestions (which I summarize below) but please feel free to add your
own. Please vote.
IQL (Infinispan Query Language, already used by others).
Ickle (Alternate pronunciation of above, also means "small")
LQID (Language for Querying Infinispan Datagrids)
QuIL ("Query Infinispan" Language)
Tristan
[1] https://github.com/infinispan/infinispan/pull/4617
--
Tristan Tarrant
Infinispan Lead
JBoss, a division of Red Hat
8 years, 1 month
Func API in tx cache
by Radim Vansa
Hi,
seems I'll have to implement the functional stuff on tx caches [1][2] if
I want to get rid of DeltaAware et al.
The general idea is quite simple - ReadOnly* commands should behave very
similar to non-tx mode, WriteOnly* commands will be just added as
modifications to the PrepareCommand and ReadWrite* commands will be both
added to modifications list, and sent to remote nodes where the result
won't be stored yet.
The results of operations should not be stored into transactional
context - the command will execute remotely (if the owners are remote)
unless the value was read by Get* beforehand.
With repeatable-reads isolation, the situation gets more complicated. If
we use ReadOnly* that performs identity lookup (effectively the same as
Get*) and the entry was modified in during the transaction, we can
return two different results - so a read committed semantics. With write
skew check enabled, we could at least fail the transaction at the end
(the check would be performed reads as well if the transaction contains
functional reads), but we cannot rely on WSC always on with RR.
Retrieving the whole entry and applying the functional command is not a
viable solution, IMO - that would completely defy the purpose of using
functional command.
A possible solution would be to send the global transaction ID with
those read commands and keep a remote transactional context with read
entries for the duration of transaction on remote nodes, too. However,
if we do a Read* command to primary owner, it's possible that further
Get* command will hit backup. So, we could go to all owners with Read*
already during the transaction (slowing down functional reads
considerably), or read only from primary owner (which slows down Get*s
even if we don't use functional APIs - this makes it a no-go). I am not
100% sure how a transaction transfer during ST will get into that.
We could also do it the ostrich way - "Yes we've promissed RR but Func
will be only RC". I'll probably do that in the first draft anyway.
Comments & opinions appreciated.
Radim
[1] https://issues.jboss.org/browse/ISPN-5806
[2] https://issues.jboss.org/browse/ISPN-6573
--
Radim Vansa <rvansa(a)redhat.com>
JBoss Performance Team
8 years, 1 month
Some memory usage data points of the Hibernate Search / Lucene query engine.
by Sanne Grinovero
Hi all,
at our last meeting we had some chats about heap usage at runtime.
I wasn't actually investigating this, but since I was running some
tests now I thought it was interesting to share the following figures
about the Hibernate Search / Lucene query engine.
# Index size
My test is running with an off-heap index of about 5 million entries,
which translates to about 170MB of disk space.
# Query type
We're running a large query which matches every single one of the 5
million entries, and sorting the results on a Float property.
# TLAB usage
Running such a query takes 2.78 MB of TLAB; this data is never
promoted so it's "cheap" to collect as there's no much interaction
with other threads.
# Beyond TLAB
Allocating zero bytes ;)
In terms of memory usage this looks quite good; kudos to the Lucene
team of course as they do the heavy lifting, but I'm proud of the
Hibernate Search team as well as we don't add significant overhead.
We still have some work to do, as while memory looks good the sorting
on Floats is still very heavy.. but we'll figure it out.
Looking forward to see similar metrics from get/put tests on Infinispan ;-)
N.B. this is an "off heap" index, but even if I use an "on heap"
index, above figures are pretty much the same, and the performance
doesn't get higher, but slower.
Thanks,
Sanne
8 years, 1 month
Early Access builds for JDK 8u122 b02 , JDK 9 & JDK 9 with Project Jigsaw b140 are available on java.net
by Rory O'Donnell
Hi Galder,
Early Access b02 <https://jdk8.java.net/download.html> for JDK 8u122 is
available , summary of changes are listed here.
<http://www.java.net/download/java/jdk8u122/changes/jdk8u122-b02.html>
Early Access b140 <https://jdk9.java.net/jigsaw/> (#5625) for JDK 9 with
Project Jigsaw is available on java.net, summary of changes are listed
here.
<http://www.java.net/download/java/jigsaw/archive/140/binaries/jdk-9+140.html>
Early Access b140 <https://jdk9.java.net/download/> for JDK 9 is
available on java.net, summary of changes are listed here
<http://www.java.net/download/java/jdk9/changes/jdk-9+140.html>.
A couple of items to point out with regard to b140:
1. A fix for Cyclic interface initialization causes JVM crash is
included in b140
2. We are requesting feedback on a change that went into JDK 9 b140
The java.io.FilePermission class was changed to remove pathname
canonicalization from its creation,
along with a system property to revert the behavior back to way it
worked in the previous JDK release.
We do this mainly for performance enhancement so that there is no
need to consult the file system every
time a FilePermisson is created. If you use a security manager and
file permissions then you should read
the details as described on the jdk9 mailing list [1]
Feedback is requested via core-libs-dev(a)openjdk.java.net
The new GA date of JDK 9 has been updated on the JDK 9 Project page [2],
further details in Mark Reinhold's email [3].
Rgds,Rory
[1] http://mail.openjdk.java.net/pipermail/jdk9-dev/2016-October/005062.html
[2] http://openjdk.java.net/projects/jdk9/
[3]
http://mail.openjdk.java.net/pipermail/jdk9-dev/2016-October/005092.html
--
Rgds,Rory O'Donnell
Quality Engineering Manager
Oracle EMEA, Dublin,Ireland
8 years, 2 months