Proposal - encrypted cache
by Sebastian Laskawiec
Hey!
A while ago I stumbled upon [1]. The article talks about encrypting data
before they reach the server, so that the server doesn't know how to
decrypt it. This makes the data more secure.
The idea is definitely not new and I have been asked about something
similar several times during local JUGs meetups (in my area there are lots
of payments organizations who might be interested in this).
Of course, this can be easily done inside an app, so that it encrypts the
data and passes a byte array to the Hot Rod Client. I'm just thinking about
making it a bit easier and adding a default encryption/decryption mechanism
to the Hot Rod client.
What do you think? Does it make sense?
Thanks
Sebastian
[1] https://eprint.iacr.org/2016/920.pdf
6 years, 5 months
Infinispan and change data capture
by Randall Hauch
The Debezium project [1] is working on building change data capture connectors for a variety of databases. MySQL is available now, MongoDB will be soon, and PostgreSQL and Oracle are next on our roadmap.
One way in which Debezium and Infinispan can be used together is when Infinispan is being used as a cache for data stored in a database. In this case, Debezium can capture the changes to the database and produce a stream of events; a separate process can consume these change and evict entries from an Infinispan cache.
If Infinispan is to be used as a data store, then it would be useful for Debezium to be able to capture those changes so other apps/services can consume the changes. First of all, does this make sense? Secondly, if it does, then Debezium would need an Infinispan connector, and it’s not clear to me how that connector might capture the changes from Infinispan.
Debezium typically monitors the log of transactions/changes that are committed to a database. Of course how this works varies for each type of database. For example, MySQL internally produces a transaction log that contains information about every committed row change, and MySQL ensures that every committed change is included and that non-committed changes are excluded. The MySQL mechanism is actually part of the replication mechanism, so slaves update their internal state by reading the master’s log. The Debezium MySQL connector [2] simply reads the same log.
Infinispan has several mechanisms that may be useful:
Interceptors - See [3]. This seems pretty straightforward and IIUC provides access to all internal operations. However, it’s not clear to me whether a single interceptor will see all the changes in a cluster (perhaps in local and replicated modes) or only those changes that happen on that particular node (in distributed mode). It’s also not clear whether this interceptor is called within the context of the cache’s transaction, so if a failure happens just at the wrong time whether a change might be made to the cache but is not seen by the interceptor (or vice versa).
Cross-site replication - See [4][5]. A potential advantage of this mechanism appears to be that it is defined (more) globally, and it appears to function if the remote backup comes back online after being offline for a period of time.
State transfer - is it possible to participate as a non-active member of the cluster, and to effectively read all state transfer activities that occur within the cluster?
Cache store - tie into the cache store mechanism, perhaps by wrapping an existing cache store and sitting between the cache and the cache store
Monitor the cache store - don’t monitor Infinispan at all, and instead monitor the store in which Infinispan is storing entries. (This is probably the least attractive, since some stores can’t be monitored, or because the store is persisting an opaque binary value.)
Are there other mechanism that might be used?
There are a couple of important requirements for change data capture to be able to work correctly:
Upon initial connection, the CDC connector must be able to obtain a snapshot of all existing data, followed by seeing all changes to data that may have occurred since the snapshot was started. If the connector is stopped/fails, upon restart it needs to be able to reconnect and either see all changes that occurred since it last was capturing changes, or perform a snapshot. (Performing a snapshot upon restart is very inefficient and undesirable.) This works as follows: the CDC connector only records the “offset” in the source’s sequence of events; what this “offset” entails depends on the source. Upon restart, the connector can use this offset information to coordinate with the source where it wants to start reading. (In MySQL and PostgreSQL, every event includes the filename of the log and position in that file. MongoDB includes in each event the monotonically increasing timestamp of the transaction.
No change can be missed, even when things go wrong and components crash.
When a new entry is added, the “after” state of the entity will be included. When an entry is updated, the “after” state will be included in the event; if possible, the event should also include the “before” state. When an entry is removed, the “before” state should be included in the event.
Any thoughts or advice would be greatly appreciated.
Best regards,
Randall
[1] http://debezium.io
[2] http://debezium.io/docs/connectors/mysql/
[3] http://infinispan.org/docs/stable/user_guide/user_guide.html#_custom_inte...
[4] http://infinispan.org/docs/stable/user_guide/user_guide.html#CrossSiteRep...
[5] https://github.com/infinispan/infinispan/wiki/Design-For-Cross-Site-Repli...
8 years
Default cache
by Tristan Tarrant
In the discussion for [1] the subject of the default cache and the way
it affects configuration inheritance came up.
My proposal is:
- remove the default cache as a special cache altogether
- CacheManager.getCache() should return the named cache specified as
default in the configuration.
- the programmatic GlobalConfigurationBuilder/GlobalConfiguration should
have the notion of the default named cache (currently this is handled in
the parser)
- Retrieving the cache named "___defaultcache" should actually retrieve
the above named cache
Opinions ?
Tristan
[1] https://github.com/infinispan/infinispan/pull/4631
--
Tristan Tarrant
Infinispan Lead
JBoss, a division of Red Hat
8 years
Spring module - change dependencies to Uber Jars
by Sebastian Laskawiec
Hey!
I'm currently trying to solve a tricky class loading issue connected to
Spring, CDI and Uber Jars. Here's the scenario:
- Remote Uber Jar contains CDI module
- Our Hot Rod client use newer version of JBoss Logging which is present
in Wildfly/EAP modules
- However EAP and Wildfly will load (and make available for deployment)
their own version of JBoss Logging [1]
- The easiest fix for this is to relocate JBoss Logging package in
Uber Jar
- Spring module requires some classes from Infinispan Common and they in
turn need BasicLogger from JBoss Logging
- If we relocate JBoss Logging and will try to use Uber Jar with
Spring - we will end up with classloading issue [2]
So it seems the best approach is to make Spring depend on Uber Jars instead
of "small ones". Of course, users who use small jars will probably be
affected by this change (they would have to either accept using Uber Jars
or exclude them in their poms and add dependencies manually).
Is anyone against this solution? JIRA tracking ticket: [3].
Thanks
Sebastian
[1] Scenario with Weld enabled WAR
https://docs.jboss.org/author/display/AS7/Implicit+module+dependencies+fo...
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1266831#c7
[3] https://issues.jboss.org/browse/ISPN-6132
8 years
Off-Heap Storage Query
by Alan Kash
Hi,
Is there any architecture document of Off-Heap storage architecture. Do we
have an ETA for this feature.
Are we using any standard library for storage ? I looked online, the
Apache Cassandra project has abstracted the Off-Heap functionality into
https://github.com/snazy/ohc
Thanks,
Alan
8 years, 1 month
Missing Externalizers for the Lucene Query classes
by Sanne Grinovero
Following up on today's meeting minutes.
Galder asked if Hibernate Search was going to provide externalizers
for the Lucene Query classes; let me clarify that I don't think that
those belong in the Hibernate Search code base, not least we hope to
avoid ever needing to implement them.
A soft reason to not have them in Hibernate Search is that this
project never needs to serialise any Query; this is a requirement of
Infinispan Query only, needed to implement Infinispan specific
extensions to the query engine.
Although, these are very nice extensions so I'd not like to see them
dropped: I'd hope that Infinispan could work around the lack of proper
externalizers for the moment, as it has always been able to do so far.
A stronger reason is that this would introduce circular dependencies
between the two projects, and a big overhead of release coordination:
we had this in the past, very all very glad this is in the past!
When we'll have IQL, this will both define a good "on the wire"
representation which would solve the serialization problem, and IQL
will also limit the amount of Query types which we will need to
support, as at that point we will be able to limit the support for
Clustered Queries (which is the feature needing to serialize the
queries) to those which IQL can express, and thus serialize.
At that point we'll be able to deprecate the the Clustered Query API
which accepts a user instance of the Lucene Query, and only run
clustered queries for queries expressed over IQL. Not least we'll be
able to automatically determine if the query is best run as a
clustered query or as a local query, removing this complexity from the
user's responsibilities.
In conclusion, we'll still be using the "Clustered Query"
functionality, but not exposing it, and by doing so we won't need any
externalizer. But for now please keep the tests running with the
existing externalizer strategies: we just need to keep it functional,
but there's no need to optimise performance of these externalizers as
we'll get rid of them.
Thanks,
Sanne
8 years, 1 month
Triangle and ISPN-3918
by Radim Vansa
Hi,
I was thinking about ISPN-3918 [1] and I've realized that while this
happens in current implementation only rarely during state transfer,
with Triangle v4 this could happen more often.
Conditional command is always executed on primary owner, and so far
during the execution of conditional command (incl. replication to
backup-owners) the other commands to the same key were blocking in the
locking layer. Triangle v4 removes this blocking, and if in thread T1
you do:
T1: replace(key, A, B)
and in second thread T2
T2: replace(key, A, C)
T2: get(key)
the T2.replace can now fail before the T1.replace (successful) is
replicated to backup owner. When T2 is, by chance, the backup owner, the
T2.replace completes with false, the T2.get will be served locally and
it will still returns A.
We should decide if this is an issue, and either close ISPN-3918 (not a
bug) or think about triangle routing of unsuccessful commands.
Radim
[1] https://issues.jboss.org/browse/ISPN-3918
--
Radim Vansa <rvansa(a)redhat.com>
JBoss Performance Team
8 years, 1 month
Data Container configuration changes
by William Burns
I have been working on adding in off heap support for a given cache. I
wanted to check in and let you all know what I was thinking for the
configuration and changes that would come about with it.
TLDR;
New config under data container to enable off heap, StoreAsBinary removed,
Equivalence removed
First I was planning on adding new sub elements of data container. These
would be instance, binary and off-heap. Only of the three could be picked
as they are mutually exclusive. Instance is as we operate now where we
store the instance of the object passed to us. Binary is essentially what
we have now that is called storeAsBinary with both keys and values
converted. Lastly off-heap would store the entry as a byte[] store
completely in native memory.
Example:
<data-container>
<off-heap/>
</data-container>
The reason it is a subelement instead of a property is because off-heap
will most likely require some additional configuration to tell how many
entries to store in the a bucket (think non resizing HashMap).
With these changes storeAsBinary becomes redundant, so I was planning on
removing this configuration completely. I would rather remove since this
is 9.0 and not deprecate. As far as I know nobody really used it before.
Also another side effect is I was removing all of the Equivalence classes.
I am not sure if I can plainly remove them since they have lived in commons
for quite a while, but it would be best if I could, although I am fine
deprecating. In its place the instance setting for data-container will
always wrap byte[] to satisfy equals and hashCode methods.
Any feedback would be appreciated.
Thanks,
- Will
8 years, 1 month
Names, names, names...
by Tristan Tarrant
Hi all,
something trivial and fun for a Monday morning.
I've just issued a PR [1] to update the codename for Infinispan 9.0.
And while we're at it, let's give a name to the new query language that
Adrian (and Emmanuel) have designed. We already have a number of
suggestions (which I summarize below) but please feel free to add your
own. Please vote.
IQL (Infinispan Query Language, already used by others).
Ickle (Alternate pronunciation of above, also means "small")
LQID (Language for Querying Infinispan Datagrids)
QuIL ("Query Infinispan" Language)
Tristan
[1] https://github.com/infinispan/infinispan/pull/4617
--
Tristan Tarrant
Infinispan Lead
JBoss, a division of Red Hat
8 years, 1 month