On Dec 8, 2016, at 3:13 AM, Gustavo Fernandes
<gustavo(a)infinispan.org> wrote:
On Wed, Dec 7, 2016 at 9:20 PM, Randall Hauch <rhauch(a)redhat.com
<mailto:rhauch@redhat.com>> wrote:
Reviving this old thread, and as before I appreciate any help the Infinispan community
might provide. There definitely is interest in Debezium capturing the changes being made
to an Infinispan cluster. This isn’t as important when Infinispan is used as a cache, but
when Infinispan is used as a store then it is important for other apps/services to be able
to accurately keep up with the changes being made in the store.
> On Jul 29, 2016, at 8:47 AM, Galder Zamarreño <galder(a)redhat.com
<mailto:galder@redhat.com>> wrote:
>
>
> --
> Galder Zamarreño
> Infinispan, Red Hat
>
>> On 11 Jul 2016, at 16:41, Randall Hauch <rhauch(a)redhat.com
<mailto:rhauch@redhat.com>> wrote:
>>
>>>
>>> On Jul 11, 2016, at 3:42 AM, Adrian Nistor <anistor(a)redhat.com
<mailto:anistor@redhat.com>> wrote:
>>>
>>> Hi Randall,
>>>
>>> Infinispan supports both push and pull access models. The push model is
supported by events (and listeners), which are cluster wide and are available in both
library and remote mode (hotrod). The notification system is pretty advanced as there is a
filtering mechanism available that can use a hand coded filter / converter or one
specified in jpql (experimental atm). Getting a snapshot of the initial data is also
possible. But infinispan does not produce a transaction log to be used for determining all
changes that happened since a previous connection time, so you'll always have to get a
new full snapshot when re-connecting.
>>>
>>> So if Infinispan is the data store I would base the Debezium connector
implementation on Infinispan's event notification system. Not sure about the other use
case though.
>>>
>>
>> Thanks, Adrian, for the feedback. A couple of questions.
>>
>> You mentioned Infinispan has a pull model — is this just using the normal API to
read the entries?
>>
>> With event listeners, a single connection will receive all of the events that
occur in the cluster, correct? Is it possible (e.g., a very unfortunately timed crash) for
a change to be made to the cache without an event being produced and sent to listeners?
>
> ^ Yeah, that can happen due to async nature of remote events. However, there's
the possibility for clients, upon receiving a new topology, to receive the current state
of the server as events, see [1] and [2]
>
> [1]
http://infinispan.org/docs/dev/user_guide/user_guide.html#client_event_li...
<
http://infinispan.org/docs/dev/user_guide/user_guide.html#client_event_li...
> [2]
http://infinispan.org/docs/dev/user_guide/user_guide.html#client_event_li...
<
http://infinispan.org/docs/dev/user_guide/user_guide.html#client_event_li...
It is critical that any change event stream is consistent with the store, and the change
event stream is worthless without it. Only when the change event stream is an accurate
representation of what changed can downstream consumers use the stream to rebuild their
own perfect copy of the upstream store and to keep those copies consistent with the
upstream store.
So, given that the events are handled asynchronously, in a cluster how are multiple
changes to a single entry handled. For example, if a client sets entry <A,Foo>, then
a short time after that (or another) client sets entry <A,Bar>, is it guaranteed
that a client listening to events will see <A,Foo> first and <A,Bar> some time
later? Or is it possible that a client listening might first see <A,Bar> and then
<A,Foo>?
>
>> What happens if the network fails or partitions? How does cross site replication
address this?
>
> In terms of cross-site, depends what the client is connected to. Clients can now
failover between sites, so they should be able to deal with events too in the same as
explained above.
>
>>
>> Has there been any thought about adding to Infinispan a write ahead log or
transaction log to each node or, better yet, for the whole cluster?
>
> Not that I'm aware of but we've recently added security audit log, so a
transaction log might make sense too.
Without a transaction log, Debezium would have to use a client listener with
includeCurrentState=true to obtain the state every time it reconnects. If Debezium just
included all of this state in the event stream, then the stream might contain lots of
superfluous or unnecessary events, then this impacts all downstream consumers by forcing
them to spend a lot of time processing changes that never really happened. So the only way
to avoid that would be for Debezium to use an external store to track the changes it has
seen so far so that it doesn’t include unnecessary events in the change event stream. It’d
be a shame to have to require this much infrastructure.
A transaction log would really be a great way to solve this problem. Has there been any
more thought about Infinispan using and exposing a transaction log? Or perhaps Infinispan
could record the changes in a Kafka topic directly?
(I guess if the Infinispan cache used relational database(s) as a cache store(s), then
Debezium could just capture the changes from there. That seems like a big constraint,
though.)
Thoughts?
I recently updated a proposal [1] based on several discussions we had in the past that is
essentially about introducing an event storage mechanism (write ahead log) in order to
improve reliability, failover and "replayability" for the remote listeners, any
feedback greatly appreciated.
[1]
https://github.com/infinispan/infinispan/wiki/Remote-Listeners-improvemen...
<
https://github.com/infinispan/infinispan/wiki/Remote-Listeners-improvemen...
Hi, Gustavo. Thanks for the response. I like the proposal a lot, and have a few specific
comments and questions. Let me know if there is a better forum for this feedback.
It is smart to require the application using the HotRod client to know/manage the id of
the latest event it has seen. This allows an application to restart from where it left
off, but it also allows the application to replay some events if needed. For example, an
application may “fully-process” an event asynchronously from the “handle” method (e.g.,
the event handler method just puts the event into a queue and immediately returns), so
only the application knows which ids it has fully processed. If anything goes wrong, the
client is in full control over where it wants to restart.
When a client first registers, it should always obtain the id of the most recent event in
the log. When using "includeState=true”, the client will first receive the state of
all entries, and then needs to start reading events from the point at which the state
transfer started (this is the only way to ensure that every change is seen at least
once).
It must be possible to enable this logging on an existing cache, and doing this will
likely mean the log starts out capturing only the changes made since the log was enabled.
This should be acceptable, since clients that want all entries can optionally start out
with a snapshot (e.g., “includeState=true”).
Is the log guaranteed to have the same order of changes as was changed in the cache?
Will the log be configured with a TTL for the events or a fixed size? TTLs are easy to
understand but require variable amount of storage; capped storage size is easy to manage
but harder to understand.
Will the log store the “before” state of the entry? This increases the size of the events
and therefore the log, but it means client applications can do a lot more with the events
without storing (as much) state.
It is very useful for the HodRod client automatically failover when it loses its
connectivity with Infinispan. I presume this is based upon the id of the event
successfully provided and handled by the listener method.
Will the log include transaction boundaries, or at least a transaction id/number in each
event?
Do/will the events include the entry version numbers? Are the versions included in events
when "includeCurrentState=true” is set?
I hope this helps; let me know if you want clarification on any of these. I can’t wait to
have this feature!
Best regards,
Randall
Thanks,
Gustavo
_______________________________________________
infinispan-dev mailing list
infinispan-dev(a)lists.jboss.org <mailto:infinispan-dev@lists.jboss.org>
https://lists.jboss.org/mailman/listinfo/infinispan-dev
<
https://lists.jboss.org/mailman/listinfo/infinispan-dev>