| [
Yes I agree but I meant to point out that - as a user - I'd expect the frameworks to help define to which events it will listen to (i.e. which tables and columns we care about) and then from the received events being able to push the updates to the object model.
Different databases do different things, which is why change data capture (CDC) is so difficult for apps/frameworks that deal with multiple DBMSes. The MySQL binlog, for example, always records all of the values in each modified record, so that's what Debezium ships in its events. Other DBMS logs (or streaming APIs) might only expose the values that changed (MongoDB does this). The result is that the events are not homogenous. My goal is that Debezium can eventually provide services that one can optionally chain together to make the events homogenous. For example, one service can keep track of each record, consume the updates, and spit out events with the full state. Another service might keep track of the each record and spit out a patch of the record (only what changed). BTW, another thing that is really difficult about CDC is that you can't miss an event (and you don't want to process an event twice). If your consuming process goes down and comes back up, you want to start consuming events where you last left off, even when those events kept coming while you were down.
The alternative is that the RDBMs listener pushes not just the delta but the full graph of needed fields; but I'm afraid that gets complex too quickly and would require our libraries.
It'd be very interesting if Debezium service could be set up with entity definitions (e.g., from Hibernate & Co) that, when deployed, it tracks the state of each entity, applies the individual record changes to that entity, and spits out a "patch" for the entity anytime it changed. That's obviously non-trivial and requires a lot of integration, but I think the whole point is to provide functionality so that apps and libraries (like Hibernate Search) can set up Debezium to see the changes they need in near real-time so they don't have to query the database. |