[Hibernate-JIRA] Created: (HSEARCH-880) Discussion on how to support backward / forward compatible serialization layer

[Hibernate-JIRA] Created:...

Emmanuel Bernard (JIRA)

Friday, 26 August 2011 Fri, 26 Aug '11

10:21 a.m.

Discussion on how to support backward / forward compatible serialization layer ------------------------------------------------------------------------------ Key: HSEARCH-880 URL: http://opensource.atlassian.com/projects/hibernate/browse/HSEARCH-880 Project: Hibernate Search Issue Type: New Feature Components: serialization Reporter: Emmanuel Bernard h1. General principles The serialized message needs the following elements: * index name: to redirect the flux to the appropriate backend * serialization provider id: if not present, a cluster must make sure to use the same SerializationProvider for a given IndexManager * protocol version: today the version is major.minor where the major increase means incompatibility at the stream level, whereas minor means compatibility but with missing features * stream: this is the SerializationProvider specific byte[] bq. Do we need a serialization provider id? In other words, do we need to be able to hot-upgrade the SerializationProvider in a cluster? h1. Exchanging messages in an heterogeneous cluster h2. Cluster with one way communication (JMS) In this case the master receives a message and must try and process it. Receives an index name + serial provider id. Use the serial provider id to deserialize the message. If message_major > node_major, the serialization provider fails If message_minor > node_minor, the serialization provider proceeds but some features might not be supported and the deserialization might fail. In the minor bump case: * some feature might not be deserialized and simply ignored. A user is aware of the list of features differences between each node. * the stream might not be readable by an old version after all due to the use of some new features => Exception If message_major or message_minor < node_major or node_minor, we use the older protocol deserializer. bq. could there ever be a problem where a new HSearch Engine cannot deal with an old HSearch engine's message? h2. Cluster with two way communication (JMS) Each time a node A needs to send a message to a node B for the first time. It sends the list of supported SerializationProvider id and for each the list of Versions supported. The first SerializationProvider id is preferred and the latest versions are preferred. A version is more recent if majorA > majorB and with majorA = majorB if minorA > minorB. Node B receives the handshake message and returns the appropriate serialization provider id and version. Subsequent messages are exchanged with this accepted version between A and B bq. Is the JGroups clustering using multicast to send change messages ie does it know which node it sends the message to to do the handshake? bq. What happens if B goes down and back up? Does it have a "new" name that uniquely identify it? h1. API changes SerializationProvider will need the following adjustments: * a getSupportedVersions() * a getSerializer(Version) * a getDeserializer(Version) bq. could it be that Serializer / Deserializer / LuceneWorksBuilder lead to the inability to support a version n-1 (by adding of new methods or stuff like that? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira

Show replies by date

Emmanuel Bernard (JIRA)

Friday, 26 August Fri, 26 Aug

10:23 a.m.

New subject: [Hibernate-JIRA] Updated: (HSEARCH-880) Discussion on how to support backward / forward compatible serialization layer

[ http://opensource.atlassian.com/projects/hibernate/browse/HSEARCH-880?pag... ] Emmanuel Bernard updated HSEARCH-880: ------------------------------------- Fix Version/s: 4.0

...

Discussion on how to support backward / forward compatible serialization layer ------------------------------------------------------------------------------ Key: HSEARCH-880 URL: http://opensource.atlassian.com/projects/hibernate/browse/HSEARCH-880 Project: Hibernate Search Issue Type: New Feature Components: serialization Reporter: Emmanuel Bernard Fix For: 4.0 h1. General principles The serialized message needs the following elements: * index name: to redirect the flux to the appropriate backend * serialization provider id: if not present, a cluster must make sure to use the same SerializationProvider for a given IndexManager * protocol version: today the version is major.minor where the major increase means incompatibility at the stream level, whereas minor means compatibility but with missing features * stream: this is the SerializationProvider specific byte[] bq. Do we need a serialization provider id? In other words, do we need to be able to hot-upgrade the SerializationProvider in a cluster? h1. Exchanging messages in an heterogeneous cluster h2. Cluster with one way communication (JMS) In this case the master receives a message and must try and process it. Receives an index name + serial provider id. Use the serial provider id to deserialize the message. If message_major > node_major, the serialization provider fails If message_minor > node_minor, the serialization provider proceeds but some features might not be supported and the deserialization might fail. In the minor bump case: * some feature might not be deserialized and simply ignored. A user is aware of the list of features differences between each node. * the stream might not be readable by an old version after all due to the use of some new features => Exception If message_major or message_minor < node_major or node_minor, we use the older protocol deserializer. bq. could there ever be a problem where a new HSearch Engine cannot deal with an old HSearch engine's message? h2. Cluster with two way communication (JMS) Each time a node A needs to send a message to a node B for the first time. It sends the list of supported SerializationProvider id and for each the list of Versions supported. The first SerializationProvider id is preferred and the latest versions are preferred. A version is more recent if majorA > majorB and with majorA = majorB if minorA > minorB. Node B receives the handshake message and returns the appropriate serialization provider id and version. Subsequent messages are exchanged with this accepted version between A and B bq. Is the JGroups clustering using multicast to send change messages ie does it know which node it sends the message to to do the handshake? bq. What happens if B goes down and back up? Does it have a "new" name that uniquely identify it? h1. API changes SerializationProvider will need the following adjustments: * a getSupportedVersions() * a getSerializer(Version) * a getDeserializer(Version) bq. could it be that Serializer / Deserializer / LuceneWorksBuilder lead to the inability to support a version n-1 (by adding of new methods or stuff like that?

-- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira

Emmanuel Bernard (JIRA)

10:25 a.m.

New subject: [Hibernate-JIRA] Updated: (HSEARCH-880) Discussion on how to support backward / forward compatible serialization layer

[ http://opensource.atlassian.com/projects/hibernate/browse/HSEARCH-880?pag... ] Emmanuel Bernard updated HSEARCH-880: ------------------------------------- Description: h1. General principles The serialized message needs the following elements: * index name: to redirect the flux to the appropriate backend * serialization provider id: if not present, a cluster must make sure to use the same SerializationProvider for a given IndexManager * protocol version: today the version is major.minor where the major increase means incompatibility at the stream level, whereas minor means compatibility but with missing features * stream: this is the SerializationProvider specific byte[] bq. Do we need a serialization provider id? In other words, do we need to be able to hot-upgrade the SerializationProvider in a cluster? h1. Exchanging messages in an heterogeneous cluster h2. Cluster with one way communication (JMS) In this case the master receives a message and must try and process it. Receives an index name + serial provider id. Use the serial provider id to deserialize the message. If message_major > node_major, the serialization provider fails If message_minor > node_minor, the serialization provider proceeds but some features might not be supported and the deserialization might fail. bq. this requires to send the Avro schema with each message which would be a huge loss to support message_minor > node_minor In the minor bump case: * some feature might not be deserialized and simply ignored. A user is aware of the list of features differences between each node. * the stream might not be readable by an old version after all due to the use of some new features => Exception If message_major or message_minor < node_major or node_minor, we use the older protocol deserializer. bq. could there ever be a problem where a new HSearch Engine cannot deal with an old HSearch engine's message? h2. Cluster with two way communication (JMS) Each time a node A needs to send a message to a node B for the first time. It sends the list of supported SerializationProvider id and for each the list of Versions supported. The first SerializationProvider id is preferred and the latest versions are preferred. A version is more recent if majorA > majorB and with majorA = majorB if minorA > minorB. Node B receives the handshake message and returns the appropriate serialization provider id and version. Subsequent messages are exchanged with this accepted version between A and B bq. Is the JGroups clustering using multicast to send change messages ie does it know which node it sends the message to to do the handshake? bq. What happens if B goes down and back up? Does it have a "new" name that uniquely identify it? h1. API changes SerializationProvider will need the following adjustments: * a getSupportedVersions() * a getSerializer(Version) * a getDeserializer(Version) bq. could it be that Serializer / Deserializer / LuceneWorksBuilder lead to the inability to support a version n-1 (by adding of new methods or stuff like that? was: h1. General principles The serialized message needs the following elements: * index name: to redirect the flux to the appropriate backend * serialization provider id: if not present, a cluster must make sure to use the same SerializationProvider for a given IndexManager * protocol version: today the version is major.minor where the major increase means incompatibility at the stream level, whereas minor means compatibility but with missing features * stream: this is the SerializationProvider specific byte[] bq. Do we need a serialization provider id? In other words, do we need to be able to hot-upgrade the SerializationProvider in a cluster? h1. Exchanging messages in an heterogeneous cluster h2. Cluster with one way communication (JMS) In this case the master receives a message and must try and process it. Receives an index name + serial provider id. Use the serial provider id to deserialize the message. If message_major > node_major, the serialization provider fails If message_minor > node_minor, the serialization provider proceeds but some features might not be supported and the deserialization might fail. In the minor bump case: * some feature might not be deserialized and simply ignored. A user is aware of the list of features differences between each node. * the stream might not be readable by an old version after all due to the use of some new features => Exception If message_major or message_minor < node_major or node_minor, we use the older protocol deserializer. bq. could there ever be a problem where a new HSearch Engine cannot deal with an old HSearch engine's message? h2. Cluster with two way communication (JMS) Each time a node A needs to send a message to a node B for the first time. It sends the list of supported SerializationProvider id and for each the list of Versions supported. The first SerializationProvider id is preferred and the latest versions are preferred. A version is more recent if majorA > majorB and with majorA = majorB if minorA > minorB. Node B receives the handshake message and returns the appropriate serialization provider id and version. Subsequent messages are exchanged with this accepted version between A and B bq. Is the JGroups clustering using multicast to send change messages ie does it know which node it sends the message to to do the handshake? bq. What happens if B goes down and back up? Does it have a "new" name that uniquely identify it? h1. API changes SerializationProvider will need the following adjustments: * a getSupportedVersions() * a getSerializer(Version) * a getDeserializer(Version) bq. could it be that Serializer / Deserializer / LuceneWorksBuilder lead to the inability to support a version n-1 (by adding of new methods or stuff like that?

...

Discussion on how to support backward / forward compatible serialization layer ------------------------------------------------------------------------------ Key: HSEARCH-880 URL: http://opensource.atlassian.com/projects/hibernate/browse/HSEARCH-880 Project: Hibernate Search Issue Type: New Feature Components: serialization Reporter: Emmanuel Bernard Fix For: 4.0 h1. General principles The serialized message needs the following elements: * index name: to redirect the flux to the appropriate backend * serialization provider id: if not present, a cluster must make sure to use the same SerializationProvider for a given IndexManager * protocol version: today the version is major.minor where the major increase means incompatibility at the stream level, whereas minor means compatibility but with missing features * stream: this is the SerializationProvider specific byte[] bq. Do we need a serialization provider id? In other words, do we need to be able to hot-upgrade the SerializationProvider in a cluster? h1. Exchanging messages in an heterogeneous cluster h2. Cluster with one way communication (JMS) In this case the master receives a message and must try and process it. Receives an index name + serial provider id. Use the serial provider id to deserialize the message. If message_major > node_major, the serialization provider fails If message_minor > node_minor, the serialization provider proceeds but some features might not be supported and the deserialization might fail. bq. this requires to send the Avro schema with each message which would be a huge loss to support message_minor > node_minor In the minor bump case: * some feature might not be deserialized and simply ignored. A user is aware of the list of features differences between each node. * the stream might not be readable by an old version after all due to the use of some new features => Exception If message_major or message_minor < node_major or node_minor, we use the older protocol deserializer. bq. could there ever be a problem where a new HSearch Engine cannot deal with an old HSearch engine's message? h2. Cluster with two way communication (JMS) Each time a node A needs to send a message to a node B for the first time. It sends the list of supported SerializationProvider id and for each the list of Versions supported. The first SerializationProvider id is preferred and the latest versions are preferred. A version is more recent if majorA > majorB and with majorA = majorB if minorA > minorB. Node B receives the handshake message and returns the appropriate serialization provider id and version. Subsequent messages are exchanged with this accepted version between A and B bq. Is the JGroups clustering using multicast to send change messages ie does it know which node it sends the message to to do the handshake? bq. What happens if B goes down and back up? Does it have a "new" name that uniquely identify it? h1. API changes SerializationProvider will need the following adjustments: * a getSupportedVersions() * a getSerializer(Version) * a getDeserializer(Version) bq. could it be that Serializer / Deserializer / LuceneWorksBuilder lead to the inability to support a version n-1 (by adding of new methods or stuff like that?

-- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira

Emmanuel Bernard (JIRA)

10:29 a.m.

New subject: [Hibernate-JIRA] Updated: (HSEARCH-880) Discussion on how to support backward / forward compatible serialization layer

[ http://opensource.atlassian.com/projects/hibernate/browse/HSEARCH-880?pag... ] Emmanuel Bernard updated HSEARCH-880: ------------------------------------- Description: h1. General principles The serialized message needs the following elements: * index name: to redirect the flux to the appropriate backend * serialization provider id: if not present, a cluster must make sure to use the same SerializationProvider for a given IndexManager * protocol version: today the version is major.minor where the major increase means incompatibility at the stream level, whereas minor means compatibility but with missing features * stream: this is the SerializationProvider specific byte[] bq. Do we need a serialization provider id? In other words, do we need to be able to hot-upgrade the SerializationProvider in a cluster? h1. Exchanging messages in an heterogeneous cluster h2. Cluster with one way communication (JMS) In this case the master receives a message and must try and process it. Receives an index name + serial provider id. Use the serial provider id to deserialize the message. If message_major > node_major, the serialization provider fails If message_minor > node_minor, the serialization provider proceeds but some features might not be supported and the deserialization might fail. bq. this requires to send the Avro schema with each message which would be a huge loss to support message_minor > node_minor In the minor bump case: * some feature might not be deserialized and simply ignored. A user is aware of the list of features differences between each node. * the stream might not be readable by an old version after all due to the use of some new features => Exception If message_major or message_minor < node_major or node_minor, we use the older protocol deserializer. bq. could there ever be a problem where a new HSearch Engine cannot deal with an old HSearch engine's message? h2. Cluster with two way communication (JGroups) Each time a node A needs to send a message to a node B for the first time. It sends the list of supported SerializationProvider id and for each the list of Versions supported. The first SerializationProvider id is preferred and the latest versions are preferred. A version is more recent if majorA > majorB and with majorA = majorB if minorA > minorB. Node B receives the handshake message and returns the appropriate serialization provider id and version. Subsequent messages are exchanged with this accepted version between A and B bq. Is the JGroups clustering using multicast to send change messages ie does it know which node it sends the message to to do the handshake? bq. What happens if B goes down and back up? Does it have a "new" name that uniquely identify it? h1. API changes SerializationProvider will need the following adjustments: * a getSupportedVersions() * a getSerializer(Version) * a getDeserializer(Version) bq. could it be that Serializer / Deserializer / LuceneWorksBuilder lead to the inability to support a version n-1 (by adding of new methods or stuff like that? was: h1. General principles The serialized message needs the following elements: * index name: to redirect the flux to the appropriate backend * serialization provider id: if not present, a cluster must make sure to use the same SerializationProvider for a given IndexManager * protocol version: today the version is major.minor where the major increase means incompatibility at the stream level, whereas minor means compatibility but with missing features * stream: this is the SerializationProvider specific byte[] bq. Do we need a serialization provider id? In other words, do we need to be able to hot-upgrade the SerializationProvider in a cluster? h1. Exchanging messages in an heterogeneous cluster h2. Cluster with one way communication (JMS) In this case the master receives a message and must try and process it. Receives an index name + serial provider id. Use the serial provider id to deserialize the message. If message_major > node_major, the serialization provider fails If message_minor > node_minor, the serialization provider proceeds but some features might not be supported and the deserialization might fail. bq. this requires to send the Avro schema with each message which would be a huge loss to support message_minor > node_minor In the minor bump case: * some feature might not be deserialized and simply ignored. A user is aware of the list of features differences between each node. * the stream might not be readable by an old version after all due to the use of some new features => Exception If message_major or message_minor < node_major or node_minor, we use the older protocol deserializer. bq. could there ever be a problem where a new HSearch Engine cannot deal with an old HSearch engine's message? h2. Cluster with two way communication (JMS) Each time a node A needs to send a message to a node B for the first time. It sends the list of supported SerializationProvider id and for each the list of Versions supported. The first SerializationProvider id is preferred and the latest versions are preferred. A version is more recent if majorA > majorB and with majorA = majorB if minorA > minorB. Node B receives the handshake message and returns the appropriate serialization provider id and version. Subsequent messages are exchanged with this accepted version between A and B bq. Is the JGroups clustering using multicast to send change messages ie does it know which node it sends the message to to do the handshake? bq. What happens if B goes down and back up? Does it have a "new" name that uniquely identify it? h1. API changes SerializationProvider will need the following adjustments: * a getSupportedVersions() * a getSerializer(Version) * a getDeserializer(Version) bq. could it be that Serializer / Deserializer / LuceneWorksBuilder lead to the inability to support a version n-1 (by adding of new methods or stuff like that?

...

Discussion on how to support backward / forward compatible serialization layer ------------------------------------------------------------------------------ Key: HSEARCH-880 URL: http://opensource.atlassian.com/projects/hibernate/browse/HSEARCH-880 Project: Hibernate Search Issue Type: New Feature Components: serialization Reporter: Emmanuel Bernard Fix For: 4.0 h1. General principles The serialized message needs the following elements: * index name: to redirect the flux to the appropriate backend * serialization provider id: if not present, a cluster must make sure to use the same SerializationProvider for a given IndexManager * protocol version: today the version is major.minor where the major increase means incompatibility at the stream level, whereas minor means compatibility but with missing features * stream: this is the SerializationProvider specific byte[] bq. Do we need a serialization provider id? In other words, do we need to be able to hot-upgrade the SerializationProvider in a cluster? h1. Exchanging messages in an heterogeneous cluster h2. Cluster with one way communication (JMS) In this case the master receives a message and must try and process it. Receives an index name + serial provider id. Use the serial provider id to deserialize the message. If message_major > node_major, the serialization provider fails If message_minor > node_minor, the serialization provider proceeds but some features might not be supported and the deserialization might fail. bq. this requires to send the Avro schema with each message which would be a huge loss to support message_minor > node_minor In the minor bump case: * some feature might not be deserialized and simply ignored. A user is aware of the list of features differences between each node. * the stream might not be readable by an old version after all due to the use of some new features => Exception If message_major or message_minor < node_major or node_minor, we use the older protocol deserializer. bq. could there ever be a problem where a new HSearch Engine cannot deal with an old HSearch engine's message? h2. Cluster with two way communication (JGroups) Each time a node A needs to send a message to a node B for the first time. It sends the list of supported SerializationProvider id and for each the list of Versions supported. The first SerializationProvider id is preferred and the latest versions are preferred. A version is more recent if majorA > majorB and with majorA = majorB if minorA > minorB. Node B receives the handshake message and returns the appropriate serialization provider id and version. Subsequent messages are exchanged with this accepted version between A and B bq. Is the JGroups clustering using multicast to send change messages ie does it know which node it sends the message to to do the handshake? bq. What happens if B goes down and back up? Does it have a "new" name that uniquely identify it? h1. API changes SerializationProvider will need the following adjustments: * a getSupportedVersions() * a getSerializer(Version) * a getDeserializer(Version) bq. could it be that Serializer / Deserializer / LuceneWorksBuilder lead to the inability to support a version n-1 (by adding of new methods or stuff like that?

-- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira

Sanne Grinovero (JIRA)

11:14 a.m.

New subject: [Hibernate-JIRA] Commented: (HSEARCH-880) Discussion on how to support backward / forward compatible serialization layer

[ http://opensource.atlassian.com/projects/hibernate/browse/HSEARCH-880?pag... ] Sanne Grinovero commented on HSEARCH-880: ----------------------------------------- Some first-impact questions: h3. General principles - protocol version Should the protocol version be a global number across serialization providers? Should it be mandatory for each serialization provider? I guess that each provider should be free to handle it as it thinks best. I'd of course support the idea that our reference providers should have versioning, but I wouldn't impose this to other implementors. h3. What is a _serialization provider id_ in practice? I'd say it should be short as it's transmitted, but also if these ids are global (scoped to the SearchFactory) then ids should be able to be assigned dynamically, making me think about a fully qualified class name of an implementor. Proposal: remove it, and consider the serialization provider coupled to the IndexManager (identified by the index name already). As far as dynamic configuration goes, we'll support the option to start/stop new IndexManagers but not to reconfigure an existing one (at least not without stop+start). bq. Do we need a serialization provider id? In other words, do we need to be able to hot-upgrade the SerializationProvider in a cluster? Exactly, I would say no for the reasons I just mentioned. h3. Cluster with one way communication - minor bump case you say it's allowed to fail when a new feature is being used. Is transmitting a new feature not something that we're supposed to bump the mayor version for? Maybe an example could clarify. Is our switch from a Delete+Add LuceneWork into an Update LuceneWork something you would bump the mayor version for? I would rather have expected to have an option on the sender side to send "backwards compatible messages", i.e. convert each Update to the couple of operations. So people could define a version number in their configuration, then update the software but have it still send messages the old way. bq. If message_major or message_minor < node_major or node_minor, we use the older protocol deserializer. In practice, how are "older" implementations loaded by the factory? I won't assume with a classloader really loading the older jar? duplicating the packages into different names for each byte-format change? bq. could there ever be a problem where a new HSearch Engine cannot deal with an old HSearch engine's message? I think we should always be able to compensate, in theory. The problem is how to compensate with our mistakes, i.e. how should the engine deal with the fact that we might not do it in practice: even in the best effort we might miss to test unexpected message combinations. bq. Is the JGroups clustering using multicast to send change messages ie does it know which node it sends the message to to do the handshake? It should be unicast, being a point-to-point communication (for each backend we can't have more than one IndexWriter). bq. What happens if B goes down and back up? Does it have a "new" name that uniquely identify it? What do you mean by _name_ ? JGroups networks are identified mainly by their network address, and a string defined in the configuration named _cluster name_. So if it's the same node coming up again, it will have the same name (assuming I understood your question). It becomes more tricky if a different node takes over the role of Master, and happens to have different protocol versions. We will receive an event when the cluster elements change, and then we should start a new handshake. bq. could it be that Serializer / Deserializer / LuceneWorksBuilder lead to the inability to support a version n-1 (by adding of new methods or stuff like that? Didn't understand this question. you mean we won't be able to support the previous mayor version?

...

Discussion on how to support backward / forward compatible serialization layer ------------------------------------------------------------------------------ Key: HSEARCH-880 URL: http://opensource.atlassian.com/projects/hibernate/browse/HSEARCH-880 Project: Hibernate Search Issue Type: New Feature Components: serialization Reporter: Emmanuel Bernard Fix For: 4.0 h1. General principles The serialized message needs the following elements: * index name: to redirect the flux to the appropriate backend * serialization provider id: if not present, a cluster must make sure to use the same SerializationProvider for a given IndexManager * protocol version: today the version is major.minor where the major increase means incompatibility at the stream level, whereas minor means compatibility but with missing features * stream: this is the SerializationProvider specific byte[] bq. Do we need a serialization provider id? In other words, do we need to be able to hot-upgrade the SerializationProvider in a cluster? h1. Exchanging messages in an heterogeneous cluster h2. Cluster with one way communication (JMS) In this case the master receives a message and must try and process it. Receives an index name + serial provider id. Use the serial provider id to deserialize the message. If message_major > node_major, the serialization provider fails If message_minor > node_minor, the serialization provider proceeds but some features might not be supported and the deserialization might fail. bq. this requires to send the Avro schema with each message which would be a huge loss to support message_minor > node_minor In the minor bump case: * some feature might not be deserialized and simply ignored. A user is aware of the list of features differences between each node. * the stream might not be readable by an old version after all due to the use of some new features => Exception If message_major or message_minor < node_major or node_minor, we use the older protocol deserializer. bq. could there ever be a problem where a new HSearch Engine cannot deal with an old HSearch engine's message? h2. Cluster with two way communication (JGroups) Each time a node A needs to send a message to a node B for the first time. It sends the list of supported SerializationProvider id and for each the list of Versions supported. The first SerializationProvider id is preferred and the latest versions are preferred. A version is more recent if majorA > majorB and with majorA = majorB if minorA > minorB. Node B receives the handshake message and returns the appropriate serialization provider id and version. Subsequent messages are exchanged with this accepted version between A and B bq. Is the JGroups clustering using multicast to send change messages ie does it know which node it sends the message to to do the handshake? bq. What happens if B goes down and back up? Does it have a "new" name that uniquely identify it? h1. API changes SerializationProvider will need the following adjustments: * a getSupportedVersions() * a getSerializer(Version) * a getDeserializer(Version) bq. could it be that Serializer / Deserializer / LuceneWorksBuilder lead to the inability to support a version n-1 (by adding of new methods or stuff like that?

-- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira

Emmanuel Bernard (JIRA)

Monday, 29 August Mon, 29 Aug

3:34 a.m.

New subject: [Hibernate-JIRA] Commented: (HSEARCH-880) Discussion on how to support backward / forward compatible serialization layer

[ http://opensource.atlassian.com/projects/hibernate/browse/HSEARCH-880?pag... ] Emmanuel Bernard commented on HSEARCH-880: ------------------------------------------ h2. General principles - protocol version {quote}Should the protocol version be a global number across serialization providers? Should it be mandatory for each serialization provider? I guess that each provider should be free to handle it as it thinks best. I'd of course support the idea that our reference providers should have versioning, but I wouldn't impose this to other implementors. {quote} The version should not be a global number across all providers but specific to each provider. That's one of the reason for the serialization provider id. However, you cannot make the version number opaque if you want to implement a handshake in the two way communication case. That is unless you want each protocol to implement the handshake but I don't think that would be a good idea.

...

Discussion on how to support backward / forward compatible serialization layer ------------------------------------------------------------------------------ Key: HSEARCH-880 URL: http://opensource.atlassian.com/projects/hibernate/browse/HSEARCH-880 Project: Hibernate Search Issue Type: New Feature Components: serialization Reporter: Emmanuel Bernard Fix For: 4.0 h1. General principles The serialized message needs the following elements: * index name: to redirect the flux to the appropriate backend * serialization provider id: if not present, a cluster must make sure to use the same SerializationProvider for a given IndexManager * protocol version: today the version is major.minor where the major increase means incompatibility at the stream level, whereas minor means compatibility but with missing features * stream: this is the SerializationProvider specific byte[] bq. Do we need a serialization provider id? In other words, do we need to be able to hot-upgrade the SerializationProvider in a cluster? h1. Exchanging messages in an heterogeneous cluster h2. Cluster with one way communication (JMS) In this case the master receives a message and must try and process it. Receives an index name + serial provider id. Use the serial provider id to deserialize the message. If message_major > node_major, the serialization provider fails If message_minor > node_minor, the serialization provider proceeds but some features might not be supported and the deserialization might fail. bq. this requires to send the Avro schema with each message which would be a huge loss to support message_minor > node_minor In the minor bump case: * some feature might not be deserialized and simply ignored. A user is aware of the list of features differences between each node. * the stream might not be readable by an old version after all due to the use of some new features => Exception If message_major or message_minor < node_major or node_minor, we use the older protocol deserializer. bq. could there ever be a problem where a new HSearch Engine cannot deal with an old HSearch engine's message? h2. Cluster with two way communication (JGroups) Each time a node A needs to send a message to a node B for the first time. It sends the list of supported SerializationProvider id and for each the list of Versions supported. The first SerializationProvider id is preferred and the latest versions are preferred. A version is more recent if majorA > majorB and with majorA = majorB if minorA > minorB. Node B receives the handshake message and returns the appropriate serialization provider id and version. Subsequent messages are exchanged with this accepted version between A and B bq. Is the JGroups clustering using multicast to send change messages ie does it know which node it sends the message to to do the handshake? bq. What happens if B goes down and back up? Does it have a "new" name that uniquely identify it? h1. API changes SerializationProvider will need the following adjustments: * a getSupportedVersions() * a getSerializer(Version) * a getDeserializer(Version) bq. could it be that Serializer / Deserializer / LuceneWorksBuilder lead to the inability to support a version n-1 (by adding of new methods or stuff like that?

-- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira

Emmanuel Bernard (JIRA)

3:43 a.m.

New subject: [Hibernate-JIRA] Commented: (HSEARCH-880) Discussion on how to support backward / forward compatible serialization layer

[ http://opensource.atlassian.com/projects/hibernate/browse/HSEARCH-880?pag... ] Emmanuel Bernard commented on HSEARCH-880: ------------------------------------------ h2. What is a serialization provider id in practice? {quote}I'd say it should be short as it's transmitted, but also if these ids are global (scoped to the SearchFactory) then ids should be able to be assigned dynamically, making me think about a fully qualified class name of an implementor. Proposal: remove it, and consider the serialization provider coupled to the IndexManager (identified by the index name already). As far as dynamic configuration goes, we'll support the option to start/stop new IndexManagers but not to reconfigure an existing one (at least not without stop+start). Do we need a serialization provider id? In other words, do we need to be able to hot-upgrade the SerializationProvider in a cluster? Exactly, I would say no for the reasons I just mentioned.{quote} Dynamically assigning them is scaring me as the numbers might not be unique across a cluster or upon restart. It also makes debugging very hard as these numbers would change "at random". Also, if someone wants to change the serialization provider, it needs to stop the whole cluster making sure the master has processed all remaining messages before changing the configuration and restarting the cluster. The serialization id could prevent that. Also as discussed in [the previous comment|#comment-43371], the serialization id helps differentiate a version number from another provider

...

Discussion on how to support backward / forward compatible serialization layer ------------------------------------------------------------------------------ Key: HSEARCH-880 URL: http://opensource.atlassian.com/projects/hibernate/browse/HSEARCH-880 Project: Hibernate Search Issue Type: New Feature Components: serialization Reporter: Emmanuel Bernard Fix For: 4.0 h1. General principles The serialized message needs the following elements: * index name: to redirect the flux to the appropriate backend * serialization provider id: if not present, a cluster must make sure to use the same SerializationProvider for a given IndexManager * protocol version: today the version is major.minor where the major increase means incompatibility at the stream level, whereas minor means compatibility but with missing features * stream: this is the SerializationProvider specific byte[] bq. Do we need a serialization provider id? In other words, do we need to be able to hot-upgrade the SerializationProvider in a cluster? h1. Exchanging messages in an heterogeneous cluster h2. Cluster with one way communication (JMS) In this case the master receives a message and must try and process it. Receives an index name + serial provider id. Use the serial provider id to deserialize the message. If message_major > node_major, the serialization provider fails If message_minor > node_minor, the serialization provider proceeds but some features might not be supported and the deserialization might fail. bq. this requires to send the Avro schema with each message which would be a huge loss to support message_minor > node_minor In the minor bump case: * some feature might not be deserialized and simply ignored. A user is aware of the list of features differences between each node. * the stream might not be readable by an old version after all due to the use of some new features => Exception If message_major or message_minor < node_major or node_minor, we use the older protocol deserializer. bq. could there ever be a problem where a new HSearch Engine cannot deal with an old HSearch engine's message? h2. Cluster with two way communication (JGroups) Each time a node A needs to send a message to a node B for the first time. It sends the list of supported SerializationProvider id and for each the list of Versions supported. The first SerializationProvider id is preferred and the latest versions are preferred. A version is more recent if majorA > majorB and with majorA = majorB if minorA > minorB. Node B receives the handshake message and returns the appropriate serialization provider id and version. Subsequent messages are exchanged with this accepted version between A and B bq. Is the JGroups clustering using multicast to send change messages ie does it know which node it sends the message to to do the handshake? bq. What happens if B goes down and back up? Does it have a "new" name that uniquely identify it? h1. API changes SerializationProvider will need the following adjustments: * a getSupportedVersions() * a getSerializer(Version) * a getDeserializer(Version) bq. could it be that Serializer / Deserializer / LuceneWorksBuilder lead to the inability to support a version n-1 (by adding of new methods or stuff like that?

-- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira

Sanne Grinovero (JIRA)

3:57 a.m.

New subject: [Hibernate-JIRA] Commented: (HSEARCH-880) Discussion on how to support backward / forward compatible serialization layer

[ http://opensource.atlassian.com/projects/hibernate/browse/HSEARCH-880?pag... ] Sanne Grinovero commented on HSEARCH-880: ----------------------------------------- {quote}Dynamically assigning them is scaring me as the numbers might not be unique across a cluster or upon restart. It also makes debugging very hard as these numbers would change "at random". Also, if someone wants to change the serialization provider, it needs to stop the whole cluster making sure the master has processed all remaining messages before changing the configuration and restarting the cluster. The serialization id could prevent that. Also as discussed in the previous comment, the serialization id helps differentiate a version number from another provider {quote} I think we agree on the fact that dynamic assignments is scary and hard to debug: my first paragraph is exactly to make the point that in case you want them they should be dynamic, and that brings a complexity, so I think we should not support more than a single SerializationProvider (per IndexManager), making it unnecessary to have such ids. People will be able to upgrade serialization versions of the same type, but switching dynamically from one technology to another doesn't seem worth the effort.

...

Discussion on how to support backward / forward compatible serialization layer ------------------------------------------------------------------------------ Key: HSEARCH-880 URL: http://opensource.atlassian.com/projects/hibernate/browse/HSEARCH-880 Project: Hibernate Search Issue Type: New Feature Components: serialization Reporter: Emmanuel Bernard Fix For: 4.0 h1. General principles The serialized message needs the following elements: * index name: to redirect the flux to the appropriate backend * serialization provider id: if not present, a cluster must make sure to use the same SerializationProvider for a given IndexManager * protocol version: today the version is major.minor where the major increase means incompatibility at the stream level, whereas minor means compatibility but with missing features * stream: this is the SerializationProvider specific byte[] bq. Do we need a serialization provider id? In other words, do we need to be able to hot-upgrade the SerializationProvider in a cluster? h1. Exchanging messages in an heterogeneous cluster h2. Cluster with one way communication (JMS) In this case the master receives a message and must try and process it. Receives an index name + serial provider id. Use the serial provider id to deserialize the message. If message_major > node_major, the serialization provider fails If message_minor > node_minor, the serialization provider proceeds but some features might not be supported and the deserialization might fail. bq. this requires to send the Avro schema with each message which would be a huge loss to support message_minor > node_minor In the minor bump case: * some feature might not be deserialized and simply ignored. A user is aware of the list of features differences between each node. * the stream might not be readable by an old version after all due to the use of some new features => Exception If message_major or message_minor < node_major or node_minor, we use the older protocol deserializer. bq. could there ever be a problem where a new HSearch Engine cannot deal with an old HSearch engine's message? h2. Cluster with two way communication (JGroups) Each time a node A needs to send a message to a node B for the first time. It sends the list of supported SerializationProvider id and for each the list of Versions supported. The first SerializationProvider id is preferred and the latest versions are preferred. A version is more recent if majorA > majorB and with majorA = majorB if minorA > minorB. Node B receives the handshake message and returns the appropriate serialization provider id and version. Subsequent messages are exchanged with this accepted version between A and B bq. Is the JGroups clustering using multicast to send change messages ie does it know which node it sends the message to to do the handshake? bq. What happens if B goes down and back up? Does it have a "new" name that uniquely identify it? h1. API changes SerializationProvider will need the following adjustments: * a getSupportedVersions() * a getSerializer(Version) * a getDeserializer(Version) bq. could it be that Serializer / Deserializer / LuceneWorksBuilder lead to the inability to support a version n-1 (by adding of new methods or stuff like that?

-- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira

Emmanuel Bernard (JIRA)

4:16 a.m.

New subject: [Hibernate-JIRA] Commented: (HSEARCH-880) Discussion on how to support backward / forward compatible serialization layer

[ http://opensource.atlassian.com/projects/hibernate/browse/HSEARCH-880?pag... ] Emmanuel Bernard commented on HSEARCH-880: ------------------------------------------ h2.Cluster with one way communication - minor bump case {quote} you say it's allowed to fail when a new feature is being used. Is transmitting a new feature not something that we're supposed to bump the mayor version for? {quote} The reason for bumping the protocol version was related to the inability to deserialize it with a previous version. Let's say, Lucene introduces a new way of optimizing or a new Fieldable type. You can safely parse the message (at least with Avro) and you can process the message safely as long as the new optimization operation and the new Fieldable type is not used. Note that your Delete+Add example is so impacting that we might want to bump the version number anyways. Let me return the question, when would you only bump the minor version? In which scenario? {quote}I would rather have expected to have an option on the sender side to send "backwards compatible messages", i.e. convert each Update to the couple of operations. So people could define a version number in their configuration, then update the software but have it still send messages the old way. {quote} That's kind of orthogonal and on a case by case basis. Update has a backward compatible mode, not all operations have. Imagine boost did not exist and is introduced in Lucene, if the user makes use of boost, it has to fail on the other side. If it does not we can process it. {quote} bq. If message_major or message_minor < node_major or node_minor, we use the older protocol deserializer. In practice, how are "older" implementations loaded by the factory? I won't assume with a classloader really loading the older jar? duplicating the packages into different names for each byte-format change? {quote} I have no clear idea to be honest. It depends on the serialization provider. For Avro, we probably would have different versions of Works.avpr and the corresponding .avro files. Should the parsong code be different? Maybe in some case but not in all. For a provider like the JavaSerializationProvider, then yes you need different packages (assuming you don't play with read/writeObject. {quote} could there ever be a problem where a new HSearch Engine cannot deal with an old HSearch engine's message? I think we should always be able to compensate, in theory. The problem is how to compensate with our mistakes, i.e. how should the engine deal with the fact that we might not do it in practice: even in the best effort we might miss to test unexpected message combinations. {quote} Right. The idea IMO is to allow people to migrate smoothly from HSearch n to n+1 (probably not for major bumps even). We won't try to support reading messages v 1.0 when we will be at 23.45 :o) {quote} bq. What happens if B goes down and back up? Does it have a "new" name that uniquely identify it? What do you mean by name ? JGroups networks are identified mainly by their network address, and a string defined in the configuration named cluster name. So if it's the same node coming up again, it will have the same name (assuming I understood your question). It becomes more tricky if a different node takes over the role of Master, and happens to have different protocol versions. We will receive an event when the cluster elements change, and then we should start a new handshake. {quote} Right my concern was that everytime a node goes back up, a new handshake has to be initiated to potentially update the protocol version. {quote} bq. could it be that Serializer / Deserializer / LuceneWorksBuilder lead to the inability to support a version n-1 (by adding of new methods or stuff like that? Didn't understand this question. you mean we won't be able to support the previous mayor version? {quote} It depends. What I am saying is that if you change, remove or add a new essential method to Serializer / Deserializer / LuceneWorksBuilder, you need toimplement the behavior related to this method on all protocols you aim to support. This is a legacy cost. So my question was, could it be that we might not be able to do that at time. I'm trying to think of scenarios where the model breaks.

...

Discussion on how to support backward / forward compatible serialization layer ------------------------------------------------------------------------------ Key: HSEARCH-880 URL: http://opensource.atlassian.com/projects/hibernate/browse/HSEARCH-880 Project: Hibernate Search Issue Type: New Feature Components: serialization Reporter: Emmanuel Bernard Fix For: 4.0 h1. General principles The serialized message needs the following elements: * index name: to redirect the flux to the appropriate backend * serialization provider id: if not present, a cluster must make sure to use the same SerializationProvider for a given IndexManager * protocol version: today the version is major.minor where the major increase means incompatibility at the stream level, whereas minor means compatibility but with missing features * stream: this is the SerializationProvider specific byte[] bq. Do we need a serialization provider id? In other words, do we need to be able to hot-upgrade the SerializationProvider in a cluster? h1. Exchanging messages in an heterogeneous cluster h2. Cluster with one way communication (JMS) In this case the master receives a message and must try and process it. Receives an index name + serial provider id. Use the serial provider id to deserialize the message. If message_major > node_major, the serialization provider fails If message_minor > node_minor, the serialization provider proceeds but some features might not be supported and the deserialization might fail. bq. this requires to send the Avro schema with each message which would be a huge loss to support message_minor > node_minor In the minor bump case: * some feature might not be deserialized and simply ignored. A user is aware of the list of features differences between each node. * the stream might not be readable by an old version after all due to the use of some new features => Exception If message_major or message_minor < node_major or node_minor, we use the older protocol deserializer. bq. could there ever be a problem where a new HSearch Engine cannot deal with an old HSearch engine's message? h2. Cluster with two way communication (JGroups) Each time a node A needs to send a message to a node B for the first time. It sends the list of supported SerializationProvider id and for each the list of Versions supported. The first SerializationProvider id is preferred and the latest versions are preferred. A version is more recent if majorA > majorB and with majorA = majorB if minorA > minorB. Node B receives the handshake message and returns the appropriate serialization provider id and version. Subsequent messages are exchanged with this accepted version between A and B bq. Is the JGroups clustering using multicast to send change messages ie does it know which node it sends the message to to do the handshake? bq. What happens if B goes down and back up? Does it have a "new" name that uniquely identify it? h1. API changes SerializationProvider will need the following adjustments: * a getSupportedVersions() * a getSerializer(Version) * a getDeserializer(Version) bq. could it be that Serializer / Deserializer / LuceneWorksBuilder lead to the inability to support a version n-1 (by adding of new methods or stuff like that?

-- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira

Sanne Grinovero (JIRA)

4:34 a.m.

New subject: [Hibernate-JIRA] Commented: (HSEARCH-880) Discussion on how to support backward / forward compatible serialization layer

[ http://opensource.atlassian.com/projects/hibernate/browse/HSEARCH-880?pag... ] Sanne Grinovero commented on HSEARCH-880: ----------------------------------------- All clear for me on the previous question, some doubts on the last two points: bq. Right my concern was that everytime a node goes back up, a new handshake has to be initiated to potentially update the protocol version. I was assuming this is unavoidable, and therefore I was expecting the class-id mapping from the review of pull/133 to fit nicely as part of the information sent over during the handshake, having all nodes agree on the protocol version to use && the class-id mapping. bq. It depends. What I am saying is that if you change, remove or add a new essential method to Serializer / Deserializer / LuceneWorksBuilder, you need toimplement the behavior related to this method on all protocols you aim to support. This is a legacy cost. So my question was, could it be that we might not be able to do that at time. I'm trying to think of scenarios where the model breaks. Right, but I thought we have chosen to maintain the Avro implementation only for now; iff after some experience we realize that an alternative based on X would e much better we might have a second implementation but we should always keep the number of maintained serializers very limited.

...

Discussion on how to support backward / forward compatible serialization layer ------------------------------------------------------------------------------ Key: HSEARCH-880 URL: http://opensource.atlassian.com/projects/hibernate/browse/HSEARCH-880 Project: Hibernate Search Issue Type: New Feature Components: serialization Reporter: Emmanuel Bernard Fix For: 4.0 h1. General principles The serialized message needs the following elements: * index name: to redirect the flux to the appropriate backend * serialization provider id: if not present, a cluster must make sure to use the same SerializationProvider for a given IndexManager * protocol version: today the version is major.minor where the major increase means incompatibility at the stream level, whereas minor means compatibility but with missing features * stream: this is the SerializationProvider specific byte[] bq. Do we need a serialization provider id? In other words, do we need to be able to hot-upgrade the SerializationProvider in a cluster? h1. Exchanging messages in an heterogeneous cluster h2. Cluster with one way communication (JMS) In this case the master receives a message and must try and process it. Receives an index name + serial provider id. Use the serial provider id to deserialize the message. If message_major > node_major, the serialization provider fails If message_minor > node_minor, the serialization provider proceeds but some features might not be supported and the deserialization might fail. bq. this requires to send the Avro schema with each message which would be a huge loss to support message_minor > node_minor In the minor bump case: * some feature might not be deserialized and simply ignored. A user is aware of the list of features differences between each node. * the stream might not be readable by an old version after all due to the use of some new features => Exception If message_major or message_minor < node_major or node_minor, we use the older protocol deserializer. bq. could there ever be a problem where a new HSearch Engine cannot deal with an old HSearch engine's message? h2. Cluster with two way communication (JGroups) Each time a node A needs to send a message to a node B for the first time. It sends the list of supported SerializationProvider id and for each the list of Versions supported. The first SerializationProvider id is preferred and the latest versions are preferred. A version is more recent if majorA > majorB and with majorA = majorB if minorA > minorB. Node B receives the handshake message and returns the appropriate serialization provider id and version. Subsequent messages are exchanged with this accepted version between A and B bq. Is the JGroups clustering using multicast to send change messages ie does it know which node it sends the message to to do the handshake? bq. What happens if B goes down and back up? Does it have a "new" name that uniquely identify it? h1. API changes SerializationProvider will need the following adjustments: * a getSupportedVersions() * a getSerializer(Version) * a getDeserializer(Version) bq. could it be that Serializer / Deserializer / LuceneWorksBuilder lead to the inability to support a version n-1 (by adding of new methods or stuff like that?

-- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira

Emmanuel Bernard (JIRA)

Friday, 4 November Fri, 4 Nov

5:11 a.m.

New subject: [Hibernate-JIRA] Updated: (HSEARCH-880) Discussion on how to support backward / forward compatible serialization layer

[ http://opensource.atlassian.com/projects/hibernate/browse/HSEARCH-880?pag... ] Emmanuel Bernard updated HSEARCH-880: ------------------------------------- Fix Version/s: (was: 4.0.0.Final) Revisit for a later version

...

Discussion on how to support backward / forward compatible serialization layer ------------------------------------------------------------------------------ Key: HSEARCH-880 URL: http://opensource.atlassian.com/projects/hibernate/browse/HSEARCH-880 Project: Hibernate Search Issue Type: New Feature Components: serialization Reporter: Emmanuel Bernard h1. General principles The serialized message needs the following elements: * index name: to redirect the flux to the appropriate backend * serialization provider id: if not present, a cluster must make sure to use the same SerializationProvider for a given IndexManager * protocol version: today the version is major.minor where the major increase means incompatibility at the stream level, whereas minor means compatibility but with missing features * stream: this is the SerializationProvider specific byte[] bq. Do we need a serialization provider id? In other words, do we need to be able to hot-upgrade the SerializationProvider in a cluster? h1. Exchanging messages in an heterogeneous cluster h2. Cluster with one way communication (JMS) In this case the master receives a message and must try and process it. Receives an index name + serial provider id. Use the serial provider id to deserialize the message. If message_major > node_major, the serialization provider fails If message_minor > node_minor, the serialization provider proceeds but some features might not be supported and the deserialization might fail. bq. this requires to send the Avro schema with each message which would be a huge loss to support message_minor > node_minor In the minor bump case: * some feature might not be deserialized and simply ignored. A user is aware of the list of features differences between each node. * the stream might not be readable by an old version after all due to the use of some new features => Exception If message_major or message_minor < node_major or node_minor, we use the older protocol deserializer. bq. could there ever be a problem where a new HSearch Engine cannot deal with an old HSearch engine's message? h2. Cluster with two way communication (JGroups) Each time a node A needs to send a message to a node B for the first time. It sends the list of supported SerializationProvider id and for each the list of Versions supported. The first SerializationProvider id is preferred and the latest versions are preferred. A version is more recent if majorA > majorB and with majorA = majorB if minorA > minorB. Node B receives the handshake message and returns the appropriate serialization provider id and version. Subsequent messages are exchanged with this accepted version between A and B bq. Is the JGroups clustering using multicast to send change messages ie does it know which node it sends the message to to do the handshake? bq. What happens if B goes down and back up? Does it have a "new" name that uniquely identify it? h1. API changes SerializationProvider will need the following adjustments: * a getSupportedVersions() * a getSerializer(Version) * a getDeserializer(Version) bq. could it be that Serializer / Deserializer / LuceneWorksBuilder lead to the inability to support a version n-1 (by adding of new methods or stuff like that?

-- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira

5285

days inactive

5355

days old

hibernate-issues@lists.jboss.org

Manage subscription

10 comments

2 participants

tags (0)

participants (2)

Emmanuel Bernard (JIRA)
Sanne Grinovero (JIRA)

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

[Hibernate-JIRA] Created: (HSEARCH-880) Discussion on how to support backward / forward compatible serialization layer