[keycloak-user] Database problems running a clustered multi-site keycloak on MariaDB

Mon Oct 7 05:33:16 EDT 2019

Hello Alistair,

On Fri, Oct 4, 2019 at 4:20 PM Doswald Alistair <alistair.doswald at elca.ch>
wrote:

> Hello,
>
> We're running into some important errors when running a keycloak on a
> multi-site cluster with MariaDB as our multi-master database. We have a
> setup similar to
> https://www.keycloak.org/docs/latest/server_installation/index.html#crossdc-mode,
> with keycloak 7.0.0 and MariaDB 10.1.37. Each site will write to its own
> database cluster, and we thought that MariaDB would handle the replication
> and transactions correctly.
>
> It works well, until we get the following types of errors on the database,
> and then everything crashes:
>
> 2019-10-03 14:09:46 140205469263616 [ERROR] Slave SQL: Could not execute
> Delete_rows_v1 event on table cloudtrust-int-keycloak.EVENT_ENTITY; Can't
> find record in 'EVENT_ENTITY', Error_code: 1032; handler error
> HA_ERR_KEY_NOT_FOUND; the event's master log FIRST, end_log_pos 883,
> Internal MariaDB error code: 1032
>

See MDEV-15405 <https://jira.mariadb.org/browse/MDEV-15405> -- can you
possibly retry with MariaDB 10.3.5+ if the issue is still there?

If the MariaDB upgrade doesn't help, I would retry with "showSql" enabled
(start Keycloak with "*-Dkeycloak.connectionsJpa.showSql=true*"),
reproduce the issue again & try to isolate the SQL statement / set of SQL
statements, which is leading to this state. Maybe after
couple of times repeating the scenario / crash, such set can be identified.

Having that SQL statements set identified, the question is:

   - If this is anoter MariaDB bug (hitting the same error msg & error
   code) via those SQL statements (thus something to be fixed on MariaDB
   side), or
   - If this is serialization issue of some kind (.. it happens sometimes
   because SQL slave failed to ...) These circumstances would need to be
   identified.

> 2019-10-03 14:09:46 140205469263616 [Warning] WSREP: RBR event 2
> Delete_rows_v1 apply warning: 120, 591931
> 2019-10-03 14:09:46 140205469263616 [Warning] WSREP: Failed to apply app
> buffer: seqno: 591931, status: 1
>          at galera/src/trx_handle.cpp:apply():351
> Retrying 4th time
> 2019-10-03 14:09:46 140205469263616 [ERROR] Slave SQL: Could not execute
> Delete_rows_v1 event on table cloudtrust-int-keycloak.EVENT_ENTITY; Can't
> find record in 'EVENT_ENTITY', Error_code: 1032; handler error
> HA_ERR_KEY_NOT_FOUND; the event's master log FIRST, end_log_pos 883,
> Internal MariaDB error code: 1032
> 2019-10-03 14:09:46 140205469263616 [Warning] WSREP: RBR event 2
> Delete_rows_v1 apply warning: 120, 591931
> 2019-10-03 14:09:46 140205469263616 [ERROR] WSREP: Failed to apply trx:
> source: 4f98589f-e5bd-11e9-9eb9-12b92fd5aeef version: 3 local: 0 state:
> APPLYING flags: 1 conn_id: 395 trx_id: 991166 seqnos (l: 18625, g: 591931,
> s: 591930, d: 584704, ts: 31567167461519)
> 2019-10-03 14:09:46 140205469263616 [ERROR] WSREP: Failed to apply trx
> 591931 4 times
> 2019-10-03 14:09:46 140205469263616 [ERROR] WSREP: Node consistency
> compromized, aborting...
> .....................
>
> >From our analysis, it seems that a transaction was not able to be
> replayed, which caused the database to shutdown to protect consistency.

Were you able to identify, at which code part this transaction deadlock
happens? After performing what action / steps? Or is it just
Keycloak is started with that setup & it happens after some time everytime?
Did you try different Keycloak / MariaDB versions?

> This can seem to happen with race conditions from multiple writes. Looking
> into it we found in the following document
> https://galeracluster.com/library/kb/trouble/multi-master-conflicts.html
> this passage "When two transactions are conflicting, the later of the two
> is rolled back by the cluster. The client application registers this
> rollback as a deadlock error. Ideally, the client application should retry
> the deadlocked transaction. However, not all client applications have this
> logic built in."
>
> Does anyone else have a similar setup? If yes, have you encountered this
> problem? Is there a known resolution?
>
> Best regards,
>
> Alistair Doswald
>

Thank you && Regards, Jan
--
Jan iankko Lieskovsky / Keycloak / RH-SSO Team

> _______________________________________________
> keycloak-user mailing list
> keycloak-user at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/keycloak-user
>