On 08/03/16 09:41, Stian Thorgersen wrote:
I actually think the chance of someone killing it during upgrade is relatively high. It could be they forgot to include bind address or used wrong server config. It could be migration takes longer than they expect. We shouldn't require users to manually unlock.

The lock should be done in association with the transaction. JPA provides pessimistic locks so you can do:

DatabaseLockEntity lock = em.find(DatabaseLockEntity.class, "lock", LockModeType.PESSIMISTIC_WRITE);
Ok, I will take a look at this possibility and if reliably works with all databases. However it will take some time though...


That will work for all databases (except Mongo of course). If the process dies the transaction will timeout and it's safe to run again at that point because no chances would have been committed to the db.

On 8 March 2016 at 09:22, Marek Posolda <mposolda@redhat.com> wrote:
On 08/03/16 06:48, Stian Thorgersen wrote:
What about obtaining a database lock on a table/column? That would automatically be freed if the transaction dies.
You mean something like "Give me lock for table XY until end of transaction" ? I doubt there is some universal solution for something like this, which will reliably work with all databases which we need to support :/ Otherwise I guess liquibase would already use it too?

Currently it works the way that lock is obtained by updating the column in database. Something similar to "UPDATE DATABASECHANGELOGLOCK set LOCKED=true where ID=1" .
Note there is always single record in this table with ID=1. Something similar is done for Mongo too.

The lock is released in "finally" block if something fails. The only possibility how can DB remains locked is if someone force to kill the process (like "kill -9" command, then finally blocks are not called) or if network connection between server and DB is lost. The chance of this is very low IMO and we have option to manually recover from this.


-1 To having a timeout, I agree it's dangerous and could leave the DB inconsistent so we shouldn't do it

On 7 March 2016 at 21:59, Marek Posolda <mposolda@redhat.com> wrote:
Then the record in DB will remain locked and needs to be fixed manually. Actually the same behaviour like liquibase. The possibilities to repair from this state is:
- Run keycloak with system property "-Dkeycloak.dblock.forceUnlock=true" . Then Keycloak will release the existing lock at startup and acquire new lock. The warning is written to server.log that this property should be used carefully just to repair DB
- Manually delete lock record from DATABASECHANGELOGLOCK table (or "dblock" collection in mongo)

The other possibility is that after timeout, node2 will assume the current lock is timed-out and will forcefully release existing lock and replace with it's own lock. However I didn't it this way as it's potentially dangerous though - there is some chance that 2 nodes run migration or import at the same time and DB will end in inconsistent state. Or is it acceptable risk?


On 07/03/16 19:50, Stian Thorgersen wrote:
900 seconds is probably ok, but what happens if the node holding the lock dies?

On 7 March 2016 at 11:03, Marek Posolda <mposolda@redhat.com> wrote:
Send PR with added support for $subject .
https://github.com/keycloak/keycloak/pull/2332 .

Few details:
- Added DBLockProvider, which handles acquire and release of DB lock.
When lock is acquired, the cluster node2 needs to wait until node1
release the lock

- The lock is acquired at startup for the migrating model (both model
specific and generic migration), importing realms and adding initial
admin user. So this can be done always just by one node at a time.

- The lock is implemented at DB level, so it works even if infinispan
cluster is not correctly configured. For the JPA, I've added
implementation, which is reusing liquibase DB locking with the bugfix,
which prevented builtin liquibase lock to work correctly. I've added
implementation for Mongo too.

- Added DBLockTest, which simulates 20 threads racing for acquire lock
concurrently. It's passing with all databases.

- Default timeout for acquire lock is 900 seconds and the time for lock
recheck is 2 seconds. So if node2 is not able to acquire lock within 900
seconds, it fails to start. There is possibility to change in
keycloak-server.json. Is 900 seconds too much? I was thinking about the
case when there is some large realm file importing at startup.

keycloak-dev mailing list