[keycloak-dev] Support concurrent startup by more cluster nodes

Tue Mar 8 03:41:47 EST 2016

I actually think the chance of someone killing it during upgrade is
relatively high. It could be they forgot to include bind address or used
wrong server config. It could be migration takes longer than they expect.
We shouldn't require users to manually unlock.

The lock should be done in association with the transaction. JPA provides
pessimistic locks so you can do:

DatabaseLockEntity lock = em.find(DatabaseLockEntity.class, "lock",
LockModeType.PESSIMISTIC_WRITE);

That will work for all databases (except Mongo of course). If the process
dies the transaction will timeout and it's safe to run again at that point
because no chances would have been committed to the db.

On 8 March 2016 at 09:22, Marek Posolda <mposolda at redhat.com> wrote:

> On 08/03/16 06:48, Stian Thorgersen wrote:
>
> What about obtaining a database lock on a table/column? That would
> automatically be freed if the transaction dies.
>
> You mean something like "Give me lock for table XY until end of
> transaction" ? I doubt there is some universal solution for something like
> this, which will reliably work with all databases which we need to support
> :/ Otherwise I guess liquibase would already use it too?
>
> Currently it works the way that lock is obtained by updating the column in
> database. Something similar to "UPDATE DATABASECHANGELOGLOCK set
> LOCKED=true where ID=1" .
> Note there is always single record in this table with ID=1. Something
> similar is done for Mongo too.
>
> The lock is released in "finally" block if something fails. The only
> possibility how can DB remains locked is if someone force to kill the
> process (like "kill -9" command, then finally blocks are not called) or if
> network connection between server and DB is lost. The chance of this is
> very low IMO and we have option to manually recover from this.
>
> Marek
>
>
> -1 To having a timeout, I agree it's dangerous and could leave the DB
> inconsistent so we shouldn't do it
>
> On 7 March 2016 at 21:59, Marek Posolda <mposolda at redhat.com> wrote:
>
>> Then the record in DB will remain locked and needs to be fixed manually.
>> Actually the same behaviour like liquibase. The possibilities to repair
>> from this state is:
>> - Run keycloak with system property "-Dkeycloak.dblock.forceUnlock=true"
>> . Then Keycloak will release the existing lock at startup and acquire new
>> lock. The warning is written to server.log that this property should be
>> used carefully just to repair DB
>> - Manually delete lock record from DATABASECHANGELOGLOCK table (or
>> "dblock" collection in mongo)
>>
>> The other possibility is that after timeout, node2 will assume the
>> current lock is timed-out and will forcefully release existing lock and
>> replace with it's own lock. However I didn't it this way as it's
>> potentially dangerous though - there is some chance that 2 nodes run
>> migration or import at the same time and DB will end in inconsistent state.
>> Or is it acceptable risk?
>>
>> Marek
>>
>>
>>
>> On 07/03/16 19:50, Stian Thorgersen wrote:
>>
>> 900 seconds is probably ok, but what happens if the node holding the lock
>> dies?
>>
>> On 7 March 2016 at 11:03, Marek Posolda < <mposolda at redhat.com>
>> mposolda at redhat.com> wrote:
>>
>>> Send PR with added support for $subject .
>>> https://github.com/keycloak/keycloak/pull/2332 .
>>>
>>> Few details:
>>> - Added DBLockProvider, which handles acquire and release of DB lock.
>>> When lock is acquired, the cluster node2 needs to wait until node1
>>> release the lock
>>>
>>> - The lock is acquired at startup for the migrating model (both model
>>> specific and generic migration), importing realms and adding initial
>>> admin user. So this can be done always just by one node at a time.
>>>
>>> - The lock is implemented at DB level, so it works even if infinispan
>>> cluster is not correctly configured. For the JPA, I've added
>>> implementation, which is reusing liquibase DB locking with the bugfix,
>>> which prevented builtin liquibase lock to work correctly. I've added
>>> implementation for Mongo too.
>>>
>>> - Added DBLockTest, which simulates 20 threads racing for acquire lock
>>> concurrently. It's passing with all databases.
>>>
>>> - Default timeout for acquire lock is 900 seconds and the time for lock
>>> recheck is 2 seconds. So if node2 is not able to acquire lock within 900
>>> seconds, it fails to start. There is possibility to change in
>>> keycloak-server.json. Is 900 seconds too much? I was thinking about the
>>> case when there is some large realm file importing at startup.
>>>
>>> Marek
>>> _______________________________________________
>>> keycloak-dev mailing list
>>> keycloak-dev at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/keycloak-dev
>>>
>>
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/keycloak-dev/attachments/20160308/86d33261/attachment-0001.html