[keycloak-dev] Support concurrent startup by more cluster nodes
Marek Posolda
mposolda at redhat.com
Tue Mar 8 03:51:28 EST 2016
On 08/03/16 09:41, Stian Thorgersen wrote:
> I actually think the chance of someone killing it during upgrade is
> relatively high. It could be they forgot to include bind address or
> used wrong server config. It could be migration takes longer than they
> expect. We shouldn't require users to manually unlock.
>
> The lock should be done in association with the transaction. JPA
> provides pessimistic locks so you can do:
>
> DatabaseLockEntity lock = em.find(DatabaseLockEntity.class, "lock",
> LockModeType.PESSIMISTIC_WRITE);
Ok, I will take a look at this possibility and if reliably works with
all databases. However it will take some time though...
Marek
>
> That will work for all databases (except Mongo of course). If the
> process dies the transaction will timeout and it's safe to run again
> at that point because no chances would have been committed to the db.
>
>
>
> On 8 March 2016 at 09:22, Marek Posolda <mposolda at redhat.com
> <mailto:mposolda at redhat.com>> wrote:
>
> On 08/03/16 06:48, Stian Thorgersen wrote:
>> What about obtaining a database lock on a table/column? That
>> would automatically be freed if the transaction dies.
> You mean something like "Give me lock for table XY until end of
> transaction" ? I doubt there is some universal solution for
> something like this, which will reliably work with all databases
> which we need to support :/ Otherwise I guess liquibase would
> already use it too?
>
> Currently it works the way that lock is obtained by updating the
> column in database. Something similar to "UPDATE
> DATABASECHANGELOGLOCK set LOCKED=true where ID=1" .
> Note there is always single record in this table with ID=1.
> Something similar is done for Mongo too.
>
> The lock is released in "finally" block if something fails. The
> only possibility how can DB remains locked is if someone force to
> kill the process (like "kill -9" command, then finally blocks are
> not called) or if network connection between server and DB is
> lost. The chance of this is very low IMO and we have option to
> manually recover from this.
>
> Marek
>
>>
>> -1 To having a timeout, I agree it's dangerous and could leave
>> the DB inconsistent so we shouldn't do it
>>
>> On 7 March 2016 at 21:59, Marek Posolda <mposolda at redhat.com
>> <mailto:mposolda at redhat.com>> wrote:
>>
>> Then the record in DB will remain locked and needs to be
>> fixed manually. Actually the same behaviour like liquibase.
>> The possibilities to repair from this state is:
>> - Run keycloak with system property
>> "-Dkeycloak.dblock.forceUnlock=true" . Then Keycloak will
>> release the existing lock at startup and acquire new lock.
>> The warning is written to server.log that this property
>> should be used carefully just to repair DB
>> - Manually delete lock record from DATABASECHANGELOGLOCK
>> table (or "dblock" collection in mongo)
>>
>> The other possibility is that after timeout, node2 will
>> assume the current lock is timed-out and will forcefully
>> release existing lock and replace with it's own lock. However
>> I didn't it this way as it's potentially dangerous though -
>> there is some chance that 2 nodes run migration or import at
>> the same time and DB will end in inconsistent state. Or is it
>> acceptable risk?
>>
>> Marek
>>
>>
>>
>> On 07/03/16 19:50, Stian Thorgersen wrote:
>>> 900 seconds is probably ok, but what happens if the node
>>> holding the lock dies?
>>>
>>> On 7 March 2016 at 11:03, Marek Posolda <mposolda at redhat.com
>>> <mailto:mposolda at redhat.com>> wrote:
>>>
>>> Send PR with added support for $subject .
>>> https://github.com/keycloak/keycloak/pull/2332 .
>>>
>>> Few details:
>>> - Added DBLockProvider, which handles acquire and
>>> release of DB lock.
>>> When lock is acquired, the cluster node2 needs to wait
>>> until node1
>>> release the lock
>>>
>>> - The lock is acquired at startup for the migrating
>>> model (both model
>>> specific and generic migration), importing realms and
>>> adding initial
>>> admin user. So this can be done always just by one node
>>> at a time.
>>>
>>> - The lock is implemented at DB level, so it works even
>>> if infinispan
>>> cluster is not correctly configured. For the JPA, I've added
>>> implementation, which is reusing liquibase DB locking
>>> with the bugfix,
>>> which prevented builtin liquibase lock to work
>>> correctly. I've added
>>> implementation for Mongo too.
>>>
>>> - Added DBLockTest, which simulates 20 threads racing
>>> for acquire lock
>>> concurrently. It's passing with all databases.
>>>
>>> - Default timeout for acquire lock is 900 seconds and
>>> the time for lock
>>> recheck is 2 seconds. So if node2 is not able to acquire
>>> lock within 900
>>> seconds, it fails to start. There is possibility to
>>> change in
>>> keycloak-server.json. Is 900 seconds too much? I was
>>> thinking about the
>>> case when there is some large realm file importing at
>>> startup.
>>>
>>> Marek
>>> _______________________________________________
>>> keycloak-dev mailing list
>>> keycloak-dev at lists.jboss.org
>>> <mailto:keycloak-dev at lists.jboss.org>
>>> https://lists.jboss.org/mailman/listinfo/keycloak-dev
>>>
>>>
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/keycloak-dev/attachments/20160308/fcca78d6/attachment-0001.html
More information about the keycloak-dev
mailing list