I'm not convinced that the proposed changes would make a significant
difference in performance. We heavily cache things and complicated queries
are usually limited to admin operations. That being said if we can do a POC
to check the performance it may be worth it, especially with large number
of entries (we recently had someone talking about 150 million users!).
As Bill points out is has to be backwards compatible and it has to be
possible to automatically perform the changes to the database. From talking
to Hynek it seems like this may be possible. Since the User Storage SPI is
being supported in RHSSO 7.1 there's also no option for us to make changes
to this, but it doesn't seem like we'd have to.
This is something that is worth considering to investigate more at some
point (first step would be to evaluate the real benefits if any), but when
we can prioritize is another question.
On 16 December 2016 at 21:30, Bill Burke <bburke(a)redhat.com> wrote:
Not every key is a simple UUID. Specifically any table that
references
a user: UserFederatedStorageProvider tables and offline tokens. THis
is because of the user storage spi. These tables must be searchable by
user id which can either be a UUID (keycloak database) or "f:" +
componentUUID + ":" + opaque_external_identifier (External User Storage
Provider database). So breaking up a User Storage SPI USER_ID isn't
going to help as you'll still need to create an index based on these
columns. Also, there's been some talk of using this ID format for other
model types like Roles and Groups.
Also, Everything has to remain backward compatible. We will need to
support existing databases. So, can you support changing to a
completely different id format and type without screwing up existing
databases??? I don't think doing a full export/import to/from JSON is
going to work with large deployments. Will this @Nationalize Hibernate
annotation allow us to turn off the SQl Server implicit unicode
conversion and send everything via ascii by default?
Finally, I'm going to freak if I have to do a lot of refactoring to the
User Storage SPI. A solution that keeps existing user storage and SPIs
backward compatible with very little work on the backend should be made
a priority instead of completely rearchitecting something so
fundamental to our datamodel.
On 12/16/16 2:12 PM, Hynek Mlnarik wrote:
> Hi All,
>
> Apologies for a long e-mail.
>
> TLDR: We need to define format of primary keys (UUID) so that it is
possible to transform the primary and foreign keys from VARCHAR(36) columns
into database-native binary format. This is in particular important to
document in 2.5 in Storage Ids section of the new User Storage SPI [1]
>
> Long version:
>
> I have looked at current database model and while generally it looks
well, there is an interesting issue with primary / foreign keys that causes
performance degradation on both Java and - most importantly - database
side, causing even deadlocks for some databases.
>
> The issue comes from database handling of IDs. IDs are in fact UUIDs,
i.e. series of bytes that are represented by Strings in KC JPA classes. Why
this causes performance degradation is due to various representations
conversions (byte array vs String in Java) and - most importantly for
database - character set conversions. In the worst case, The conversions
occur both in JDBC driver and the database. The consequences are
demonstrated by Jira issue KEYCLOAK-3210 [2] when several simultaneous
requests lead to deadlocking the database.
>
> When JDBC driver obtains a string, it converts its representation into a
character set understood by database. Database might need to convert the
string to a character set specified for the column. This is nicely
illustrated in MSSQL which makes distinction between VARCHAR (8-bit
codepages) and NVARCHAR (UCS-2 Unicode charset). IDs are VARCHARs which is
indeed an efficient way to use strings that consist of ASCII-only
characters (though not optimal for UUID, read below). However, if Unicode
characters are to be supported, MSSQL JDBC driver sends all character
parameters as Unicode Strings [3]. Database then performs a conversion from
Unicode to 8-bit charset which generally loses some data. To account for
this loss, instead of performing an index scan that directly points to a
requested row, it returns a range where the requested record should be.
This has fatal impact on performance. For more detailed analysis of the
resulting plans, see comment in [2].
>
> Clearly, the scan by id should be fast and the format of IDs in database
matters. It should avoid conversions as much as possible. Hence the
following plan came:
>
> * In the result, all primary keys and corresponding keys have to be
represented by binary UUID data type (where supported, some databases
represent UUID as e.g. VARBINARY(16)), i.e. 16 bytes instead of 36 bytes
> * All keys in the JPA classes should be of type UUID, not String
>
> As a result, database indices get smaller (16 bytes of indexed data per
record vs 36 bytes as it is now in case of 8-bit storage of characters),
and no character conversions are in place, hence the overall performance
increases.
>
> This task is a slighty big one so it won't fit into KC 2.5 timeframe,
but we should definitely aim for 3.0.
>
> This has several preconditions:
> * The String keys in keycloak JPA classes, wherever used, are restricted
to UUID format
> * This format is documented and respected by all custom implementations,
namely User Storage implementations.
> * There exists conversion from String to native UUID for used databases
(this is certainly possible for PostgreSQL MSSQL, DB2, and MySQL, most
likely others)
>
> Similarly to JPA, Infinispan classes should be revisited and optimized
to save some bytes that might be important for cluster replication by
replacing String with UUIDs
>
> Thoughts?
>
> --Hynek
>
> [1]
https://github.com/keycloak/server_development_guide/blob/
6b82f0868c0d7a148a084a30e0d8fda192f01502/topics/user-
storage/model-interfaces.adoc
> [2]
https://issues.jboss.org/browse/KEYCLOAK-3210
> [3]
https://msdn.microsoft.com/en-us/library/ms378857(v=sql.110).aspx
>
> _______________________________________________
> keycloak-dev mailing list
> keycloak-dev(a)lists.jboss.org
>
https://lists.jboss.org/mailman/listinfo/keycloak-dev
_______________________________________________
keycloak-dev mailing list
keycloak-dev(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/keycloak-dev