One other point revealed by this thread (and a couple of others) is that it is quite difficult to get enough information from the logs to accurately see why a comparison is failing - all logging that outputs passwords has been removed so that passwords are not accidentally stored in the logs but I think there are still a couple of scenarios where there is no other option than logging the values so that they can be manually compared to make sure they are as expected.
I am going to have a look at what options we have for a development environment at least to essentially evaluate the end to end authentication process so that the point of failure can be pinpointed more quickly.