[JBoss JIRA] (DROOLS-5687) NullPointerException on second consecutive declared type update
by Matteo Casalino (Jira)
[ https://issues.redhat.com/browse/DROOLS-5687?page=com.atlassian.jira.plug... ]
Matteo Casalino commented on DROOLS-5687:
-----------------------------------------
Thank you for the quick fix! While testing it, we realized that it still breaks if an update is attempted on a uninitialized KieBase:
{noformat}
java.lang.NullPointerException
at org.drools.compiler.kie.builder.impl.AbstractKieModule.getPackage(AbstractKieModule.java:153)
at org.drools.compiler.kie.util.ChangeSetBuilder.build(ChangeSetBuilder.java:107)
at org.drools.compiler.kie.builder.impl.InternalKieModule.getChanges(InternalKieModule.java:132)
at org.drools.compiler.kie.builder.impl.KieContainerImpl.update(KieContainerImpl.java:241)
at org.drools.compiler.kie.builder.impl.KieContainerImpl.update(KieContainerImpl.java:237)
at org.drools.compiler.kie.builder.impl.KieContainerImpl.updateToVersion(KieContainerImpl.java:195)
at org.example.reproducer.KjarUpdateTest.update(KjarUpdateTest.java:37)
at org.example.reproducer.KjarUpdateTest.testConsecutiveUpdates(KjarUpdateTest.java:48)
{noformat}
see the latest attached reproducer [^npe-consecutives-declare-update-skip-kiebase-init.zip]
> NullPointerException on second consecutive declared type update
> ---------------------------------------------------------------
>
> Key: DROOLS-5687
> URL: https://issues.redhat.com/browse/DROOLS-5687
> Project: Drools
> Issue Type: Bug
> Components: core engine
> Affects Versions: 7.43.0.Final
> Reporter: Matteo Casalino
> Assignee: Mario Fusco
> Priority: Major
> Attachments: npe-consecutives-declare-update-skip-kiebase-init.zip, npe-consecutives-declare-update-skip-kiebase-init.zip, npe-consecutives-declare-update.zip
>
>
> When updating a {{KieContainer}} with {{updateToVersion}} with a change in a declared type two consecutive times, a {{NullPointerException}} is raised.
> Minimal example DRL triggering the issue (notice at least two declared types are needed):
> {noformat}
> declare FactType1
> x : int
> end
> declare FactType2
> y : int
> end
> {noformat}
> first update:
> {noformat}
> declare FactType1
> x : int
> z : int
> end
> declare FactType2
> y : int
> end
> {noformat}
> second update:
> {noformat}
> declare FactType1
> x : int
> z : int
> w : int
> end
> declare FactType2
> y : int
> end
> {noformat}
> The second update will raise the following exception:
> {noformat}
> java.lang.NullPointerException
> at org.drools.compiler.kie.util.ChangeSetBuilder.build(ChangeSetBuilder.java:106)
> at org.drools.compiler.kie.builder.impl.InternalKieModule.getChanges(InternalKieModule.java:132)
> at org.drools.compiler.kie.builder.impl.KieContainerImpl.update(KieContainerImpl.java:241)
> at org.drools.compiler.kie.builder.impl.KieContainerImpl.update(KieContainerImpl.java:237)
> at org.drools.compiler.kie.builder.impl.KieContainerImpl.updateToVersion(KieContainerImpl.java:195)
> at org.example.reproducer.KjarUpdateTest.update(KjarUpdateTest.java:37)
> at org.example.reproducer.KjarUpdateTest.testConsecutiveUpdates(KjarUpdateTest.java:48)
> {noformat}
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
5 years, 7 months
[JBoss JIRA] (DROOLS-5687) NullPointerException on second consecutive declared type update
by Matteo Casalino (Jira)
[ https://issues.redhat.com/browse/DROOLS-5687?page=com.atlassian.jira.plug... ]
Matteo Casalino updated DROOLS-5687:
------------------------------------
Attachment: npe-consecutives-declare-update-skip-kiebase-init.zip
> NullPointerException on second consecutive declared type update
> ---------------------------------------------------------------
>
> Key: DROOLS-5687
> URL: https://issues.redhat.com/browse/DROOLS-5687
> Project: Drools
> Issue Type: Bug
> Components: core engine
> Affects Versions: 7.43.0.Final
> Reporter: Matteo Casalino
> Assignee: Mario Fusco
> Priority: Major
> Attachments: npe-consecutives-declare-update-skip-kiebase-init.zip, npe-consecutives-declare-update-skip-kiebase-init.zip, npe-consecutives-declare-update.zip
>
>
> When updating a {{KieContainer}} with {{updateToVersion}} with a change in a declared type two consecutive times, a {{NullPointerException}} is raised.
> Minimal example DRL triggering the issue (notice at least two declared types are needed):
> {noformat}
> declare FactType1
> x : int
> end
> declare FactType2
> y : int
> end
> {noformat}
> first update:
> {noformat}
> declare FactType1
> x : int
> z : int
> end
> declare FactType2
> y : int
> end
> {noformat}
> second update:
> {noformat}
> declare FactType1
> x : int
> z : int
> w : int
> end
> declare FactType2
> y : int
> end
> {noformat}
> The second update will raise the following exception:
> {noformat}
> java.lang.NullPointerException
> at org.drools.compiler.kie.util.ChangeSetBuilder.build(ChangeSetBuilder.java:106)
> at org.drools.compiler.kie.builder.impl.InternalKieModule.getChanges(InternalKieModule.java:132)
> at org.drools.compiler.kie.builder.impl.KieContainerImpl.update(KieContainerImpl.java:241)
> at org.drools.compiler.kie.builder.impl.KieContainerImpl.update(KieContainerImpl.java:237)
> at org.drools.compiler.kie.builder.impl.KieContainerImpl.updateToVersion(KieContainerImpl.java:195)
> at org.example.reproducer.KjarUpdateTest.update(KjarUpdateTest.java:37)
> at org.example.reproducer.KjarUpdateTest.testConsecutiveUpdates(KjarUpdateTest.java:48)
> {noformat}
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
5 years, 7 months
[JBoss JIRA] (JGRP-2504) Poor throughput over high latency TCP connection when recv_buf_size is configured
by Bela Ban (Jira)
[ https://issues.redhat.com/browse/JGRP-2504?page=com.atlassian.jira.plugin... ]
Bela Ban commented on JGRP-2504:
--------------------------------
OK, so setting SO_RCVBUF works now, good to know. JGroups also sets SO_SNDBUF on sockets, but always before calling connect (client side). Also, setting SO_SNDBUF on a socket received as result of calling accept() apparently works. Besides, as discussed, there is no way of setting this on a ServerSocket in Java...
Thanks for your detailed analysis; always great to work with experts in the field!
Note to self: I should make myself familiar with the Linux networking code, all the more since I have kernel guys in my company that I can ask for advice!
Cheers,
> Poor throughput over high latency TCP connection when recv_buf_size is configured
> ---------------------------------------------------------------------------------
>
> Key: JGRP-2504
> URL: https://issues.redhat.com/browse/JGRP-2504
> Project: JGroups
> Issue Type: Bug
> Affects Versions: 5.0.0.Final
> Reporter: Andrew Skalski
> Assignee: Bela Ban
> Priority: Minor
> Fix For: 5.1
>
> Attachments: SpeedTest.java, bla5.java, bla6.java, bla7.java, delay-ip.sh, rcvbuf.png
>
>
> I recently finished troubleshooting a unidirectional throughput bottleneck involving a JGroups application (Infinispan) communicating over a high-latency (~45 milliseconds) TCP connection.
> The root cause was JGroups improperly configuring the receive/send buffers on the listening socket. According to the tcp(7) man page:
> {code:java}
> On individual connections, the socket buffer size must be set prior to
> the listen(2) or connect(2) calls in order to have it take effect.
> {code}
> However, JGroups does not set the buffer size on the listening side until after accept().
> The result is poor throughput when sending data from client (connecting side) to server (listening side.) Because the issue is a too-small TCP receive window, throughput is ultimately latency-bound.
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
5 years, 7 months
[JBoss JIRA] (JGRP-2506) Loads of error messages of type IllegalArgumentException: Invalid UUID string in org.jgroups.fromString(UUID.java:139) in error log
by Bela Ban (Jira)
[ https://issues.redhat.com/browse/JGRP-2506?page=com.atlassian.jira.plugin... ]
Bela Ban updated JGRP-2506:
---------------------------
Workaround Description:
Hi Valeska
thanks for the thorough analysis!
The root cause may be concurrent access (a thread reading data that is being changed by the coord), or a missing resource (like you mentioned).
I agree that JGroups can get into an endless loop here, and I need to change this code, regardless of the root cause.
Fix Version/s: 4.2.7
5.1
> Loads of error messages of type IllegalArgumentException: Invalid UUID string in org.jgroups.fromString(UUID.java:139) in error log
> -----------------------------------------------------------------------------------------------------------------------------------
>
> Key: JGRP-2506
> URL: https://issues.redhat.com/browse/JGRP-2506
> Project: JGroups
> Issue Type: Bug
> Affects Versions: 4.2.0
> Reporter: Valeska Dreßler
> Assignee: Bela Ban
> Priority: Major
> Fix For: 5.1, 4.2.7
>
>
> The customer encounters situations, when at server start or after some time one of their server starts to write very big error logs very fast, because of some jgroups issue. This can only be stopped by a server restart.
> They use AWS based on jgroups for communication
> Here are two example log messages:
> server start:
> {code:java}
> [2020-09-29 06:29:01.625 +0200] ERROR <*host name*> ES0 appserver0 [] [] org.jgroups.protocols.FILE_PING [] [] [] [] "ip-<**-**-**-**><*host name*>-startStop-1" JGRP000187: failed reading line of input stream java.lang.IllegalArgumentException: Invalid UUID string:
> at org.jgroups.util.UUID.fromString(UUID.java:139)
> at org.jgroups.protocols.Discovery.read(Discovery.java:455)
> at org.jgroups.protocols.FILE_PING.read(FILE_PING.java:269)
> at org.jgroups.protocols.FILE_PING.read(FILE_PING.java:263)
> at org.jgroups.protocols.FILE_PING.readAll(FILE_PING.java:239)
> at org.jgroups.protocols.FILE_PING.findMembers(FILE_PING.java:124)
> at org.jgroups.protocols.Discovery.invokeFindMembers(Discovery.java:217)
> at org.jgroups.protocols.Discovery.findMembers(Discovery.java:244)
> at org.jgroups.protocols.Discovery.down(Discovery.java:383)
> at org.jgroups.protocols.FILE_PING.down(FILE_PING.java:119)
> at org.jgroups.protocols.MERGE3.down(MERGE3.java:274)
> at org.jgroups.protocols.FD_SOCK.down(FD_SOCK.java:375)
> at org.jgroups.protocols.FD_ALL2.down(FD_ALL2.java:214)
> at org.jgroups.protocols.VERIFY_SUSPECT.down(VERIFY_SUSPECT.java:102)
> at org.jgroups.protocols.pbcast.NAKACK2.down(NAKACK2.java:555)
> at org.jgroups.protocols.UNICAST3.down(UNICAST3.java:595)
> at org.jgroups.protocols.pbcast.STABLE.down(STABLE.java:350)
> at org.jgroups.protocols.pbcast.ClientGmsImpl.joinInternal(ClientGmsImpl.java:72)
> at org.jgroups.protocols.pbcast.ClientGmsImpl.join(ClientGmsImpl.java:41)
> at org.jgroups.protocols.pbcast.GMS.down(GMS.java:1054)
> at org.jgroups.protocols.FlowControl.down(FlowControl.java:300)
> at org.jgroups.protocols.FRAG2.down(FRAG2.java:138)
> at org.jgroups.protocols.RSVP.down(RSVP.java:103)
> at org.jgroups.protocols.pbcast.StreamingStateTransfer.down(StreamingStateTransfer.java:183)
> at org.jgroups.stack.ProtocolStack.down(ProtocolStack.java:908)
> at org.jgroups.JChannel.down(JChannel.java:627)
> at org.jgroups.JChannel._connect(JChannel.java:855)
> at org.jgroups.JChannel.connect(JChannel.java:352)
> at org.jgroups.JChannel.connect(JChannel.java:343)
> at com.intershop.beehive.messaging.internal.jgroups.JGroupsMessenger.createJGroupsObjAndReceiver(JGroupsMessenger.java:230)
> at com.intershop.beehive.messaging.internal.jgroups.JGroupsMessenger.createResourcesAndReceivers(JGroupsMessenger.java:183)
> at com.intershop.beehive.messaging.capi.AbstractMessenger.initialize(AbstractMessenger.java:154)
> at com.intershop.beehive.messaging.capi.MessengerProvider.createMessengerInstance(MessengerProvider.java:133)
> at com.intershop.beehive.messaging.capi.MessengerProvider.createMessenger(MessengerProvider.java:68)
> at com.intershop.beehive.core.internal.event.EventMgrImpl.onStartupHook(EventMgrImpl.java:168)
> at com.intershop.beehive.core.internal.environment.ServerEnvironment.onStartupHook(ServerEnvironment.java:1126)
> at com.intershop.beehive.core.internal.environment.ServerEnvironment.startup(ServerEnvironment.java:2064)
> at com.intershop.beehive.core.internal.environment.ApplicationContextListener.contextInitialized(ApplicationContextListener.java:47)
> at org.apache.catalina.core.StandardContext.listenerStart(StandardContext.java:4939)
> at org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5434)
> at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)
> at org.apache.catalina.core.ContainerBase$StartChild.call(ContainerBase.java:1559)
> at org.apache.catalina.core.ContainerBase$StartChild.call(ContainerBase.java:1549)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> {code}
> issue happening later:
> {code:java}
> [2020-09-27 03:55:14.342 +0200] ERROR <*host name*> ES0 appserver0 [] [] org.jgroups.protocols.FILE_PING [] [] [] [] "Timer runner-2,eventChannel,ip-<**-**-**-**>-21247" JGRP000187: failed reading line of input stream java.lang.IllegalArgumentException: Invalid UUID string:
> at org.jgroups.util.UUID.fromString(UUID.java:139)
> at org.jgroups.protocols.Discovery.read(Discovery.java:455)
> at org.jgroups.protocols.FILE_PING.read(FILE_PING.java:269)
> at org.jgroups.protocols.FILE_PING.read(FILE_PING.java:263)
> at org.jgroups.protocols.FILE_PING.readAll(FILE_PING.java:239)
> at org.jgroups.protocols.FILE_PING.findMembers(FILE_PING.java:124)
> at org.jgroups.protocols.Discovery.invokeFindMembers(Discovery.java:217)
> at org.jgroups.protocols.Discovery.findMembers(Discovery.java:244)
> at org.jgroups.protocols.Discovery.down(Discovery.java:387)
> at org.jgroups.protocols.FILE_PING.down(FILE_PING.java:119)
> at org.jgroups.protocols.MERGE3$InfoSender.run(MERGE3.java:412)
> at org.jgroups.util.TimeScheduler3$Task.run(TimeScheduler3.java:328)
> at org.jgroups.util.TimeScheduler3$RecurringTask.run(TimeScheduler3.java:362)
> at org.jgroups.util.TimeScheduler3.submitToPool(TimeScheduler3.java:244)
> at org.jgroups.util.TimeScheduler3.run(TimeScheduler3.java:200)
> at java.lang.Thread.run(Thread.java:748)
> {code}
> In the stack traces the original IPs/ host names of the servers were replaced by <**--****-**>/<*host name*>.
> In both cases the message is written very fast very often and always from the same thread.
> We have error log examples of these sizes:
> 9,1 GB (server startup) - 2453051 occurrences from [2020-09-29 06:19:01.013 +0200] until [2020-09-29 06:29:01.983 +0200] (~10 min)
> 450 MB ((the timer issue above) - 349586 occurrences from [2020-09-27 03:55:14.342 +0200] until [2020-09-27 04:10:20.213 +0200] (~15 min)
> *Steps to repeat:*
> - this happens occasionally. no steps to repeat available
> *Investigations:*
> Especially the second stack trace consists of jgroups classes only.
> With the given sources I analyzed how these messages could be created.
> message JGRP000187 is defined here: JGroups/conf/jg-messages.properties
> {code:java}
> FailedReadingLineOfInputStream = JGRP000187: failed reading line of input stream
> {code}
> Look around this line from the stack trace
> ---------------------------------------------------------------------------
> at org.jgroups.util.UUID.fromString(UUID.java:139)
> ---------------------------------------------------------------------------
> {code:java}
>
> /*136*/ public static UUID fromString(String name) {
> /*137*/ String[] components = name.split("-");
> /*138*/ if (components.length != 5)
> /*139*/ throw new IllegalArgumentException("Invalid UUID string: "+name);
> {code}
> The exception is thrown when the provided string cannot be split into 5 parts by '-".
> We see that the string name is logged appended to the message "Invalid UUID string: "
> In our error message example no value can be seen for name so it must be an empty string that was given to the method UUID.fromString(String name). An empty string obviously does not have 5 parts when split by '-'
> Where is the UUID str given to org.jgroups.util.UUID.fromString?
> Look around this line from the stack trace
> ---------------------------------------------------------------------------
> at org.jgroups.protocols.Discovery.read(Discovery.java:455)
> ---------------------------------------------------------------------------
> {code:java}
> /*437*/ protected List<PingData> read(InputStream in) {
> /*438*/ List<PingData> retval=null;
> /*439*/ try {
> /*440*/ while(true) {
> /*441*/ try {
> /*442*/ String name_str=Util.readToken(in);
> /*443*/ String uuid_str=Util.readToken(in);
> /*444*/ String addr_str=Util.readToken(in);
> /*445*/ String coord_str=Util.readToken(in);
> /*446*/ if(name_str == null || uuid_str == null || addr_str == null || coord_str == null)
> /*447*/ break;
> /*448*/
> /*449*/ UUID uuid=null;
> /*450*/ try {
> /*451*/ long tmp=Long.parseLong(uuid_str);
> /*452*/ uuid=new UUID(0, tmp);
> /*453*/ }
> /*454*/ catch(Throwable t) {
> /*455*/ uuid=UUID.fromString(uuid_str);
> /*456*/ }
> /*457*/
> /*458*/ PhysicalAddress phys_addr=new IpAddress(addr_str);
> /*459*/ boolean is_coordinator=coord_str.trim().equals("T") || coord_str.trim().equals("t");
> /*460*/
> /*461*/ if(retval == null)
> /*462*/ retval=new ArrayList<>();
> /*463*/ retval.add(new PingData(uuid, true, name_str, phys_addr).coord(is_coordinator));
> /*464*/ }
> /*465*/ catch(Throwable t) {
> /*466*/ log.error(Util.getMessage("FailedReadingLineOfInputStream"), t);
> /*467*/ }
> /*468*/ }
> /*469*/ return retval;
> /*470*/ }
> /*471*/ finally {
> /*472*/ Util.close(in);
> /*473*/ }
> /*474*/ }
> {code}
> Here from an input stream four values are read for the different servers in the jgroups cluster: name_str, uuid_str, addr_str and coord_str
> This happens in a while(true) loop.
> We can leave this loop only by a break.
> This break happens at line 447 when the condition in 446 is met:
> if(name_str == null || uuid_str == null || addr_str == null || coord_str == null)
> So if at least one of the four strings read before would be null we could leave this loop.
> We can see that the catch block simply logs the message but does not inerrupt the outer while loop.
> The string uuid_str is read into the local variable in line 443 by String uuid_str=Util.readToken(in)
> Lets have a look at Util.readToken:
> {code:java}
> /*1774*/ /** Returns whitespace-separated strings from the input stream, or null if the end of the stream has been reached */
> /*1775*/ public static String readToken(InputStream in) {
> /*1776*/ StringBuilder sb=new StringBuilder();
> /*1777*/ boolean first=true;
> /*1778*/ int ch;
> /*1779*/ while(true) {
> /*1780*/ try {
> /*1781*/ ch=in.read();
> /*1782*/ if(ch == -1)
> /*1783*/ return sb.length() > 0? sb.toString() : null;
> /*1784*/ if(Character.isWhitespace(ch)) {
> /*1785*/ if(first)
> /*1786*/ continue;
> /*1787*/ break;
> /*1788*/ }
> /*1789*/ sb.append((char)ch);
> /*1790*/ first=false;
> /*1791*/ }
> /*1792*/ catch(IOException e) {
> /*1793*/ break;
> /*1794*/ }
> /*1795*/ }
> /*1796*/ return sb.toString();
> /*1797*/ }
> {code}
> A new instance of StringBuilder is created at line 1776.
> The boolean variable first is set to true in line 1777
> Then a while(true) loop is started.
> It can only be left by a break.
> In the try block we try to read from the given input stream in.
> 1. If the read character is -1 ( end of the stream ) the while loop is left and a value is returned.
> If the string builder sb already has a length > 0 it will be converted to a string and returned. If sb is still empty we return a null.
> 2. If the character is a white space character and we are still at the beginning of the reading (first is true) we will continue until we find a none-whitespace character or the end of the stream. If first is no longer true when a white space character is read the break will bring us to the end of the loop.
> Here all read content is returned as a string converted from sb.
> 3. The character that was read is not a -1 and not a whitespace: We append it to sb and set first to false.
> 4. In case of an IOException the while loop is left, no log message is written.
> Here all read content is returned as a string converted from sb.
> The only possibility for this method to return an empty string (not null) would be: We encounter an IOException at the first reading from in or after only white space characters were read.
> Then the code on line 1796 would return an empty string.
> Now have a look at the code read of Discovery again:
> Reaching the end of the input stream in Util.readToken(in) would be the condition to end reading from the input stream and to return all found PingData as a list.
> With an empty string for every read string variable the break will never be reached and the while loop will not be left, writing an error message at the caught exception on the trial to convert the empty string in uuid_str to a valid UUID.
> So in case the provided input stream throws an IOException on every trial to read from it, we would be in an endless loop.
> We can have a look at which input stream we are in here
> Look around this line from the stack trace
> ---------------------------------------------------------------------------
> at org.jgroups.protocols.FILE_PING.readAll(FILE_PING.java:239)
> ---------------------------------------------------------------------------
> {code:java}
> /*256*/ protected void readAll(List<Address> members, String clustername, Responses responses) {
> /*257*/ File dir=new File(root_dir, clustername);
> /*253*/ if(!dir.exists())
> /*254*/ dir.mkdir();
> /*230*/
> /*231*/ File[] files=dir.listFiles(filter); // finds all files ending with '.list'
> /*232*/ for(File file: files) {
> /*233*/ List<PingData> list=null;
> /*234*/ // implementing a simple spin lock doing a few attempts to read the file
> /*235*/ // this is done since the file may be written in concurrency and may therefore not be readable
> /*236*/ for(int i=0; i < 3; i++) {
> /*237*/ if(file.exists()) {
> /*238*/ try {
> /*239*/ if((list=read(file)) != null)
> /*240*/ break;
> /*241*/ }
> /*242*/ catch(Exception e) {
> /*243*/ }
> /*244*/ }
> /*245*/ Util.sleep(50);
> /*246*/ }
> /*247*/
> /*248*/ if(list == null) {
> /*249*/ log.warn("failed reading " + file.getAbsolutePath());
> /*250*/ continue;
> /*251*/ }
> /*252*/ for(PingData data: list) {
> /*253*/ if(members == null || members.contains(data.getAddress()))
> /*254*/ responses.addResponse(data, true);
> /*255*/ if(local_addr != null && !local_addr.equals(data.getAddress()))
> /*256*/ addDiscoveryResponseToCaches(data.getAddress(), data.getLogicalName(), data.getPhysicalAddr());
> /*257*/ }
> /*258*/ }
> /*259*/ }
> {code}
> Here we read the ping data from a file in a directory that was scanned for all files with the extension .list.
> In our application server that would be one of these directories:
> [IS_SHARE]/system/jgroups/appserverSyncChannel
> [IS_SHARE]/system/jgroups/cacheSyncChannel
> [IS_SHARE]/system/jgroups/ormSyncChannel
> [IS_SHARE]/system/jgroups/tcmSyncChannel
> So my assumption is that our error occurs when somehow a file that just existed is not reachable anymore (if deleted or unreachable because of network issues) and the input stream for it gets unusable, throwing IOExceptions on every trial to read from it.
> My idea how to fix that would be to change the return value of Util readToken on line 1796 like this:
> {code:java}
> old: return sb.toString();
> new: return sb.length() > 0? sb.toString() : null;
> {code}
> Then we would not return an empty value on an IOException and could break the loop in Discovery for a better error handling.
> I already had a look at the code in the most recent version 5.0.4: All involved methods do look still the same.
> I assume we should file a bug report to the jgroups community.
> We suspect an endless loop in the jgroups code.
> The used jgroups version is 4.2.0.
> The warn log does not contain any relevant messages for jgroups at this time
>
> I can provide the original logs for analysis
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
5 years, 7 months
[JBoss JIRA] (ELY-1998) IllegalStateException: unable to create JcaTlsCrypto: DEFAULT SecureRandom not available when configuring BC FIPS on JDK 11
by Diana Vilkolakova (Jira)
[ https://issues.redhat.com/browse/ELY-1998?page=com.atlassian.jira.plugin.... ]
Diana Vilkolakova commented on ELY-1998:
----------------------------------------
I managed to avoid this exeption and configure _server-ssl-context_ by adding *bc* to the *initial-providers*:
{code:java}
module add --name=org.bouncycastle.fips --resources=/path/to/bc-fips-1.0.2.jar:/path/to/bctls-fips-1.0.10.jar
/subsystem=elytron/provider-loader=bc:add(module=org.bouncycastle.fips)
/subsystem=elytron:write-attribute(name=initial-providers,value=bc)
reload
{code}
> IllegalStateException: unable to create JcaTlsCrypto: DEFAULT SecureRandom not available when configuring BC FIPS on JDK 11
> ---------------------------------------------------------------------------------------------------------------------------
>
> Key: ELY-1998
> URL: https://issues.redhat.com/browse/ELY-1998
> Project: WildFly Elytron
> Issue Type: Bug
> Components: SSL
> Reporter: Diana Vilkolakova
> Priority: Major
>
> The below steps require ELY-1982 bugfix to work.
> Configure security providers in java.security file:
> {code}
> security.provider.1=org.bouncycastle.jcajce.provider.BouncyCastleFipsProvider
> security.provider.2=org.bouncycastle.jsse.provider.BouncyCastleJsseProvider
> security.provider.3=SUN
> {code}
> Add the bc-fips.jar and bctls-fips-1.0.10.jar to the CLASSPATH and generate keystore in JBOSS_HOME/standalone/configuration folder:
> {code}
> keytool -genkeypair -alias appserver -keyalg RSA -keysize 2048 -keypass password -keystore "fips.keystore" -provider org.bouncycastle.jcajce.provider.BouncyCastleFipsProvider -providerpath $CLASSPATH -storetype BCFKS -storepass password -dname "CN=testserver,OU=TESTOU,O=TESTO,L=TESTL,ST=TESTCZ,C=TESTCZ" -validity 730 -v
> {code}
> Try to configure `server-ssl-context`:
> {code}
> module add --name=org.bouncycastle.fips --resources=/path/to/bc-fips-1.0.2.jar:/path/to/bctls-fips-1.0.10.jar
> /subsystem=elytron/provider-loader=bc:add(module=org.bouncycastle.fips)
> /subsystem=elytron/key-store=fipsKS:add(path=fips.keystore, relative-to=jboss.server.config.dir, credential-reference={clear-text=password}, type="BCFKS", providers=bc)
> /subsystem=elytron/key-manager=fipsKM:add(key-store=fipsKS, algorithm="X509", credential-reference={clear-text=password}, providers=bc)
> /subsystem=elytron/server-ssl-context=fipsSSC:add(key-manager=fipsKM, protocols=["TLSv1.2"], providers=bc)
> {code}
> The last command results in:
> {code}
> {
> "outcome" => "failed",
> "failure-description" => {"WFLYCTL0080: Failed services" => {"org.wildfly.security.ssl-context.fipsSSC" => "Failed to start service
> Caused by: java.lang.IllegalStateException: unable to create JcaTlsCrypto: DEFAULT SecureRandom not available
> Caused by: java.security.NoSuchAlgorithmException: DEFAULT SecureRandom not available"}},
> "rolled-back" => true
> }
> {code}
> The exception is happening [on this line|https://github.com/Skyllarr/wildfly-elytron/blob/ELY-1982/ssl/src/ma...] . This exception can be avoided by either using *new SecureRandom()* instead of null during initialization of sslContext, or by configuring securerandom with using *CryptoServicesRegistrar.setSecureRandom(new SecureRandom());* in code beforehand (this would require bc dependency).
> I tried to configure secure random statically by setting *securerandom.strongAlgorithms=DEFAULT:BCFIPS* in java.security or by trying to pass secure random as parameter to constructor with
> {code}
> security.provider.1=org.bouncycastle.jcajce.provider.BouncyCastleFipsProvider "C:DEFRND[SHA512];ENABLE{ALL};"
> {code}
> but neither had any effect. I did not find how to configure this statically for Java 11 in BC documentation.
> We could pass new instance of SecureRandom when initializing sslContext (if bouncycastle is used), or set secureRandom beforehand, or catch this exception and then use `new SecureRandom()`. But should we force the users to use SecureRandom set in the code by us? If users want to use Bouncycastle they should configure the secure random themselves since it is needed by the provider?
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
5 years, 7 months
[JBoss JIRA] (JGRP-2506) Loads of error messages of type IllegalArgumentException: Invalid UUID string in org.jgroups.fromString(UUID.java:139) in error log
by Valeska Dreßler (Jira)
Valeska Dreßler created JGRP-2506:
-------------------------------------
Summary: Loads of error messages of type IllegalArgumentException: Invalid UUID string in org.jgroups.fromString(UUID.java:139) in error log
Key: JGRP-2506
URL: https://issues.redhat.com/browse/JGRP-2506
Project: JGroups
Issue Type: Bug
Affects Versions: 4.2.0
Reporter: Valeska Dreßler
Assignee: Bela Ban
The customer encounters situations, when at server start or after some time one of their server starts to write very big error logs very fast, because of some jgroups issue. This can only be stopped by a server restart.
They use AWS based on jgroups for communication
Here are two example log messages:
server start:
{code:java}
[2020-09-29 06:29:01.625 +0200] ERROR <*host name*> ES0 appserver0 [] [] org.jgroups.protocols.FILE_PING [] [] [] [] "ip-<**-**-**-**><*host name*>-startStop-1" JGRP000187: failed reading line of input stream java.lang.IllegalArgumentException: Invalid UUID string:
at org.jgroups.util.UUID.fromString(UUID.java:139)
at org.jgroups.protocols.Discovery.read(Discovery.java:455)
at org.jgroups.protocols.FILE_PING.read(FILE_PING.java:269)
at org.jgroups.protocols.FILE_PING.read(FILE_PING.java:263)
at org.jgroups.protocols.FILE_PING.readAll(FILE_PING.java:239)
at org.jgroups.protocols.FILE_PING.findMembers(FILE_PING.java:124)
at org.jgroups.protocols.Discovery.invokeFindMembers(Discovery.java:217)
at org.jgroups.protocols.Discovery.findMembers(Discovery.java:244)
at org.jgroups.protocols.Discovery.down(Discovery.java:383)
at org.jgroups.protocols.FILE_PING.down(FILE_PING.java:119)
at org.jgroups.protocols.MERGE3.down(MERGE3.java:274)
at org.jgroups.protocols.FD_SOCK.down(FD_SOCK.java:375)
at org.jgroups.protocols.FD_ALL2.down(FD_ALL2.java:214)
at org.jgroups.protocols.VERIFY_SUSPECT.down(VERIFY_SUSPECT.java:102)
at org.jgroups.protocols.pbcast.NAKACK2.down(NAKACK2.java:555)
at org.jgroups.protocols.UNICAST3.down(UNICAST3.java:595)
at org.jgroups.protocols.pbcast.STABLE.down(STABLE.java:350)
at org.jgroups.protocols.pbcast.ClientGmsImpl.joinInternal(ClientGmsImpl.java:72)
at org.jgroups.protocols.pbcast.ClientGmsImpl.join(ClientGmsImpl.java:41)
at org.jgroups.protocols.pbcast.GMS.down(GMS.java:1054)
at org.jgroups.protocols.FlowControl.down(FlowControl.java:300)
at org.jgroups.protocols.FRAG2.down(FRAG2.java:138)
at org.jgroups.protocols.RSVP.down(RSVP.java:103)
at org.jgroups.protocols.pbcast.StreamingStateTransfer.down(StreamingStateTransfer.java:183)
at org.jgroups.stack.ProtocolStack.down(ProtocolStack.java:908)
at org.jgroups.JChannel.down(JChannel.java:627)
at org.jgroups.JChannel._connect(JChannel.java:855)
at org.jgroups.JChannel.connect(JChannel.java:352)
at org.jgroups.JChannel.connect(JChannel.java:343)
at com.intershop.beehive.messaging.internal.jgroups.JGroupsMessenger.createJGroupsObjAndReceiver(JGroupsMessenger.java:230)
at com.intershop.beehive.messaging.internal.jgroups.JGroupsMessenger.createResourcesAndReceivers(JGroupsMessenger.java:183)
at com.intershop.beehive.messaging.capi.AbstractMessenger.initialize(AbstractMessenger.java:154)
at com.intershop.beehive.messaging.capi.MessengerProvider.createMessengerInstance(MessengerProvider.java:133)
at com.intershop.beehive.messaging.capi.MessengerProvider.createMessenger(MessengerProvider.java:68)
at com.intershop.beehive.core.internal.event.EventMgrImpl.onStartupHook(EventMgrImpl.java:168)
at com.intershop.beehive.core.internal.environment.ServerEnvironment.onStartupHook(ServerEnvironment.java:1126)
at com.intershop.beehive.core.internal.environment.ServerEnvironment.startup(ServerEnvironment.java:2064)
at com.intershop.beehive.core.internal.environment.ApplicationContextListener.contextInitialized(ApplicationContextListener.java:47)
at org.apache.catalina.core.StandardContext.listenerStart(StandardContext.java:4939)
at org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5434)
at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)
at org.apache.catalina.core.ContainerBase$StartChild.call(ContainerBase.java:1559)
at org.apache.catalina.core.ContainerBase$StartChild.call(ContainerBase.java:1549)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
{code}
issue happening later:
{code:java}
[2020-09-27 03:55:14.342 +0200] ERROR <*host name*> ES0 appserver0 [] [] org.jgroups.protocols.FILE_PING [] [] [] [] "Timer runner-2,eventChannel,ip-<**-**-**-**>-21247" JGRP000187: failed reading line of input stream java.lang.IllegalArgumentException: Invalid UUID string:
at org.jgroups.util.UUID.fromString(UUID.java:139)
at org.jgroups.protocols.Discovery.read(Discovery.java:455)
at org.jgroups.protocols.FILE_PING.read(FILE_PING.java:269)
at org.jgroups.protocols.FILE_PING.read(FILE_PING.java:263)
at org.jgroups.protocols.FILE_PING.readAll(FILE_PING.java:239)
at org.jgroups.protocols.FILE_PING.findMembers(FILE_PING.java:124)
at org.jgroups.protocols.Discovery.invokeFindMembers(Discovery.java:217)
at org.jgroups.protocols.Discovery.findMembers(Discovery.java:244)
at org.jgroups.protocols.Discovery.down(Discovery.java:387)
at org.jgroups.protocols.FILE_PING.down(FILE_PING.java:119)
at org.jgroups.protocols.MERGE3$InfoSender.run(MERGE3.java:412)
at org.jgroups.util.TimeScheduler3$Task.run(TimeScheduler3.java:328)
at org.jgroups.util.TimeScheduler3$RecurringTask.run(TimeScheduler3.java:362)
at org.jgroups.util.TimeScheduler3.submitToPool(TimeScheduler3.java:244)
at org.jgroups.util.TimeScheduler3.run(TimeScheduler3.java:200)
at java.lang.Thread.run(Thread.java:748)
{code}
In the stack traces the original IPs/ host names of the servers were replaced by <**--****-**>/<*host name*>.
In both cases the message is written very fast very often and always from the same thread.
We have error log examples of these sizes:
9,1 GB (server startup) - 2453051 occurrences from [2020-09-29 06:19:01.013 +0200] until [2020-09-29 06:29:01.983 +0200] (~10 min)
450 MB ((the timer issue above) - 349586 occurrences from [2020-09-27 03:55:14.342 +0200] until [2020-09-27 04:10:20.213 +0200] (~15 min)
*Steps to repeat:*
- this happens occasionally. no steps to repeat available
*Investigations:*
Especially the second stack trace consists of jgroups classes only.
With the given sources I analyzed how these messages could be created.
message JGRP000187 is defined here: JGroups/conf/jg-messages.properties
{code:java}
FailedReadingLineOfInputStream = JGRP000187: failed reading line of input stream
{code}
Look around this line from the stack trace
---------------------------------------------------------------------------
at org.jgroups.util.UUID.fromString(UUID.java:139)
---------------------------------------------------------------------------
{code:java}
/*136*/ public static UUID fromString(String name) {
/*137*/ String[] components = name.split("-");
/*138*/ if (components.length != 5)
/*139*/ throw new IllegalArgumentException("Invalid UUID string: "+name);
{code}
The exception is thrown when the provided string cannot be split into 5 parts by '-".
We see that the string name is logged appended to the message "Invalid UUID string: "
In our error message example no value can be seen for name so it must be an empty string that was given to the method UUID.fromString(String name). An empty string obviously does not have 5 parts when split by '-'
Where is the UUID str given to org.jgroups.util.UUID.fromString?
Look around this line from the stack trace
---------------------------------------------------------------------------
at org.jgroups.protocols.Discovery.read(Discovery.java:455)
---------------------------------------------------------------------------
{code:java}
/*437*/ protected List<PingData> read(InputStream in) {
/*438*/ List<PingData> retval=null;
/*439*/ try {
/*440*/ while(true) {
/*441*/ try {
/*442*/ String name_str=Util.readToken(in);
/*443*/ String uuid_str=Util.readToken(in);
/*444*/ String addr_str=Util.readToken(in);
/*445*/ String coord_str=Util.readToken(in);
/*446*/ if(name_str == null || uuid_str == null || addr_str == null || coord_str == null)
/*447*/ break;
/*448*/
/*449*/ UUID uuid=null;
/*450*/ try {
/*451*/ long tmp=Long.parseLong(uuid_str);
/*452*/ uuid=new UUID(0, tmp);
/*453*/ }
/*454*/ catch(Throwable t) {
/*455*/ uuid=UUID.fromString(uuid_str);
/*456*/ }
/*457*/
/*458*/ PhysicalAddress phys_addr=new IpAddress(addr_str);
/*459*/ boolean is_coordinator=coord_str.trim().equals("T") || coord_str.trim().equals("t");
/*460*/
/*461*/ if(retval == null)
/*462*/ retval=new ArrayList<>();
/*463*/ retval.add(new PingData(uuid, true, name_str, phys_addr).coord(is_coordinator));
/*464*/ }
/*465*/ catch(Throwable t) {
/*466*/ log.error(Util.getMessage("FailedReadingLineOfInputStream"), t);
/*467*/ }
/*468*/ }
/*469*/ return retval;
/*470*/ }
/*471*/ finally {
/*472*/ Util.close(in);
/*473*/ }
/*474*/ }
{code}
Here from an input stream four values are read for the different servers in the jgroups cluster: name_str, uuid_str, addr_str and coord_str
This happens in a while(true) loop.
We can leave this loop only by a break.
This break happens at line 447 when the condition in 446 is met:
if(name_str == null || uuid_str == null || addr_str == null || coord_str == null)
So if at least one of the four strings read before would be null we could leave this loop.
We can see that the catch block simply logs the message but does not inerrupt the outer while loop.
The string uuid_str is read into the local variable in line 443 by String uuid_str=Util.readToken(in)
Lets have a look at Util.readToken:
{code:java}
/*1774*/ /** Returns whitespace-separated strings from the input stream, or null if the end of the stream has been reached */
/*1775*/ public static String readToken(InputStream in) {
/*1776*/ StringBuilder sb=new StringBuilder();
/*1777*/ boolean first=true;
/*1778*/ int ch;
/*1779*/ while(true) {
/*1780*/ try {
/*1781*/ ch=in.read();
/*1782*/ if(ch == -1)
/*1783*/ return sb.length() > 0? sb.toString() : null;
/*1784*/ if(Character.isWhitespace(ch)) {
/*1785*/ if(first)
/*1786*/ continue;
/*1787*/ break;
/*1788*/ }
/*1789*/ sb.append((char)ch);
/*1790*/ first=false;
/*1791*/ }
/*1792*/ catch(IOException e) {
/*1793*/ break;
/*1794*/ }
/*1795*/ }
/*1796*/ return sb.toString();
/*1797*/ }
{code}
A new instance of StringBuilder is created at line 1776.
The boolean variable first is set to true in line 1777
Then a while(true) loop is started.
It can only be left by a break.
In the try block we try to read from the given input stream in.
1. If the read character is -1 ( end of the stream ) the while loop is left and a value is returned.
If the string builder sb already has a length > 0 it will be converted to a string and returned. If sb is still empty we return a null.
2. If the character is a white space character and we are still at the beginning of the reading (first is true) we will continue until we find a none-whitespace character or the end of the stream. If first is no longer true when a white space character is read the break will bring us to the end of the loop.
Here all read content is returned as a string converted from sb.
3. The character that was read is not a -1 and not a whitespace: We append it to sb and set first to false.
4. In case of an IOException the while loop is left, no log message is written.
Here all read content is returned as a string converted from sb.
The only possibility for this method to return an empty string (not null) would be: We encounter an IOException at the first reading from in or after only white space characters were read.
Then the code on line 1796 would return an empty string.
Now have a look at the code read of Discovery again:
Reaching the end of the input stream in Util.readToken(in) would be the condition to end reading from the input stream and to return all found PingData as a list.
With an empty string for every read string variable the break will never be reached and the while loop will not be left, writing an error message at the caught exception on the trial to convert the empty string in uuid_str to a valid UUID.
So in case the provided input stream throws an IOException on every trial to read from it, we would be in an endless loop.
We can have a look at which input stream we are in here
Look around this line from the stack trace
---------------------------------------------------------------------------
at org.jgroups.protocols.FILE_PING.readAll(FILE_PING.java:239)
---------------------------------------------------------------------------
{code:java}
/*256*/ protected void readAll(List<Address> members, String clustername, Responses responses) {
/*257*/ File dir=new File(root_dir, clustername);
/*253*/ if(!dir.exists())
/*254*/ dir.mkdir();
/*230*/
/*231*/ File[] files=dir.listFiles(filter); // finds all files ending with '.list'
/*232*/ for(File file: files) {
/*233*/ List<PingData> list=null;
/*234*/ // implementing a simple spin lock doing a few attempts to read the file
/*235*/ // this is done since the file may be written in concurrency and may therefore not be readable
/*236*/ for(int i=0; i < 3; i++) {
/*237*/ if(file.exists()) {
/*238*/ try {
/*239*/ if((list=read(file)) != null)
/*240*/ break;
/*241*/ }
/*242*/ catch(Exception e) {
/*243*/ }
/*244*/ }
/*245*/ Util.sleep(50);
/*246*/ }
/*247*/
/*248*/ if(list == null) {
/*249*/ log.warn("failed reading " + file.getAbsolutePath());
/*250*/ continue;
/*251*/ }
/*252*/ for(PingData data: list) {
/*253*/ if(members == null || members.contains(data.getAddress()))
/*254*/ responses.addResponse(data, true);
/*255*/ if(local_addr != null && !local_addr.equals(data.getAddress()))
/*256*/ addDiscoveryResponseToCaches(data.getAddress(), data.getLogicalName(), data.getPhysicalAddr());
/*257*/ }
/*258*/ }
/*259*/ }
{code}
Here we read the ping data from a file in a directory that was scanned for all files with the extension .list.
In our application server that would be one of these directories:
[IS_SHARE]/system/jgroups/appserverSyncChannel
[IS_SHARE]/system/jgroups/cacheSyncChannel
[IS_SHARE]/system/jgroups/ormSyncChannel
[IS_SHARE]/system/jgroups/tcmSyncChannel
So my assumption is that our error occurs when somehow a file that just existed is not reachable anymore (if deleted or unreachable because of network issues) and the input stream for it gets unusable, throwing IOExceptions on every trial to read from it.
My idea how to fix that would be to change the return value of Util readToken on line 1796 like this:
{code:java}
old: return sb.toString();
new: return sb.length() > 0? sb.toString() : null;
{code}
Then we would not return an empty value on an IOException and could break the loop in Discovery for a better error handling.
I already had a look at the code in the most recent version 5.0.4: All involved methods do look still the same.
I assume we should file a bug report to the jgroups community.
We suspect an endless loop in the jgroups code.
The used jgroups version is 4.2.0.
The warn log does not contain any relevant messages for jgroups at this time
I can provide the original logs for analysis
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
5 years, 7 months