[
https://issues.jboss.org/browse/DROOLS-1030?page=com.atlassian.jira.plugi...
]
Artur Kronenberg edited comment on DROOLS-1030 at 1/21/16 12:00 PM:
--------------------------------------------------------------------
Ok,
I think I have the issue (or some issue) reproduced. I will attach this in a second. Here
is what I do:
1. The standalone has 2 methods; create and load. Load takes the id from the create run to
load the thing. I made the code as close as I could to where I found the bug/feature in
the first place. The exception is not 100% the same, but I think I may have been hunting a
red herring. I think those exceptions are all related, but they might happen at a slightly
different point in time - so different symptoms with maybe the same root.
2. First run: create
Just run the main, wait until the thing is executed and then shut down the run. This is
important as I think it has something to do with persistence + expiry which means the run
needs to terminate so that drools can't update the session anymore (timerjobs and
friends I believe).
3. Wait for 1-2 minutes. I recommend 9gag, cracks me up :D
4. Change load to take the ID of the run in step 2. Comment out the create method, and run
the same thing again.
This has worked for me 3 out of 3 times with an exception talking about transaction and
stuff:
http://pastebin.com/RqKr3FcE
Observations:
1. Load under 1 minute wait time. (so immediately)
Everything works fine. It appears no timer jobs are started and no exceptions are thrown.
2. Load after 1 minute wait
The 1k models that event 1 loads are set to expire after 1 minute.
When loading the session, it appears that for each fact that ought to be expired, 1
JpaTimerJob is started. The logs show heeeaaaapps of hibernate operations going on. I
suspect that all the jobs try to do their thing and there may or may not be a bunch of
retrying happening in the background.
Conclusion is that I am still not sure what is happening exactly. But the signs stand in a
way that I suspect loading a session which has facts/events that are supposed to be
expired seems to be what the problem is. Please let me know if there's anything else I
can do. If I manage to reproduce the NPE properly I will do that. Sadly the exception of
my actual system vs my standalone example are slightly wrong.
Note: I believe it is essential for this exception that:
* Facts/events are inserted in a transaction - see TransactionHelper class
* fireAllRules is called in a separate thread
Update:
Maybe this is off some value. I don't think this is correct behaviour. After
re-running (loading) the same session 3 or 4 times, the update count (OPTLOCK) on that
session skyrocketed:
MariaDB [drools]> select id, OPTLOCK from SEssioninfo;
+----+---------+
| id | OPTLOCK |
+----+---------+
| 89 | 12923 |
+----+---------+
1 rows in set (0.00 sec)
was (Author: pandaadb):
Ok,
I think I have the issue (or some issue) reproduced. I will attach this in a second. Here
is what I do:
1. The standalone has 2 methods; create and load. Load takes the id from the create run to
load the thing. I made the code as close as I could to where I found the bug/feature in
the first place. The exception is not 100% the same, but I think I may have been hunting a
red herring. I think those exceptions are all related, but they might happen at a slightly
different point in time - so different symptoms with maybe the same root.
2. First run: create
Just run the main, wait until the thing is executed and then shut down the run. This is
important as I think it has something to do with persistence + expiry which means the run
needs to terminate so that drools can't update the session anymore (timerjobs and
friends I believe).
3. Wait for 1-2 minutes. I recommend 9gag, cracks me up :D
4. Change load to take the ID of the run in step 2. Comment out the create method, and run
the same thing again.
This has worked for me 3 out of 3 times with an exception talking about transaction and
stuff:
http://pastebin.com/RqKr3FcE
Observations:
1. Load under 1 minute wait time. (so immediately)
Everything works fine. It appears no timer jobs are started and no exceptions are thrown.
2. Load after 1 minute wait
The 1k models that event 1 loads are set to expire after 1 minute.
When loading the session, it appears that for each fact that ought to be expired, 1
JpaTimerJob is started. The logs show heeeaaaapps of hibernate operations going on. I
suspect that all the jobs try to do their thing and there may or may not be a bunch of
retrying happening in the background.
Conclusion is that I am still not sure what is happening exactly. But the signs stand in a
way that I suspect loading a session which has facts/events that are supposed to be
expired seems to be what the problem is. Please let me know if there's anything else I
can do. If I manage to reproduce the NPE properly I will do that. Sadly the exception of
my actual system vs my standalone example are slightly wrong.
Note: I believe it is essential for this exception that:
* Facts/events are inserted in a transaction - see TransactionHelper class
* fireAllRules is called in a separate thread
Nullpointer in JpaTimerJobInstance.call on startup
---------------------------------------------------
Key: DROOLS-1030
URL:
https://issues.jboss.org/browse/DROOLS-1030
Project: Drools
Issue Type: Feature Request
Components: core engine
Environment: Mac OS 10.10.5, Eclipse Mars Release, Java 1.8, Drools 6.3.0.Final
Reporter: Artur Kronenberg
Assignee: Mario Fusco
Attachments: test-standalone.zip
Hi,
I have noticed Nullpointer exceptions when I try and reload a persisted session on
startup. It is a bit hard to recreate (I am actually not managing it now since I deleted
all old sessions to attempt recreation) but I figured maybe someone has an idea.
Essentially I am getting this NPE twice:
java.lang.NullPointerException: null
at
org.drools.persistence.jpa.JpaTimerJobInstance.call(JpaTimerJobInstance.java:50)
[drools-persistence-jpa-6.3.0.Final.jar:6.3.0.Final]
at
org.drools.persistence.jpa.JpaTimerJobInstance.call(JpaTimerJobInstance.java:30)
[drools-persistence-jpa-6.3.0.Final.jar:6.3.0.Final]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_51]
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
[na:1.8.0_51]
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
[na:1.8.0_51]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
[na:1.8.0_51]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
[na:1.8.0_51]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_51]
When debugging I noticed the following behaviour which points to a race condition:
If I start my server and try to recreate the session, those NPEs happen 2x.
If I start my server and put a breakpoint BEFORE creating the CommandService for
execution, I can wait for a few seconds and the service can be found.
It appears that the scheduler's timerJobFactoryManager is not fully there at the time
the sessions is being loaded? Or something else is racing with the service creation.
Is there a way for me to make sure everything is fully instantiated before I use drools?
Does a workaround exist? Is this even a bug?
Thanks and let me know your thoughts. If I run into this again I will attempt to try and
reproduce it. Let me know if more info is needed!
UPDATE:
I noticed that the reason I can't reproduce it at the moment is that the
JpaTimerJobInstance is never called with the newly persisted session. It appears that that
timer job depends on something else to happen so that it thinks it needs to start the
timerJob
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)