Are exceptions counted for JVM lifetime in Java Flight Recorder? - profiling

I have run a Java Flight Recorder recording for 2 minutes on a JBoss EAP 6.1 app server under load. I enabled exception counting (Java Application => Java Exception => Enabled=true) and I'm surprised by the number of reported exceptions.
When I look in the Events => Histogram view with event type "Java Application/ Java Exception" and Group by "Event Thread", 10 threads have over 2000 exceptions each. 3 of them have over 3000 exceptions.
This is the total number of reported creations of Throwable or Error:
Stack Trace Sample Count
java.lang.Throwable.<init>() 128 059
java.lang.Throwable.<init>(String) 116 107
java.lang.Throwable.<init>(Throwable) 39 207
java.lang.Error.<init>() 7
java.lang.Throwable.<init>(String, Throwable) 2
So I'm wondering if all these exceptions occurred during the 2 minute period I recorded or are they counted since the start of the JVM?

"Sample Count" column in the Histogram tab aggregates the number of events with respect to a field value, in your case I believe the top frame of the stack trace. So the number 128 059 means there were that many events emitted with a top frame of "java.lang.Throwable.<init>()" during your recording.
This may not be the information you are looking for.
I recommend using the recording template to enable Exceptions / Errors, and looking at the Exceptions tab, instead of editing settings for individual events and using the Histogram tab.

TL;DR: Java Exception events only count what happens during the recording. The Exception Statistics event count exceptions during the JVM lifetime (or some other 'JVM global' time).
There are two different data points, the Java Exception and Java Error events, and the Statistics/Throwables event.
If you only look at Java Exception/Error, the events you have in the recording are those that occcured during that time. The Statistics/Throwables event is taken with regular intervals, and might be from the start of the JVM, or possibly from the start of the JFR engine, or from the start of the first JFR recording occuring in the running JVM. It's mostly interesting to compare this values relative to each other. This event is displayed in the top of the Code/Exceptions tab, in the two text fields.
Also note that the Exception/Error events occur in the constructor, not when it's actually thrown
If your program creates a lot of Java Error events, there is some trouble with double bookkeeping of these, so the numbers might be incorrect (this is something we have compensated for in the next JMC version, but not in JMC 5.5)
Other Throwable subclasses will show up as Exception events.

Related

MQTTnet poor performance when multi-threaded despite same number of messages being sent

I am sending around 20,000 messages per second, these can be across a number of arbitary threads (the messages are processed before sending to MQTTnet).
I have found, that the fewer the threads the better the performance, going over 16 simultanous senders causes MQTTnet to grind to a halt even with 10k messages per second.
It is not the threads that are slow, I poll the MQTTnet Managed Client buffer size every 10 seconds and see it increasing to the point where it becomes full (at the limit that I have set).
This is with the most recent code version and something I noticed a number of months ago (from today August 2020) - it was highlighted with my recent ThreadRipper system upgrade and my code creating number of threads equal to the number of Environment Processors - same code base but 8 with previous hardware with 48 on new hardware caused the "failure".
48 decode/send threads caused MQTTnet to grind to a halt, whereas 4 to 8 threads was OK and performant. I can see the speed on the NIC to the MQTT server drop from 8Mbps (with 4 to 8 sending threads) to less than 100kbps when higher thread counts are used.
Local or remote MQTT server makes no difference - as mentioned I can see the send buffer within MQTT increase and will do so until memory exhaustion unless a limit is set (either way, it will drop messages once under duress from threads and in this state).
Note, in both cases, the total number of messages being sent per second remained the same - the only variable was the number of worker threads the messages were being sent from.
Is this a bug, or something I am doing wrong? Should i create my own queue to front-end the managed client and dispatch one at a time (i don't want to reinvent the wheel, but want to ensure i am using the library correctly).
I have found this seems to be related to debug and start without debugs - starting without debug is magnitudes faster and can scale all the way to 48 threads (as per the environment processor count) without any issue when started without debug without any queuing whatsoever.
Strange, as the message volume is the same in both cases, only difference being the thread count as mentioned (and even with debug and 8 threads, debug can keep up without issue).
Seems there is an overhead when debugging with multiple sending threads - which may be obvious but couldn't find where this was warned.

Progress Bar with Gtkmm

Hello I am looking for a signal for gtkmm. Basically I am doing some simulations and what I want is something like this :
I assume I do 5 simulations :
progressBar.set_fraction(0);
1 simulation
progressBar.set_fraction(progressBar.get_fraction()+1/5)
2 simulation
progressBar.set_fraction(progressBar.get_fraction()+1/5)
3 simulation
progressBar.set_fraction(progressBar.get_fraction()+1/5)
4 simulation
progressBar.set_fraction(progressBar.get_fraction()+1/5)
5 simulation
progressBar.set_fraction(progressBar.get_fraction()+1/5)
But I don't know which signal I have to use and how to translate to this.
Thank you a lot for your help !!!
The pseudo code which you presented in your question should actually work - no signal is necessary. However, you could introduce a signal into your simulation for update of the progress bar. IMHO this will not solve your problem and I will try to explain why and what to do to solve it:
You provided a little bit too less context, so, that I will introduce some more assumptions: You have a main window with a button or toolbar item or menu item (or even all of them) which start the simulation.
Let's imagine you set a breakpoint at Gtk::ProgressBar::set_fraction().
Once the debugger stopped at this break point you will find the following calls on the stack trace (probably with many other calls in between):
Gtk::Main::run()
the signal handler of the widget or action which started the simulation
the function which runs the five simulations
and last the call of Gtk::ProgressBar::set_fraction().
If you could inspect the internals of Gtk::ProgressBar you would notice that everything in Gtk::ProgressBar::set_fraction() is done properly. So what's wrong?
When you call Gtk::ProgressBar::set_fraction() it probably generates an expose event (i.e. adds an event to the event queue inside of Gtk::Main with a request for its own refresh). The problem is that you probably do not process the request until all five runs of the simulation are done. (Remember that Gtk::Main::run() which is responsible for this is the uppermost/outmost call of my imaginery stack trace.) Thus, the refresh does not happen until the simulation is over - that's too late. (Btw. the authors of Gtk+ stated somewhere in the manual about their cleverness to optimize events. I.e. there might be finally only one expose event for the Gtk::ProgressBar in the event queue but this does not make your situation better.)
Thus, after you called Gtk::ProgressBar::set_fraction() you must somehow flush the event queue before doing further progress with your simulation.
This sounds like leaving the simulation, leaving the calling widget signal handler, returning to Gtk::Main::run() for further event processing and finally coming back for next simulation step - terrible idea. But we did it much simpler. For this, we use essentially the following code (in gtkmm 2.4):
while (Gtk::Main::events_pending()) Gtk::Main::iteration(false);
(This should hopefully be the same in the gtkmm version you use but if in doubt consult the manual.)
It should be done immediately after updating the progress bar fraction and before simulation is continued.
This recursively enters (parts of) the main loop and processes all pending events in the event queue of Gtk::Main and thus, the progress bar is exposed before the simulation continues. You may be concerned to "recursively enter the main loop" but I read somewhere in the GTK+ manual that it is allowed (and reasonable to solve problems like this) and what to care about (i.e. to limit the number of recursions and to grant a proper "roll-back").
What in your case is the simulation we call in general long running functions. Because such long running functions are algorithms (in libraries for anything) which shall not be polluted with any GUI stuff, we built some administrational infra structure around this basic concept including
a progress "proxy" object with an update(double) method and a signal slot
a customized progress dialog which can connect a signal handler to such a progress object (i.e. its signal slot).
The long running function gets a progress object (as argument) and is responsible to call the Progress::update() method in appropriate intervals with an appropriate progress factor. (We simply use values in the range [0, 1].)
One issue is the interval of calling the progress update. If it is called to often the GUI will slow down your long running function significantly. The opposite case (calling it not often enough) results in less responsiveness of GUI. Thus, we decided for more often progress update. To lower the time consuming of GUI, we remember the time of last update in our progress dialog and skip the next refreshs until a certain duration since last refresh is measured. Thus, the long running function has still some extra effort for progress update but it is not recognizable anymore. (A good refresh interval is IMHO 0.1 s - the perception threshold of humans but you may choose 0.05 s if in doubt.)
Flushing all pending events results in processing of mouse events (and other GTK+ signals) also. This allows another useful feature: aborting the long running function.
When the "Cancel" button of our progress dialog is pressed it sets an internal flag. If the progress is updated next time it checks the flag. If the flag became true it throws a special exception. The throw aborts the caller of the progress update (the long running function) immediately. This exception must be catched in the signal handler of the button (or whatever called the long running function). Otherwise, it would "fall through" to the event dispatcher in Gtk::Main where it is catched definitely which would abort your application. (I saw it often enough whenever I forgot to catch.) On the other hand: catching the special exception tells clearly that the long running function has been aborted (in opposition to ended by regulary return). This may or may not be something which can be stated on GUI also.
Finally, the above solution can cause another issue: It enables to start the simulation (via GUI) while a simulation is already running. This is possible because button presses for simulation start could be processed while in progress update. To prevent this, there is actually a simple solution: set a flag at start of simulation in the GUI until it has finished and prevent further starts while the flag is set. Another option can be to make the widget/action insensitive when simulation is started. This topic becomes more complicated if you have multiple distinct long running functions in your application which may or may not exclude each other - leads to something like an exclusion matrix. Well, we solved it pragmatically... (but without the matrix).
And last but not least I want to mention that we use a similar concept for output of log views (e.g. visual logging of infos, warnings, and errors while anything long running is in progress). IMHO it is always good to provide some visual action for end users. Otherwise, they might get bored and use the telephone to complain about the (too) slow software which actually steals you the time to make it faster (a vicious cycle you have to break...)

Could not find the ColdFusion component or interface only rarely

In a ColdBox application I have this code in my main handler's onException function:
getModel('JVMUtils#cbcommons')
In the past month I've seen this throw the error that it can't find it 17 times. During the same time, the application has (sadly) had hundreds if not thousands of unhandled exceptions that hit the onException handler and this particular line of code and didn't die from not finding the component.
What could cause it to find it nearly every time, but not on these rare occasions?

Siddhi CEP 4.x: Multiple results per group when using time batch window

Using siddhi 4.1.0
Is there any possibility to apply time.windowBatch for upcoming events? I understood the time window is working based on already arrived events.
Say for example,
I am getting multiple results while using window.timeBatch(2 min) with group by clause.
In the given 2 min duration I passed the 50 input events periodically. The expected behavior is all those events put together and given as an single result(used count function to verify). But it gives two results like 40 and 10. Is that first 40 events are fall into the one time window period and second is next window? In this case how I will merge or get all those events are single output for 2 mins?
Also I want to start the time-window once the first event arrived.
I experienced the time-window is running in background, in case the events are coming in middle of first time window it collects the events for 1 min only. The remaining one minute events collected by next time window. So, finally I got 2 batched results.
Please suggest there is any other solution to achieve.
Usecase:
My usecase is based on time duration(time.windowBatch(1 min)) for monitoring switches. I would like to implement following usecase.
Use case:
The switch sends the SNMP traps to CEP. The traps are like switchFanFailed and switchFanOk.
If I am receiving switchFanFailed trap the next trap that I am expecting switchFanOk trap will be within the 1 min. Incase the switchFanOk trap is not received within 1 min, then CEP would generate a notification through email. Otherwise It will discard that trap.
Even though my trap generator generate the traps switchFanFailed and switchFanOk within 1 min duration as constant, In some cases I am not able receive the traps in same window.
Say for example, switchFanFailed is coming end of the 0.50 sec, from here I should wait for 1 min to expect switchFanOk trap.
Sorry, I am bit confused with your usecase.. :)
Whether your usecase is based on time or length or both.. For time batch window, it starts only after 1st event comes..
If you want to wait until 50 events (or any no of events to arrive) then you have to use lengthBatch window.. If you want process based on time and batch it then use timeBatch window..
Do you have any fixed no of events ? If not, CEP/Siddhi cannot wait/batch indefinitely. There should be something to say end of batch. Isn't ?
I had a same issue and it always create two summarised for any number of records sent in to my grouping query. The fix for my issue was, one value was differnt from others which was used in the grouping. i suggest you to check the grouping.
if you think of merging two records i suggest you to ues a time batch window
timeBatch(1 min) which will summarise the out put of your current data set.

Particular NServiceBus Sagas: Concurrent Access to Saga Data Persisted in Azure Table Storage

This question is in regards to concurrent access to saga data, when saga data is persisted in Azure Table Storage. It is also references information found in Particular's documentation: http://docs.particular.net/nservicebus/nservicebus-sagas-and-concurrency
We've noticed that, within a single saga executing handlers concurrently, modifications to saga data appear to be operating in a "last one to post changes to azure table storage wins" scenario. Is this intended behavior when using NSB in conjunction with Azure Table Storage as the Saga data persistence layer?
Example:
Integer property in Saga Data, assume it currently = 5
5 commands are handled by 5 instances of the same handler in this saga
Each command handler decrements the integer property in saga data
Final value of the integer property in saga data could actually be 4 after handling these 5 messages - if each message was handled by a new instance of the saga, potentially on different servers, each having a copy of saga data indicating the integer property is 5, decrement it to 4, and post back up. What I just described was the extremely concurrent example, however it is likely the integer will be greater than 0 if any of the 5 messages were handled concurrently, the only time the saga data integer property reaches 0 is when the 5 commands happen to have executing serially.
Also, as Azure Table Storage supports Optimistic Concurrency, is it possible to enable the use of of this feature for Table Storage just as it is enabled for RavenDB when Raven is used as the persistence tech?
If this is not possible, what is the recommended approach for handling this? Currently we are subscribing to the paradigm that any handler in a saga that could ever potentially be handling multiple messages concurrently is not allowed to modify saga data, meaning our coordination of saga message is being accomplished via means external to the saga rather than using Saga Data as we'd initially intended.
After working with Particular support - the symptoms described above ended up being an defect in NServiceBus.Azure. This issue has been patched by Particular in NServiceBus.Azure 5.3.11 and 6.2+. I can personally confirm that updating to 5.3.11 resolved our issues.
For reference, a tell-tale sign of this issue manifesting itself is the following exception getting thrown and not getting handled.
Failed to process message
Microsoft.WindowsAzure.Storage.StorageException: Unexpected response
code for operation : 0
The details of the exception will indicate that "UpdateConditionNotSatisfied" - referring to the optimistic concurrency check.
Thanks to Yves Goeleven and Sean Feldman from Particular for diagnosing and resolving this issue.
The azure saga storage persister uses optimistic concurency, if multiple messages arrive at the same time, the last one to update should throw an exception, retry and make the data correct again.
So this sounds like a bug, can you share which version you're on?
PS: last year we have resolved an issue that sounds very similar to this https://github.com/Particular/NServiceBus.Azure/issues/124 it has been resolved in NServiceBus.Azure 5.2 and upwards