Why Jprofiler 9.11 stops doing profiling (CPU views - Hotspots) after some time? - profiling

Jprofiler version 9.11 stops doing profiling after ~20 Minutes. Is there any hidden configuration that allows profiling for a longer period of time? Any helps are appreciated!

The UI is not responding! I increase the memory size and do change in configuration so that the graphs and the statistics are updated in a lower frequency. Updates are triggered after each 30s vs. default 5s!

Related

High cpu usage of initAggregator method in AttributeAggregator

We are using siddhi-3.1.2 as our cep engine. We are currently using 1 thread to process all our data in siddhi. The data processing speed is very slow.
Profiling the cpu usage, we see that the majority of the time is spent in initAggregator method. Yourkit snapshot analysis shows that the CPU usage on a 1s sampling is about 954ms on this particular method.
Analytics (1/1) TID=322 STATE=RUNNABLE CPU_TIME=1978 (87.24%) USER_TIME=1910 (84.20%) Allocted: 512423096
org.wso2.siddhi.core.query.selector.attribute.aggregator.AttributeAggregator.initAggregator(AttributeAggregator.java:47)
org.wso2.siddhi.core.query.selector.attribute.processor.executor.GroupByAggregationAttributeExecutor.execute(GroupByAggregationAttributeExecutor.java:52)
org.wso2.siddhi.core.query.selector.attribute.processor.AttributeProcessor.process(AttributeProcessor.java:38)
org.wso2.siddhi.core.query.selector.QuerySelector.processInBatchGroupBy(QuerySelector.java:225)
org.wso2.siddhi.core.query.selector.QuerySelector.process(QuerySelector.java:78)
org.wso2.siddhi.core.query.processor.stream.window.WindowProcessor.processEventChunk(WindowProcessor.java:57)
org.wso2.siddhi.core.query.processor.stream.AbstractStreamProcessor.process(AbstractStreamProcessor.java:101)
The hot spot in the CPU is shown below.
YourKitSnapshotImage
I'm wondering if this time spent on initAggregator is expected in general or is that dependent on the siddhi query that is being processed.

High CPU utilisation while using MQTT deamon..!

I am using MQTT daemon in background to receive and send data to the server and this is a cyclic process so i have written the above functionality in thread(C++ & Qt) and the code is working fine.But the problem is it is consuming high CPU usage i.e min 91%-99% max,i have gone through my code several times but i was unable to spot the affecting area.
Please guide to find it,i was using Linux os with a kernel version 3.1
Thanks in advance,
Rohith.G
MQTT has a loop to check for messages. Please include a few micro second sleep. This reduces the high CPU usage drastically.
while True:
mqttc.loop_start()
time.sleep(0.001)
To reduce the CPU usage that was being used by mosquitto-deamon i have changed the keepalive value in the library source,it worked for me..!

How to use .. QNX Momentics Application Profiler?

I'd like to profile my (multi-threaded) application in terms of timing. Certain threads are supposed to be re-activated frequently, i.e. a thread executes its main job once every fixed time interval. In other words, there's a fixed time slice in which all the threads a getting re-activated.
More precisely, I expect certain threads to get activated every 2ms (since this is the cycle period). I made some simplified measurements which confirmed the 2ms to be indeed effective.
For the purpose of profiling my app more accurately it seemed suitable to use Momentics' tool "Application Profiler".
However when I do so, I fail to interpret the timing figures that I selected. I would be interested in the average as well in the min and max time it takes before a certain thread is re-activated. So far it seems, the idea is to be only able to monitor the times certain functions occupy. However, even that does not really seem to be the case. E.g. I've got 2 lines of code that are put literally next to each other:
if (var1 && var2 && var3) var5=1; takes 1ms (avg)
if (var4) var5=0; takes 5ms (avg)
What is that supposed to tell me?
Another thing confuses me - the parent thread "takes" up 33ms on avg, 2ms on max and 1ms on min. Aside the fact that the avg shouldn't be bigger than max (i.e. even more I expect avg to be not bigger than 2ms - since this is the cycle time), it's actually increasing the longer I run the the profiling tool. So, if I would run the tool for half an hour the 33ms would actually be something like 120s. So, it seems that avg is actually the total amount of time the thread occupies the CPU.
If that is the case, I would assume to be able to offset against the total time using the count figure which doesn't work either. Mostly due to the figure being almost never available - i.e. there is only as a separate list entry (for every parent thread) called which does not represent a specific process scope.
So, I read QNX community wiki about the "Application Profiler", incl. the manual about "New IDE Application Profiler Enhancements", as well as the official manual articles about how to use the profiler tool.. but I couldn't figure out how I would use the tool to serve my interest.
Bottom line: I'm pretty sure I'm misinterpreting and misusing the tool for what it was intended to be used. Thus my question - how would I interpret the numbers or use the tool's feedback properly to get my 2ms cycle time confirmed?
Additional information
CPU: single core
QNX SDP 6.5 / Momentics 4.7.0
Profiling Method: Sampling and Call Count Instrumentation
Profiling Scope: Single Application
I enabled "Build for Profiling (Sampling and Call Count Instrumentation)" in the Build Options1
The System Profiler should give you what you are looking for. It hooks into the micro kernel and lets you see the state of all threads on the system. I used it in a similar setup to find out what our system was getting unexpected time-outs. (The cause turned out to be Page Waits on critical threads.)

Locust.io: Controlling the request per second parameter

I have been trying to load test my API server using Locust.io on EC2 compute optimized instances. It provides an easy-to-configure option for setting the consecutive request wait time and number of concurrent users. In theory, rps = wait time X #_users. However while testing, this rule breaks down for very low thresholds of #_users (in my experiment, around 1200 users). The variables hatch_rate, #_of_slaves, including in a distributed test setting had little to no effect on the rps.
Experiment info
The test has been done on a C3.4x AWS EC2 compute node (AMI image) with 16 vCPUs, with General SSD and 30GB RAM. During the test, CPU utilization peaked at 60% max (depends on the hatch rate - which controls the concurrent processes spawned), on an average staying under 30%.
Locust.io
setup: uses pyzmq, and setup with each vCPU core as a slave. Single POST request setup with request body ~ 20 bytes, and response body ~ 25 bytes. Request failure rate: < 1%, with mean response time being 6ms.
variables: Time between consecutive requests set to 450ms (min:100ms and max: 1000ms), hatch rate at a comfy 30 per sec, and RPS measured by varying #_users.
The RPS follows the equation as predicted for upto 1000 users. Increasing #_users after that has diminishing returns with a cap reached at roughly 1200 users. #_users here isn't the independent variable, changing the wait time affects the RPS as well. However, changing the experiment setup to 32 cores instance (c3.8x instance) or 56 cores (in a distributed setup) doesn't affect the RPS at all.
So really, what is the way to control the RPS? Is there something obvious I am missing here?
(one of the Locust authors here)
First, why do you want to control the RPS? One of the core ideas behind Locust is to describe user behavior and let that generate load (requests in your case). The question Locust is designed to answer is: How many concurrent users can my application support?
I know it is tempting to go after a certain RPS number and sometimes I "cheat" as well by striving for an arbitrary RPS number.
But to answer your question, are you sure your Locusts doesn't end up in a dead lock? As in, they complete a certain number of requests and then become idle because they have no other task to perform? Hard to tell what's happening without seeing the test code.
Distributed mode is recommended for larger production setups and most real-world load tests I've run have been on multiple but smaller instances. But it shouldn't matter if you are not maxing out the CPU. Are you sure you are not saturating a single CPU core? Not sure what OS you are running but if Linux, what is your load value?
While there is no direct way of controlling rps, you can try constant_pacing and constant_throughput option in wait_time
From docs
https://docs.locust.io/en/stable/api.html#locust.wait_time.constant_throughput
In the following example the task will always be executed once every 1 seconds, no matter the task execution time:
class MyUser(User):
wait_time = constant_throughput(1)
constant_pacing is inverse of this.
So if you run with 100 concurrent users, test will run at 100rps (assuming each request takes less than 1 second in first place

How to turn off flash protection against long executing scripts?

I am profiling some AS code by measuring wall clock time. In order to minimize the error I need to run the code for a long period of time. However, flash seems to protect itself from unresponsive scripts by throwing an exception after some period of unresponsiveness, namely: Error #1502: A script has executed for longer than the default timeout period of 15 seconds.
Is there any way to disable this protection, or at least extend the timeout period?
If you are publishing with Adobe Flash CS/4/5 etc.
Goto the publish settings. select "flash" at the bottom of this screen there is a textbox which says "Script Timeout" I know you can increase this, I think the limit is 90seconds even though you can enter any value here.
Can you move execution of the script across separate frames, and add a timer to advance the frame before the timeout period has lapsed? I believe the error only occurs when you've dwelled on a frame for more than 15 seconds.