High CPU utilisation while using MQTT deamon..! - c++

I am using MQTT daemon in background to receive and send data to the server and this is a cyclic process so i have written the above functionality in thread(C++ & Qt) and the code is working fine.But the problem is it is consuming high CPU usage i.e min 91%-99% max,i have gone through my code several times but i was unable to spot the affecting area.
Please guide to find it,i was using Linux os with a kernel version 3.1
Thanks in advance,
Rohith.G

MQTT has a loop to check for messages. Please include a few micro second sleep. This reduces the high CPU usage drastically.
while True:
mqttc.loop_start()
time.sleep(0.001)

To reduce the CPU usage that was being used by mosquitto-deamon i have changed the keepalive value in the library source,it worked for me..!

Related

Network Adapter Recieved/Sent bits in C++

Oh all mighty coding guru's, hear my plea Bows down
I am relatively new to C++ and I am trying to make an agent server to sit on a Windows machine and return statistics when polled by a client program. I have already created the UDP communication (For speed) and have figured out how to report back memory, processor utilization and disk space. I am having a hard time trying to figure out how to get the network statistics for a given adapter.
My plan is this: you provide the index number of an interface to the program, and it spits out the network bits per second both received and sent. This way I can monitor the network utilization and see if an adapter is getting slammed (Similar to what you see in Task Manager).
Reason for this is SNMP on Windows only has 32bit integers which is fine if your network bandwidth is 100mbps but when it is gigabit, the counter wraps around faster than I can poll it, thus giving unreliable results.
Is there any way to do this?

IO-starving causes?

I have a kind of a complex application, which uses heavy IO: it includes ffmpeg and doing video transcoding (by software, no HW-acceleration is available).
This might be unimportant, but I wanted to emphasize it.
All video transcoding functions are working on their own std::thread, and using libev for IO management.
Hardware details:
CPU architecture is ARM
OS is Debian.
Now, I'm trying to use ping to check if a particular server is available. I found ping's open source, and included in the application. This is running an a completely different std::thread.
Problem:
If there is no video-transcoding in progress, ping function works as intended.
However, when there is a CPU-intensive transcoding, ping function 99% returns with timeout.
I suspect there is some kind of IO starving, so I deep dived into pings source:
I found out that ping uses the old select function call to detect if there is an I/O available. I was almost sure this causing the problem, and I have refactored the code: dropped select, and put libev into action.
But unfortunately the starving still stays the same.
I have almost accepted this, as the video transcoding really puts a huge load onto the CPU (70-80%).
But, if I run my application from an SSH session #A, and run ping from another SSH session #B, my application can do the transcoding, and there is no singe lost packet from ping on session #B.
Conclusion:
The hardware seems capable of running my heavy application in parallel with ping.
Question:
I am really curious of:
Is there a limitation per process on Linux on how many IOs can the process use? (I guess, but how can I know this limitation? How can I raise this limitation? How can I check current usage?)
If there is not any problems with IO limitation, what else can cause this kind of "starving-like" problems between std::threads? (Anyway, pings thread seems not blocking, as it receives the timeout. It just does not receive a free IO operation) More accurate: ping actually CAN send out packets, but they just do not seem arriving back. And I am almost sure those ping packets replys are coming back, just my application is not getting a green light for READ operation.

dpdk-19.11,ixgbe PMD got imissed when up to 8Mpps(decreased 3Mpps),any one have solution or know the reason

using DPDK 17.02 with my custom application, I get to see missed fo 11Mpps. Using 19.11 DPDK it has reduced to 8Mpps. are there compile flags or code changes for ixgbe PMD which has reduced the same.
new updates:
the application arch is rx_cores(3)-->worker cores(16)-->tx cores(2),
when I increased tx cores to 3,it can reach 10M pps,but increased to 4,It didn't take affects any more
imissed stands for packets missed in HW, due to less number of poll cycles for RX thread. Lower the value better it is. Hence using DPDK 19.11 it is having reduced impact.
reasons for imissed to be lower is 19.11 can be
better compiler flags
better code optimization.
assuming you are using static libraries, code logic might be better fitting into the instruction cache.
note: you should really run profiler and use objdump to decipher this.
reasons for imissed in your applications could be the following reasons
CPU frequency is in power save and not performance (impact of not disabling C states in BIOS).
The is additional work or sleep added in RX thread loop will prevent rx_burst been invoked frequently
RSS is enabled, but traffic send to DPDK port is falling on 1 queue only. In case of too many RX queues like 16, this leads to increase delay to pick packet from the relevant queue.
iF RX thread is feeding worker cores based on flow id, there could be retries if RING full scenarios leading to loss of rx_burst.
If testpmd or example/skeleton or example/l2fwd is not having imissed. then here are my suggestions to debug your applciation.
try sending RX-TX from single-core
set power governed from powersave to performance.
ensure you are running the core threads on isolated cores.

How to figure out why UDP is only accepting packets at a relatively slow rate?

I'm using Interix on Windows XP to port my C++ Linux application more readily to port to Windows XP. My application sends and receives packets over a socket to and from a nearby machine running Linux. When sending, I'm only getting throughput of around 180 KB/sec and when receiving I'm getting around 525 KB/sec. The same code running on Linux gets closer to 2,500 KB/sec.
When I attempt to send at a higher rate than 180 KB/sec, packets get dropped to bring the rate back down to about that level.
I feel like I should be able to get better throughput on sending than 180 KB/sec but am not sure how to go about determining what is the cause of the dropped packets.
How might I go about investigating this slowness in the hopes of improving throughput?
--Some More History--
To reach the above numbers, I have already improved the throughput a bit by doing the following (that made no difference on Linux, but help throughput on Interix):
I changed SO_RCVBUF and SO_SNDBUF from 256KB to 25MB, this improved throughput about 20%
I ran optimized instead of debug, this improved throughput about 15%
I turned off all logging messages going to stdout and a log file, this doubled throughput.
So it would seem that CPU is a limiting factor on Interix, but not on Linux. Further, I am running on a Virtual Machine hosted in a hypervisor. The Windows XP is given 2 cores and 2 GB of memory.
I notice that the profiler shows the cpu on the two cores never exceeding 50% utilization on average. This even occurs when I have two instances of my application running, still it hovers around 50% on both cores. Perhaps my application, which is multi-threaded, with a dedicated thread to read from UDP socket and a dedicated thread to write to UDP socket (only one is active at any given time) is not being scheduled well on Interix and thus my packets are dropping?
In answering your question, I am making the following assumptions based on your description of the problem:
(1) You are using the exact same program in Linux when achieving the throughput of 2,500 KB/sec, other than the socket library, which is of course, going to be different between Windows and Linux. If this assumption is correct, we probably shouldn't have to worry about other pieces of your code affecting the throughput.
(2) When using Linux to achieve 2,500 KB/sec throughput, the node is in the exact same location in the network. If this assumption is correct, we don't have to worry about network issues affecting your throughput.
Given these two assumptions, I would say that you likely have a problem in your socket settings on the Windows side. I would suggest checking the size of the send-buffer first. The size of the send-buffer is 8192 bytes by default. If you increase this, you should see an increase in throughput. Use setsockopt() to change this. Here is the usage manual: http://msdn.microsoft.com/en-us/library/windows/desktop/ms740476(v=vs.85).aspx
EDIT: It looks like I misread your post going through it too quickly the first time. I just noticed you're using Interix, which means you're probably not using a different socket library. Nevertheless, I suggest checking the send buffer size first.

c++ process cpu usage jump causes detection

Given: multithreaded (~20 threads) C++ application under RHEL 5.3.
When testing under load, top shows that CPU usage jumps in range 10-40% every second.
The design mostly pretty simple - most of the threads implement active object design pattern: thread has a thread-safe queue, requests from other queues are pushed to the queue, while the thread only polling on the queue and process incomming requests. Processed request causes to a new request to be pushed to next processing thread.
The process has several TCP/UDP connection over each a data is received/sent in a high load.
I know I did not provided sufficiant data. This is pretty big application, and I'n not familiar well with all it's parts. It's now ported from Windows on Linux over ACE library (used for networking part).
Suppusing the problem is in the application and not external one, what are the techicues/tools/approaches can be used to discover the problem. For example I suspect that this maybe caused by some mutex contention.
I have faced similar problem some time back and here are the steps that helped me.
1) Start with using strace to see where the application is spending the time executing system calls.
2) Use OProfile to profile both the application and the kernel.
3) If you are using an SMP system , look at the numa settings,
In my case that caused a havoc .
/proc/appPID/numa_maps will give a quick look at how the access to the memory is happening.
numa misses can cause the jumps.
4) You have mentioned about TCP connections in your app.
Look at the MTU size and see its set to right value and
Depending upon the type of Data getting transferred use the Nagles Delay appropriately.
Nagles Delay