trace-cmd by process ID seems not working - ftrace

I am working on a SUSE linux work node and want to find out some performance issue on it.
The linux kernel looks as below:
5.3.18-24.99-default #1 SMP Sun Jan 23 19:03:51 UTC 2022 (712a8e6) x86_64 x86_64 x86_64 GNU/Linux
After installing trace-cmd, I am able to get trace result using the command like trace-cmd record -p function_graph ls.
But when it comes to process id like trace-cmd record -p function_graph -P 87166, I find nothing is toggled, actually the process exist in the system.
**87166** root 20 0 21.829g 577592 22040 S 33.33 0.748 456:41.80 java
root 87166 86790 29 Oct06 ? 07:39:39 java -server -Dservice.log.base.dir=/var/log/java-performance-benchmark -XX:ErrorFile=/var/log/jvm/java-performance-benchmark_hs_err.log -XX:+PrintCommandLineFlags -jar /opt/java-performance-benchmark/java-performance-benchmark-app.jar
worker-pool1-p1aooifw-eccd-ibd-udm80635:/home/eccd # trace-cmd record -P 87166 -p function_graph
plugin 'function_graph'
Hit Ctrl^C to stop recording
^CCPU0 data recorded at offset=0x646000
0 bytes in size
CPU1 data recorded at offset=0x646000
0 bytes in size
CPU2 data recorded at offset=0x646000
0 bytes in size
CPU3 data recorded at offset=0x646000
0 bytes in size
CPU4 data recorded at offset=0x646000
0 bytes in size
CPU5 data recorded at offset=0x646000
0 bytes in size
CPU6 data recorded at offset=0x646000
0 bytes in size
CPU7 data recorded at offset=0x646000
0 bytes in size
CPU8 data recorded at offset=0x646000
0 bytes in size
CPU9 data recorded at offset=0x646000
0 bytes in size
CPU10 data recorded at offset=0x646000
0 bytes in size
CPU11 data recorded at offset=0x646000
0 bytes in size
CPU12 data recorded at offset=0x646000
0 bytes in size
CPU13 data recorded at offset=0x646000
0 bytes in size
CPU14 data recorded at offset=0x646000
0 bytes in size
CPU15 data recorded at offset=0x646000
0 bytes in size
CPU16 data recorded at offset=0x646000
0 bytes in size
CPU17 data recorded at offset=0x646000
0 bytes in size
How could I debug such an issue?

Related

CentOS 7: LVM swap extension not shown by the "free" command

I'm running a CentOS 7 guest on a VirtualBox 6 on Windows. The result of the free command is as follows:
$ free -h
total used free shared buff/cache available
Mem: 15G 2.4G 11G 162M 1.5G 12G
Swap: 1.2G 0B 1.2G
showing that the swap partition has 1.2 GB. I need to extend it to at least 2GB. So, with the guest stopped, I added a new volume of 1.2 GB and, after having rebooted, I did as follows:
$ sudo pvcreate /dev/sdb
$ sudo vgextend centos /dev/sdb
$ sudo lvextend -L+1G /dev/centos/swap
Now, the lvdisplay command shows the new created volume, as follows:
$ sudo lvdisplay
--- Logical volume ---
LV Path /dev/centos/swap
LV Name swap
VG Name centos
LV UUID 1OT4R8-69eL-vczL-zydM-XrwS-jA47-YfikMS
LV Write Access read/write
LV Creation host, time localhost, 2019-12-30 22:01:35 +0100
LV Status available
# open 2
LV Size <2.20 GiB
Current LE 563
Segments 2
Allocation inherit
Read ahead sectors auto
- currently set to 8192
Block device 253:1
--- Logical volume ---
LV Path /dev/centos/root
LV Name root
VG Name centos
LV UUID hGDGPf-iPMB-TUtM-nqRv-aDNd-D3mw-W15H8Z
LV Write Access read/write
LV Creation host, time localhost, 2019-12-30 22:01:35 +0100
LV Status available
# open 1
LV Size <76.43 GiB
Current LE 19565
Segments 3
Allocation inherit
Read ahead sectors auto
- currently set to 8192
Block device 253:0
The fstab file looks as follows:
dev/mapper/centos-root / xfs defaults 0 0
UUID=4ef0416f-1617-40da-99d2-83896d808eed /boot xfs defaults 0 0
/dev/mapper/centos-swap swap swap defaults 0 0
showing that the swap is allocated on the /dev/mapper/centos-swap partition. Here is the out put of the fstab command:
Disk /dev/mapper/centos-root: 82.1 GB, 82061557760 bytes, 160276480 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk /dev/mapper/centos-swap: 2361 MB, 2361393152 bytes, 4612096 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
But after reboot the swapon command doesn't seem to reflect the extension:
$ sudo swapon -s
Filename Type Size Used Priority
/dev/dm-1 partition 1257468 0 -2
For some reason, the swap doesn't seem to be on the /dev/mapper/centos-swap partition but on /dev/dm-1, which doesn't even exist. And the free command still shows the same result like in the beggining:
$ free -h
total used free shared buff/cache available
Mem: 15G 2.4G 11G 155M 1.5G 12G
Swap: 1.2G 0B 1.2G
and the /proc/swaps:
$ cat /proc/swaps
Filename Type Size Used Priority
/dev/dm-1 partition 1257468 0 -2
What am I missing here ?
Seymour
I'm answering my own question. The issue is simply solved by running the following command:
sudo mkswap /dev/mapper/centos-swap
After that, the free command shows the new increased swap space and the /proc/swaps file also reflect that.
I found the solution by chance, while surfing for another topic. It seems that, as a matter of fact, after having created the physical volume and after having extended the virtual group and the logical volume, it's not enough to declare the new swap with swapon command, but it also requires to effectively "make" the swap using the mkswap command.
Don't ask me why, this is the way it works :-).

In Hadoop HDFS, how many data nodes a 1GB file uses to be stored?

I have a file of 1GB size to be stored on HDFS file system. I am having a cluster setup of 10 data nodes and a namenode. Is there any calculation that the Namenode uses (not for replicas) a particular no of data nodes for the storage of the file? Or Is there any parameter that we can configure to use for a file storage? If so, what is the default no of datanodes that Hadoop uses to store the file if it is not specifically configured?
I want to know if it uses all the datanodes of the cluster or only specific no of datanodes.
Let's consider the HDFS block size is 64MB and free space is also existing on all the datanodes.
Thanks in advance.
If the configured block size is 64 MB, and you have a 1 GB file which means the file size is 1024 MB.
So the blocks needed will be 1024/64 = 16 blocks, which means 1 Datanode will consume 16 blocks to store your 1 GB file.
Now, let's say that you have a 10 nodes cluster then the default replica is 3, that means your 1 GB file will be stored on 3 different nodes. So, the blocks acquired by your 1 GB file is -> *16 * 3 =48 blocks*.
If your one block is of 64 MB, then total size your 1 GB file consumed is ->
*64 * 48 = 3072 MB*.
Hope that clears your doubt.
In Second(2nd) Generation of Hadoop
If the configured block size is 128 MB, and you have a 1 GB file which means the file size is 1024 MB.
So the blocks needed will be 1024/128 = 8 blocks, which means 1 Datanode will contain 8 blocks to store your 1 GB file.
Now, let's say that you have a 10 nodes cluster then the default replica is 3, that means your 1 GB file will be stored on 3 different nodes. So, the blocks acquired by your
1 GB file is -> *8 * 3 =24 blocks*.
If your one block is of 128 MB, then total size your 1 GB file consumed is -
*128 * 24 = 3072 MB*.

Determine if an allocation via malloc() is backed by a huge page

I understand pretty well how transparent hugepages work, and that any allocation, such as those performed by malloc may be satisfied by a huge page.
What I'd like to know, is if there is any check I can make (possibly heuristic) after an allocation to determine if the memory is backed by a huge page.
You can determine the exact status of any page, including whether it is backed by a transparent (or non-transparent) hugepage by looking up the "pfn" (page frame number) in the /proc/kpageflags file. You get the pfn for a page by reading from the /proc/$PID/pagemap file for your process, which is indexed by virtual address.
Unfortunately, both the pfn value from pagemap1 and the entire /proc/kpageflags file are accessible only to root users. Still if you can run your process as root at least in the testing or benchmarking scenario you are interested in, this works well.
I wrote a small library called page-info which does the relevant parsing for you. Give it a range of memory and it will return you info on each page, including whether it is present in memory, backed by a hugepage, etc.
For example, running the included test process as sudo ./page-info-test THP gives the following output:
PAGE_SIZE = 4096, PID = 18868
size memset FLAG SET UNSET UNAVAIL
0.25 MiB BEFORE THP 0 1 64
0.25 MiB AFTER THP 0 65 0
0.50 MiB BEFORE THP 0 1 128
0.50 MiB AFTER THP 0 129 0
1.00 MiB BEFORE THP 0 1 256
1.00 MiB AFTER THP 0 257 0
2.00 MiB BEFORE THP 0 1 512
2.00 MiB AFTER THP 0 513 0
4.00 MiB BEFORE THP 0 1 1024
4.00 MiB AFTER THP 512 513 0
8.00 MiB BEFORE THP 0 1 2048
8.00 MiB AFTER THP 1536 513 0
16.00 MiB BEFORE THP 0 1 4096
16.00 MiB AFTER THP 3584 513 0
32.00 MiB BEFORE THP 0 1 8192
32.00 MiB AFTER THP 7680 513 0
64.00 MiB BEFORE THP 0 1 16384
64.00 MiB AFTER THP 15872 513 0
128.00 MiB BEFORE THP 0 1 32768
128.00 MiB AFTER THP 32256 513 0
256.00 MiB BEFORE THP 0 1 65536
256.00 MiB AFTER THP 65024 513 0
512.00 MiB BEFORE THP 0 1 131072
512.00 MiB AFTER THP 124416 6657 0
1024.00 MiB BEFORE THP 0 1 262144
1024.00 MiB AFTER THP 0 262145 0
DONE
The UNAVAIL column means that no information about the mapping was available - usually because the page has never been accesses and so isn't yet backed by any page at all. You can see that for these "largeish" allocations only a single page is mapped in following the allocation, since we haven't touched the memory.
The AFTER rows are the same information after calling memset() on the entire allocation, which causes all pages to be physically allocated. Here we can see that no allocations are backed by transparent hugepages until we hit allocations of 4 MiB, at which point the majority of each allocation is backed by THP, except for 513 pages (which turn out to be at the edges of the allocated region). At 512 MiB the system starts running out of available hugepages but still satisfies most of the allocation, but at 1024 MiB the entire allocation is satisfied with small pages.
This library isn't production ready so don't use it for anything critical (e.g., some failures simply call exit()). Contributions welcome.
1 Since kernel 4.0 approximately, before that the pfn was accessible to non-root user processes. From 4.0 to 4.1 or thereabouts, the entire pagemap was off-limits to non-root processes, but since then the file is again available but with the pfn masked out (it will always appear as zero).
There is a difference between traditional hugepages and transparent huge pages (THP). In the case of THP's, the application can use huge pages without any developer support (mmap, shmget, etc) or sys-admin intervention.
In the code, I am afraid there may be no straight forward way check this. However, if you know the sizeof() allocated data structure or buffers, it worth grepping and checking the THP usage on the system using the following command. This usage should increase while running your application:
# grep AnonHugePages /proc/meminfo
AnonHugePages: 2648064 kB

c/c++: How can I know the size of used flash memory?

I recently faced flash overflow problem. After doing some optimization in code, I saved some flash memory and executed software successfully. I want to how much flash memory is saved through my changes. Please let me know how can I check for used flash / available flash memory. Also I want to how much flash is utilized by particular function/file.
Below mentioned are some info about my developing environment.
- Avr microcontroller with 64 k ram and 512 K flash.
- Using freeRtos.
- Using GNU C++ compiler.
- Using AVRATJTAGEICE for programming and Debugging.
Please let me know the solution.
Regards,
Jagadeep.
GCC's size program is what you're looking for.
size can be passed the full compiled .elf file. It will, by default, output something like this:
$ size linked-file.elf
text data bss dec hex filename
11228 112 1488 12828 321c linked-file.elf
This is saying:
There are 11228 bytes in the .text "section" of this file. This is generally for functions.
There are 112 bytes of initialized data: global variables in the program with initial values.
There are 1488 bytes of uninitialized data: global variables without initial values.
dec is simply the sum of the previous 3 values: 11228 + 112 + 1488 = 12828.
hex is simply the hexadecimal representation of the dec value: 0x321c == 12828.
For embedded systems, generally dec needs to be smaller than the flash size of your target device (or the available space on the device).
It is generally sufficient to simply watch the dec or text outputs of GCC's size command to monitor the size of your compiled code over time. A large jump in size often indicates a poorly implemented new feature or constexpr that are not getting compiled away. (Don't forget function-sections and data-sections).
Note: For AVR's, you'll want to use avr-size for checking the linked size of AVR .elf files. avr-size takes an extra argument of the target chip and will automatically calculate the percentage of used flash for your chosen chip.
GCC's size also works directly on intermediate object files.
This is particularly useful if you want to check the compiled size of functions.
You should see something like this excerpt:
$ size -A main.cpp.o
main.cpp.o :
section size addr
.group 8 0
.group 8 0
.text 0 0
.data 0 0
.bss 0 0
.text._Z8sendByteh 8 0
.text._ZN3XMC5IOpin7setModeENS0_4ModeE 64 0
.text._ZN7NamSpac6OptionIN5Clock4TimeEEmmEi 76 0
.text.Default_Handler 24 0
.text.HardFault_Handler 16 0
.text.SVC_Handler 16 0
.text.PendSV_Handler 16 0
.text.SysTick_Handler 28 0
.text._Z5errorPKc 8 0
.text._ZN7NamSpac5Motor2goEi 368 0
.text._ZN7NamSpac5Motor3getEv 12 0
.rodata.cst1 1 0
.text.startup.main 632 0
.text._ZN7NamSpac7Program3runEv 380 0
.text._ZN7NamSpac8Position4tickEv 24 0
.text.startup._GLOBAL__sub_I__ZN7NamSpac7displayE 292 0
.init_array 4 0
.bss._ZN5Debug9formatterE 4 0
.rodata._ZL10dispDigits 8 0
.bss.position 4 0
.bss.motorState 4 0
.bss.count 4 0
.rodata._ZL9diameters 20 0
.bss._ZN7NamSpac8diameterE 16 0
.bss._ZN5Debug3pinE 12 0
.bss._ZN7NamSpac7displayE 24 0
.rodata.str1.4 153 0
.rodata._ZL12dispSegments 32 0
.bss._ZL16diametersDisplay 10 0
.bss.loadAggregate 4 0
.bss.startCount 4 0
.bss._ZL15runtimesDisplay 10 0
.bss._ZN7NamSpac7runtimeE 16 0
.bss.startTime 4 0
.rodata._ZL8runtimes 20 0
.comment 111 0
.ARM.attributes 49 0
Total 2494
Please let me know the solution.
Sorry, there's no the solution! You've gotta getting through what's linked to your final ELF, and decide if it was linked by intend, or unwanted default.
Please let me know how can I check for used flash / available flash memory.
That primarily depends on your actual target hardware platform, so you have to manage to get your .text section fitting in there.
Also I want to how much flash is utilized by particular function/file.
The nm tool of the GCC binutils provides detailed information about any (global) symbol found in an ELF file and the space it occupies in it's associated section. You'll just need to grep the results for particular functions/classes/namespaces (best demangled!) to accumulate section type and symbol filtered outputs for analysis.
That's the approach, I've been using for a little tool called nmalyzr. Sorry to say, as it stands on the GIT repo, its not really working as intended (I've got working versions, that aren't pushed back).
In general, it's a good strategy to chase for code that has #include <iostream> statements (no matter if std::cout or alike are used or not, static instances are provided!), or unwanted newlib/libstdc++ bindings as for e.g. default exception handling.
Use size command from binutils on the generated elf file. As you seem to use an AVR chip, use avr-size.
To get the size of functions, use nm command from binutils (avr-nm on AVR chips).

Linux Network Interface Usage Monitoring in C/C++

I am in a situation where I am extremely bandwidth limited and must devote most of the bandwidth to transferring one type of measurement data. Sometimes I will be sending out lots of this measurement data and other times I will just be waiting for events to occur (all of this is over a TCP socket).
I would like to be able to stream out the full data capture file (different than the measurements) in the background at a speed that is inversely proportional to the amount of measurements that I am sending back.
I am looking for a way to monitor how many bytes are being sent out the network interface in much the same was as the system monitor on Ubuntu. The source code for the system monitor relies on gnome libraries and since my program is on an embedded device, I would like to reduce the number of external libraries that I use. Does anybody know of a way to do this in C/C++ without many additional libraries on a standard Linux distribution?
One of the simplest ways is to parse the file: /proc/net/dev
Mine contains:
Inter-| Receive | Transmit
face |bytes packets errs drop fifo frame compressed multicast|bytes packets errs drop fifo colls carrier compressed
lo: 44865 1431 0 0 0 0 0 0 44865 1431 0 0 0 0 0 0
eth0:150117850 313734 0 0 0 0 0 0 34347178 271210 0 0 0 0 0 0
pan0: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
You could then write a parser that uses nothing other than the C/C++ libraries.
Bytes transmitted and recieved accessable via
/sys/class/net/eth0/statistics/tx_bytes and /sys/class/net/eth0/statistics/rx_bytes files.
$ cat /sys/class/net/net1/statistics/rx_bytes
1055448
Use NetLink sockets of RTNetLink interface socket, they will get you the required in struct net_device_stats format