why does use "perf stat -B -e branches,branch-misses ./a.out" to get one process branch prediction miss is always zero in centos7 [duplicate]

why does use "perf stat -B -e branches,branch-misses ./a.out" to get one process branch prediction miss is always zero in centos7 [duplicate] - c++

I want to evaluate the performance of one process by "branch-misses" hardware event. But when I used perf stat to get "branch-misses" data, it always return 0 just because my os is in KVM.
Because it's trouble for me to get one real machine to do the test. So I want to know that is there any method to get "branch-misses" by "perf stat" when I am in KVM.
I'm really need your help. Thanks a lot.

Related

HTCondor - Partitionable slot not working

I am following the tutorial on
Center for High Throughput Computing and Introduction to Configuration in the HTCondor website to set up a Partitionable slot. Before any configuration I run
condor_status
and get the following output.
I update the file 00-minicondor in /etc/condor/config.d by adding the following lines at the end of the file.
NUM_SLOTS = 1
NUM_SLOTS_TYPE_1 = 1
SLOT_TYPE_1 = cpus=4
SLOT_TYPE_1_PARTITIONABLE = TRUE
and reconfigure
sudo condor_reconfig
Now with
condor_status
I get this output as expected. Now, I run the following command to check everything is fine
condor_status -af Name Slotype Cpus
and find slot1#ip-172-31-54-214.ec2.internal undefined 1 instead of slot1#ip-172-31-54-214.ec2.internal Partitionable 4 61295 that is what I would expect. Moreover, when I try to summit a job that asks for more than 1 cpu it does not allocate space for it (It stays waiting forever) as it should.
I don't know if I made some mistake during the installation process or what could be happening. I would really appreciate any help!
EXTRA INFO: If it can be of any help have have installed HTCondor with the command
curl -fsSL https://get.htcondor.org | sudo /bin/bash -s – –no-dry-run
on Ubuntu 18.04 running on an old p2.xlarge instance (it has 4 cores).
UPDATE: After rebooting the whole thing it seems to be working. I can now send jobs with different CPUs requests and it will start them properly.
The only issue I would say persists is that Memory allocation is not showing properly, for example:
But in reality it is allocating enough memory for the job (in this case around 12 GB).
If I run again
condor_status -af Name Slotype Cpus
I still get something I am not supposed to
But at least it is showing the correct number of CPUs (even if it just says undefined).

What is the output of condor_q -better when the job is idle?

Get inode for lookup in /proc/net/udp

I am trying to programmatically track the amount of data in my receive buffer. I am receiving UDP data. After doing some research it seems that the only way to do this in Linux is to look at /proc/net/udp. This seems like a good solution until I realized that two applications could be listening to the same multicast group and I need to tell them apart. It seems that I am supposed to do this by determining what my inode is.
I spend some time looking into this and there are suggestions that sockfd_lookup or sock_from_file is the way to go but on my CentOS Linux machine, these functions do not seem to be available.
Can someone please help me to figure out which line in /proc/net/udp belongs to my application?
I started using the ioctl (handle, FIONREAD, &bytesInBuffer) call only to discover that in Linux this only returns the size of the first datagram packet in the buffer.
Google seems to suggest that the sockfd_lookup call can be used to get the inode but a grep in my /usr/local/include/ does not return these functions.
My linux/net.h seems pretty bare-bone compared to some I can find on google which includes structs like "socket" which has the sock member which I believe has the inode information. My linux/net.h on CentOs only is 58 lines long and has only a few #defines and an enum.

after a bit of fiddling I noticed that readlink("/proc/self/fd/$fd") (under Linux 5.3) gives me something like:
socket:[3753088]
back. I can parse this and use the resulting digits to look up the relevant line in /proc/net/udp:
sl local_address rem_address st tx_queue rx_queue tr tm->when retrnsmt uid timeout inode ref pointer drops
2867: 00000000:0BB8 00000000:0000 07 00000000:00000000 00:00000000 00000000 1000 0 3753088 2 000000003ae8e911 0
that said, I don't understand why you'd want to do this, but never mind!
I'm also not sure why you don't just look up by sock&peer name, which might be easier

Not seeing performance increase running django test /w postgresql on ramdrive

For a django project, the tests are running slower and slower as the models grow and the number of migrations as well. A single unittest i have timed now at 200s.
I followed this post: http://tech.marksblogg.com/test-django-on-ram-drive.html to work on a ramdrive only for the testing but strangly enough i'm not seeing any improvement.... So i expect something is not going as it should...
I have debugged some and did see the TABLESPACE statements being generated towards postgres, for instance:
... CREATE TABLE ""django_content_type"" (""id"" serial NOT NULL PRIMARY KEY USING INDEX TABLESPACE ""ram_disk"", ""name"" varchar(100) NOT NULL, ""app_label"" varchar(100) NOT NULL, ""model"" varchar(100) NOT NULL) TABLESPACE ""ram_disk""",,,,,,,,,"
Could it be that postgresql rejects it? And how can i test/see if the ram drive is actually being used?
Paul

The advice given in that blog is pretty poor. Don't do that. Not only is it unsafe if you have anything else on your database that you care about, but it's also not going to address WAL and fsync costs, so it won't even be all that much faster.
A ramdisk offers little to no benefit on a modern virtual memory system. You're better off just using postgres as normal, with durability protections turned off for testing purposes.
See Optimise PostgreSQL for fast testing for tips on that.
If you must use a ramdisk / tempfs for some reason, initdb a whole new postgres instance there when you bring up your system. You'll get better results and it's safer.

Thanks for that link.
I don't really need the ramdrive, i just eager for performance increase on running my tests.
Turning off fsync=off and full_page_writes does not affect performance much:
python manage.py test -v3 --noinput 192,49s user 0,69s system 92% cpu 3:28,44 total
vs
python manage.py test -v3 --noinput 200,73s user 0,71s system 94% cpu 3:32,38 total
I think actually it's my ssd driver that seems to get slower and slower? In the post about ramdrive there is a simple performance check, when i run on against my ssd:
dd if=/dev/zero \
of=/tmp/benchmark \
conv=fdatasync \
bs=4k \
count=100000 \
&& rm -f /tmp/benchmark
409600000 bytes (410 MB) copied, 11,6737 s, 35,1 MB/s
versus the same test against the ram driver:
409600000 bytes (410 MB) copied, 0,16099 s, 2,5 GB/s
Actually i expect more than 35MB/s....
Could also be ubuntu/driver issue maybe?
Paul

Linux - Detecting idleness

I need to detect when a computer is idle for a certain time period. My definition of idleness is:
No users logged in, either by remote methods or on the local machine
X server inactivity, with no movement of mouse or key presses
TTY keyboard inactivity (hopefully)
Since the majority of distros have now moved to logind, I should be able to use its DBUS interface to find out if users are logged in, and also to monitor logins/logouts. I have used xautolock to detect X idleness before, and I could continue using that, but xscreensaver is also available. Preferably however I want to move away from any specific dependencies like the screensaver due to different desktop environments using different components.
Ideally, I would also be able to base idleness on TTY keyboard inactivity, however this isn't my biggest concern. According to this answer, I should be able to directly query the /dev/input/* interfaces, however I have no clue how to go about this.
My previous attempts at making such a monitor have used Bash, due to the ease of changing a plain text script file, howver I am happy using C++ in case more advanced methods are required to accomplish this.

From a purely shell standpoint (since you tagged this bash), you can get really close to what you want.
#!/bin/sh
users_are_logged_in() {
who |grep -q .
return $?
}
x_is_blanked() {
local DISPLAY=:0
if xscreensaver-command -time |grep -q 'screen blanked'; then
return 0 # we found a blanked xscreensaver: return true
fi
# no blanked xscreensaver. Look for DPMS modes
xset -q |awk '
/DPMS is Enabled/ { dpms = 1 } # DPMS is enabled
/Monitor is On$/ { monitor = 1 } # The monitor is on
END { if(dpms && !monitor) { exit 0 } else { exit 1 } }'
return $? # true when DPMS is enabled and the monitor is not on
}
nobody_here() {
! users_are_logged_in && x_is_blanked
return $?
}
if nobody_here; then
sleep 2m
if nobody_here; then
# ...
fi
fi
This assumes that a user can log in in two minutes and that otherwise, there is no TTY keyboard activity.
You should verify that the who |grep works on your system (i.e. no headers). I had originally grepped for / but then it won't work on FreeBSD. If who has headers, maybe try [ $(who |grep -c .) -gt 1 ] which will tell you that the number of lines that who outputs is more than one.
I share your worry about the screensaver part; xscreensaver likely isn't running in the login manager (any other form of X would involve a user logged in, which who would detect), e.g. GDM uses gnome-screensaver, whose syntax would be slightly different. The DPMS part may be good enough, giving a far larger buffer for graphical logins than the two minutes for console login.
Using return $? in the last line of a function is redundant. I used it to clarify that we're actually using the return value from the previous line. nobody_here short circuits, so if no users are logged in, there is no need to run the more expensive check for the status of X.
Side note: Be careful about using the term "idle" as it more typically refers to resource (hardware, that is) consumption (e.g. CPU load). See the uptime command for load averages for the most common way of determining system (resource) idleness. (This is why I named my function nobody_here instead of e.g. is_idle)

Gameboy emulator testing strategies?

I'm writing a gameboy emulator, and am struggling with making sure opcodes are emulated correctly. Certain operations set flag registers, and it can be hard to track whether the flag is set correctly, and where.
I want to write some sort of testing framework, but thought it'd be worth asking here for some help. Currently I see a few options:
Unit test each and every opcode with several test cases. Issues are there are 256 8 bit opcodes and 50+ (can't remember exact number) 16 bit opcodes. This would take a long time to do properly.
Write some sort of logging framework that logs a stacktrace at each operation and compares it to other established emulators. This would be pretty quick to do, and allows a fairly rapid overview of what exactly went wrong. The log file would look a bit like this:
...
PC = 212 Just executed opcode 7c - Register: AF: 5 30 BC: 0 13 HL: 5 ce DE: 1 cd SP: ffad
PC = 213 Just executed opcode 12 - Register: AF: 5 30 BC: 0 13 HL: 5 ce DE: 1 cd SP: ffad
...
Cons are I need to modify the source of another emulator to output the same form. And there's no guarantee the opcode is correct as it assumes the other emulator is.
What else should I consider?
Here is my code if it helps: https://github.com/dbousamra/scalagb

You could use already established test roms. I would recommend Blargg's test roms. You can get them from here: http://gbdev.gg8.se/files/roms/blargg-gb-tests/.

To me the best idea is the one you already mentioned:
take an existing emulator that is well known and you have the source code. let's call it master emulator
take some ROM that you can use to test
test these ROMs in the emulator that is known to work well.
modify the master emulator so it produces log while it is running for each opcode that it executes.
do the same in your own emulator
compare the output
I think this one has more advantage:
you will have the log file from a good emulator
the outcome of the test can be evaluated much faster
you can use more than one emulator
you can go deeper later like putting memory to the log and see the differences between the two implementations.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js