There is a method in our codebase which used to work fine, but not any more(without any modification to this method):
void XXX::setCSVFileName()
{
//get current working directory
char the_path[1024];
getcwd(the_path, 1023);
printf("current dir: %s \n",the_path);
std::string currentPath(the_path);
std::string currentPathTmp = currentPath + "/tmp_"+pathSetParam->pathSetTravelTimeTmpTableName;
std::string cmd = "mkdir -p "+currentPathTmp;
if (system(cmd.c_str()) == 0) // stops here
{
csvFileName = currentPathTmp+"/"+pathSetParam->pathSetTravelTimeTmpTableName + ".csv";
}
//...
}
I tried to debug it and found the culprit line to be if (system(cmd.c_str()) == 0) . I put a breakpoint on that line and tried to step over it. it just stays there.
The value of cmd as debugger shows is:
Details:{static npos = , _M_dataplus =
{> = {<__gnu_cxx::new_allocator> = {}, }, _M_p = 0x306ae9e78 "mkdir -p
/home/fm-simmobility/vahid/simmobility/dev/Basic/tmp_xuyan_pathset_exp_dy_traveltime_tmp"}}
I dont know what the system is doing but my application in top shows around 100% cpu usage.
Have you ever hit such a situation?
IMPORTANT UPDATE
As usual, I started reverting changes in my code one-by-one back to the state prior to the problem. Surprisingly, I found the problem(but not the solution....yet).
I added -pg to my compilation options to enable gprof. and that is what caused the issue.
May be you have some knowledge of why gropf doesn't line system() or mkdir ??
thanks
You said in a comment on your other question that you needed to use gprof to support the results generated by your own profiler.
In other words, you want to write a profiler, and compare it to gprof, and you're questioning if the -pg flag is making system hang.
I'm saying forget about the -pg flag. All that does is put call-counting code for gprof in the functions the compiler sees.
If I were you I would find something better to compare your profiler to.
Remember the typical reason why people use a profiler is to find speedups,
and they may think collecting measurements will help them do that.
It doesn't.
What it does instead is convince them there are no speedups to be found.
(They ask questions like "new is taking 5% of the time, and that's my bottleneck, how can I speed it up?")
That's what gprof has done for us.
Here's a table of profiler features, from poor to better to best:
gprof perf zoom pausing
samples program counter | X | X | X | X |
show self % by function | X | X | X | X |
show inclusive % by function | | X | X | X |
samples stack | | X | X | X |
detects extra calls | | X | X | X |
show self % by line | | X | X | X |
show inclusive % by line | | ? | X | X |
handles recursion properly | | ? | X | X |
samples on wall-clock time | | | X | X |
let you examine samples | | | | X |
The reason these are important is that speedups are really good at hiding from profilers:
If % by line not shown, speedup may be anywhere in a large function.
If inclusive % not shown, extraneous calls are not seen.
If samples not taken on wall-clock time, extraneous I/O or blocking not seen.
If hot-path is shown, speedups can hide on either side of it.
If call-graph is shown, speedups can hide in it by not being localized to A calls B, such as by a "tunnel" function.
If flame-graph is shown, speedups can hide in it by not aggregating samples that could be removed.
But they can't hide from simply examining stack samples.
P.S. Here are some examples of how speedups can hide from profilers.
If the profiler shows a "hot-path", it only shows a small subset of the stack samples, so it can only show small problems.
But there could be a large problem that would be evident if only comparing stack samples for similarity, not equality:
Speedups can also hide in call graphs, as in this case the fact that A1 always calls C2 and A2 always calls C1 is obscured by the "tunnel function" B (which might be multiple layers).
The call stacks are shown on the right, and a human recognizes the pattern easily:
In this case, the fact that A always calls C is obscured by A calling any of a number of Bi functions (possibly over multiple layers) that then call C.
Again, the pattern is easily recognized in call stacks:
Another way is if the stack samples show that a lot of time is spent calling functions that have the same name but belong to different classes (and are therefore different functions), or have different names but are related by a similar purpose.
In a profiler these conspire to divide the time into small amounts, telling you there is nothing big going on.
That's a consequence of people "looking for slow functions" which is actually a form of blinders.
Related
What does the error ANOMALY: meaningless REX prefix used mean? I have googled and all information I got was completly random that it is related to java or avg or minecraft (because of java).
However, I got this error in the console output of my Visual Studio console application after I merged several branches of my c++ opengl 4.0 graphics engine and it suddenly popped up. I might have updated the AMD graphics driver between the time points I have written them, so this could be one source. After the error popped up also the depth buffer test was suddenly disabled.
After using clean and rebuild in visual studio the error is gone now, I therefore do not need help in fixing the error but I would like to know what it means and what in general causes this error. It makes me curious as I have not found ANYTHING useful searching for this error.
Myria in the comments said:
It's referring to an x86-64 assembly instruction using a REX prefix byte when it didn't need to
To expand upon this, REX prefixes are ignored in a few different scenarios.
If the ModR/M field specifies other registers or an extended opcode.
If more than 1 REX prefix is used in an instruction (though I read on osdev.org this is undefined
If general formatting isn't followed. For example the REX prefix must precede the opcode or escape opcode byte unless being used in conjunction with a mandatory prefix. In which case REX can be right after the opcode/escape byte.
If you try to use the single byte form of INC/DEC in 64 bit mode.
Looks like this ANOMALY message displays in a variety of contexts from git to Java related programs (maybe the one you are referencing) in which a new driver seems to have been the problem. The culprit: Raptr, which comes with AMD's Radeon drivers. In the Java post someone reported using SAPPHIRE Radeon HD 5850 and on the next site I'll link you to, one person was using AMD R9 390 and another the 380. In this context someone saw the message on the console of their 64-bit Win7 sys. Now this person's site took me through a hook Raptr was using (which connects to the opengl32.dll) called mhook, I started digging through this 'Windows API hooking library' and found this starting on line 1230:
assert(X86Instruction->AddressSize >= 4);
if (X86Instruction->rex.w)
{
X86Instruction->OperandSize = 8;
X86Instruction->HasOperandSizePrefix = FALSE;
}
else if (X86Instruction->HasOperandSizePrefix)
{
assert(X86Instruction->OperandSize == 2);
}
else if (X86Instruction->rex_b == REX_PREFIX_START)
{
if (!Instruction->AnomalyOccurred)
{
if (!SuppressErrors) printf("[0x%08I64X] ANOMALY: meaningless REX prefix used\n", VIRTUAL_ADDRESS);
Instruction->AnomalyOccurred = TRUE;
}
X86Instruction->rex_b = 0;
}
To summarize, this ANOMALY message occurs when software handles a REX prefix ignore, like this Windows API library does.
So there you have it, you were in all the right places. The mhook library even has a long list of Visual Studio files to ignore.
additional note*
I found this comment from the os2museum site a good clue to this whole mystery
The Windows amd64 ABI requires that the first opcode of a function be at least 2 bytes in length. (I think this is so the function can be hotpatched.) Many times the first instruction is “push ” but the instruction has a 1-byte encoding! To comply with the ABI, a rex prefix is added to the instruction, making it 2 bytes — “rex push rbp” or “rex push rbx” or whatever. The compiler does this for you, but if you are writing a function in assembler, you need to remember the rule.
Other fun error messages (just a few of many!) in this particular hook library include
ANOMALY: Meaningless segment override
ANOMALY: REX prefix before legacy prefix 0x%02X\n
ANOMALY: Conflicting prefix\n
ANOMALY: Reached maximum prefix count %d\n
and my favorite:
ANOMALY: branch into the middle of an instruction\n
And just because I can't help myself, it might be worth noting these are the instructions that default to 64-bit operands:
+--------------+------------+-------------+
| CALL (near) | ENTER | Jcc |
+--------------+------------+-------------+
| JrCXZ | JMP (near) | LEAVE |
+--------------+------------+-------------+
| LGDT | LIDT | LLDT |
+--------------+------------+-------------+
| LOOP | LOOPcc | LTR |
+--------------+------------+-------------+
| MOV CR(n) | MOV DR(n) | POP reg/mem |
+--------------+------------+-------------+
| POP reg | POP FS | POP GS |
+--------------+------------+-------------+
| POPFQ | PUSH imm8 | PUSH imm32 |
+--------------+------------+-------------+
| PUSH reg/mem | PUSH reg | PUSH FS |
+--------------+------------+-------------+
| PUSH GS | PUSHFQ | RET (near) |
+--------------+------------+-------------+
I am teaching a class to intro C++ students and I want to design a lab that shows how recursive functions differ from iteration. My idea was to track the memory/call stack usage for both and display the difference. I was almost positive that I had done something similar when I did my degree, but can't remember. My experience doesn't lie in C/C++ so any guidance would be appreciated.
Update 1:
I believe I may have miss represented my task. I had hoped to find a way to show how recursion increases the overhead/stack compared to iteration. I followed some suggested links and came up with the following script.
loops=100
counter=0
total1=0
echo "Iteration"
while [ $counter -lt $loops ]; do
"$1" & # Run the given command line in the background.
pid=$! peak1=0
echo -e "$counter.\c"
while true; do
#sleep 0.1
sample="$(pmap $pid | tail -n1 | sed 's/[^0-9]*//g' 2> /dev/null)" || break
if [ -z "$sample" ]; then
break
fi
let peak1='sample > peak1 ? sample : peak1'
done
# echo "Peak: $peak1" 1>&2
total1=$(expr $total1 + $peak1)
counter=$[$counter+1]
done
The program implements a binary search with either iteration or recursion. The idea is to get the average memory use and compare it to the Recursion version of the same program. This does not work as the iteration version often has a larger memory average than the recursion, which doesn't show to my students that recursion has drawbacks. Therefore I am pretty sure I am doing something incorrect.
Is pmap not going to provide me with what I want?
Something like this I think
void recursive(int* ptop) {
int dummy = 0;
printf("stack size %d\n",&dummy - ptop);
recursive(ptop);
}
void start() {
int dummy = 0;
recursive(&dummy);
}
until it will crash.
On any platform that knows them (Linux), backtrace(3), or even better backtrace_symbols(3) and their companions should be of great help.
I have a code base (mostly C++) which is well tested and crash free. Mostly. A part of the code -- which is irreplaceable, hard to maintain or improve and links against a binary-only library* -- causes all crashes. These to not happen often, but when they do, the entire program crashes.
+----------------------+
| Shiny new sane |
| code base |
| |
| +-----------------+ | If the legacy code crashes,
| | | | the entire program does, too.
| | Legacy Code | |
| | * Crash prone * | |
| | int abc(data) | |
| +-----------------+ |
| |
+----------------------+
Is it possible to extract that part of the code into a separate program, start that from the main program, move the data between these programs (on Linux, OS X and, if possible, Windows), tolerate crashes in the child process and restart the child? Something like this:
+----------------+ // start,
| Shiny new sane | ------. // re-start on crash
| code base | | // and
| | v // input data
| | +-----------------+
| return | | |
| results <-------- | Legacy Code |
+----------------+ | * Crash prone * |
| int abc(data) |
(or not results +-----------------+
because abc crashed)
Ideally the communication would be fast enough so that the synchronous call to int abc(char *data) can be replaced transparently with a wrapper (assuming the non-crash case). And because of slight memory leaks, the legacy program should be restarted every hour or so. Crashes are deterministic, so bad input data should not be sent twice.
The code base is C++11 and C, notable external libraries are Qt and boost. It runs on Linux, OSX and Windows.
--
*: some of the crashes/leaks stem from this library which has no source code available.
Well, if I were you, I wouldn't start from here ...
However, you are where you are. Yes, you can do it. You are going to have to serialize your input arguments, send them, deserialize them in the child process, run the function, serialize the outputs, return them, and then deserialize them. Boost will have lots of useful code to help with this (see asio).
Global variables will make life much more "interesting". Does the legacy code use Qt? - that probably won't like being split into two processes.
If you were using Windows only, I would say "use DCOM" - it makes this very simple.
Restarting is simple enough if the legacy is only used from one thread (the code which handles "return" just looks to see if it needs to restart, and kills the processes.) If you have multiple threads, then the shiny code will need to check if a restart is required, block any further threads, wait until all calls have returned, restart the process, and then unblock everything.
Boost::interprocess looks to have everything you need for the communication - it's got shared memory, mutexes, and condition variables. Boost::serialization will do the job for marshalling and unmarshalling.
I have been trying to fully understand how the memory works in a program that I write. I do not have any specific code just generally to get a better understanding of the memory address and contents. Does anyone know of a way that I can step through a given program and view the memory line by line?
for example:
int main(){
int x;
x = 5;
}
After line 1 the table would look like this:
Address | Contents | Variable
AB68 9034 | FF847392(Garbage) | x
After line 2 the table would look like this:
AB68 9034 | 5 | x
I want to look at code more complicated than this sample and I could write a function to print the memory after each part of the code if there is not. I am just curious if there is something built in that would do this automatically?
I would like to change my stack size to allow a project with many non-tail-recursive functions to run on larger data. To do so, I tried to set OCAMLRUNPARAM="l=xxx" for varying values of xxx (in the range 0 through 10G), but it did not have any effect. Is setting OCAMLRUNPARAM even the right approach?
In case it is relevant: The project I am interested in is built using OCamlMakefile, target native-code.
Here is a minimal example where simply a large list is created without tail recursion. To quickly check whether the setting of OCAMLRUNPARAM has an effect, I compiled the program stacktest.ml:
let rec create l =
match l with
| 0 -> []
| _ -> "00"::(create (l-1))
let l = create (int_of_string (Sys.argv.(1)))
let _ = print_endline("List of size " ^ string_of_int (List.length l) ^ " created.")
using the command
ocamlbuild stacktest.native
and found out roughly at which length of the list a stack overflow occurs by (more or less) binary search with the following bash script foo.sh:
#!/bin/bash
export OCAMLRUNPARAM="l=$1"
increment=1000000
length=1
while [[ $increment > 0 ]] ; do
while [[ $(./stacktest.native $length) ]]; do
length=$(($length+$increment))
done
length=$(($length-$increment))
increment=$(($increment/2))
length=$(($length+$increment))
done
length=$(($length-$increment))
echo "Largest list without overflow: $length"
echo $OCAMLRUNPARAM
The results vary between runs of this script (and the intermediate results are not even consistent within one run, but let's ignore that for now), but they are similar no matter whether I call
bash foo.sh 1
or
bash foo.sh 1G
i.e. whether the stack size is set to 1 or 2^30 words.
Changing the stack limit via OCAMLRUNPARAM works only for bytecode executables, that are run by the OCaml interpreter. A native program is handled by an operating system and executed directly on CPU. Thus, in order to change the stack limit, you need to use facilities, provided by your operating system.
For example, on Linux there is the ulimit command that handles many process parameters, including the stack limit. Add the following to your script
ulimit -s $1
And you will see that the result is changing.