I have a situation in which I need to interpose function calls made by a third-party static library to an iOS system framework (a shared library). Chances of cost-effective or timely maintenance support from the vendor of the static library are slim to non-existent.
In effect changing the calling sequence from:
+-------------+ +-------------+ +---------------------+
| Application | ---> | libVendor.a | ----> | FrameworkA (.dylib) |
+-------------+ +-------------+ +---------------------+
to:
+-------------+ +-------------+ +------------+ +---------------------+
| Application | ---> | libVendor.a | ----> | Interposer | ----> | FrameworkA (.dylib) |
+-------------+ +-------------+ +------------+ +---------------------+
The orthodox solutions to this problem is either to persuade the run-time linker to load a different library in place of FrameworkA or loading libraries with dlopen(). Neither of these are options on iOS.
One solution that does work is using sed to rename symbols in the the symbol table of libVendor.a.
Suppose I want to interpose calls FrameworkA_Foo() in FrameworkA made by functions in libVendor.a:
sed s/FrameworkA_Foo/InterposeA_Foo/g < libVendor.a > libInterposedVendor.a
And then interpose it with:
void InterposeA_Foo()
{
// do some stuff here
// ....
// then (maybe) forward
FrameworkA_Foo();
}
This works just so long as the length of the symbol names remains the same.
Whilst this approach works, it lacks elegance and feels fairly hacky. Do better solutions exist?
Other approaches considered:
Changing link order in order to get the interposing function to link to libVendor.a rather than the framework: Apple's linkers (unlike those on most UNIX platforms) recursively resolve symbols in libraries, and the order they are presented on the command line makes little difference)
Linker scripts: Not supported by lld
mach-o object-file editing tools: Found nothing that worked
[For clarity, this is C (rather than Objective-c) API and the toolchain is clang/LLVM. Using an Apple-supplied GCC is not an option due deprecation and lack of C++11 support]
Related
thanks for reply. I edit my question.
I want to distinguish whether the rpc_address is host or remote node, so I add some information in evet_stats.cc:
const auto &rpc_address = CoreWorkerProcess::GetCoreWorker().GetRpcAddress();
RAY_LOG(INFO) << rpc_address.SerializeAsString() << "\n\n";
then add core_worker_lib in the ray_common's dependency in BUILD.bzael, but I find the
ERROR: /home/Ray/ray/BUILD.bazel:2028:11: in cc_library rule //:gcs_client_lib: cycle in dependency graph:
//:ray_pkg
//:cp_raylet
//:raylet
//:raylet_lib
.-> //:gcs_client_lib
| //:gcs_service_rpc
| //:pubsub_lib
| //:pubsub_rpc
| //:grpc_common_lib
| //:ray_common
| //:core_worker_lib
`-- //:gcs_client_lib
so, MY question is:
how can I use CoreWorkerProcess in event_stat.cc ?
when I change the code of event_stat.cc, I use " pip3 install -e . --verbose" to recompile project, but it is too slow. Is there a another way to fast recompile when altering the cpp code of ray ?
I'm looking for the most portable and most organized way to include headers in C++. I'm making a game, and right now, my project structure looks like this:
game
| util
| | foo.cpp
| | foo.h
| ...
game-client
| main.cpp
| graphics
| | gfx.cpp
| | gfx.h
| ...
game-server
| main.cpp
| ...
Say I want to include foo.h from gfx.cpp. As far as I know, there are 3 ways to do this:
#include "../../game/util/foo.h. This is what I'm currently doing, but it gets messier the deeper into the folder structure I go.
#include "foo.h". My editor (Xcode) compiles fine with just this, but I'm not sure about other compilers.
#include "game/util/foo.h", and adding the base directory to the include path.
Which one is the best? (most portable, most organized, scales the best with many folders, etc.)
I found the below approach most useful when you are dealing with a large code base.
Public headers
module_name/include/module_name/public_header.hpp
module_name/include/module_name/my_class.hpp
...
Private headers and source
module_name/src/something_private.cpp
module_name/src/something_private.hpp
module_name/src/my_class.cpp
Notes:
module_name is repeated to ensure that the module name is provided while including a public header from this library/module.
This improves the readability and also avoids extra time spent to
find the location of the header when same name is used in multiple
modules.
What does the error ANOMALY: meaningless REX prefix used mean? I have googled and all information I got was completly random that it is related to java or avg or minecraft (because of java).
However, I got this error in the console output of my Visual Studio console application after I merged several branches of my c++ opengl 4.0 graphics engine and it suddenly popped up. I might have updated the AMD graphics driver between the time points I have written them, so this could be one source. After the error popped up also the depth buffer test was suddenly disabled.
After using clean and rebuild in visual studio the error is gone now, I therefore do not need help in fixing the error but I would like to know what it means and what in general causes this error. It makes me curious as I have not found ANYTHING useful searching for this error.
Myria in the comments said:
It's referring to an x86-64 assembly instruction using a REX prefix byte when it didn't need to
To expand upon this, REX prefixes are ignored in a few different scenarios.
If the ModR/M field specifies other registers or an extended opcode.
If more than 1 REX prefix is used in an instruction (though I read on osdev.org this is undefined
If general formatting isn't followed. For example the REX prefix must precede the opcode or escape opcode byte unless being used in conjunction with a mandatory prefix. In which case REX can be right after the opcode/escape byte.
If you try to use the single byte form of INC/DEC in 64 bit mode.
Looks like this ANOMALY message displays in a variety of contexts from git to Java related programs (maybe the one you are referencing) in which a new driver seems to have been the problem. The culprit: Raptr, which comes with AMD's Radeon drivers. In the Java post someone reported using SAPPHIRE Radeon HD 5850 and on the next site I'll link you to, one person was using AMD R9 390 and another the 380. In this context someone saw the message on the console of their 64-bit Win7 sys. Now this person's site took me through a hook Raptr was using (which connects to the opengl32.dll) called mhook, I started digging through this 'Windows API hooking library' and found this starting on line 1230:
assert(X86Instruction->AddressSize >= 4);
if (X86Instruction->rex.w)
{
X86Instruction->OperandSize = 8;
X86Instruction->HasOperandSizePrefix = FALSE;
}
else if (X86Instruction->HasOperandSizePrefix)
{
assert(X86Instruction->OperandSize == 2);
}
else if (X86Instruction->rex_b == REX_PREFIX_START)
{
if (!Instruction->AnomalyOccurred)
{
if (!SuppressErrors) printf("[0x%08I64X] ANOMALY: meaningless REX prefix used\n", VIRTUAL_ADDRESS);
Instruction->AnomalyOccurred = TRUE;
}
X86Instruction->rex_b = 0;
}
To summarize, this ANOMALY message occurs when software handles a REX prefix ignore, like this Windows API library does.
So there you have it, you were in all the right places. The mhook library even has a long list of Visual Studio files to ignore.
additional note*
I found this comment from the os2museum site a good clue to this whole mystery
The Windows amd64 ABI requires that the first opcode of a function be at least 2 bytes in length. (I think this is so the function can be hotpatched.) Many times the first instruction is “push ” but the instruction has a 1-byte encoding! To comply with the ABI, a rex prefix is added to the instruction, making it 2 bytes — “rex push rbp” or “rex push rbx” or whatever. The compiler does this for you, but if you are writing a function in assembler, you need to remember the rule.
Other fun error messages (just a few of many!) in this particular hook library include
ANOMALY: Meaningless segment override
ANOMALY: REX prefix before legacy prefix 0x%02X\n
ANOMALY: Conflicting prefix\n
ANOMALY: Reached maximum prefix count %d\n
and my favorite:
ANOMALY: branch into the middle of an instruction\n
And just because I can't help myself, it might be worth noting these are the instructions that default to 64-bit operands:
+--------------+------------+-------------+
| CALL (near) | ENTER | Jcc |
+--------------+------------+-------------+
| JrCXZ | JMP (near) | LEAVE |
+--------------+------------+-------------+
| LGDT | LIDT | LLDT |
+--------------+------------+-------------+
| LOOP | LOOPcc | LTR |
+--------------+------------+-------------+
| MOV CR(n) | MOV DR(n) | POP reg/mem |
+--------------+------------+-------------+
| POP reg | POP FS | POP GS |
+--------------+------------+-------------+
| POPFQ | PUSH imm8 | PUSH imm32 |
+--------------+------------+-------------+
| PUSH reg/mem | PUSH reg | PUSH FS |
+--------------+------------+-------------+
| PUSH GS | PUSHFQ | RET (near) |
+--------------+------------+-------------+
I have a code base (mostly C++) which is well tested and crash free. Mostly. A part of the code -- which is irreplaceable, hard to maintain or improve and links against a binary-only library* -- causes all crashes. These to not happen often, but when they do, the entire program crashes.
+----------------------+
| Shiny new sane |
| code base |
| |
| +-----------------+ | If the legacy code crashes,
| | | | the entire program does, too.
| | Legacy Code | |
| | * Crash prone * | |
| | int abc(data) | |
| +-----------------+ |
| |
+----------------------+
Is it possible to extract that part of the code into a separate program, start that from the main program, move the data between these programs (on Linux, OS X and, if possible, Windows), tolerate crashes in the child process and restart the child? Something like this:
+----------------+ // start,
| Shiny new sane | ------. // re-start on crash
| code base | | // and
| | v // input data
| | +-----------------+
| return | | |
| results <-------- | Legacy Code |
+----------------+ | * Crash prone * |
| int abc(data) |
(or not results +-----------------+
because abc crashed)
Ideally the communication would be fast enough so that the synchronous call to int abc(char *data) can be replaced transparently with a wrapper (assuming the non-crash case). And because of slight memory leaks, the legacy program should be restarted every hour or so. Crashes are deterministic, so bad input data should not be sent twice.
The code base is C++11 and C, notable external libraries are Qt and boost. It runs on Linux, OSX and Windows.
--
*: some of the crashes/leaks stem from this library which has no source code available.
Well, if I were you, I wouldn't start from here ...
However, you are where you are. Yes, you can do it. You are going to have to serialize your input arguments, send them, deserialize them in the child process, run the function, serialize the outputs, return them, and then deserialize them. Boost will have lots of useful code to help with this (see asio).
Global variables will make life much more "interesting". Does the legacy code use Qt? - that probably won't like being split into two processes.
If you were using Windows only, I would say "use DCOM" - it makes this very simple.
Restarting is simple enough if the legacy is only used from one thread (the code which handles "return" just looks to see if it needs to restart, and kills the processes.) If you have multiple threads, then the shiny code will need to check if a restart is required, block any further threads, wait until all calls have returned, restart the process, and then unblock everything.
Boost::interprocess looks to have everything you need for the communication - it's got shared memory, mutexes, and condition variables. Boost::serialization will do the job for marshalling and unmarshalling.
There is a method in our codebase which used to work fine, but not any more(without any modification to this method):
void XXX::setCSVFileName()
{
//get current working directory
char the_path[1024];
getcwd(the_path, 1023);
printf("current dir: %s \n",the_path);
std::string currentPath(the_path);
std::string currentPathTmp = currentPath + "/tmp_"+pathSetParam->pathSetTravelTimeTmpTableName;
std::string cmd = "mkdir -p "+currentPathTmp;
if (system(cmd.c_str()) == 0) // stops here
{
csvFileName = currentPathTmp+"/"+pathSetParam->pathSetTravelTimeTmpTableName + ".csv";
}
//...
}
I tried to debug it and found the culprit line to be if (system(cmd.c_str()) == 0) . I put a breakpoint on that line and tried to step over it. it just stays there.
The value of cmd as debugger shows is:
Details:{static npos = , _M_dataplus =
{> = {<__gnu_cxx::new_allocator> = {}, }, _M_p = 0x306ae9e78 "mkdir -p
/home/fm-simmobility/vahid/simmobility/dev/Basic/tmp_xuyan_pathset_exp_dy_traveltime_tmp"}}
I dont know what the system is doing but my application in top shows around 100% cpu usage.
Have you ever hit such a situation?
IMPORTANT UPDATE
As usual, I started reverting changes in my code one-by-one back to the state prior to the problem. Surprisingly, I found the problem(but not the solution....yet).
I added -pg to my compilation options to enable gprof. and that is what caused the issue.
May be you have some knowledge of why gropf doesn't line system() or mkdir ??
thanks
You said in a comment on your other question that you needed to use gprof to support the results generated by your own profiler.
In other words, you want to write a profiler, and compare it to gprof, and you're questioning if the -pg flag is making system hang.
I'm saying forget about the -pg flag. All that does is put call-counting code for gprof in the functions the compiler sees.
If I were you I would find something better to compare your profiler to.
Remember the typical reason why people use a profiler is to find speedups,
and they may think collecting measurements will help them do that.
It doesn't.
What it does instead is convince them there are no speedups to be found.
(They ask questions like "new is taking 5% of the time, and that's my bottleneck, how can I speed it up?")
That's what gprof has done for us.
Here's a table of profiler features, from poor to better to best:
gprof perf zoom pausing
samples program counter | X | X | X | X |
show self % by function | X | X | X | X |
show inclusive % by function | | X | X | X |
samples stack | | X | X | X |
detects extra calls | | X | X | X |
show self % by line | | X | X | X |
show inclusive % by line | | ? | X | X |
handles recursion properly | | ? | X | X |
samples on wall-clock time | | | X | X |
let you examine samples | | | | X |
The reason these are important is that speedups are really good at hiding from profilers:
If % by line not shown, speedup may be anywhere in a large function.
If inclusive % not shown, extraneous calls are not seen.
If samples not taken on wall-clock time, extraneous I/O or blocking not seen.
If hot-path is shown, speedups can hide on either side of it.
If call-graph is shown, speedups can hide in it by not being localized to A calls B, such as by a "tunnel" function.
If flame-graph is shown, speedups can hide in it by not aggregating samples that could be removed.
But they can't hide from simply examining stack samples.
P.S. Here are some examples of how speedups can hide from profilers.
If the profiler shows a "hot-path", it only shows a small subset of the stack samples, so it can only show small problems.
But there could be a large problem that would be evident if only comparing stack samples for similarity, not equality:
Speedups can also hide in call graphs, as in this case the fact that A1 always calls C2 and A2 always calls C1 is obscured by the "tunnel function" B (which might be multiple layers).
The call stacks are shown on the right, and a human recognizes the pattern easily:
In this case, the fact that A always calls C is obscured by A calling any of a number of Bi functions (possibly over multiple layers) that then call C.
Again, the pattern is easily recognized in call stacks:
Another way is if the stack samples show that a lot of time is spent calling functions that have the same name but belong to different classes (and are therefore different functions), or have different names but are related by a similar purpose.
In a profiler these conspire to divide the time into small amounts, telling you there is nothing big going on.
That's a consequence of people "looking for slow functions" which is actually a form of blinders.