clang++ Name mangling [duplicate] - c++

I have c++filt command to demangle a symbol, what is the tool to do the opposite and mangle a symbol name?
This would be useful if I were to want to call dlsym() on a mangled C++ function name. I'd rather not hard code the name mangling in the code since it could change over time due to new complier versions or new compiler brands being used or at present due to compiling for multiple platforms.
Is there a programatic way to get the string that represents a C++ function at runtime so that the code is compiler independent? One way to possibly do this would be to call a utility at compile time that performs the name mangling for the compiler being used and inserts the appropriate mangled C++ symbol name into a string for dlsym() to use.
Here is the closest to a solution I've found on this site which is accomplished by using a fixed C style name to indirect to C++ symbols that are defined in the library you wish to dlsym(), but if you do not have control over what that library provides, this is not an option.

That's how g++ mangles names. You might implement those mangling rules on your program.
Another (crazy) solution would be to list all of the symbols in the library you want to use (it's not so difficult if you understand the format), demangle them all, and search your function's name in that list. The advantage with this method is that demangling is easier, as there is a function call to do it: abi::__cxa_demangle, from cxxabi.h header.

You may be able to get what you want by looking at the symbol table
of the .so you are looking at: Someone else answered this already
Returning a shared library symbol table.
However, if there are too many symbols ... that may not work.
So here's a crazy idea. Caveat emptor!
A potential solution is to:
create a file with a stub with exactly one name: the name you want: void myfunction() { }
compile that file (with -fPIC and -shared so it's a dynamic library)
call dlopen/dlsym on that particular file
Iterate through the symbols (there should just be only the one want plus other regular junk you can filter). Iterating through the symbols is clumsy, but you can do it:
Returning a shared library symbol table
dlclose() to free it up (lose the stub out of your symbols)
Open the file you want with dlopen
Basically, you would invoke the compiler from your code, it would create
a .so you could look at, get the only value out, then unload that .so
so you could load in the one you want.
It's crazy.

Name mangling is implementation specific.
There is no standard for name mangling so your best bet is to find a compiler to do it for you.
Name mangling
There is a table here that may help you if you wish to do this manually

An easier method than the first posted.
Write a little C++ program like:
#include <stdlib.h>
extern int doit(const char *toto, bool is);
int main(int argc, char *argv[])
{
exit(doit (argv[0], true));
}
Build it with
# g++ -S test.cpp
And extract symbol name from assembler source
# cat test.s | grep call | grep doit | awk '{print $2}'
You get:
rcoscali#srjlx0001:/tmp/TestC++$ cat test.s | grep call | grep doit | awk '{print $2}'
_Z4doitPKcb
rcoscali#srjlx0001:/tmp/TestC++$
The doit symbol mangled is _Z4doitPKcb
Use the compiler you plan to use because each compiler have its own name mangling rules (as it has been said before from one compiler to another these rules may change).
Have fun !

If you're using g++ on x86 or ARM then you can try this one(ish)-liner:
echo "<your-type> <your-name>(<your-parameters>) {}" \
| g++ -x c++ - -o - -S -w \
| grep '^_' \
| sed 's/:$//'
g++ invokes the front-end for the cc1plusplus compiler.
g++ -x c++ says to interpret the input language as C++.
g++ -x c++ - says to get the input from the stdin (the piped echo).
g++ -x c++ - -o - says to output to the stdout (your display).
g++ -x c++ - -o - -S says to output assembler/assembly language.
g++ -x c++ - -o - -S -w says to silence all warnings from cc1plusplus.
This gives us the raw assembly code output.
For x86(_64) or ARM(v7/v8) machines, the mangled name in the assembly output will start at the beginning of a line, prefixed by an underscore (_) (typically _Z).
Notably, no other lines will begin this way, so lines beginning with an underscore are guaranteed to be a code object name.
grep '^_' says to filter the output down to only lines beginning with an underscore (_).
Now we have the mangled names (one on each line--depending on how many you echoed into g++).
However, all the names in the assembly are suffixed by a colon (:) character. We can remove it with the Stream-EDitor, sed.
sed 's/:$//' says to remove the colon (:) character at the end of each line.
Lastly, a couple of concrete examples, showing mangling and then demangling for you to use as reference (output from an x86 machine):
Example 1:
echo "int MyFunction(int x, char y) {}" \
| g++ -x c++ - -o - -S -w \
| grep '^_' \
| sed 's/:$//'
_Z10MyFunctionic # This is the output from the command pipeline
c++filt _Z10MyFunctionic
MyFunction(int, char) # This is the output from c++filt
Example 2:
echo \
"\
namespace YourSpace { int YourFunction(int, char); }
int YourSpace::YourFunction(int x, char y) {}
"\
| g++ -x c++ - -o - -S -w \
| grep '^_' \
| sed 's/:$//'
_ZN9YourSpace12YourFunctionEic # This is the output from the command pipeline
c++filt _ZN9YourSpace12YourFunctionEic
YourSpace::YourFunction(int, char) # This is the output from c++filt
I originally saw how to apply g++ to stdin in Romain Picard's article:
How To Mangle And Demangle A C++ Method Name
I think it's a good read.
Hope this helped you.
Additional Info:
Primary source: GNU <libstdc++> Manual: Chapter 28 Part 3: Demangling

Related

resolve name mangling without c++filt

I need to remove unused functions from a big C++ project. After reading a while I used this link: How can I know which parts in the code are never used?
I compile on RedHat using makefiles. I added to compiler the flags:
-Wall -Wconversion -ffunction-sections -fdata-sections
and to the linker the flags:
-Wl,-rpath,--demangle,--gc-sections,--print-gc-sections
For some annoying reason I receive the output after mangling even after using --demangle option. For example:
/usr/bin/ld: Removing unused section '.text._ZN8TRACABLED0Ev' in file 'CMakeFiles/oded.dir/oded.cpp.o'
/usr/bin/ld: Removing unused section '.text._ZN8TRACABLED1Ev' in file 'CMakeFiles/oded.dir/oded.cpp.o'
So I have 6000 function names I need to unmangle and I cannot use extern C.
I can write a script to parse it and use c++filt, but Im looking for a solution that will make the linker unmangle the function by itself!
Anyone knows if such a solution exist?
For some annoying reason I receive the output after mangling even after using --demangle option
From man ld:
--demangle[=style]
--no-demangle
These options control whether to demangle symbol names in
error messages and other output.
But these messages:
Removing unused section '.text._ZN8TRACABLED0Ev' in file
are not about symbol names. They are about section names, which just happen to sometimes include the symbol name. So this is working as documented.
Now, if you really wanted to do something about it, you could develop a linker patch to also demangle section names, and send it to GNU binutils maintainers.
But an easier option might be to simply pipe the messages you want to be demangled through c++filt. For example:
echo "Removing unused section '.text._ZN8TRACABLED0Ev' in file" |
sed -e 's/_ZN/ _ZN/' | c++filt
produces:
Removing unused section '.text. TRACABLE::~TRACABLE()' in file

How to print symbol list for .so file in OSX?

I have an .SO file (note, not .a, not .dylib and not .o) and I need to get symbol information from it on OSX.
I have tried
nm -gU lib.so
However, nothing is printed out.
I can't use otool because it's not an object file, and readelf does not exists on OSX. How do I get the symbol information?
Please note, that I am using this .so file in another project, and there is symbol information. I am able to load the library, and reference functions from it. However, I have yet to find a tool on OSX to let me print the symbol information from it.
As asked,
file lib.so
ELF 32-bit LSB shared object, ARM, version 1 (SYSV), dynamically linked, stripped
Try using c++filt piped from nm:
nm lib.so | c++filt -p -i
c++filt - Demangle C++ and Java symbols.
-p
--no-params
When demangling the name of a function, do not display the types of
the function's parameters.
-i
--no-verbose
Do not include implementation details (if any) in the demangled
output.
EDIT: Based upon the new (ARM) info provided in the question, try using symbols instead:
symbols lib.so -arch arm | awk '{print $4}'
I've used awk to simplify output; remove to output everything.
Manual page : Symbols
https://developer.apple.com/legacy/library/documentation/Darwin/Reference/ManPages/man1/nm.1.html
Nm displays the name list (symbol table) of each object file in the argument list. If an argument is
an archive, a listing for each object file in the archive will be produced. File can be of the form
libx.a(x.o), in which case only symbols from that member of the object file are listed. (The paren-
theses have to be quoted to get by the shell.) If no file is given, the symbols in a.out are listed.

What is Linux utility to mangle a C++ symbol name?

I have c++filt command to demangle a symbol, what is the tool to do the opposite and mangle a symbol name?
This would be useful if I were to want to call dlsym() on a mangled C++ function name. I'd rather not hard code the name mangling in the code since it could change over time due to new complier versions or new compiler brands being used or at present due to compiling for multiple platforms.
Is there a programatic way to get the string that represents a C++ function at runtime so that the code is compiler independent? One way to possibly do this would be to call a utility at compile time that performs the name mangling for the compiler being used and inserts the appropriate mangled C++ symbol name into a string for dlsym() to use.
Here is the closest to a solution I've found on this site which is accomplished by using a fixed C style name to indirect to C++ symbols that are defined in the library you wish to dlsym(), but if you do not have control over what that library provides, this is not an option.
That's how g++ mangles names. You might implement those mangling rules on your program.
Another (crazy) solution would be to list all of the symbols in the library you want to use (it's not so difficult if you understand the format), demangle them all, and search your function's name in that list. The advantage with this method is that demangling is easier, as there is a function call to do it: abi::__cxa_demangle, from cxxabi.h header.
You may be able to get what you want by looking at the symbol table
of the .so you are looking at: Someone else answered this already
Returning a shared library symbol table.
However, if there are too many symbols ... that may not work.
So here's a crazy idea. Caveat emptor!
A potential solution is to:
create a file with a stub with exactly one name: the name you want: void myfunction() { }
compile that file (with -fPIC and -shared so it's a dynamic library)
call dlopen/dlsym on that particular file
Iterate through the symbols (there should just be only the one want plus other regular junk you can filter). Iterating through the symbols is clumsy, but you can do it:
Returning a shared library symbol table
dlclose() to free it up (lose the stub out of your symbols)
Open the file you want with dlopen
Basically, you would invoke the compiler from your code, it would create
a .so you could look at, get the only value out, then unload that .so
so you could load in the one you want.
It's crazy.
Name mangling is implementation specific.
There is no standard for name mangling so your best bet is to find a compiler to do it for you.
Name mangling
There is a table here that may help you if you wish to do this manually
An easier method than the first posted.
Write a little C++ program like:
#include <stdlib.h>
extern int doit(const char *toto, bool is);
int main(int argc, char *argv[])
{
exit(doit (argv[0], true));
}
Build it with
# g++ -S test.cpp
And extract symbol name from assembler source
# cat test.s | grep call | grep doit | awk '{print $2}'
You get:
rcoscali#srjlx0001:/tmp/TestC++$ cat test.s | grep call | grep doit | awk '{print $2}'
_Z4doitPKcb
rcoscali#srjlx0001:/tmp/TestC++$
The doit symbol mangled is _Z4doitPKcb
Use the compiler you plan to use because each compiler have its own name mangling rules (as it has been said before from one compiler to another these rules may change).
Have fun !
If you're using g++ on x86 or ARM then you can try this one(ish)-liner:
echo "<your-type> <your-name>(<your-parameters>) {}" \
| g++ -x c++ - -o - -S -w \
| grep '^_' \
| sed 's/:$//'
g++ invokes the front-end for the cc1plusplus compiler.
g++ -x c++ says to interpret the input language as C++.
g++ -x c++ - says to get the input from the stdin (the piped echo).
g++ -x c++ - -o - says to output to the stdout (your display).
g++ -x c++ - -o - -S says to output assembler/assembly language.
g++ -x c++ - -o - -S -w says to silence all warnings from cc1plusplus.
This gives us the raw assembly code output.
For x86(_64) or ARM(v7/v8) machines, the mangled name in the assembly output will start at the beginning of a line, prefixed by an underscore (_) (typically _Z).
Notably, no other lines will begin this way, so lines beginning with an underscore are guaranteed to be a code object name.
grep '^_' says to filter the output down to only lines beginning with an underscore (_).
Now we have the mangled names (one on each line--depending on how many you echoed into g++).
However, all the names in the assembly are suffixed by a colon (:) character. We can remove it with the Stream-EDitor, sed.
sed 's/:$//' says to remove the colon (:) character at the end of each line.
Lastly, a couple of concrete examples, showing mangling and then demangling for you to use as reference (output from an x86 machine):
Example 1:
echo "int MyFunction(int x, char y) {}" \
| g++ -x c++ - -o - -S -w \
| grep '^_' \
| sed 's/:$//'
_Z10MyFunctionic # This is the output from the command pipeline
c++filt _Z10MyFunctionic
MyFunction(int, char) # This is the output from c++filt
Example 2:
echo \
"\
namespace YourSpace { int YourFunction(int, char); }
int YourSpace::YourFunction(int x, char y) {}
"\
| g++ -x c++ - -o - -S -w \
| grep '^_' \
| sed 's/:$//'
_ZN9YourSpace12YourFunctionEic # This is the output from the command pipeline
c++filt _ZN9YourSpace12YourFunctionEic
YourSpace::YourFunction(int, char) # This is the output from c++filt
I originally saw how to apply g++ to stdin in Romain Picard's article:
How To Mangle And Demangle A C++ Method Name
I think it's a good read.
Hope this helped you.
Additional Info:
Primary source: GNU <libstdc++> Manual: Chapter 28 Part 3: Demangling

What are the GCC default include directories?

When I compile a very simple source file with gcc I don't have to specify the path to standard include files such as stdio or stdlib.
How does GCC know how to find these files?
Does it have the /usr/include path hardwired inside, or it will get the paths from other OS components?
In order to figure out the default paths used by gcc/g++, as well as their priorities, you need to examine the output of the following commands:
For C:
gcc -xc -E -v -
For C++:
gcc -xc++ -E -v -
The credit goes to Qt Creator team.
There is a command with a shorter output, which allows to automatically cut the include pathes from lines, starting with a single space:
$ echo | gcc -Wp,-v -x c++ - -fsyntax-only
ignoring nonexistent directory "/usr/lib/gcc/x86_64-redhat-linux/4.8.2/include-fixed"
ignoring nonexistent directory "/usr/lib/gcc/x86_64-redhat-linux/4.8.2/../../../../x86_64-redhat-linux/include"
#include "..." search starts here:
#include <...> search starts here:
/usr/lib/gcc/x86_64-redhat-linux/4.8.2/../../../../include/c++/4.8.2
/usr/lib/gcc/x86_64-redhat-linux/4.8.2/../../../../include/c++/4.8.2/x86_64-redhat-linux
/usr/lib/gcc/x86_64-redhat-linux/4.8.2/../../../../include/c++/4.8.2/backward
/usr/lib/gcc/x86_64-redhat-linux/4.8.2/include
/usr/local/include
/usr/include
End of search list.
The credit goes to the libc++ front-page.
To summarise the other answers:
For C++:
c++ -xc++ /dev/null -E -Wp,-v 2>&1 | sed -n 's,^ ,,p'
For C:
cc -xc /dev/null -E -Wp,-v 2>&1 | sed -n 's,^ ,,p'
Though I agree with Ihor Kaharlichenko’s answer for considering C++ and with abyss.7’s answer for the compactness of its output, they are still incomplete for the multi-arch versions of gcc because input processing depends on the command line parameters and macros.
Example:
echo | /opt/gcc-arm-none-eabi-9-2019-q4-major/bin/arm-none-eabi-g++ -specs=nano.specs -mcpu=cortex-m4 -march=armv7e-m -mthumb -mfloat-abi=soft -x c++ -E -Wp,-v\
- -fsyntax-only yields
⋮
/opt/gcc-arm-none-eabi-9-2019-q4-major/bin/../arm-none-eabi/include/newlib-nano
/opt/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/../../../../arm-none-eabi/include/c++/9.2.1
/opt/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/../../../../arm-none-eabi/include/c++/9.2.1/arm-none-eabi/thumb/v7e-m/nofp
/opt/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/../../../../arm-none-eabi/include/c++/9.2.1/backward
/opt/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/include
/opt/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/include-fixed
/opt/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/../../../../arm-none-eabi/include
⋮
whereas echo | /opt/gcc-arm-none-eabi-9-2019-q4-major/bin/arm-none-eabi-g++ -x c++ -E -Wp,-v - -fsyntax-only yields
⋮
/opt/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/../../../../arm-none-eabi/include/c++/9.2.1
/opt/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/../../../../arm-none-eabi/include/c++/9.2.1/arm-none-eabi
/opt/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/../../../../arm-none-eabi/include/c++/9.2.1/backward
/opt/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/include
/opt/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/include-fixed
/opt/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/../../../../arm-none-eabi/include
⋮
The former invocation utilizes newlib (see lines 1 and 3 of the output), the latter goes with the standard includes. The common files at the end of the list are an example for the usage of include_next.
Bottom line: Always consider all macros and compiler options when printing the include directories.
Just run the following to list the default search paths:
$(gcc -print-prog-name=cc1) -v

Find function signature in Linux

Given a .so file and function name, is there any simple way to find the function's signature through bash?
Return example:
#_ZN9CCSPlayer10SwitchTeamEi
Thank you.
My compiler mangles things a little different to yours (OSX g++) but changing your leading # to an underscore and passing the result to c++filt gives me the result that I think you want:
bash> echo __ZN9CCSPlayer10SwitchTeamEi | c++filt
CCSPlayer::SwitchTeam(int)
doing the reverse is trickier as CCSPlayer could be a namespace or a class (and I suspect they're mangled differently). However since you have the .so you can do this:
bash> nm library.so | c++filt | grep CCSPlayer::SwitchTeam
000ca120 S CCSPlayer::SwitchTeam
bash> nm library.so | grep 000ca120
000ca120 S __ZN9CCSPlayer10SwitchTeamEi
Though you might need to be a bit careful about getting some extra results. ( There are some funny symbols in those .so files sometimes)
nm has a useful --demangle flag that can demangle your .so all at once
nm --demangle library.so
Try
strings <library.so>
nm -D library.so | grep FuncName