How do I find where a symbol is defined among static libraries - c++

Suppose you work with a codebase comprising several tools and libraries and you want to port (or resurrect) some component within such codebase but any clue about where symbols lie within the various libs is either lost or will take ages to find out by looking at the code itself (yes improved documentation can avoid such issues but is quite demanding). What is the fastest way to discover in which library you can find symbols used in the code?

Assuming a linux box, the nm tool, listing names in library files, comes to the rescue.
It can be used to do an extensive search as follows: one can first find all the libraries available (assuming the project have been successfully compiled without the component you are adding) with a find, then such find can be enclosed in a loop where you call nm on all discovered libraries; the output you then grep for discarding "U" references (undefined symbols, aka where else the symbol is being used). On a single bash line that gives:
for lib in $(find base_path -name \*.a) ; do echo $lib ; nm $lib | grep my_symbol | grep -v " U " ; done
where:
base_path is the root of your codebase
my_symbol is the symbol you are looking for
The echo generates a list of all libraries found, which is not so clean since it outputs names of libs not holding the symbol, but it was the fastest way I found to have a direct reference to the library so when you see a:
base_path/component/libA.a
0000000000000080 D my_symbol
You have found your usual suspect.

Using nm, it is possible to list the symbols defined in a binary, and the --defined-only switch ignores undefined references.
Option 1: find
In a single command:
find $path -name \*.a -exec bash -c "nm --defined-only {} 2>/dev/null | grep $symbol && echo {}" \;
where $path is the root of the file tree containing the binaries, and $symbol is the name of the symbol you are looking for.
Option 2: find + GNU parallel
Running nm on all files can take time, so it could be helpful to process the results of find in parallel (using GNU parallel):
find $path -name \*.a | parallel "nm --defined-only {} 2>/dev/null | grep $symbol && echo {}"
Option 3: fd
And at last, my favourite. Using the fd tool, that has a simpler syntax than find, is generally faster, and processes the results in parallel by default:
fd '.*\.a$' -x bash -c "nm --defined-only {} 2>/dev/null | grep $symbol && echo {}"
Simple benchmark
Searching for the gz_write symbol in /usr/lib on my laptop:
find takes around 23 seconds
find | parallel takes around 10 seconds
fd takes around 8 seconds

Using nm's --defined-only switch is helpful here since it will remove the undefined references. Below is a csh script that may be useful to others.
#!/bin/csh
#
#recurse from current dir and output name of any .a files
#that contain the desired symbol.
echo "Search for: $1"
foreach i (`find . -name '*.a'`)
nm --defined-only $i | grep $1
if ($status == 0) then
echo $i
endif
end

Related

Where do I see what parts of LLVM a library contain?

I know how to see which libraries a certain component correponds to with the command:
llvm-config --libs core
Now, suppose I get a linker error and wants to include another library to resolve it.
Say, the linker can't resolve some symbol A. Then how do I:
1) Find the library that contains the specific symbol, like e.g. LLVMCore.lib.
2) Look up contents of libraries to see what symbols it defines?
I don't understand how to do this reading the documentation.
As you have already discovered a proper LLVM-way to do this would be using llvm-config by indicating the components you intend to link against or use, e.g.
llvm-config --cxxflags --ldflags --system-libs --libs core
Other common non-llvm specific methods that you can use to find a symbol: on a Win platform (use VS native tools cmd or equivalent environment-set one):
for %f in (*.lib) do (dumpbin.exe /symbols %f | findstr /C:"your_symbol")
if you can't deal with findstr's limitations GNU grep might be a better choice.
If you have unix tools installed and in your PATH you can also use
for %f in (*.lib) do (nm -gC %f | findstr /C:"your_symbol")
as baddger964 suggests.
On a unix system:
for lib in $(find . -name \*.so) ; do nm -gC $lib | grep my_symbol | grep -v " U " ; done
(search *.so libraries in this directory for my_symbol; extern-only, demangle and exclude undefined symbols)
Given the above question 2 is trivial.
One way to see symbols of your lib is to use the nm command :
nm -gC mylib.so

Why is this pattern search hanging?

I am running Linux CentOs and i am trying to find some malicious code in my wordpress installation with this command:
grep -r 'php \$[a-zA-Z]*=.as.;' * |awk -F : '{print $1}'
When I hit enter, the process just hangs...I want to double check that I have the syntax right and all I have to do is wait?
How Can I get some sort of feedback/something happening while its searching?
Thanks
Instead of using grep -r to recursively grep, one option is to use find to get the list of filenames, and feed them to grep one at a time. That lets you add other commands alongside the grep, such as echos. For example, you could create a script called is-it-malware.sh that contains this:
#!/bin/bash
if grep 'php \$[a-zA-Z]*=.as.;' "$1" >/dev/null
then
"!!! $1 is malware!!!"
else
" $1 is fine."
fi
and run this command:
find -type f -exec ./is-it-malware.sh '{}' ';'
to run your script over every file in the current directory and all of its subdirectories (recursively).
Its probably taking its time due to the -r * (recursively, all files/dirs)?
Consider
find -type f -print0 | xargs -0trn10 grep -l 'php \$[a-zA-Z]*=.as.;'
which will process the files in batches of (max) 10, and printing those commands as it goes.
Of course, like that you can probably optimize the heck out of it, with a simple measure like
find -type f -iname '*.php' -print0 | xargs -0trn10 grep -l 'php \$[a-zA-Z]*=.as.;'
Kind of related:
You can do similar things without find for smaller trees, with recent bash:
shopt -s globstar
grep -l 'pattern' **/*.php

Unix find with wildcard directory structure

I am trying to do a find where I can specify wildcards in the directory structure then do a grep for www.domain.com in all the files within the data directory.
ie
find /a/b/c/*/WA/*/temp/*/*/data -type f -exec grep -l "www.domain.com" {} /dev/null \;
This works fine where there is only one possible level between c/*/WA.
How would I go about doing the same thing above where there could be multiple levels between C/*/WA?
So it could be at
/a/b/c/*/*/WA/*/temp/*/*/data
or
/a/b/c/*/*/*/WA/*/temp/*/*/data
There is no defined number of directories between /c/ and /WA/; there could be multiple levels and at each level there could be the /WA/*/temp/*/*/data.
Any ideas on how to do a find such as that?
How about using a for loop to find the WA directories, then go from there:
for DIR in $(find /a/b/c -type d -name WA -print); do
find $DIR/*/temp/*/*/data -type f \
-exec grep -l "www.domain.com" {} /dev/null \;
done
You may be able to get all that in a single command, but I think clarity is more important in the long run.
Assuming no spaces in the paths, then I'd think in terms of:
find /a/b/c -name data -type f |
grep -E '/WA/[^/]+/temp/[^/]+/[^/]+/data' |
xargs grep -l "www.domain.com" /dev/null
This uses find to find the files (rather than making the shell do most of the work), then uses the grep -E (equivalent to egrep) to select the names with the correct pattern in the path, and then uses xargs and grep (again) to find the target pattern.

Why is my gdb debugger setting 2 break points?

Is this normal? I swear it was setting only 1 break point until recently. How do I make it only set a breakpoint in my running file and not the source file.
(gdb) break main
Breakpoint 1 at 0x1dbf
Breakpoint 2 at 0x1ed8: file arrays.c, line 17.
warning: Multiple breakpoints were set.
Use the "delete" command to delete unwanted breakpoints.
(gdb)
There are multiple main symbols :) Perhaps look at 'info breakpoints' in gdb or
objdump -C -t myprog
to see why/where.
Use cscope to interactively search for declarations.
ctags -R . && grep -w main tags
[ -x /usr/bin/vim ] && vim +'tj main'
Should be helpful as well if you have ctags (and optionally, vim) installed
If all else fails, brute force grep -RIw main . should work. If even that fails, you should find yourself with very strange external header #defines or even a (static) library with a surplus main symbol. To brute force search for the main identifier through the preprocessed sources:
find -name '*.c' -print0 | xargs -0n1 -iQ cpp -I/usr/include/... -DDEBUG Q Q.ii
find -name '*.c.ii' -print0 | xargs grep -wI main
(replace -I/usr/include/... -DDEBUG with the relevant preprocessor defines)

Find function signature in Linux

Given a .so file and function name, is there any simple way to find the function's signature through bash?
Return example:
#_ZN9CCSPlayer10SwitchTeamEi
Thank you.
My compiler mangles things a little different to yours (OSX g++) but changing your leading # to an underscore and passing the result to c++filt gives me the result that I think you want:
bash> echo __ZN9CCSPlayer10SwitchTeamEi | c++filt
CCSPlayer::SwitchTeam(int)
doing the reverse is trickier as CCSPlayer could be a namespace or a class (and I suspect they're mangled differently). However since you have the .so you can do this:
bash> nm library.so | c++filt | grep CCSPlayer::SwitchTeam
000ca120 S CCSPlayer::SwitchTeam
bash> nm library.so | grep 000ca120
000ca120 S __ZN9CCSPlayer10SwitchTeamEi
Though you might need to be a bit careful about getting some extra results. ( There are some funny symbols in those .so files sometimes)
nm has a useful --demangle flag that can demangle your .so all at once
nm --demangle library.so
Try
strings <library.so>
nm -D library.so | grep FuncName