How to use Angr to analyze JNI functions in shared libraries? - java-native-interface

I'm new to the binary analysis field. What I want to do is to analyze the JNI native interface functions (e.g., RegisterNatives or other functions listed here by using the SimProcedures provided by Angr. The shared libraries (*.so files) suppose to be part of Android apps. However, I noticed that these JNI native interface functions do not show as symbols in the shared libraries. So my questions are:
Why these JNI native interface functions do not have corresponding symbols in the shared libraries? Did I do something wrong or they suppose like this?
In Angr, SimProcedures can only bind to symbols if I did not miss anything. So if there are no such symbols, what should I do to make it work?

The various functions are exposed by the JVM as table of function pointers. See here, for example.
A call to env->FindClass would be represented in assembly as something like (C pseudocode):
fp = env + 6 * sizeof(void *);
fp(env, ...)
Perhaps you can teach this angr thing about this function pointer table?

Related

ABI stability on pointers to symbols vs regular symbol lookup

General Scenario
Using dlsym(), I dynamically load a shared object addon from my main thread.
I follow either of these two ways.
Way A
Pass a struct of pointers to symbols to the addon so it can call the host's functions and access other variables, knowing their data type of course.
Way B
Let the addon call symbols by their extern "C" identifier and have the runtime normally lookup them.
Question
Is there any difference between these two methods regarding ABI stability? For example: would one of this methods guarantee more chance of compatibility from an addon to the host program in case they were compiled in different environments?
One advantage of "Way A" is that it gives you the chance to pass different pointers to different plugins. So you could for example make a "v1" struct of pointers, and then later a "v2" that newer plugins could request.
If anything works well two methods are equivalent except some performance loss. But runtime lookup resolves named symbols in global scope, which could be influenced by flag like RTLD_GLOBAL used in dlopen. It will lead to different behaviour under different context even though using the same addon.
So I think method A is better.

How to compile app in C with module?

I want to do application, which can be compiled with external modules, for example like in php. In php you can load modules in runtime, or compile php with modules together, so modules are available without loading in runtime. But i don't understand how this can be done. If i have module in module.c and there is one function, called say_hello, how can i register it to main application, if you understand what i mean?
/* module.c */
#include <stdio.h>
// here register say_hello function, but how, if i can't in global scope
// call another function?
void say_hello()
{
printf("hello!");
}
If i compile all that files(main app + modules) together, there isn't some reference to say_hello function from main app, because it is called only if user call it in its code. So how can i say to my app, hey, there is say_hello function, if someone want to call it, you know it exists.
EDIT1: I need to have something like table at runtime, where i can see if user called function exists (have C equivavent). Header files doesn't help to me.
EDIT2: My app is interpret for my script langugage.
EDIT3: If someone call function in php, php interpret must know that function exists. I know about dynamic linking and if .so or .dll is loaded, then some start routine is called and you can simple register function in that dll, so php interpret can see, if some module registred for example function called "say_hello". But if i want compile php with for example gd support, then how gd functions are registred to some php function list, hashtable or whatever?
I guess what you are looking for is dynamic libraries (we call runtime loadable modules as dynamic/shared libraries in C and in the OS world, in general). Take, for example, Pidgin which supports plugins to extend it's functionalities. It gives a particular interface to it's plugin-makers to abide by, say functions to register, load, unload and use, which the plugins will have to follow.
When the program loads, it looks for such dynamic libraries in it's plugins directory, if present, it'll load and use it, else it'll skip exposing the functionality. The reason why an interface is needed is that since different modules can have different functionalities which are unknown uptil runtime, an app. has to have a common, agreed-upon way of "talking" to it's plugins/modules.
Every C program can be linked to a static or a dynamic library; static will copy the code from the library to the said program, there by leaving no dependencies for the program to run, while linking to a dynamic library expects the dynamic library to be present when the program is launched. A third way of doing it, is not to link to a DLL, but just asking the OS to perform a load operation of the library. If this succeeds, then the dynamic module is used, else ignored. The functionality which the dynamic library should perform is exposed to the user, only if the load call succeeds.
It is to be noted that this is a operating system provided feature and it has nothing to do with the language used (C or C++ or Python doesn't matter here); as far as C is concered, the compiler still links to known code i.e. code which is available # compile time. This is the reason for different operating system, one needs to write different code to load a dynamic module. Even more, the file type/format of syuch libraries vary from system to system. In Linux it's called shared objects (.so), in Mac it's called dynamic libraries (.dylib) and in Windows as Dynamic link libraries (.dll).
C is not interpreted language. So you need linking, you may want static linking or dynamic linking.
Program building consists of 2 major phases: compiling and linking. During compiling all c-files are translated into machine code, leaving called functions unresolved (obj or o files). Then linker merges all these files into one executable, resolving what was unresolved.
This is static linking. Linked module becomes integral part of executable.
Dynamic linking is platform specific. Under windows these are DLLs. You should issue a system call to load DLL after which you will be able to call functions from it.
What you need is dynamic library. Let's first take a look at the example provided in the Linux manpage of dlopen(3):
/* Load the math library, and print the cosine of 2.0: */
#include <stdio.h>
#include <stdlib.h>
#include <dlfcn.h>
int main(int argc, char **argv) {
void *handle;
double (*cosine)(double);
char *error;
handle = dlopen("libm.so", RTLD_LAZY);
if (!handle) {
fprintf(stderr, "%s\n", dlerror());
exit(EXIT_FAILURE);
}
dlerror(); /* Clear any existing error */
/* Writing: cosine = (double (*)(double)) dlsym(handle, "cos");
would seem more natural, but the C99 standard leaves
casting from "void *" to a function pointer undefined.
The assignment used below is a workaround. */
*(void **) (&cosine) = dlsym(handle, "cos");
if ((error = dlerror()) != NULL) {
fprintf(stderr, "%s\n", error);
exit(EXIT_FAILURE);
}
printf("%f\n", (*cosine)(2.0));
dlclose(handle);
exit(EXIT_SUCCESS);
}
There's also a C++ dlopen mini HOWTO.
For more general information about dynamic loading, start from the wikipedia page first.
I think it is impossible, if i understand what you mean. Because it is compiled language.

C++/CLI lib with 1 function to be statically linked with native C++

I would like to accomplish a simple task:
I want to produce a C++/CLI lib that calls some .NET routines, and exposes 1 static method that I can call from purely native C++ app. I want to statically link to that lib from my native app. The signature of the method that the native C++ app would call should be
void (unsigned char* data, int length, _TCHAR* args[])
I am pretty new to C++/CLI and C++ in general. So, can you please help me and answer these questions:
For my tiny C++/CLI project, I assume the type needs to be class library with lib as output, targeting vc9 runtime (so that I can be sure it is available on end users PCs). Am I correct in this assumption?
Can you provide an example of the outline of the method with that signature I will need to write in my C++/CLI project. In particular how should I do conversion to CLR types properly (i.e. byte[], int32, and string)? And how do I decorate that method with something like "extern "C" __declspec(dllexport)" to make this method callable from my native C++ app?
How would I separate the code between cpp and h files in C++/CLI properly?
Finally how would I actually call it from my native app once I add lib reference?
Thank you.
Class library: correct, but you'll also need to change your project's configuration type from Dynamic Library to Static Library. I'm not sure what you mean by 'targeting vc9 runtime' – you need to target the same runtime as the native code that will be using your static library.
Since this is a static library, no __declspec(dllexport) is needed. If you want to know how to do conversion to .NET types, you'll need to post what your code is actually doing. In general, you'll want Marshal::Copy to copy a C-array into a .NET array, and marshal_as<> to copy C-strings into .NET strings, but there's still the question as to whether that data is intended to be mutable and needs to be marshaled back to native types before returning...
The exact same as in C++ – declaration in a header, definition in a source file.
The exact same as any other function – #include the header containing the declaration and call the function.

How to call a JNI DLL from C++

I have a task to interface with a dll from a third party company using C++.
The dll package comes with:
the dll itself
a sample Java implementation - consists of a java wrapper(library) generated using the SWIG tool and the the java source file
documentation that states all the public datatypes, enumeration and member functions.
My other colleague is using Java(based on the example in package) to interface with the dll while I'm asked to use C++. The Java example looks straight forward... just import the wrapper and instantiate any class described in the docs..
More info on the dll:
From the docs, it says the dll was programmed using C++
From a hexdump, it shows that it was compiled using VC90 (VS C++ 2008 right?) and something from Dinkumware.
From a depends.exe output, the functions seems to be wrapped under JNI. For example: _Java_mas_com_oa_rollings_as_apiJNI_Server_1disconnect#20
My dilemma:
The dll company is not changing anything in the dll and not providing any other info.
How do i use the member functions in the class from the dll?
I did some simple LoadLibrary() and GetProcAddress and manage to get the address of the public member functions.
But i dunno how to use the functions that has the datatype parameters defined in the dll. For example:
From the docs, the member function is defined as:
void Server::connect(const StringArray, const KeyValueMap) throw(std::invalid_argument,std::out_of_range)
typedef std::map Server::KeyValueMap
typedef std::vector Server::StringArray
how do i call that function in C++. The std::map and std::vector in my compiler (VS 2005) has different functions listing that the one in the dll. For example, from the depends.exe output:
std::map // KeyValueMap - del, empty, get, has_1key,set
std::vector // StringArray - add, capacity, clear, get, isEMPTY, reserve, set, size
Any advice/strategy on how i should solve this? Is it possible to simply instantiate the class like the Java example?
If you are trying to use VS 2005 to try and interface with a DLL that is built using VS2008, your attempts will be mostly doomed unless you can use a plain C interface. Given your description, this is not the case; The runtime libraries differ between VS2005 and VS2008 so there is little chance that the object layout has stayed the same between compilers. The 'something from Dinkumware' that you're referring to is most likely the C++ standard library as ISTR that Microsoft uses the Dinkumware one.
With your above example you're also missing several important pieces of information - the types you describe (Server::StringArray and Server::KeyValueMap) are standard library containers. OK fine, but standard library containers of what? These containers are templates and unless you know the exact types these templates have been instantiated with, you're a little stuck.
Is this DLL intended to be called from C++ at all? The fact that it export a JNI interface suggests that it might not be in the first place. Does it export any other public symbols apart from those that are of the format _Java_...?
Of course if there is no other way in and you must use C++ instead of Java, you might want to look into embedding a JVM into your C++ app and use that to call through to the C++ dll. It's not what I'd call an elegant solution but it might well work.
I don't quite understand the use of C++ standard library data types here. How can Java code provide a std::map argument? Are the arguments you pass in always just "opaque" values you would get as output from a previous call to the library? That's the only way you're going to be able to make it work from code under a different runtime.
Anyway...
When you make a JNI module, you run javah.exe and it generates a header file with declarations like:
JNIEXPORT void JNICALL Java_Native_HelloWorld(JNIEnv *, jobject);
Do you have any such header file for the module?
These symbols are exported as extern "C" if I recall correctly, so if you can get the correct signatures, you should have no issues with name mangling or incompatible memory allocators, etc..
The "#20" at the end of the method signature means that the function is declared "stdcall" and that 20 bytes are put on the stack when the function is called. All these methods should start with a JNIEnv* and a jobject, these will total 8 bytes I believe, on a 32-bit environment, so that leaves 12 bytes of parameters you will need to know in order to generate a correct function prototype.
Once you figure out what the parameters are, you can generate something like this:
typedef void (__stdcall *X)(JNIEnv *, jobject, jint i, jboolean b);
Then, you can cast the result of GetProcAddress to an X and call it from your C++ code.
X x = (X)GetProcAddress(module, "name");
if (x) x(params...);
Unfortunately, what you have doesn't quite look like what I have seen in the past. I am used to having to deal with Java data types from C/C++ code, but it looks like this module is dealing with C++ data types in Java code, so I don't know how relevant any of my experience is. Hopefully this is some help, at least.

Compiling a DLL with gcc

Sooooo I'm writing a script interpreter. And basically, I want some classes and functions stored in a DLL, but I want the DLL to look for functions within the programs that are linking to it, like,
program dll
----------------------------------------------------
send code to dll-----> parse code
|
v
code contains a function,
that isn't contained in the DLL
|
list of functions in <------/
program
|
v
corresponding function,
user-defined in the
program--process the
passed argument here
|
\--------------> return value sent back
to the parsing function
I was wondering basically, how do I compile a DLL with gcc? Well, I'm using a windows port of gcc. Once I compile a .dll containing my classes and functions, how do I link to it with my program? How do I use the classes and functions in the DLL? Can the DLL call functions from the program linking to it? If I make a class { ... } object; in the DLL, then when the DLL is loaded by the program, will object be available to the program? Thanks in advance, I really need to know how to work with DLLs in C++ before I can continue with this project.
"Can you add more detail as to why you want the DLL to call functions in the main program?"
I thought the diagram sort of explained it... the program using the DLL passes a piece of code to the DLL, which parses the code, and if function calls are found in said code then corresponding functions within the DLL are called... for example, if I passed "a = sqrt(100)" then the DLL parser function would find the function call to sqrt(), and within the DLL would be a corresponding sqrt() function which would calculate the square root of the argument passed to it, and then it would take the return value from that function and put it into variable a... just like any other program, but if a corresponding handler for the sqrt() function isn't found within the DLL (there would be a list of natively supported functions) then it would call a similar function which would reside within the program using the DLL to see if there are any user-defined functions by that name.
So, say you loaded the DLL into the program giving your program the ability to interpret scripts of this particular language, the program could call the DLLs to process single lines of code or hand it filenames of scripts to process... but if you want to add a command into the script which suits the purpose of your program, you could say set a boolean value in the DLL telling it that you are adding functions to its language and then create a function in your code which would list the functions you are adding (the DLL would call it with the name of the function it wants, if that function is a user-defined one contained within your code, the function would call the corresponding function with the argument passed to it by the DLL, the return the return value of the user-defined function back to the DLL, and if it didn't exist, it would return an error code or NULL or something). I'm starting to see that I'll have to find another way around this to make the function calls go one way only
This link explains how to do it in a basic way.
In a big picture view, when you make a dll, you are making a library which is loaded at runtime. It contains a number of symbols which are exported. These symbols are typically references to methods or functions, plus compiler/linker goo.
When you normally build a static library, there is a minimum of goo and the linker pulls in the code it needs and repackages it for you in your executable.
In a dll, you actually get two end products (three really- just wait): a dll and a stub library. The stub is a static library that looks exactly like your regular static library, except that instead of executing your code, each stub is typically a jump instruction to a common routine. The common routine loads your dll, gets the address of the routine that you want to call, then patches up the original jump instruction to go there so when you call it again, you end up in your dll.
The third end product is usually a header file that tells you all about the data types in your library.
So your steps are: create your headers and code, build a dll, build a stub library from the headers/code/some list of exported functions. End code will link to the stub library which will load up the dll and fix up the jump table.
Compiler/linker goo includes things like making sure the runtime libraries are where they're needed, making sure that static constructors are executed, making sure that static destructors are registered for later execution, etc, etc, etc.
Now as to your main problem: how do I write extensible code in a dll? There are a number of possible ways - a typical way is to define a pure abstract class (aka interface) that defines a behavior and either pass that in to a processing routine or to create a routine for registering interfaces to do work, then the processing routine asks the registrar for an object to handle a piece of work for it.
On the detail of what you plan to solve, perhaps you should look at an extendible parser like lua instead of building your own.
To your more specific focus.
A DLL is (typically?) meant to be complete in and of itself, or explicitly know what other libraries to use to complete itself.
What I mean by that is, you cannot have a method implicitly provided by the calling application to complete the DLLs functionality.
You could however make part of your API the provision of methods from a calling app, thus making the DLL fully contained and the passing of knowledge explicit.
How do I use the classes and functions in the DLL?
Include the headers in your code, when the module (exe or another dll) is linked the dlls are checked for completness.
Can the DLL call functions from the program linking to it?
Yes, but it has to be told about them at run time.
If I make a class { ... } object; in the DLL, then when the DLL is loaded by the program, will object be available to the program?
Yes it will be available, however there are some restrictions you need to be aware about. Such as in the area of memory management it is important to either:
Link all modules sharing memory with the same memory management dll (typically c runtime)
Ensure that the memory is allocated and dealloccated only in the same module.
allocate on the stack
Examples!
Here is a basic idea of passing functions to the dll, however in your case may not be most helpfull as you need to know up front what other functions you want provided.
// parser.h
struct functions {
void *fred (int );
};
parse( string, functions );
// program.cpp
parse( "a = sqrt(); fred(a);", functions );
What you need is a way of registering functions(and their details with the dll.)
The bigger problem here is the details bit. But skipping over that you might do something like wxWidgets does with class registration. When method_fred is contructed by your app it will call the constructor and register with the dll through usage off methodInfo. Parser can lookup methodInfo for methods available.
// parser.h
class method_base { };
class methodInfo {
static void register(factory);
static map<string,factory> m_methods;
}
// program.cpp
class method_fred : public method_base {
static method* factory(string args);
static methodInfo _methoinfo;
}
methodInfo method_fred::_methoinfo("fred",method_fred::factory);
This sounds like a job for data structures.
Create a struct containing your keywords and the function associated with each one.
struct keyword {
const char *keyword;
int (*f)(int arg);
};
struct keyword keywords[max_keywords] = {
"db_connect", &db_connect,
}
Then write a function in your DLL that you pass the address of this array to:
plugin_register(keywords);
Then inside the DLL it can do:
keywords[0].f = &plugin_db_connect;
With this method, the code to handle script keywords remains in the main program while the DLL manipulates the data structures to get its own functions called.
Taking it to C++, make the struct a class instead that contains a std::vector or std::map or whatever of keywords and some functions to manipulate them.
Winrawr, before you go on, read this first:
Any improvements on the GCC/Windows DLLs/C++ STL front?
Basically, you may run into problems when passing STL strings around your DLLs, and you may also have trouble with exceptions flying across DLL boundaries, although it's not something I have experienced (yet).
You could always load the dll at runtime with load library