Freeing jni references from memory - java-native-interface

Freeing jni references from memory - java-native-interface

I create a byte array in java and pass it by reference to the jni function. This I do in a loop and and sometimes get a out of memory error in the jni. I wanted to know if java automatically frees the array on every iteration or since it is passed to the jni function, it doesn't ??
JNI Code (bOldArray is the java byte array that i pass to jni as an argument)
len = (*env)->GetArrayLength(env,bOldArray);
char *oldBuff = (char *)calloc(sizeof(char),MAX_SIZE);
jbyte* bytes = (*env)->GetByteArrayElements(env,bOldArray,0);
memcpy(oldBuff,bytes,len);
(*env)->ReleaseByteArrayElements(env,bOldArray,(jbyte *)bytes,0);

you have 2 buffers here, one handed from your java code (bOldArray)and the local buffer (oldbuff) which you allocated in line 2.
in fact, you might have more buffers, because
(*env)->GetArrayLength
with almost certainly makes a unmovable copy of the memory (needed for c-pointer-access), which holds the array in you java code and with
(*env)->ReleaseByteArrayElements(env,bOldArray,(jbyte *)bytes,0);
this memory is copied back onto the memory of your java array (check documentation of the last argument of ReleaseByteArrayElements)
but about you problem: you should free oldBuff too.
free(oldBuff);
else the VM does free your c-copy of the java array but not the self-allocated part directly (it might do thiz later due to object-lifetime and garbage collection, but this is unpredictable and therefore the out-of-memory error is also unpredictable)
to avoid the java-c-copy mechanism (speeds up performance) use a shared/static buffer like ByteBuffer

If you are using GetByteArrayElements you have to call ReleaseByteArrayElements after you are done with the array in JNI, because the JVM will prevent the freeing of this array in java until you do so.
Please post the code to get a clearer idea

Related

Are Node.js buffers in C++ addon null terminated?

I am trying to create buffer in a .js file, and then passing that buffer to a c++ addon where I will call a Windows API.
In the c++ addon I have:
auto buf = node::Buffer::Data(args[0]);
auto len = node::Buffer::Length(args[0]);
Are there any guarantees that node::Buffers are null-terminated? Or does node::Buffer::Length have any other form of safety check to prevent an overrun?

No. Think of buffers as a minimal data structure containing length and memory; they are not raw memory like what malloc() provides. A buffer is protected against memory overruns within a JavaScript context but not if you pass a pointer to the buffer memory to C/C++. And because a Buffer can be used to store binary data (as well as a string) it doesn't provide an extra byte to null-terminate a string; it doesn't "know" what is in the buffer.
I don't think the code you've posted will work; This post does a good job of explaining how to use buffers in C/C++: https://community.risingstack.com/using-buffers-node-js-c-plus-plus/

Pinning Unsafe pointer

I'm designing a JNI interface that passes string parameters from Java to C++. I need high performance and have been able to use Direct ByteBuffer and String.getBytes() to do that fairly well, but the penalty for passing strings to C/C++ still remains fairly high. I recently read about the Open JDK's Unsafe class. This excellent page got me started, but I'm finding Unsafe to be woefully, but understandably poorly documented.
I'm wondering, if I use the Unsafe class to obtain a pointer to a string and pass it to C++, is there a risk that the object has moved before the C++ code is entered? And even while C++ is executing? Or are these addresses provided by the Unsafe code somehow pinned? If they aren't pinned, how are these Unsafe pointers ever useful?

Unsafe is not meant to interop with JNI. So obtained via Unsafe could change any time (even in parallel with your C++).
JNI API has ability to pin object in memory to access array content (in HotSpot JVM it would block GC thus may have negative effect on GC pause duration).
In particular, Get*ArrayElements would pin array until you explicitly do Release*ArrayElements. GetStringChars work similar way.
Direct ByteBuffer hold pointer to memory buffer outside of heap, hense this buffer is not moving and you can access it for Native code.

I've read the Java source for java.misc.Unsafe and have a bit more insight.
Unsafe has at least two ways of dealing with memory.
allocateMemory/reallocateMemory/freeMemory/etc -- As far as I can tell this allocation of memory is outside the heap so faces no GC'ing challenges. I have indirectly tested this and it seems that the long returned is simply a pointer to the memory. It seems very likely that this type of memory is safe to pass through JNI to native code. And the application Java code should be able to quickly modify/query it before and after JNI calls by using some of the other intrinsic Unsafe methods that support this style of memory pointer.
object+offset - These methods accept a pointer to an object and an "offset" token to indicate where in the object to fetch/modify the value. The objects presumably are always in the Java heap, but passing the object to these methods probably helps resolve GC complications. It does sounds like the "offset" is sometimes a "cookie" rather than an actual offset, but it also sounds like that in the case of arrays, arrayBaseOffset() returns an "offset" that one can manipulate arithmetically. I don't know if this object+offset is safe for JNI code. I don't see a method to generate a pointer directly to the Java object in the heap that one could (dangerously) pass through JNI. One could pass an object and offset, but given the cost of passing Objects through JNI, this approach is not appealing anyway.
Like (1), the code associated with the page I referenced in my posting is probably pretty safe for JNI interactions. It takes the object+offset approach when dealing with String, but uses approach (1) when dealing with the direct ByteBuffer, which always reside outside the Java heap. Direct ByteBuffer's are very JNI friendly and often they can be used in ways that avoids the JNI Object passing costs I allude to in my comment to Tom above.

Determining when functions allocate memory in C++

So one major problem I have is determining whether a given function in C/C++ does memory allocation or not. I frequently work with external libraries, some of which have functions that return pointers to new objects. Is there some basic design paradigm or convention that will let me know ahead of time if something allocates memory?
It would seem like any function that returns a pointer to a new object must be allocating memory, but this does not always seem to be the case. For example, fopen does not
Edit: To be clear, I don't have access to the source code so I can't just check if it uses new or malloc, ect.

Read the documentation of all libraries you use. They should tell you if certain things should be freed.

If they documented the libraries well (as all the built in libraries are), then it should state something along the lines of "caller must free" in the post condition, sub section side effects of the function.

For C++ anything that calls new or new [] allocates memory. So a function does if it calls those or calls any function (that calls any function.... and so on) that calls new.
The same in C except the calls are malloc, calloc and family.

The best, or at least simplest, solution is the documentation, of course.
But, if you want to be sure that the function doesn't use malloc, you wrap malloc (and its friends calloc, realloc and eventually free) to gather some stats.
Writing wrappers is quiet simple, at least if can use dlsym(3) (sorry I don't know the windows way for that), here is the code for malloc:
void *malloc(size_t s) {
// Retrieve the pointer to the libc's malloc
// I use a static var to avoid time penality
static void* (*real_malloc)(size_t) = NULL;
if (!real_malloc) real_malloc = dlsym(RTLD_NEXT,"malloc");
stat.nmalloc += 1; // count malloc calls
stat.smalloc += s; // count malloced size
// You can also directly print malloc's parameters
// but you first need to check that stdio functions
// doesn't use malloc, or write your own printer
return real_malloc(s);
}
In my example, I use a static global struct to store the number of calls for each functions and the sum of size at each call. The wrapper code is but in a small lib that you can link with your test code (or, if you diretly print statistic, you can use LD_PRELOAD.)
The result is interesting, for example, you said that fopen doesn't use malloc, using that kind of tricks, you can see that it's false. On a 64bits recent linux system, for example, I got one malloc call for 568 bytes when using fopen ([edit] of course the free is done in fclose.)

How to detect if memory is dynamic or static, from a callee perspective?

Note: when I say "static string" here I mean memory that can not be handled by realloc.
Hi, I have written a procedure that takes a char * argument and I would like to create a duplicate IF the memory is not relocatable/resizable via realloc. As is, the procedure is a 'heavy' string processor, so being ignorant and duplicating the string whether or not it is static will surely cause some memory overhead/processing issues in the future.
I have tried to use exception handlers to modify a static string, the application just exits without any notice. I step back, look at C and say: "I'm not impressed." That would be an exception if I have ever heard of one.
I tried to use exception handlers to call realloc on a static variable... Glib reports that it can't find some private information to a structure (I'm sure) I don't know about and obviously calls abort on the program which means its not an exception that can be caught with longjmp/setjmp OR C++ try, catch finally.
I'm pretty sure there must be a way to do this reasonably. For instance dynamic memory most likely is not located anywhere near the static memory so if there is a way to divulge this information from the address... we might just have a bingo..
I'm not sure if there are any macros in the C/C++ Preprocessors that can identify the source and type of a macro argument, but it would be pretty dumb if it didn't. Macro Assemblers are pretty smart about things like that. Judging by the lack of robust error handling, I would not be a bit surprised if it did not.

C does not provide a portable way to tell statically allocated memory blocks from dynamically allocated ones. You can build your own struct with a string pointer and a flag indicating the type of memory occupied by the object. In C++ you can make it a class with two different constructors, one per memory type, to make your life easier.
As far as aborting your program goes, trying to free or re-allocate memory that has not been allocated dynamically is undefined behavior, so aborting is a fair game.

You may be able to detect ranges of memory and do some pointer comparisons. I've done this in some garbage collection code, where I need to know whether a pointer is in the stack, heap, or elsewhere.
If you control all allocation, you can simply keep min and max bounds based on every dynamic pointer that ever came out of malloc, calloc or realloc. A pointer lower than min or greater than max is probably not in the heap, and this min and max delimited region is unlikely to intersect with any static area, ever. If you know that a pointer is either static or it came from malloc, and that pointer is outside of the "bounding box" of malloced storage, then it must be static.
There are some "museum" machines where that sort of stuff doesn't work and the C standard doesn't give a meaning to comparisons of pointers to different objects using the relational operators, other than exact equality or inequality.

Any solution you would get would be platform specific, so you might want to specify the platform you are running on.
As for why a library should call abort when you pass it unexpected parameters, that tends to be safer than continuing execution. It's more annoying, certainly, but at that point the library knows that the code calling into it is in an state that cannot be recovered from.

I have written a procedure that takes a char * argument and I would like to create a duplicate IF the memory is not relocatable/resizable via realloc.
Fundamentally, the problem is that you want to do memory management based on information that isn't available in the scope you're operating in. Obviously you know if the string is on the stack or heap when you create it, but that information is lost by the time you're inside your function. Trying to fix that is going to be nearly impossible and definitely outside of the Standard.
I have tried to use exception handlers to modify a static string, the application just exits without any notice. I step back, look at C and say: "I'm not impressed." That would be an exception if I have ever heard of one.
As already mentioned, C doesn't have exceptions. C++ could do this, but the C++ Standards Committee believes that having C functions behave differently in C++ would be a nightmare.
I'm pretty sure there must be a way to do this reasonably.
You could have your application replace the default stack with one you created (and, as such, know the range of addresses in) using ucontext.h or Windows Fibers, and check if the address is inside the that range. However, (1) this puts a huge burden on any application using your library (of course, if you wrote the only application using your library, then you may be willing to accept that burden); and (2) doesn't detect memory that can't be realloced for other reasons (allocated using static, allocated using a custom allocator, allocated using SysAlloc or HeapAlloc on Windows, allocated using new in C++, etc.).
Instead, I would recommend having your function take a function pointer that would point at a function used to reallocate the memory. If the function pointer is NULL, then you duplicate the memory. Otherwise, you call the function.

original poster here. I neglected to mention that I have a working solution to the problem, it is not as robust as I would have hoped for. Please do not be upset, I appreciate everyone participating in this Request For Comments and Answers. The 'procedure' in question is variadic in nature and expects no more than 63 anonymous char * arguments.
What it is: a multiple string concatenator. It can handle many arguments but I advise the developer against passing more than 20 or so. The developer never calls the procedure directly. Instead a macro known as 'the procedure name' passes the arguments along with a trailing null pointer, so I know when I have met the end of statistics gathering.
If the function recieves only two arguments, I create a copy of the first argument and return that pointer. This is the string literal case. But really all it is doing is masking strdup
Failing the single valid argument test, we proceed to realloc and memcpy, using record info from a static database of 64 records containing each pointer and its strlen, each time adding the size of the memcopy to a secondary pointer (memcpy destination) that began as a copy of the return value from realloc.
I've written a second macro with an appendage of 'd' to indicate that the first argument is not dynamic, therefore a dynamic argument is required, and that macro uses the following code to inject a dynamic argument into the actual procedure call as the first argument:
strdup("")
It is a valid memory block that can be reallocated. Its strlen returns 0 so when the loop adds the size of it to the records, it affects nothing. The null terminator will be overwritten by memcpy. It works pretty damned well I should say. However being new to C in only the past few weeks, I didn't understand that you can't 'fool proof' this stuff. People follow directions or wind up in DLL hell I suppose.
The code works great without all of these extra shenanigans do-hickies and whistles, but without a way to reciprocate a single block of memory, the procedure is lost on loop processing, because of all the dynamic pointer mgmt. involved. Therefore the first argument must always be dynamic. I read somehwere someone had suggested using a c-static variable holding the pointer in the function, but then you can't use the procedure to do other things in other functions, such as would be needed in a recursive descent parser that decided to compile strings as it went along.
If you would like to see the code just ask!
Happy Coding!
mkstr.cpp
#include <stdarg.h>
#include <stdlib.h>
#include <string.h>
struct mkstr_record {
size_t size;
void *location;
};
// use the mkstr macro (in mkstr.h) to call this procedure.
// The first argument to mkstr MUST BE dynamically allocated. i.e.: by malloc(),
// or strdup(), unless that argument is the sole argument to mkstr. Calling mkstr()
// with a single argument is functionally equivalent to calling strdup() on the same
// address.
char *mkstr_(char *source, ...) {
va_list args;
size_t length = 0, item = 0;
mkstr_record list[64]; /*
maximum of 64 input vectors. this goes beyond reason!
the result of this procedure is a string that CAN be
concatenated by THIS procedure, or further more reallocated!
We could probably count the arguments and initialize properly,
but this function shouldn't be used to concatenate more than 20
vectors per call. Unless you are just "asking for it".
In any case, develop a workaround. Thank yourself later.
*/// Argument Range Will Not Be Validated. Caller Beware!!!
va_start(args, source);
char *thisArg = source;
while (thisArg) {
// don't validate list bounds here.
// an if statement here is too costly for
// for the meager benefit it can provide.
length += list[item].size = strlen(thisArg);
list[item].location = thisArg;
thisArg = va_arg(args, char *);
item++;
}
va_end(args);
if (item == 1) return strdup(source); // single argument: fail-safe
length++; // final zero terminator index.
char *str = (char *) realloc(source, length);
if (!str) return str; // don't care. memory error. check your work.
thisArg = (str + list[0].size);
size_t count = item;
for (item = 1; item < count; item++) {
memcpy(thisArg, list[item].location, list[item].size);
thisArg += list[item].size;
}
*(thisArg) = '\0'; // terminate the string.
return str;
}
mkstr.h
#ifndef MKSTR_H_
#define MKSTR_H_
extern char *mkstr_(char *string, ...);
// This macro ensures that the final argument to "mkstr" is null.
// arguments: const char *, ...
// limitation: 63 variable arguments max.
// stipulation: caller must free returned pointer.
#define mkstr(str, args...) mkstr_(str, ##args, NULL)
#define mkstrd(str, args...) mkstr_(strdup(str), ##args, NULL)
/* calling mkstr with more than 64 arguments should produce a segmentation fault
* this is not a bug. it is intentional operation. The price of saving an in loop
* error check comes at the cost of writing code that looks good and works great.
*
* If you need a babysitter, find a new function [period]
*/
#endif /* MKSTR_H_ */
Don't for get to mention me in the credits. She's fine and dandy.

CString to char*

We are using the CString class throughout most of our code. However sometimes we need to convert to a char *. at the moment we have been doing this using variable.GetBuffer(0) and this seems to work ( this mainly happens when passing the Csting into a function where the function requires a char *). The function accepts this and we keep going.
However we have lately become worried about how this works, and whether there is a better way to do it.
The way i understand it to work is it passes a char pointer into the function that points at the first character in the CString and all works well.
I Guess we are just worried about memory leaks or any unforseen circumstances where this might not be a good idea.

If your functions only require reading the string and not modifying it, change them to accept const char * instead of char *. The CString will automatically convert for you, this is how most of the MFC functions work and it's really handy. (Actually MFC uses LPCTSTR, which is a synonym for const TCHAR * - works for both MBC and Unicode builds).
If you need to modify the string, GetBuffer(0) is very dangerous - it won't necessarily allocate enough memory for the resulting string, and you could get some buffer overrun errors.
As has been mentioned by others, you need to use ReleaseBuffer after GetBuffer. You don't need to do that for the conversion to const char *.

# the OP:
>>> I Guess we are just worried about memory leaks or any ...
Hi, calling the GetBuffer method won't lead to any memory leaks. Because the destructor is going to deallocate the buffer anyway. However, others have already warned you about the potential issues with calling this method.
#Can >>> when you call the getbuffer function it allocates memory for you.
This statement is not completely true. GetBuffer(0) does NOT allocate any memory. It merely returns a pointer to the internal string buffer that can be used to manipulate the string directly from "outside" the CString class.
However, if you pass a number, say N to it like GetBuffer(N), and if N is larger than the current length of the buffer, then the function ensures that the returned buffer is at least as large as N by allocating more memory.
Cheers,
Rajesh.
MVP, Visual ++.

when you call the getbuffer function it allocates memory for you.
when you have done with it, you need to call releasebuffer to deallocate it

try the documentation at http://msdn.microsoft.com/en-us/library/awkwbzyc.aspx for help on that.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js