Intel PIN misses RTN_InsertCall at certain positions (instrumenting malloc function) - c++

I was testing the malloctrace example shown on the official Intel PIN website: https://software.intel.com/sites/landingpage/pintool/docs/98484/Pin/html/index.html#FunctionArguments
I have a very simple testing binary file with the source code of:
int main()
{
char *buf = malloc(sizeof(char)*50);
//printf("Buf addr: %p\n", buf); // sanity check
char *bar_buf = malloc(sizeof(char)*50);
//printf("Bar_buf addr: %p\n", bar_buf); // sanity check
free(buf);
free(bar_buf);
}
Upon executing the above malloctrace example with this simple binary file, the output looks like this:
...
malloc(0x32) <- first malloc
malloc(0x32) <- second malloc
returns 0x55cb2b4222e0
free(0x55cb2b4222a0) <- free of first malloc
free(0x55cb2b4222e0) <- free of second malloc
Interestingly enough, for whatever reason, the first malloc does not return the address it obtains from the malloc call, similar to what is shown for the second malloc.
What I am expecting is the return value for the first malloc is 0x55cb2b4222a0 as that is the input argument to the free(buf) for the test source code.
I have done some additional instrumentation and testing (such as checking all read/write to the memory location) to conclude that this is simply that Intel PIN cannot find the proper instrumentation point to insert the profiling code after the first malloc, but I was nevertheless wondering if anyone else had to face this issue, and potentially know the solution to solving this "missing instrumentation" problem.

Related

Program creating file path using strdup and strcat crashes when fed more than 39 characters

I am trying to concatenate two char arrays using the function strcat(). However the program crashes.
#include <cstdio>
#include <cstring>
int main() {
const char *file_path = "D:/MyFolder/YetAnotherFolder/test.txt";
const char *file_bk_path = strcat(strdup(file_path), ".bk");
printf("%s\n", file_bk_path);
return 0;
}
The strangest thing to me is that the program indeed produces an output before crashing:
D:/MyFolder/YetAnotherFolder/test.txt.bk
What is the reason for this problem and how it can be fixed?
Error state is reproduced in Windows (MinGW 7.2.0).
strdup is creating new memory for you to hold a duplicate of the string. The memory is only as long as strlen(file_path) + 1. You then try to add an extra 2 characters into memory that you don't own. You will go out of range of the memory created and create some undefined behaviour. It might print because setting the memory and printing the first part could be happening correctly, but it is undefined and anything can happen. Also note, in strdup you need to call free on the memory it creates for you, or you are going to leak some memory.
Here is a much simpler way to do this, use a std::string:
const char *file_path = "D:/MyFolder/YetAnotherFolder/test.txt";
std::string file_bk_path = std::string(file_path) + ".bk";
std::cout << file_bk_path << "\n";
Here is a live example.
If it absolutely needs to be in C-style then you are better off controlling the memory yourself:
const char *file_path = "D:/MyFolder/YetAnotherFolder/test.txt";
const char *bk_string = ".bk";
char *file_bk_path = malloc((strlen(file_path) + strlen(bk_string) + 1)*sizeof(char));
if (!file_bk_path) { exit(1); }
strcpy(file_bk_path, file_path);
strcat(file_bk_path, bk_string);
printf("%s\n", file_bk_path);
free(file_bk_path);
Here is a live example.
As mentioned in the comments and answers, strdup mallocs the length of your path string, plus an extra cell for the string end character '\0'. When you concatenate to this two characters writing after the allocated area.
Following #Ben's comments, I'd like to elucidate some more:
To be clear strcat adds a delimiter, but this is already after the memory you were allocated.
In general unless you specifically hit no-no addresses, the program will probably run fine - in fact this is a common hard to find bug. If for example you allocate some more memory right after that address, you will be deleting said delimiter (so printing the string will read further into the memory.
So in general, you may be OK crash wise. The crash (probably) occurs when the program ends, and the OS cleans up the memory you forgot to free yourself - That extra cell is a memory leak, and will cause the crash. So you do get a full print, and only after a crash.
Of course all of this is undefined behavior, so may depend on the compiler and OS.

Valgrind Memory Leak Wrong File Trace

I'm currently in the memory-leak detection stage of debugging, and am running valgrind --leak-check=full --show-leak-kinds=all on my executable. However, I'm getting some confusing output in a loss record:
==26628== 2 bytes in 1 blocks are indirectly lost in loss record 2 of 343
==26628== at 0x4C2B0E0: operator new(unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==26628== by 0x436EF6: void std::vector<unsigned char, std::allocator<unsigned char> >::_M_range_insert<char*>(__gnu_cxx::__normal_iterator<unsigned char*, std::vector<unsig$
==26628== by 0x4368EF: giga::Page::aux_load() (stl_vector.h:1291)
==26628== by 0x43499D: cachepp::SerialCache<cachepp::SimpleSerialCacheData, giga::Page>::allocate(std::shared_ptr<giga::Page> const&) (lineinterface.template:34)
...
The problem is in the trace for giga::Page::aux_load -- this function is defined by me, and most definitely not in the standard lib. I'm not sure why the line was not reported on this output line from the appropriate calling file.
I wasn't sure if code was necessary for this, but in case I do --
void giga::Page::aux_load() {
this->data.clear();
// deallocate the data vector
std::vector<uint8_t>().swap(this->data);
if(!this->get_is_dirty()) {
// load data into the page
FILE *fp = fopen(this->get_filename().c_str(), "r");
if(fseek(fp, this->file_offset, SEEK_SET) == -1) {
throw(exceptionpp::RuntimeError("giga::Page::aux_load", "invalid result returned from fseek"));
}
char *data = (char *) calloc(this->get_size(), sizeof(char));
if(fread((void *) data, sizeof(char), this->get_size(), fp) < this->get_size()) {
throw(exceptionpp::RuntimeError("giga::Page::aux_load", "invalid result returned from fread"));
}
fclose(fp);
this->data.insert(this->data.end(), data, data + this->get_size());
free((void *) data);
}
}
Any help would be greatly appreciated!
This is likely to be caused by valgrind stacktrace not showing inlined function call
defined at stl_vector.h:1291 and called somewhere inside aux_load.
The next version of Valgrind has support for showing inlined function,
using the option --read-inline-info=yes.
So, get and compile the latest valgrind svn version
See http://www.valgrind.org/downloads/repository.html
for instruction about how to get, configure and compile the valgrind svn version
This may not be the reason for the leaks in your report, but your code definitely will leak memory, without a doubt. It will leak if you throw an exception:
throw(exceptionpp::RuntimeError("giga::Page::aux_load", "invalid result returned from fread"));
The reason why it is a leak is that you previously called calloc, and the line where you free the data is never reached. You also have a dangling open file pointer if any exception is thrown by your function.
The answer to correct this is short and sweet:
Use std::ifstream, not FILE*. This will guarantee that your file is closed on function return, regardless of the reason for the return.
Use std::vector<char> dataV(get_size()), not calloc. This will guarantee that the memory is deallocated on function return, regardless of the reason for the return (in this case, use &dataV[0] or if using C++11 dataV.data() to get a pointer to the char buffer).
The topic you should read about is RAII: What is meant by Resource Acquisition is Initialization (RAII)?
Rewrite your code using these constructs above, and you will relieve yourself of the leaks in the function you posted.
The other changes you should use:
Check for get_size() == 0. If it's 0, then you can return immediately without having to open files.
There is no need for you to use this-> everywhere. It clutters the code and makes it harder to debug if there is an issue.

malloc memory for char array causes access violation

This is probably a noob question. I am using Marmalade SDK. If I allocate some memory for a char array dynamically I get an access violation. I cannot figure out what I am doing wrong.
Is this a valid way to use malloc and free?
const char* CClass::caption(int something)
{
...
//char stringPrediction[2000]; // <--- This works
size_t string01_Size = strlen(string01);
size_t string02_Size = strlen(string02);
...
stringX = (char *)malloc(string01_Size + string02_Size + 1); // <- Access violation
strcpy(stringX , string01);
strcat(stringX , " ");
strcat(stringX , string02);
return stringX ;
}
Destructor:
CClass::~CClass(void)
{
free(stringX);
}
Then I use it to set the caption of a label on a Button click event
... OnClick...(...)
{
CClass someObject;
label.setCaption(someObject.caption());
}
After a few clicks I get access violation.
Unhandled exception at 0x1007ECB8 (s3e_simulator_debug.dll) in
s3e_simulator_debug.exe: 0xC0000005: Access violation writing
location 0x0000000C.
EDIT: It seems the problems is:
stringX = (char *)malloc(string01_Size + string02_Size + 1);
I have failed to allocate space for this:
strcat(stringX , " ");
This should be better:
stringX = (char *)malloc(string01_Size + 1 + string02_Size + 1);
1.
You're not testing for the case that malloc fails. The error suggests you're trying to dereference a null pointer - the pointer returned when malloc fails.
2.
It doesn't look like you're accounting for the null-byte terminator when calculating the size of your buffer. Assuming string01_Size and string02_Size are the number of characters not including the null byte, you're overflowing your buffer.
Both actions result in undefined behaviour.
Access violation encountered during call of malloc or free in most cases indicates that the heap is corrupt. Due to, most commonly overrun or double free or free issued on dangling pointer, happened sometime earlier. May be way earlier.
From the posted code it's not possible to deduce, as it effectively starts with the crash.
To deal with it you may try valgrind, app verifier, other runtime tools helping with corruption. For that one particular issue -- for the general you should not be there in the first place, using malloc, the str* function an all the inherited crap from C.
Consistent use of collection classes, String class, etc would prevent too many cases leading to drastic problems.
You've basically spotted the issue: by not allocating for the space you're second strcat will overrun and probably crunch the heap pointers of the next block. Additional things to watch (you're probably be doing this anyway) are to free stringX in CClass::caption() before reallocating so you don't leak. Whatever you really should check that the malloc has not failed before use.
As others have suggested, it may be better to use std::string. You can always convert these to char* if required. Of course, if you do so you should consider that exceptions might be thrown and program accordingly.

Leak caused by fread

I'm profiling code of a game I wrote and I'm wondering how it is possible that the following snippet causes an heap increase of 4kb (I'm profiling with Heapshot Analysis of Xcode) every time it is executed:
u8 WorldManager::versionOfMap(FILE *file)
{
char magic[4];
u8 version;
fread(magic, 4, 1, file); <-- this is the line
fread(&version,1,1,file);
fseek(file, 0, SEEK_SET);
return version;
}
According to the profiler the highlighted line allocates 4.00Kb of memory with a malloc every time the function is called, memory which is never released. This thing seems to happen with other calls to fread around the code, but this was the most eclatant one.
Is there anything trivial I'm missing? Is it something internal I shouldn't care about?
Just as a note: I'm profiling it on an iPhone and it's compiled as release (-O2).
If what you're describing is really happening and your code has no bugs elsewhere, it is a bug in the implementation, I think.
More likely I think, is the possibility that you don't close the file. Stdio streams use buffering by default if the device is non-interactive, and the buffer is allocated either at the time the file is opened or when I/O is performed. While only one buffer should be allocated, you can certainly leak the buffer by forgetting to close the file. But certainly, closing the file should free the buffer. Don't forget to check the value returned by fclose.
Supposing for the sake of argument that you are correctly closing the file there are a couple of other nits in your code which won't be causing this problem but I'll mention anyway.
First your fread call reads an object having one member of size 4. You actually have an object having 4 members of size 1. In other words the numeric arguments to fread are swapped. This makes a difference only in the meaning of the return value (important in the case of a partial read).
Second, while your first call to fread correctly hard-codes the size of char as 1 (in C, that is the definition of 'size'), it's probably better stylistically to use sizeof(u8) in the second call to fread.
If the idea that this really is a memory leak is a correct interpretation (and there aren't any bugs elsewhere) then you may be able to work around the problem by turning off the stdio buffering for this particular file:
bool WorldManager::versionOfMap(FILE *file, bool *is_first_file_io, u8 *version)
{
char magic[4];
bool ok = false;
if (*is_first_file_io)
{
// we ignore failure of this call
setvbuf(file, NULL, _IONBF, 0);
*is_first_file_io = false;
}
if (sizeof(magic) == fread(magic, 1, sizeof(magic), file)
&& 1 == fread(version, sizeof(*version), 1, file))
{
ok = true;
}
if (-1 == fseek(file, 0L, SEEK_SET))
{
return false;
}
else
{
return ok && 0 == memcmp(magic, EXPECTED_MAGIC, sizeof(magic));
}
}
Even if we're going with the hypothesis that this really is a bug, and the leak is real, it is well worth condensing your code to the smallest possible example that still demonstrates the problem. If doing that reveals the true bug, you win. Otherwise, you will need the minimal example to report the bug in the implementation.

Is there any way to determine how many characters will be written by sprintf?

I'm working in C++.
I want to write a potentially very long formatted string using sprintf (specifically a secure counted version like _snprintf_s, but the idea is the same). The approximate length is unknown at compile time so I'll have to use some dynamically allocated memory rather than relying on a big static buffer. Is there any way to determine how many characters will be needed for a particular sprintf call so I can always be sure I've got a big enough buffer?
My fallback is I'll just take the length of the format string, double it, and try that. If it works, great, if it doesn't I'll just double the size of the buffer and try again. Repeat until it fits. Not exactly the cleverest solution.
It looks like C99 supports passing NULL to snprintf to get the length. I suppose I could create a module to wrap that functionality if nothing else, but I'm not crazy about that idea.
Maybe an fprintf to "/dev/null"/"nul" might work instead? Any other ideas?
EDIT: Alternatively, is there any way to "chunk" the sprintf so it picks up mid-write? If that's possible it could fill the buffer, process it, then start refilling from where it left off.
The man page for snprintf says:
Return value
Upon successful return, these functions return the number of
characters printed (not including the trailing '\0' used to end
output to strings). The functions snprintf and vsnprintf do not
write more than size bytes (including the trailing '\0'). If
the output was truncated due to this limit then the return value
is the number of characters (not including the trailing '\0')
which would have been written to the final string if enough
space had been available. Thus, a return value of size or more
means that the output was truncated. (See also below under
NOTES.) If an output error is encountered, a negative value is
returned.
What this means is that you can call snprintf with a size of 0. Nothing will get written, and the return value will tell you how much space you need to allocate to your string:
int how_much_space = snprintf(NULL, 0, fmt_string, param0, param1, ...);
As others have mentioned, snprintf() will return the number of characters required in a buffer to prevent the output from being truncated. You can simply call it with a 0 buffer length parameter to get the required size then use an appropriately sized buffer.
For a slight improvement in efficiency, you can call it with a buffer that's large enough for the normal case and only do a second call to snprintf() if the output is truncated. In order to make sure the buffer(s) are properly freed in that case, I'll often use an auto_buffer<> object that handles the dynamic memory for me (and has the default buffer on the stack to avoid a heap allocation in the normal case).
If you're using a Microsoft compiler, MS has a non-standard _snprintf() that has serious limitations of not always null terminating the buffer and not indicating how big the buffer should be.
To work around Microsoft's non-support, I use a nearly public domain snprintf() from Holger Weiss.
Of course if your non-MS C or C++ compiler is missing snprintf(), the code from the above link should work just as well.
I would use a two-stage approach. Generally, a large percentage of output strings will be under a certain threshold and only a few will be larger.
Stage 1, use a reasonable sized static buffer such as 4K. Since snprintf() can restrict how many characters are written, you won't get a buffer overflow. What you will get returned from snprintf() is the number of characters it would have written if your buffer had been big enough.
If your call to snprintf() returns less than 4K, then use the buffer and exit. As stated, the vast majority of calls should just do that.
Some will not and that's when you enter stage 2. If the call to snprintf() won't fit in the 4K buffer, you at least now know how big a buffer you need.
Allocate, with malloc(), a buffer big enough to hold it then snprintf() it again to that new buffer. When you're done with the buffer, free it.
We worked on a system in the days before snprintf() and we acheived the same result by having a file handle connected to /dev/null and using fprintf() with that. /dev/null was always guaranteed to take as much data as you give it so we would actually get the size from that, then allocate a buffer if necessary.
Keep in kind that not all systems have snprintf() (for example, I understand it's _snprintf() in Microsoft C) so you may have to find the function that does the same job, or revert to the fprintf /dev/null solution.
Also be careful if the data can be changed between the size-checking snprintf() and the actual snprintf() to the buffer (i.e., wathch out for threads). If the sizes increase, you'll get buffer overflow corruption.
If you follow the rule that data, once handed to a function, belongs to that function exclusively until handed back, this won't be a problem.
For what it's worth, asprintf is a GNU extension that manages this functionality. It accepts a pointer as an output argument, along with a format string and a variable number of arguments, and writes back to the pointer the address of a properly-allocated buffer containing the result.
You can use it like so:
#define _GNU_SOURCE
#include <stdio.h>
int main(int argc, char const *argv[])
{
char *hi = "hello"; // these could be really long
char *everyone = "world";
char *message;
asprintf(&message, "%s %s", hi, everyone);
puts(message);
free(message);
return 0;
}
Hope this helps someone!
Take a look at CodeProject: CString-clone Using Standard C++. It uses solution you suggested with enlarging buffer size.
// -------------------------------------------------------------------------
// FUNCTION: FormatV
// void FormatV(PCSTR szFormat, va_list, argList);
//
// DESCRIPTION:
// This function formats the string with sprintf style format-specs.
// It makes a general guess at required buffer size and then tries
// successively larger buffers until it finds one big enough or a
// threshold (MAX_FMT_TRIES) is exceeded.
//
// PARAMETERS:
// szFormat - a PCSTR holding the format of the output
// argList - a Microsoft specific va_list for variable argument lists
//
// RETURN VALUE:
// -------------------------------------------------------------------------
void FormatV(const CT* szFormat, va_list argList)
{
#ifdef SS_ANSI
int nLen = sslen(szFormat) + STD_BUF_SIZE;
ssvsprintf(GetBuffer(nLen), nLen-1, szFormat, argList);
ReleaseBuffer();
#else
CT* pBuf = NULL;
int nChars = 1;
int nUsed = 0;
size_type nActual = 0;
int nTry = 0;
do
{
// Grow more than linearly (e.g. 512, 1536, 3072, etc)
nChars += ((nTry+1) * FMT_BLOCK_SIZE);
pBuf = reinterpret_cast<CT*>(_alloca(sizeof(CT)*nChars));
nUsed = ssnprintf(pBuf, nChars-1, szFormat, argList);
// Ensure proper NULL termination.
nActual = nUsed == -1 ? nChars-1 : SSMIN(nUsed, nChars-1);
pBuf[nActual+1]= '\0';
} while ( nUsed < 0 && nTry++ < MAX_FMT_TRIES );
// assign whatever we managed to format
this->assign(pBuf, nActual);
#endif
}
I've looked for the same functionality you're talking about, but as far as I know, something as simple as the C99 method is not available in C++, because C++ does not currently incorporate the features added in C99 (such as snprintf).
Your best bet is probably to use a stringstream object. It's a bit more cumbersome than a clearly written sprintf call, but it will work.
Since you're using C++, there's really no need to use any version of sprintf. The simplest thing to do is use a std::ostringstream.
std::ostringstream oss;
oss << a << " " << b << std::endl;
oss.str() returns a std::string with the contents of what you've written to oss. Use oss.str().c_str() to get a const char *. It's going to be a lot easier to deal with in the long run and eliminates memory leaks or buffer overruns. Generally, if you're worrying about memory issues like that in C++, you're not using the language to its full potential, and you should rethink your design.