I've got a plugin system in my project (running on linux), and part of this is that plugins have a "run" method such as:
void run(int argc, char* argv[]);
I'm calling my plugin and go to check my argv array (after doing a bunch of other stuff), and
the array is corrupted. I can print the values out at the top of the function, and they're correct, but not later on in the execution. Clearly something is corrupting the heap, but
I'm at a loss of how I can try to pin down exactly what's overwriting that memory. Valgrind hasn't helped me out much.
Sample code by request:
My plugin looks something like this:
void test_fileio::run(int argc, char* argv[]) {
bool all_passed = true;
// Prints out correctly.
for (int ii=0; ii < argc; ii++) {
printf("Arg[%i]: %s\n", ii, argv[ii]);
}
<bunch of tests snipped for brevity>
// Prints out inccorrectly.
for (int ii=0; ii < argc; ii++) {
printf("Arg[%i]: %s\n", ii, argv[ii]);
}
}
This is linked into a system that exposes it to python so I can call these plugins as python functions. So I take a string parameter to my python function and break that out thusly:
char** translate_arguments(string args, int& argc) {
int counter = 0;
vector<char*> str_vec;
// Copy argument string to get rid of const modifier
char arg_str[MAX_ARG_LEN];
strcpy(arg_str, args.c_str());
// Tokenize the string, splitting on spaces
char* token = strtok(arg_str, " ");
while (token) {
counter++;
str_vec.push_back(token);
token = strtok(NULL, " ");
}
// Allocate array
char** to_return = new char*[counter];
for (int ii=0; ii < counter; ii++)
to_return[ii] = str_vec[ii];
// Save arg count and return
argc = counter;
return to_return;
}
The resulting argc and argv is then passed to the plugin mentioned above.
How does translate_arguments get called? That is missing...
Does it prepare an array of pointers to chars before calling the run function in the plugin, since the run function has parameter char *argv[]?
This looks like the line that is causing trouble...judging by the code
// Allocate array
char** to_return = new char*[counter];
You are intending to allocate a pointer to pointer to chars, a double pointer, but it looks the precedence of the code is a bit mixed up?
Have you tried it this way:
char** to_return = new (char *)[counter];
Also, in your for loop as shown...you are not allocating space for the string itself contained in the vector...?
for (int ii=0; ii < counter; ii++)
to_return[ii] = str_vec[ii];
// Should it be this way...???
for (int ii=0; ii < counter; ii++)
to_return[ii] = strdup(str_vec[ii]);
At the risk of getting downvoted as the OP did not show how the translate_arguments is called and lacking further information....and misjudging if my answer is incorrect...
Hope this helps,
Best regards,
Tom.
Lookup how to use memory access breakpoints with your debugger. If you have a solid repo, this will pinpoint your problem in seconds. In windbg, it's:
ba w4 0x<address>
Where ba stands for "break on access", "w4" is "write 4 bytes" (use w8 on a 64 bit system) and "address" is obviously the address you're seeing corrupted. gdb and Visual Studio have similar capabilities.
if valgrind and code inspection dont help you could try electric fence
Related
I have always been confused and never understood how the alloc map-type of the map clause of the target (or target data) construct works.
What is my application - I would like to have a temporary array on a device, which is used only on the device, is initialized on the device, read on the device, everything on the device. The host does not touch the contents of the array at all. For the sake of simplicity, I have the following code, which copies an array to another array via a temporary array (using just a single team and thread, but that does not matter):
#include <cstdio>
int main()
{
const int count = 10;
int * src = new int[count];
int * tmp = new int[count];
int * dst = new int[count];
for(int i = 0; i < count; i++) src[i] = i;
for(int i = 0; i < count; i++) printf(" %3d", src[i]); printf("\n");
#pragma omp target map(to:src[0:count]) map(from:dst[0:count]) map(alloc:tmp[0:count])
{
for(int i = 0; i < count; i++) tmp[i] = src[i];
for(int i = 0; i < count; i++) dst[i] = tmp[i];
}
for(int i = 0; i < count; i++) printf(" %3d", dst[i]); printf("\n");
delete[] src;
delete[] tmp;
delete[] dst;
return 0;
}
This code works when using pgc++ -mp=gpu on Nvidia and on Intel gpu using icpx -fiopenmp -fopenmp-targets=spir64.
But the thing is, I don't want to allocate the tmp array on the host. If I just use int * tmp = nullptr, on nvidia the code fails (on intel it still works). If I leave the tmp uninitialized (using just int * tmp;, and removing the delete), the execution fails on Intel too. If I do not even declare the tmp variable, compilation fails (which kinda makes sense). I made sure it runs on the device (really offloads the code, doesn't fallback to cpu) using OMP_TARGET_OFFLOAD=MANDATORY.
This was weird to me, since I don't use the tmp array on the host at all. As I understand it, the tmp array is allocated on the device and then in the kernel the device array is used. Is that right? Why do I have to allocate and/or initialize the pointer on the host if I don't use it on the host?
So my question is: what are the exact requirements to use map(alloc) in OpenMP offloading? How does it work? How should I use it? I would appreciate an example and references from tutorials/documentation.
I wasn't able to find any useful information regarding this. The standard was not helpful at all, and the tutorials I attended and watched did not go into such depth.
I understand that the code should work even without OpenMP enabled (as if the pragmas were just ignored), so let's assume there is an #ifdef to actually allocate the tmp array if OpenMP is disabled.
I am also aware of manual memory management via omp_target_alloc(), omp_target_memcpy() and omp_target_free(), but I wanted to use the target map(alloc).
I am reading the standard 5.2, using pgc++ 22.2-0 and icpx 2022.0.0.20211123.
Edit/Solved: Joachim Pileborg's answer did the job for me. THX
Please be gentle as this is my first question.
I am actual lerning and playing with c++ in particular threading. I looked for an answer (and it would astonish me if there is not allready one out there, but i wasn't able to find it).
So back to topic:
My "play" code looks something like this (Console application)
void foo(){
//do something
}
int _tmain(int argc, _TCHAR* argv[])
{
std::thread t[threadcount];
for (int i = 0; i < threadcount; ++i) {
t[i] = std::thread(foo);
}
for (int i = 0; i < threadcount; ++i) {
t[i].join();
}
}
Is it possible to set the value of threadcount through argv?
If not could someone please give me a short snippet on how to implement
std::thread::hardware_concurrency()
as the threadcount, because also there Visualstudio gives me an error when setting
const int threadcount = std::thread::hardware_concurrency();
Thanks in advance.
As the number of threas is to be controlled by threadcount, setting it from the command line can be implemented by adding
int threadcount = atoi(argv[1]);
to the implementation. Some error checking could be done, e.g. reporting an error on a non-positive number of threads.
If the number of threads is to be determined programmatically, depending on the specific platform, this question could be interesting.
I'm making ASCII game and I need performance, so decided to go with printf(). But there is a problem, I designed my char array as multidimensional char ** array, and printing it outputs garbage of memory instead of data. I know it's possible to print it with a for loop but the performance rapidly drops that way. I need to printf it like a static array[][]. Is there a way?
I did some example of working and notWorking array. I only need printf() to work with nonWorking array.
edit: using Visual Studio 2015 on Win 10, and yeah, I tested performance and cout is much slower than printf (but I don't really know why is this happening)
#include <iostream>
#include <cstdio>
int main()
{
const int X_SIZE = 40;
const int Y_SIZE = 20;
char works[Y_SIZE][X_SIZE];
char ** notWorking;
notWorking = new char*[Y_SIZE];
for (int i = 0; i < Y_SIZE; i++) {
notWorking[i] = new char[X_SIZE];
}
for (int i = 0; i < Y_SIZE; i++) {
for (int j = 0; j < X_SIZE; j++) {
works[i][j] = '#';
notWorking[i][j] = '#';
}
works[i][X_SIZE-1] = '\n';
notWorking[i][X_SIZE - 1] = '\n';
}
works[Y_SIZE-1][X_SIZE-1] = '\0';
notWorking[Y_SIZE-1][X_SIZE-1] = '\0';
printf("%s\n\n", works);
printf("%s\n\n", notWorking);
system("PAUSE");
}
Note: I think I could make some kind of a buffer or static array for just copying and displaying data, but I wonder if that can be done without it.
If you would like to print a 2D structure with printf without a loop, you need to present it to printf as a contiguous one-dimension C string. Since your game needs access to the string as a 2D structure, you could make an array of pointers into this flat structure that would look like this:
Array of pointers partitions the buffer for use as a 2D structure, while the buffer itself can be printed by printf because it is a contiguous C string.
Here is the same structure in code:
// X_SIZE+1 is for '\n's; overall +1 is for '\0'
char buffer[Y_SIZE*(X_SIZE+1)+1];
char *array[Y_SIZE];
// Setup the buffer and the array
for (int r = 0 ; r != Y_SIZE ; r++) {
array[r] = &buffer[r*(X_SIZE+1)];
for (int c = 0 ; c != X_SIZE ; c++) {
array[r][c] = '#';
}
array[r][X_SIZE] = '\n';
}
buffer[Y_SIZE*(X_SIZE+1)] = '\0';
printf("%s\n", buffer);
Demo.
Some things you can do to increase performance:
There is absolutely no reason to have an array of pointers, each pointing at an array. This will cause heap fragmentation as your data will end up all over the heap. Allocating memory in adjacent cells have many benefits in terms of speed, for example it might improve the use of data cache.
Instead, allocate a true 2D array:
char (*array2D) [Y] = new char [X][Y];
printf as well as cout are both incredibly slow, as they come with tons of overhead and extra features which you don't need. Since they are just advanced wrappers around the system-specific console functions, you should consider using the system-specific functions directly. For example, the Windows console API. It will however turn your program non-portable.
If that's not an option, you could try to use puts instead of printf, since it has far less overhead.
Main performance issue with printf/cout is that they write to the end of the "standard output stream", meaning you can't write where you like, but always at the bottom of the screen. Forcing you to constantly redraw the whole thing every time you changed something, which will be slow and possibly cause flicker issues.
Old DOS/Turbo C programs solved this with a non-standard function called gotoxy which allowed you to move the "cursor" and print where you liked. In modern programming, you can do this with the console API functions. Example for Windows.
You could/should separate graphics from the rest of the program. If you have one thread handing graphics only and the main thread handling algorithms, the graphic updates will work smoother, without having to wait for whatever else the program is doing. It makes the program far more advanced though, as you have to consider thread safety issues.
I created a C++ DLL function that uses several arrays to process what is eventually image data. I'm attempting to pass these arrays by reference, do the computation, and pass the output back by reference in a pre-allocated array. Within the function I use the Intel Performance Primitives including ippsMalloc and ippsFree:
Process.dll
int __stdcall ProcessImage(const float *Ref, const float *Source, float *Dest, const float *x, const float *xi, const int row, const int col, const int DFTlen, const int IMGlen)
{
int k, l;
IppStatus status;
IppsDFTSpec_R_32f *spec;
Ipp32f *y = ippsMalloc_32f(row),
*yi = ippsMalloc_32f(DFTlen),
*X = ippsMalloc_32f(DFTlen),
*R = ippsMalloc_32f(DFTlen);
for (int i = 0; i < col; i++)
{
for (int j = 0; j < row; j++)
y[j] = Source[j + (row * i)];
status = ippsSub_32f_I(Ref, y, row);
// Some interpolation calculations calculations here
status = ippsDFTInitAlloc_R_32f(&spec, DFTlen, IPP_FFT_DIV_INV_BY_N, ippAlgHintNone);
status = ippsDFTFwd_RToCCS_32f(yi, X, spec, NULL);
status = ippsMagnitude_32fc( (Ipp32fc*)X, R, DFTlen);
for (int m = 0; m < IMGlen; m++)
Dest[m + (IMGlen * i)] = 10 * log10(R[m]);
}
_CrtDumpMemoryLeaks();
ippsDFTFree_R_32f(spec);
ippsFree(y);
ippsFree(yi);
ippsFree(X);
ippsFree(R);
return(status);
}
The function call looks like this:
for (int i = 0; i < Frames; i++)
ProcessFrame(&ref[i * FrameSize], &source[i * FrameSize], &dest[i * FrameSize], mX, mXi, NumPixels, Alines, DFTLength, IMGLength);
The function does not fail and produces the desired output for up to 6 images, more than that and it dies with:
First-chance exception at 0x022930e0 in DLL_test.exe: 0xC0000005: Access violation reading location 0x1cdda000.
I've attempted to debug the program, unfortunately VS reports that the call stack location is in an IPP DLL with "No Source Available". It consistently fails when calling ippMagnitude32fc( (Ipp32fc*)X, R, DFTlen)
Which leads me to my questions: Is this a memory leak? If so, can anybody see where the leak is located? If not, can somebody suggest how to go about debugging this problem?
To answer your first question, no that's not a memory leak, that's a memory corruption.
A memory leak is when you don't free the memory used, and so , the memory usage is growing up. That doesn't make the program to not work, but only end up using too much memory, which results in the computer being really slow (swaping) and ultimately any program crashing with a 'Not enough memory error'.
What you have is basic pointer error, as it happend all the time in C++.
Explain how to debug is hard, I suggest you add a breakpoint just before in crash, and try to see what's wrong.
i am relatively experienced in Java coding but am new to C++.
I have written the following C++ code as solution to the USACO training problem which I have reproduced at this url
This code looks fine to me.
However it crashes on the sample test case given.
On isolating the error, I found that if the second for loop is not run for the last iteration (I mean like in the sample test case, n = 5, so I run the loop only till i = 3 instead of i = 4), then it doesn't crash (and produces the expected output).
Maybe the error is somewhere else, I can't detect it.
Any ideas are welcome.
Thanks in advance.
Please excuse me for the slightly unwieldy formatting of the code (this is my first forum post). The files included are stdlib.h, stdio.h and hash_map.h
`
#include <stdlib.h>
#include <stdio.h>
#include <hash_map.h>
struct eqstr
{
bool operator()(const char* s1, const char* s2) const
{
return strcmp(s1, s2) == 0;
}
};
int main(int argc, char** argv) {
FILE *fin = fopen("gift1.in", "r");
FILE *fout = fopen("gift1.out", "w");
hash_map<const char*, int, hash<const char*>, eqstr> table;
int n;
fscanf(fin,"%d",&n);
char name[15];
char people[10][15];
for(int i = 0; i < n; i++){
fscanf(fin,"%s",name);
strcpy(people[i],name);
table[people[i]] = 0;
}//ifor
for(int i = 0; i < n; i++){
fscanf(fin,"%s",name);
int money;
fscanf(fin,"%d",&money);
int friends;
fscanf(fin,"%d",&friends);
char fname[15];
int amt = money/friends;
for(int j = 0; j < friends; j++){
fscanf(fin,"%s",fname);
table[fname] = table[fname] + amt;
}//jfor
table[name] = table[name] - friends*amt;
}//ifor
for(int i = 0; i < n; i++)
fprintf(fout,"%s %d\n",people[i],table[people[i]]);
return (EXIT_SUCCESS);
}
`
The reason it is crashing is that vick is giving 0 friends money which causes a divide by zero exception from the following line of code: int amt = money/friends;
You should put in some special logic to handle the case when the person has 0 friends so gives $0 away.
As was stated in the other comments, you should use some stl classes (string,iostream, etc) to help clean up the code.
Edit: Added the input data so the question and answer would make a little more sense
5
dave
laura
owen
vick
amr
dave
200 3
laura
owen
vick
owen
500 1
dave
amr
150 2
vick
owen
laura
0 2
amr
vick
vick
0 0
I would suggest using GDB to find these errors. These occured to me also.
Compile with -g flag, then use gdb a.out(executable). Now type run to run the program. Once the program crashes, you can use backtrace to identify the exact line where its crashing and the variable values at that point.