_alloca and std::vector of const char* - c++

I did several searches on the _alloca function and it is generally not recommended.
I will have a dynamically updated array of strings on which I will have to iterate often.
I would like to have each string allocated on the stack to reduce the miss caches as much as possible.
Using _alloca I could create a char* on the stack and put it in the vector.
My char will be in a vector which will not impact my stack and I know that the strings will never be big enough to make a stackoverflow when I allocate them on the stack.
Is it a bad thing to use it in this case?
Does the program do what I want it to do?
#include <vector>
#include <cstring>
#ifdef __GNUC__
# define _alloca(size) __builtin_alloca (size)
#endif /* GCC. */
std::vector<const char*> vec;
void add(const char* message, int size) {
char* c = (char*) _alloca (size * sizeof(char*));
std::memcpy(c, message, sizeof(message));
vec.push_back(c);
}
int main() {
const char* c = "OK";
for (int i = 0; i < 10; ++i) {
add(c, 2);
}
}

Stack memory allocated by _alloca gets automatically released when the function that calls _alloca returns. The fact that a pointer to it gets shoved someplace (specifically some vector) does not prevent this memory from being released (as you mentioned you assumed would happen, in the comments), that's not how C++ works. There is no "garbage collection" in C++.
Therefore, the shown code ends up storing a pointer to released stack memory. Once add() returns, this leaves a dangling pointer behind, and any subsequent use of it results in undefined behavior.
In other words: yes, it's "a bad thing to use it in this case", because be the shown code will not work.
There are also several other, major, issues with this specific _alloca call, but that's going to be besides the point. This entire approach fails, even if these issues get fixed correctly.

Related

practical explanation of c++ functions with pointers

I am relatively new to C++...
I am learning and coding but I am finding the idea of pointers to be somewhat fuzzy. As I understand it * points to a value and & points to an address...great but why? Which is byval and which is byref and again why?
And while I feel like I am learning and understanding the idea of stack vs heap, runtime vs design time etc, I don't feel like I'm fully understanding what is going on. I don't like using coding techniques that I don't fully understand.
Could anyone please elaborate on exactly what and why the pointers in this fairly "simple" function below are used, esp the pointer to the function itself.. [got it]
Just asking how to clean up (delete[]) the str... or if it just goes out of scope.. Thanks.
char *char_out(AnsiString ansi_in)
{
// allocate memory for char array
char *str = new char[ansi_in.Length() + 1];
// copy contents of string into char array
strcpy(str, ansi_in.c_str());
return str;
}
Revision 3
TL;DR:
AnsiString appears to be an object which is passed by value to that function.
char* str is on the stack.
A new array is created on the heap with (ansi_in.Length() + 1) elements. A pointer to the array is stored in str. +1 is used because strings in C/C++ typically use a null terminator, which is a special character used to identify the end of the string when scanning through it.
ansi_in.cstr() is called, copying a pointer to its string buffer into an unnamed local variable on the stack.
str and the temporary pointer are pushed onto the stack and strcpy is called. This has the effect of copying the string(including the null-terminator) pointed at from the temporary to str.
str is returned to the caller
Long answer:
You appear to be struggling to understand stack vs heap, and pointers vs non-pointers. I'll break them down for you and then answer your question.
The stack is a concept where a fixed region of memory is allocated for each thread before it starts and before any user code runs.
Ignoring lower level details such as calling conventions and compiler optimizations, you can reason that the following happens when you call a function:
Arguments are pushed onto the stack. This reserves part of the stack for use of the arguments.
The function performs some job, using and copying the arguments as needed.
The function pops the arguments off the stack and returns. This frees the space reserved for the arguments.
This isn't limited to function calls. When you declare objects and primitives in a function's body, space for them is reserved via pushing. When they're out of scope, they're automatically cleaned up by calling destructors and popping.
When your program runs out of stack space and starts using the space outside of it, you'll typically encounter an error. Regardless of what the actual error is, it's known as a stack overflow because you're going past it and therefore "overflowing".
The heap is a different concept where the remaining unused memory of the system is available for you to manually allocate and deallocate from. This is primarily used when you have a large data set that's too big for the stack, or when you need data to persist across arbitrary functions.
C++ is a difficult beast to master, but if you can wrap your head around the core concepts is becomes easier to understand.
Suppose we wanted to model a human:
struct Human
{
const char* Name;
int Age;
};
int main(int argc, char** argv)
{
Human human;
human.Name = "Edward";
human.Age = 30;
return 0;
}
This allocates at least sizeof(Human) bytes on the stack for storing the 'human' object. Right before main() returns, the space for 'human' is freed.
Now, suppose we wanted an array of 10 humans:
int main(int argc, char** argv)
{
Human humans[10];
humans[0].Name = "Edward";
humans[0].Age = 30;
// ...
return 0;
}
This allocates at least (sizeof(Human) * 10) bytes on the stack for storing the 'humans' array. This too is automatically cleaned up.
Note uses of ".". When using anything that's not a pointer, you access their contents using a period. This is direct memory access if you're not using a reference.
Here's the single object version using the heap:
int main(int argc, char** argv)
{
Human* human = new Human();
human->Name = "Edward";
human->Age = 30;
delete human;
return 0;
}
This allocates sizeof(Human*) bytes on the stack for the pointer 'human', and at least sizeof(Human) bytes on the heap for storing the object it points to. 'human' is not automatically cleaned up, you must call delete to free it. Note uses of "a->b". When using pointers, you access their contents using the "->" operator. This is indirect memory access, because you're accessing memory through an variable address.
It's sort of like mail. When someone wants to mail you something they write an address on an envelope and submit it through the mail system. A mailman takes the mail and moves it to your mailbox. For comparison the pointer is the address written on the envelope, the memory management unit(mmu) is the mail system, the electrical signals being passed down the wire are the mailman, and the memory location the address refers to is the mailbox.
Here's the array version using the heap:
int main(int argc, char** argv)
{
Human* humans = new Human[10];
humans[0].Name = "Edward";
humans[0].Age = 30;
// ...
delete[] humans;
return 0;
}
This allocates sizeof(Human*) bytes on the stack for pointer 'humans', and (sizeof(Human) * 10) bytes on the heap for storing the array it points to. 'humans' is also not automatically cleaned up; you must call delete[] to free it.
Note uses of "a[i].b" rather than "a[i]->b". The "[]" operator(indexer) is really just syntactic sugar for "*(a + i)", which really just means treat it as a normal variable in a sequence so I can type less.
In both of the above heap examples, if you didn't write delete/delete[], the memory that the pointers point to would leak(also known as dangle). This is bad because if left unchecked it could eat through all your available memory, eventually crashing when there isn't enough or the OS decides other apps are more important than yours.
Using the stack is usually the wiser choice as you get automatic lifetime management via scope(aka RAII) and better data locality. The only "drawback" to this approach is that because of scoped lifetime you can't directly access your stack variables once the scope has exited. In other words you can only use stack variables within the scope they're declared. Despite this, C++ allows you to copy pointers and references to stack variables, and indirectly use them outside the scope they're declared in. Do note however that this is almost always a very bad idea, don't do it unless you really know what you're doing, I can't stress this enough.
Passing an argument by-ref means pushing a copy of a pointer or reference to the data on the stack. As far as the computer is concerned pointers and references are the same thing. This is a very lightweight concept, but you typically need to check for null in functions receiving pointers.
Pointer variant of an integer adding function:
int add(const int* firstIntPtr, const int* secondIntPtr)
{
if (firstIntPtr == nullptr) {
throw std::invalid_argument("firstIntPtr cannot be null.");
}
if (secondIntPtr == nullptr) {
throw std::invalid_argument("secondIntPtr cannot be null.");
}
return *firstIntPtr + *secondIntPtr;
}
Note the null checks. If it didn't verify its arguments are valid, they very well may be null or point to memory the app doesn't have access to. Attempting to read such values via dereferencing(*firstIntPtr/*secondIntPtr) is undefined behavior and if you're lucky results in a segmentation fault(aka access violation on windows), crashing the program. When this happens and your program doesn't crash, there are deeper issues with your code that are out of the scope of this answer.
Reference variant of an integer adding function:
int add(const int& firstInt, const int& secondInt)
{
return firstInt + secondInt;
}
Note the lack of null checks. By design C++ limits how you can acquire references, so you're not suppose to be able to pass a null reference, and therefore no null checks are required. That said, it's still possible to get a null reference through converting a pointer to a reference, but if you're doing that and not checking for null before converting you have a bug in your code.
Passing an argument by-val means pushing a copy of it on the stack. You almost always want to pass small data structures by value. You don't have to check for null when passing values because you're passing the actual data itself and not a pointer to it.
i.e.
int add(int firstInt, int secondInt)
{
return firstInt + secondInt;
}
No null checks are required because values, not pointers are used. Values can't be null.
Assuming you're interested in learning about all this, I highly suggest you use std::string(also see this) for all your string needs and std::unique_ptr(also see this) for managing pointers.
i.e.
std::string char_out(AnsiString ansi_in)
{
return std::string(ansi_in.c_str());
}
std::unique_ptr<char[]> char_out(AnsiString ansi_in)
{
std::unique_ptr<char[]> str(new char[ansi_in.Length() + 1]);
strcpy(str.get(), ansi_in.c_str());
return str; // std::move(str) if you're using an older C++11 compiler.
}

strcpy works fine, even though memory is not allocated

Below c++ program works fine, even though i have not allocated any memory to chr.
I went through google, SO and came across this
Why does this intentionally incorrect use of strcpy not fail horribly?
Here the program works fine for destination which has space less than the source.
Does that apply in my case also,strcpy writes into a random location in the heap?
#include<iostream>
using namespace std;
class Mystring
{
char *chr;
int a;
public:
Mystring(){}
Mystring(char *str,int i);
void Display();
};
Mystring::Mystring(char *str, int i)
{
strcpy(chr,str);
a = i;
}
void Mystring::Display()
{
cout<<chr<<endl;
cout<<a<<endl;
}
int main()
{
Mystring a("Hello world",10);
a.Display();
return 0;
}
output:-
Hello world
10
I tried the same thing with another c++ program, with any class and class member, and i was able to see the crash.
#include<iostream>
using namespace std;
int main()
{
char *src = "Hello world";
char *dst;
strcpy(dst,src);
cout<<dst<<endl;
return 0;
}
Please help me understand the behavior in the first c++ program.Is memory allocated somehow or strcpy is writing to some random memory location.
The behavior of your program is undefined by the C++ standard.
This means that any C++ implementation (e.g. a compiler) can do whatever it wants. Printing "hello!" on stdout would be a possible outcome. Formatting the hard disk would still be acceptable, as far as the C++ standard is involved. Practically, though, some random memory on the heap will be overwritten, with unpredictable consequences.
Note that the program is even allowed to behave as expected. For now. Unless it's Friday. Or until you modify something completely unrelated. This is what happened in your case, and is one of the worst thing to happen from the point of view of a programmer.
Later on, you may add some code and see your program crash horribly, and think the problem lies in the code you just added. This can cause you to spend a long time debugging the wrong part of the code. If that ever happens, welcome to hell.
Because of this, you should never, ever, rely on undefined behavior.
strcpy() is indeed writing to a random location.
And it's a blind luck if your program runs fine and nothing crashes.
You have created on object of MyString class on a stack. In that object there's a member pointer chr, which points to some arbitrary location. Does your constructor take care to initialize that pointer or allocate a memory for the pointer to point at? -- No, it doesn't. So chr points somewhere.
strcpy() in its turn doesn't care about any pointer validity, it trusts your professionalism to provide valid input. So it does its job copying stings. Luckily, overwriting memory at the location pointed by an uninitialized chr doesn't crash you program, but that's "luckily" only.
It is known that strcpy() can cause overflow errors, because there is no check made wherever the data will fit in the new array or not.
The outcome of this overflow may sometime never been noticed, it all depends on where the data is written. However a common outcome is heap and/or program corruption.
A safe alternative of strcpy() is the usage of strcpy_s() that requires also the size of the array. You can read more about the usage of strcpy_s() on MSDN or here
Actually, strcpy() does something like this:
char *strcpy(char *dest, const char *src)
{
unsigned i;
for (i=0; src[i] != '\0'; ++i)
dest[i] = src[i];
dest[i] = '\0';
return dest;
}
So, when you pass a pointer to some character array to strcpy, it copies data from src to dest until it reaches NULL terminated character.
Character pointer does not contain any information about length of string, so when you pass dest pointer, it copies data even though you haven't assigned memory to it.
Run this example code, you will get my point:
#include <cstring>
#include <iostream>
using namespace std;
int main()
{
char str1[] = "Hello_World!";
char str2[5];
char str3[10];
strcpy(str2,str1);
cout << "string 1:" << str1 << endl;
cout << "string 2:" << str2 << endl;
cout << "string 3:" << str3 << endl;
return 0;
}
This will not show any error but you can understand by my example that this is not a good practice.

Receive SIGSEGV signal when using malloc for char * in Linux Qt Creator C++

When I run the following code to allocate memory for char * using malloc() on QT Linux C++, SIGSEGV is signaled after about 250 executions.
for(int i = 0; i < 10000; i++)
{
char * test = (char * )malloc(500);
test = "mas";
cout<<test<<endl;
}
I tried to use free() or delete() but they also trigger system signal.
What is the problem?
The main problem is that you're stepping on your malloc()'ed pointer. Stepping on it multiple times:
// Original test case
for(int i = 0; i < 10000; i++) {
char * test = (char * )malloc(500);
test = "mas"; // BAD!!!
// You've just stepped on your "malloc'ed" pointer with a *different pointer!!!!
cout<<test<<endl;
}
Here is an alternative. Note that you've also got to have some way to "remember" the pointer for each malloc(), so you can "free()" it at the appropriate time:
// Better:
for(int i = 0; i < 10000; i++) {
char * test = (char * )malloc(500);
if (test == NULL) {
cout << "malloc error!" << endl;
break;
}
strcpy (test, "mas");
cout<<test<<endl;
free (test);
}
You've changed the value of the pointer test. You need to use memcpy or the like to copy the value "mas" into the buffer.
Better yet, use std::string
This code:
test = "mas";
Doesn't do what you think it does. You think it copies the string "mas" in to the buffer pointed to by test. What it actually does is it changes what test points to. Instead of pointing to the memory you allocated with the malloc, now it points to a statically allocated char buffer.
When you're allocating memory with malloc or new you get only a pointer in return, so you have to approach to that differently - with functions like strcpy, strcmp (if you need to check equality of texts you can't compare only pointers either). Read more about pointers in any C++ book. It's possible to create texts without allocating memory explicitly, for example, this is legal:
const char *text = "my text";
But further modifications like you showed in question are not (notice const here, it gives us more safety). In this case text is created somewhere in memory by compiler itself (i'm simplifying things now) and only pointer is assigned. In that case you cannot modify that text (at least not safe).
Next thing, when you are allocating any memory you have to free it manually, at least in languages like C/C++ where memory management (garbage collectors etc.) don't really exist. If you'll skip that step you'll run into trouble (out of memory exceptions for example) faster than you think.
this statement:
test = "mas";
does not copy strings, but pointers.
To copy a string in C, use carefully strcpy(3) (often strncpyis better).
strcpy(test, "mas");
But you really should use (as Vinbot answered) std::string since you code in C++
Also, compile C++ code on Linux with g++ -g Wall (and C code with gcc -g -Wall) to get warnings and debug info. Learn how to use the gdb debugger and also valgrind
See also this answer to a related question.

Malloc replacement by user

I want to ask you for help with my own project.
I want to try to replace original malloc, free functions by my own functions which will have same behavior.
int memory_free(void *ptr){}
void memory_init(void *ptr, unsigned int size){}
void *memory_alloc(unsigned int size){}
The memory_init function will create a memory to work with. At the beggining there will be pointer for example *Memory and it will be the argument for memory_init. Memory init will be called only once at the beginning of the program.
#include <string.h>
int main()
{
char region[50];
memory_init(region, 50);
char* pointer = (char*) memory_alloc(10);
if (pointer)
memset(pointer, 0, 10);
if (pointer)
memory_free(pointer);
return 0;
}
this code is the example of testing my functions. Memory_init will initialize memory and memory_alloc will create blocks in this memory for every call.
If there is someone who have idea how to make it i will be glad to see your answer.
Sorry for my english.
thx.
for memory_init i have this
*Memory;
void memory_init(void *ptr, unsigned int size){
*((unsigned int*)ptr)=size; //at first position there will be size of whole memory;
}
My idea is to make one block of memory and in it there will be small block. every block will have head. at first position there will be size of block and after it there will be tag if it is free or not and after it there will be end of the block.
Section 8.7 of the classic Kernighan&Ritchie "The C Programming Language" describes exactly what you are asking in simple terms. You can find copies of it on the web in various places. You can also find more sophisticated malloc implementations here or here
malloc() uses the sbrk system call to request more memory. Afaik it is the only way to do it.

Is delete p where p is a pointer to array always a memory leak?

following a discussion in a software meeting I've set out to find out if deleting a dynamically allocated, primitives array with plain delete will cause a memory leak.
I have written this tiny program and compiled it with visual studio 2008 running on windows XP:
#include "stdafx.h"
#include "Windows.h"
const unsigned long BLOCK_SIZE = 1024*100000;
int _tmain()
{
for (unsigned int i =0; i < 1024*1000; i++)
{
int* p = new int[1024*100000];
for (int j =0;j<BLOCK_SIZE;j++) p[j]= j % 2;
Sleep(1000);
delete p;
}
}
I than monitored the memory consumption of my application using task manager, surprisingly the memory was allocated and freed correctly, allocated memory did not steadily increase as was expected
I've modified my test program to allocate a non primitive type array :
#include "stdafx.h"
#include "Windows.h"
struct aStruct
{
aStruct() : i(1), j(0) {}
int i;
char j;
} NonePrimitive;
const unsigned long BLOCK_SIZE = 1024*100000;
int _tmain()
{
for (unsigned int i =0; i < 1024*100000; i++)
{
aStruct* p = new aStruct[1024*100000];
Sleep(1000);
delete p;
}
}
after running for for 10 minutes there was no meaningful increase in memory
I compiled the project with warning level 4 and got no warnings.
is it possible that the visual studio run time keep track of the allocated objects types so there is no different between delete and delete[] in that environment ?
delete p, where p is an array is called undefined behaviour.
Specifically, when you allocate an array of raw data types (ints), the compiler doesnt have a lot of work to do, so it turns it into a simple malloc(), so delete p will probably work.
delete p is going to fail, typically, when:
p was a complex data type - delete p; won't know to call individual destructors.
a "user" overloads operator new[] and delete[] to use a different heap to the regular heap.
the debug runtime overloads operator new[] and delete[] to add extra tracking information for the array.
the compiler decides it needs to store extra RTTI information along with the object, which delete p; won't understand, but delete []p; will.
No, it's undefined behavior. Don't do it - use delete[].
In VC++ 7 to 9 it happens to work when the type in question has trivial destructor, but it might stop working on newer versions - usual stuff with undefined behavior. Don't do it anyway.
It's called undefined behaviour; it might work, but you don't know why, so you shouldn't stick with it.
I don't think Visual Studio keeps track of how you allocated the objects, as arrays or plain objects, and magically adds [] to your delete. It probably compiles delete p; to the same code as if you allocated with p = new int, and, as I said, for some reason it works. But you don't know why.
One answer is that yes, it can cause memory leaks, because it doesn't call the destructor for every item in the array. That means that any additional memory owned by items in the array will leak.
The more standards-compliant answer is that it's undefined behaviour. The compiler, for example, has every right to use different memory pools for arrays than for non-array items. Doing the new one way but the delete the other could cause heap corruption.
Your compiler may make guarantees that the standard doesn't, but the first issue remains. For POD items that don't own additional memory (or resources like file handles) you might be OK.
Even if it's safe for your compiler and data items, don't do it anyway - it's also misleading to anyone trying to read your code.
no, you should use delete[] when dealing with arrays
Just using delete won't call the destructors of the objects in the array. While it will possibly work as intended it is undefined as there are some differences in exactly how they work. So you shouldn't use it, even for built in types.
The reason seems not to leak memory is because delete is typically based on free, which already knows how much memory it needs to free. However, the c++ part is unlikely to be cleaned up correctly. I bet that only the destructor of the first object is called.
Using delete with [] tells the compiler to call the destructor on every item of the array.
Not using delete [] can cause memory leaks if used on an array of objects that use dynamic memory allocations like follows:
class AClass
{
public:
AClass()
{
aString = new char[100];
}
~AClass()
{
delete [] aString;
}
private:
const char *aString;
};
int main()
{
AClass * p = new AClass[1000];
delete p; // wrong
return 0;
}