Buffer Overflow a Vector - c++

I am attempting to create a buffer-overflow-proof program and I was considering using an stl vector for this, but I have read that this still would not perform bounds checking and could still be hacked. I am mainly concerned about overriding a return or function call or another variable in my program. Would a vector work in this case?

You can declare a vector like this
std::vector<int> v;
This is an empty vector of ints called v. It has 0 elements. Attempting to access any of its elements is undefined behavior. So yes, it's possible to create a buffer overflow in a std::vector if you treat it poorly.
v[0] = 1; // undefined because we're out of bounds
v[1] = 2; // as is this
Normally when adding elements to a std::vector, you use push_back or emplace_back, though, not direct element access
v.push_back(0); // v is now 1 element long: { 0 }
v.emplace_back(1); // v is now 2 elements long: { 0 , 1 }
Using the [] operator does give you a reference to the underlying item, so you can use it both for retrieval and assignment, as long as you're sure the index is valid. You can use the size() function to determine what the appropriate max element index is (it's size() - 1)
for (std::size_t i = 0; i < v.size(); ++i)
{
v[i] = 10;
}
// v was size 2, we've replaced both elements at 0 and 1 with value 10
When using your std::vector, use the interface provided to you. Use push_back or emplace_back or insert to add elements. pop_back or erase to remove elements. A std::vector knows its begin and end, so you can iterate through it like this:
// prints the elements in the vector
for (auto it = v.begin(); it != v.end(); ++it)
std::cout << *it << " ";
std::cout << "\n";
or a ranged-based for loop, since it provides the begin() and end() functions
// prints the elements in the vector
for (const auto& elem : v)
std::cout << elem << " ";
std::cout << "\n";
It's possible to overrun your std::vector buffer, but very unlikely if you follow the rules. A std::vector is nicer than an array, so if you don't mind your elements existing in the free store (which is most non-high-performance applications), go for it. The ease of use and flexibility is well worth the extra overhead.
Further, because the data in your std::vector does NOT exist on the stack (while a local array is placed on the stack) you're very unlikely to overwrite your stack if you do overrun your std::vector. If you're lucky, your program will crash, but don't rely on this.
In conclusion, a std::vector can have its buffer overrun, but its interface makes it very hard to do so. If you ever do overrun its buffer, you're overwriting data in the free store, not on the stack, so you won't be modifying your return address. While you could potentially have a second item in the free store be written over by a vector buffer overrun, I believe this is non-deterministic. While anyone with access to your source code with an array could easily identify what input would produce a stack overwrite, if any.
See https://en.cppreference.com/w/cpp/container/vector for more information on the std::vector interface.

Related

C++ Keeping track of start iterator while adding items

I am trying to do a double loop across a std::vector to explore all combinations of items in the vector. If the result is good, I add it to the vector for another pass. This is being used for an association rule problem but I made a smaller demonstration for this question. It seems as though when I push_back it will sometimes change the vector such that the original iterator no longer works. For example:
std::vector<int> nums{1,2,3,4,5};
auto nextStart = nums.begin();
while (nextStart != nums.end()){
auto currentStart = nextStart;
auto currentEnd = nums.end();
nextStart = currentEnd;
for (auto a = currentStart; a!= currentEnd-1; a++){
for (auto b = currentStart+1; b != currentEnd; b++){
auto sum = (*a) + (*b);
if (sum < 10) nums.push_back(sum);
}
}
}
On some iterations, currentStart points to a location that is outside the array and provides garbage data. What is causing this and what is the best way to avoid this situation? I know that modifying something you iterate over is an invitation for trouble...
nums.push_back(sum);
push_back invalidates all existing iterators to the vector if push_back ends up reallocating the vector.
That's just how the vector works. Initially some additional space gets reserved for the vector's growth. Vector's internal buffer that holds its contents has some extra room to spare, but when it is full the next call to push_back allocates a bigger buffer to the vector's contents, moves the contents of the existing buffer, then deletes the existing buffer.
The shown code creates and uses iterators for the vector, but any call to push_back will invalidate the whole lot, and the next invalidated vector dereference results in undefined behavior.
You have two basic options:
Replace the vector with some other container that does not invalidate its existing iterators, when additional values get added to the iterator
Reimplement the entire logic using vector indexes instead of iterators.

C++: how to loop through integer elements in a vector

I would like to loop through elements of a vector in C++.
I am very new at this so I don't understand the details very well.
For example:
for (elements in vector) {
if () {
check something
else {
//else add another element to the vector
vectorname.push_back(n)
}
}
Its the for (vector elements) that I am having trouble with.
You'd normally use what's called a range-based for loop for this:
for (auto element : your_vector)
if (condition(element))
// whatever
else
your_vector.push_back(something);
But note: modifying a vector in the middle of iteration is generally a poor idea. And if your basic notion is to add the element if it's not already present, you may want to look up std::set, std::map, std::unordered_set or std::unordered_map instead.
In order to do this properly (and safely), you need to understand how std::vector works.
vector capatity
You may know that a vector works much like an array with "infinite" size. Meaning, it can hold as many elements as you want, as long as you have enough memory to hold them. But how does it do that?
A vector has an internal buffer (think of it like an array allocated with new) that may be the same size as the elements you're storing, but generally it's larger. It uses the extra space in the buffer to insert any new elements that you want to insert when you use push_back().
The amount of elements the vector has is known as its size, and the amount of elements it can hold is known as its capacity. You can query those via the size() and capacity() member functions.
However, this extra space must end at some point. That's when the magic happens: When the vector notices it doesn't have enough memory to hold more elements, it allocates a new buffer, larger1 than the previous one, and copies all elements to it. The important thing to notice here is that the new buffer will have a different address. As we continue with this explanation, keep this in mind.
iterators
Now, we need to talk about iterators. I don't know how much of C++ you have studied yet, but think of an old plain array:
int my_array[5] = {1,2,3,4,5};
you can take the address of the first element by doing:
int* begin = my_array;
and you can take the address of the end of the array (more specifically, one past the last element) by doing:
int* end = begin + sizeof(my_array)/sizeof(int);
if you have these addresses, one way to iterate the array and print all elements would be:
for (int* it = begin; it < end; ++it) {
std::cout << *it;
}
An iterator works much like a pointer. If you increment it (like we do with the pointer using ++it above), it will point to the next element. If you dereference it (again, like we do with the pointer using *it above), it will return the element it is pointing to.
std::vector provides us with two member functions, begin() and end(), that return iterators analogous to our begin and end pointers above. This is what you need to keep in mind from this section: Internally, these iterators have pointers that point to the elements in the vector's internal buffer.
a simpler way to iterate
Theoretically, you can use std::vector::begin() and std::vector::end to iterate a vector like this:
std::vector<int> v{1,2,3,4,5};
for (std::vector<int>::iterator it = v.begin; it != v.end(); ++it) {
std::cout << *it;
}
Note that, apart from the ugly type of it, this is exactly the same as our pointer example. C++ introduced the keyword auto, that lets us get rid of these ugly types, when we don't really need to know them:
std::vector<int> v{1,2,3,4,5};
for (auto it = v.begin; it != v.end(); ++it) {
std::cout << *it;
}
This works exactly the same (in fact, it has the exact same type), but now we don't need to type (or read) that uglyness.
But, there's an even better way. C++ has also introduced range-based for:
std::vector<int> v{1,2,3,4,5};
for (auto it : v) {
std::cout << it;
}
the range-based for construct does several things for you:
It calls v.begin() and v.end()2 to get the upper and lower bounds of the range we're going to iterate;
Keeps an internal iterator (let's call it i), and calls ++i on every step of the loop;
Dereferences the iterator (by calling *i) and stores it in the it variable for us. This means we do not need to dereference it ourselves (note how the std::cout << it line looks different from the other examples)
putting it all together
Let's do a small exercise. We're going to iterate a vector of numbers, and, for each odd number, we are going to insert a new elements equal to 2*n.
This is the naive way that we could probably think at first:
std::vector<int> v{1,2,3,4,5};
for (int i : v) {
if (i%2==1) {
v.push_back(i*2);
}
}
Of course, this is wrong! Vector v will start with a capacity of 5. This means that, when we try using push_back for the first time, it will allocate a new buffer.
If the buffer was reallocated, its address has changed. Then, what happens to the internal pointer that the range-based for is using to iterate the vector? It no longer points to the buffer!
This it what we call a reference invalidation. Look at the reference for std::vector::push_back. At the very beginning, it says:
If the new size() is greater than capacity() then all iterators and references (including the past-the-end iterator) are invalidated. Otherwise only the past-the-end iterator is invalidated.
Once the range-based for tries to increment and dereference the now invalid pointer, bad things will happen.
There are several ways to avoid this. For instance, in this particular algorithm, I know that we can never insert more than n new elements. This means that the size of the vector can never go past 2n after the loop has ended. With this knowledge in hand, I can increase the vector's capacity beforehand:
std::vector<int> v{1,2,3,4,5};
v.reserve(v.size()*2); // Increases the capacity of the vector to at least size*2.
// The code bellow now works properly!
for (int i : v) {
if (i%2==1) {
v.push_back(i*2);
}
}
If for some reason I don't know this information for a particular algorithm, I can use a separate vector to store the new elements, and then add them to our vector at the end:
std::vector<int> v{1,2,3,4,5};
std::vector<int> doubles;
for (int i : v) {
if (i%2==1) {
doubles.push_back(i*2);
}
}
// Reserving space is not necessary because the vector will allocate
// memory if it needs to anyway, but this does makes things faster
v.reserve(v.size() + doubles.size());
// There's a standard algorithm (std::copy), that, when used in conjunction with
// std::back_inserter, does this for us, but I find that the code bellow is more
// readable.
for (int i : doubles) {
v.push_back(i);
}
Finally, there's the old plain for, using an int to iterate. The iterator cannot be invalidated because it holds an index, instead of a pointer to the internal buffer:
std::vector<int> v{1,2,3,4,5};
for (int i = 0; i < v.size(); ++i) {
if (v[i]%2==1) {
doubles.push_back(v[i]*2);
}
}
Hopefully by now, you understand the advantages and drawbacks of each method. Happy studies!
1 How much larger depends on the implementation. Generally, implementations choose to allocate a new buffer of twice the size of the current buffer.
2 This is a small lie. The whole story is a bit more complicated: It actually tries to call begin(v) and end(v). Because vector is in the std namespace, it ends up calling std::begin and std::end, which, in turn, call v.begin() and v.end(). All of this machinery is there to ensure that the range-based for works not only with standard containers, but also with anything with a proper implementation for begin and end. That includes, for instance, regular plain arrays.
Here is the quick code snippet using iterators to iterate the vector-
#include<iostream>
#include<iterator> // for iterators to include
#include<vector> // for vectors to include
using namespace std;
int main()
{
vector<int> ar = { 1, 2, 3, 4, 5 };
// Declaring iterator to a vector
vector<int>::iterator ptr;
// Displaying vector elements using begin() and end()
cout << "The vector elements are : ";
for (ptr = ar.begin(); ptr < ar.end(); ptr++)
cout << *ptr << " ";
return 0;
}
Article to read more - Iterate through a C++ Vector using a 'for' loop
.
Hope it will help.
Try this,
#include<iostream>
#include<vector>
int main()
{
std::vector<int> vec(5);
for(int i=0;i<10;i++)
{
if(i<vec.size())
vec[i]=i;
else
vec.push_back(i);
}
for(int i=0;i<vec.size();i++)
std::cout<<vec[i];
return 0;
}
Output:
0123456789
Process returned 0 (0x0) execution time : 0.328 s
Press any key to continue.

How to declare an array with a variable number of elements [C++]?

How do I [dynamically?] declare an array with an 'i' number of elements?
By using a container of the standard-library instead, most commonly std::vector<>.
Other solutions contain manual allocation and deallocation of memory and are probably not worth the effort.
You could use std::vector<T>. It works like
std::vector<int> a;
a.push_back(2); // add 2 to the array
a.push_back(4);
You could go on and on, and you won't need to worry about memory allocation issues.
You have to use pointers.
Example:
float *ptrarray = new float [10];
Basically, type * pointername = new type [i];
And don't forget to clean your memory:
delete [] ptrarray;
Or it will be reserved until the program ends.
Create a Dynamically Sized Array
std::size_t N = 10;
SomeType *a = new SomeType[N];
for (auto i=0; i<N; ++i) {
std::cout << a[i] << std::endl;
}
delete [] a;
A C++ array is a pointer to a contiguous chunk of memory containing a specific type. This snippet allocates space for an array of 10 SomeType instances, constructs the array which initializes each object using the default constructor, iterates over the array, prints out each element, and then deallocates the memory by calling delete [].
Key Points:
you are required to call delete [] to deallocate the array. The array form is required to ensure that the destructor is called on each object.
after allocation, there is no way to recover the size of the array.
you can iterate over the array by index (or pointer)
This is not the way to do what you want, keep reading.
Using a vector
std::size_t N = 10;
std::vector<SomeType> v(10);
for (auto iter=v.begin(); iter!=v.end(); ++iter) {
std::cout << *iter << std::endl;
}
This snippet uses std::vector which manages a contiguous block of memory for you. Note the usage of an iterator to walk over the vector instead of using indexing. Vectors do support direct indexing so the for loop used in the previous example would work here as well.
for (auto i=0; i<v.size(); ++i) {
std::cout << v[i] << std::endl;
}
Using an iterator is the best practice and idiomatic though using a for loop may not be but I digress.
Key Points:
memory management is automatic in the case of a vector
you can append using push_back
a vector knows how many elements are in it -- call v.size() to get the number of elements
iteration is performed using the iterators returned from v.begin() and v.end()
Just use std::vector. If you need raw access to the underlying pointer, then use &v[0] or v.data() but don't do that unless you need to. Also, don't use std::auto_ptr for arrays. You will be tempted to but don't do it.

exceeding vector does not cause seg fault

I am very puzzled at the result of this bit of code:
std::vector<int> v;
std::cout << (v.end() - v.begin()) << std::endl;
v.reserve(1);
std::cout << (v.end() - v.begin()) << std::endl;
v[9] = 0;
std::cout << (v.end() - v.begin()) << std::endl;
The output:
0
0
0
So... first of all... end() does not point to the end of the internal array but the last occupied cell... ok, that is why the result of the iterator subtraction is still 0 after reserve(1). But, why is it still 0 after one cell has been filled. I expected 1 as the result, because end() should now return the iterator to the second internal array cell.
Furthermore, why on earth am I not getting a seg fault for accessing the tenth cell with v[9] = 0, while the vector is only 1 cell long?
First of all, end() gives you an iterator to one beyond the last element in the vector. And if the vector is empty then begin() can't return anything else than the same as end().
Then when you call reserve() you don't actually create any elements, you only reserve some memory so the vector don't have to reallocate when you do add elements.
Finally, when you do
v[9] = 0;
you are indexing the vector out of bounds which leads to undefined behavior as you write to memory you don't own. UB often leads to crashes, but it doesn't have too, it may seem to work when in reality it doesn't.
As a note on the last part, the [] operator doesn't have bounds-checking, which is why it will accept indexing out of bounds. If you want bounds-checking you should use at().
v[9] = 0;, you're just accessing the vector out of bound, it's UB. It may crash in some cases, and may not. Nothing is guaranteed.
And v[9] = 0;, you don't add element at all. You need to use push_back or resize:
v.push_back(0); // now it has 1 element
v.resize(10); // now it has 10 elements
EDIT
why does v[index] not create an element?
Because std::vector::operator[] just doesn't do that.
Returns a reference to the element at specified location pos. No bounds checking is performed.
Unlike std::map::operator[], this operator never inserts a new element into the container.
So it's supposed that the vector has the sufficient elements for the subscript operator.
BTW: What do you suppose vector should do when you write v[9] = 0? Before set the 10th element to 0, it has to push 10 elements first. And How to set their values? All 0? So, it won't do it, the issue just depends on yourself.
This is a guess, but hopefully a helpful one.
You will only get a segfault when you attempt to access an address that has not been assigned to your process' memory space. When the OS gives memory to a program, it generally does so in 4KB increments. Because of this, you can access past the end of some arrays/vectors without triggering a segfault, but not others.

C++ Iterate through an expanding container

What will happen if I expand a container while iterating through it, may I meet the new elements or just the old ones
std::vector<int> arr(0);
arr.push_back(1);
arr.push_back(2);
arr.push_back(3);
for (auto& ele : arr) {
if (4 == ele) std::cout << "meet new eles" << endl;
arr.push_back(4);
}
push_back may invalidate any iterator to the vector, what you have there is undefined behaviour.
std::vector internally allocates an array with its elements stored contiguously in memory. To do so it needs to preallocate an estimate of the size it may need, that's known as the capacity of the vector. At push_back if the capacity is reached it allocates a new bigger array, copies/moves all the contents of the previous array to the new one and then deletes the old one. That invalidates the iterators, that keep pointing to a deleted array.
Also worth mentioning that the allocation of the new array adds a considerable performance hit. If you know the size that your vector will have always use reserve() to preallocate memory.
Now, true is that if you reserve capacity before hand and never push back exceeding that capacity the iterators won't be invalidated (only the past-end iterator) and you can keep incrementing them since they point to an array of contiguous elements. But I wouldn't advice to do so, IMO the risk of reallocating the vector by mistake is not worth it.
If you are guaranteed to never remove elements the simplest solution would still be to use a regular for loop.
for(size_t i=0, n=arr.size(); i<n; ++i)
{
if(/* something */)
{
arr.push_back(/* ... */);
++n;
}
}
I know it's a trivial solution, but others maybe searching and will find this question.
Good note from #Long_GM. This will not work on every container. std::map for example cannot be iterated in this manner whether or not you insert any values.