I found a negative index access in an embedded code I'm debugging:
for (int i = len; i > 0; i--)
{
data[i - 1] = data[i - 2]; // negative access when i == 1
}
I read this about similar cases, but in the OP arr[-2] is guaranteed to be OK since arr points to the middle of a previously allocated array. In my case, data is a pointer inside a class that is initialized by the constructor with:
public:
constructor_name(): ... data(new T_a[size]), ...
And the pointer data is the first member in the class:
template <class T_a, class T_b, int size>
class T_c
{
private:
T_a *data;
T_b *...;
int ...;
int ...;
int ...;
public:
constructor_name(): ... data(new T_a[size]), ...
Now, is there a possibility that the negative index access was deliberate and was meaningful? Is there a way the programmer who wrote that was able to ensure that data[-1] will access a specific datum, using #pragma pack () or other any methods?
Seeing *data is the first member in the class made me think it was a bug, but I'm not sure. If it is indeed a bug - is it UB?
You are asking about guarantee (a quite strong word). And your code has undefined behavior (because you are accessing some data outside of your object) which really means you cannot have any guarantee. Arbitrarily bad things could happen, even if in practice they usually don't (in particular when data points to a scalar type, like pointers).
I would recommend to replace for (int i = len; i > 0; i--) with for (int i = len; i > 1; i--) at least to make the code more readable and more standard conforming.
If for some weird reason the data[-1] access was meaningful for the previous programmer he should at least have commented about that. I guess that if he did not, it simply is a bug.
It depends on the embedded chip you are using.
First, the important part to notice is data lives in the heap. So normally (in a desktop computer) you cannot make any assumption about that data being just next to another piece of information you control.
But as we are talking about embedded systems here, some of them have a single contiguous memory space for stack and heap, and start filling the heap from one side of that memory space, and the stack from the other, until they meet at the center and the program crashes.
Anyway, in this case, the programmer of your code could have been so careful that he knows which is the next heap allocation that will happen, thus ensuring which variable lives next to data, but I actually think it is very unlikely. Nonetheless, he would be exploiting an undefined behaviour because it's a "memory access outside of array bounds" and I would consider it not only a bad programming practice but straight a bug.
Found a picture of how that heap/stack model works in a specific chip here.
Related
I hope this is not a too controversial question, but I cannot find a proper full answer on SO. This is also not a question about the difference between the methods reserve and resize or the difference between capacity and size, which are (hopefully) clear to me and have often enough been asked on SO. Also, this is not a question, if this is good practice at all, which it is not!
Consider the following situation:
#include <vector>
#include <iostream>
struct Foo
{
double a, b;
};
int main(int argc, char* argv[])
{
std::vector<Foo> Vec;
Vec.reserve(100);
Foo foo;
foo.a = -13.131;
foo.b = 3.141;
for(int i = 0; i < 100; ++i)
Vec[i] = foo;
for(int i = 0; i < 100; ++i)
std::cout << Vec[i].a << std::endl;
return 0;
}
I first create a std::vector of Foo and the reserve memory, but don't resize the vector. Clearly size() = 0, BUT the memory for 100 elements has been allocated and may now be freely used by my program, so technically, writing to and reading from any position in memory of these elements cannot result in a segmentation fault, is that correct?
I have tried to run this code on Ubuntu 14.04 and everything works as expected, all 100 elements have been written to successfully and all outputs are also -13.131, even though the vector size remains at 0. If I look for through many answers on SO, they all correctly point out that it results in undefined behaviour, because the elements are not initialized, but could it actually result in a segmentation fault in any way (not talking about accessing elements of unitialized pointers in a vector etc.)?
A question similar to this has been asked here and that seems to confirm my thought, but would it in principle work accross all platforms that support compilation of C++?
Once you have undefined behaviour, it is well, undefined behaviour.
One of the key aspects of undefined behaviour is that you can't be sure what the behaviour would be on different system and compiler. Now you could look at the code of a specific compiler and a specific library implementation and you will see it acts as you expect it to.
But I don't think you will find anyone who is willing to bet that this will work across all different systems, compilers and library implementations.
Just for instance, what if a specific vector implementation decide to use the reserved memory for internal information? Maybe it is unlikely, but how can you be sure no system is actually doing it?
Let us consider a concrete example - the std::vector implementation that when reserve() is called, it allocates the memory, but then starts performing the copy on a background thread - because it can... shrugs who knows what will happen in the near future! So while it's copying, all reads are unlocked and go straight to the old memory area because that's still good for reading.
Now attempting to read something out of range will be attempting to read random memory, and not what you're asserting should be your new allocated memory.
So as the comments and the other answer says, undefined is undefined.
i'm trying to write a handle allocator in C++. this allocator would "handle" (hue hue hue) the allocation of handles for referencing assets (such as textures, uniforms, etc) in a game engine. for instance, inside a function for creating a texture, the handle allocator would be called to create a TextureHandle. when the texture was destroyed, the handle allocator would free the TextureHandle.
i'm reading through the source of BX, a library that includes a handle allocator just for this purpose - it's the base library of the popular library BGFX, a cross-platform abstraction over different rendering APIs.
before i start explaining what's baffling me, let me first outline what this class essentially looks like:
class HandleAllocator {
public:
constructor, destructor
getters: getNumHandles, getMaxHandles
u16 alloc();
void free(u16 handle);
bool isValid(u16 handle) const;
void reset();
private:
u16* getDensePointer() const;
u16* getSparsePointer() const;
u16 _numHandles;
u16 _maxHandles;
}
here's what getDensePointer() looks like:
u8* ptr = (u8*)reinterpret_cast<const u8*>(this);
return (u16*)&ptr[sizeof(HandleAlloc)];
as far as i understand it, this function is returning a pointer to the end of the class in memory, although i don't understand why the this pointer is first cast to a uint8_t* before being dereferenced and used with the array-index operator on the next line.
here's what's weird to me. the constructor calls the reset() function, which looks like this.
_numHandles = 0;
u16* dense = getDensePointer();
for(u16 ii=0, num = _maxHandles; ii < num; ++ii) {
dense[ii] = ii;
}
if getDensePointer returns a pointer to the end of the class in memory, how is it safe to be writing to memory beyond the end of the class in this for loop? how do i know this isn't stomping on something stored adjacent to it?
i'm a total noob, i realize the answer to this is probably obvious and betrays a total lack of knowledge on my part, but go easy on me..
To answer the first question, ask yourself why pointers have a type. In the end, they are just variables that are meant to store memory addresses. Any variable with a range large enough to store all possible memory addresses could do. They what is the difference between, let's say, int* and u8*?
The difference is the way operations are performed on them. Besides dereferencing, which is another story, pointer arithmetic is also involved. Let's take the following declarations: int *p; u8 *u;. Now, p+2, in order to have sense, will return the address at p+8 (the address of the second integer, if you'd like) while u+2 would return the address of u+2 (since u8 has a size of 1).
Now, sizeof gives you the size of the type in bytes. You want to move sizeof(x) bytes, so you need to index the array (or do pointer arithmetic, they are equivalent here) on a byte-sized data type. And that's why you cast it to u8.
Now, for the second question,
how do i know this isn't stomping on something stored adjacent to it?
simply by making sure nothing is there. This is done during the creation of the handler. For example, if you have:
HandleAllocator *h = new HandleAllocator[3]
you can freely call reset on h[0] and have 2 handlers worth of memory to play with. Without more details, it's hard to tell the exact way this excess memory is allocated and what's its purpose.
I am creating an Arduino device using C++. I need a stack object with variable size and variable data types. Essentially this stack needs to be able to be resized and used with bytes, chars, ints, doubles, floats, shorts, and longs.
I have a basic class setup, but with the amount of dynamic memory allocation that is required, I wanted to make sure that my use of data frees enough space for the program to continue without memory problems. This does not use std methods, but instead built in versions of those for the Arduino.
For clarification, my question is: Are there any potential memory problems in my code?
NOTE: This is not on the Arduino stack exchange because it requires an in depth knoweledge of C/C++ memory allocation that could be useful to all C and C++ programmers.
Here's the code:
Stack.h
#pragma once
class Stack {
public:
void init();
void deinit();
void push(byte* data, size_t data_size);
byte* pop(size_t data_size);
size_t length();
private:
byte* data_array;
};
Stack.cpp
#include "Arduino.h"
#include "Stack.h"
void Stack::init() {
// Initialize the Stack as having no size or items
data_array = (byte*)malloc(0);
}
void Stack::deinit() {
// free the data so it can be re-used
free(data_array);
}
// Push an item of variable size onto the Stack (byte, short, double, int, float, long, or char)
void Stack::push(byte* data, size_t data_size) {
data_array = (byte*)realloc(data_array, sizeof(data_array) + data_size);
for(size_t i = 0; i < sizeof(data); i++)
data_array[sizeof(data_array) - sizeof(data) + i] = data[i];
}
// Pop an item of variable size off the Stack (byte, short, double, int, float, long, or char)
byte* Stack::pop(size_t data_size) {
byte* data;
if(sizeof(data_array) - data_size >= 0) {
data = (byte*)(&data_array + sizeof(data_array) - data_size);
data_array = (byte*)realloc(data_array, sizeof(data_array) - data_size);
} else {
data = NULL;
}
// Make sure to free(data) when done with the data from pop()!
return data;
}
// Return the sizeof the Stack
size_t Stack::length() {
return sizeof(data_array);
}
There are some minor code bugs, apparently, which -- although important -- are easily resolved. The following answer only applies to the overall design of this class:
There is nothing wrong with just the code that is shown.
But only the code that's shown. No opinion is rendered on any code that's not shown.
And, it's fairly likely that there are going to be massive problems, and memory leaks, in the rest of the code which will attempt to use this class.
It's going to very, very easy to use this class in a way that leaks or corrupts memory. It's going to be much harder to use this class correctly, and much easier to screw up. The fact that these functions themselves appear to do their job correctly is not going to help if all you have to do is sneeze in the wrong direction, and end up with these functions not being used in the proper order, or sequence.
Just to name the first two readily apparent problems:
1) Failure to call deinit(), when any instance of this class goes out of scope and gets destroyed, will leak memory. Every time you use this class, you have to be cognizant of when the instance of this class goes out of scope and gets destroyed. It's easy to keep track of every time you create an instance of this class, and it's easy to remember to call init() every time. But keeping track of every possible way an instance of this class could go out of scope and get destroyed, so that you must call deinit() and free up the internal memory, is much harder. It's very easy to not even realize when that happens.
2) If an instance of this class gets copy-constructed, or the default assignment operator gets invoked, this is guaranteed to result in memory corruption, with an extra side-helping of a memory leak.
Note that you don't have to go out of your way to write code that copy-constructs, or assigns one instance of the object to another one. The compiler will be more than happy to do it for you, if you do not pay attention.
Generally, the best way to avoid these kinds of problems is to make it impossible to happen, by using the language correctly. Namely:
1) Following the RAII design pattern. Get rid of init() and deinit(). Instead, do this work in the object's constructor and destructor.
2) Either deleting the copy constructor and the assignment operator, or implementing them correctly. So, if instances of this class should never be copy-constructed or assigned-to, it's much better to have the compiler yell at you, if you accidentally write some code that does that, instead of spending a week tracking down where that happens. Or, if the class can be copy-constructed or assigned, doing it properly.
Of course, if there would only be a small number of instances of this class, it should be possible to safely use it, with tight controls, and lots of care, without doing this kind of a redesign. But, even if it were the case, it's always better to do the job right, instead of shrugging this off now, but then later deciding to expand the use of this class in more places, and then forgetting about the fact that this class is so error-prone.
P.S.: a few of the minor bugs that I mentioned in the beginning:
data_array = (byte*)realloc(data_array, sizeof(data_array) + data_size);
This can't be right. data_array is a byte *, so sizeof(data_array) will always be a compile-time constant, which would be sizeof(byte *). That's obviously not what you want here. You need to explicitly keep track of the allocated array's size.
The same general bug appears in several other places here, but it's easily fixed. The overall class design is the bigger problem.
I have two character array of size 100 (char array1[100], char array2[100]). Now i just want to check whether anybody is accessing array beyond the limit or not. Its necessary because suppose allocated memory for array1 and array2 are consecutive means as the array1 finish then array2 starts. Now if anyone write: array1[101], conceptually its wrong but compiler will give warning but will not crash. So How can i detect this problems and solve it?
Update 1:
I already have a code of line 15,000. And for that code i have to check this condition and i can invoke my functions but cannot change the written code. Please suggest me according to this.
Most modern languages will detect this and prevent it from happening. C and its derivatives don't detect this, and basically can't detect this, because of the numerous ways you can access the memory, including bare pointers. If you can restrict the way you access the memory, then you can possibly use a function or something to check your access.
My initial response to this would be to wrap the access to these arrays in a function or method and send the index as a parameter. If the index is out of bounds, raise an exception or report the error in some other way.
EDIT:
This is of course a run-time prevention. Don't know how you would check this at compile time if the compiler cannot checkt this for you. Also, as Kolky has already pointed out, it'd be easier to answer this if we know which language you are using.
If you are using C++ rather than C there any reason you can't use std::vector? That will give you bounds checking if the user goes outside your range. Am I missing something here?
Wouldn't it be sensible to prevent the user having direct access to the collections in the first place?
If you use boost::array or similar you will get an exception range_error if array bounds are overstepped. http://www.boost.org/doc/libs/1_44_0/doc/html/boost/array.html. Boost is fabulous.
In C/C++, there is no general solution. You can't do it at compile time since there are too many ways to change memory in C. Example:
char * ptr = &array2;
ptr = foo(ptr); // ptr --;
ptr now contains a valid address but the address is outside of array2. This can be a bug or what you want. C can't know (there is no way to say "I want it so" in C), so the compiler can't check it. Sililarily:
char * array2 = malloc(100);
How should the C compiler know that you are treating the memory as a char array and would like a warning when you write &array2[100]?
Therefore, most solutions use "mungwalls", i.e. when you call malloc(), they will actually allocate 16/32 bytes more than you ask for:
malloc(size) {
mungwall_size = 16;
ptr = real_malloc(size + mungwall_size*2);
createMungwall(ptr, mungwall_size);
createMungwall(ptr+size, mungwall_size);
return ptr+size;
}
in free() it will check that 16 bytes before and after the allocated memory area haven't been touched (i.e. that the mungwall pattern is still intact). While not perfect, it makes your program crash earlier (and hopefully closer to the bug).
You could also use special CPU commands to check all memory accesses but this approach would make your program 100 to 1 million times slower than it is now.
Therefore, languages after C don't allow pointers which means "array" is a basic type which has a size. Now, you can check every array access with a simple compare.
If you want to write code in C which is save, you must emulate this. Create an array type, never use pointers or char * for strings. It means you must convert your data type all the time (because all library functions use const char * for strings) but it makes your code safer.
Languages do age. C is now 40 years old and our knowledge has moved on. It's still used in a lot of places but it shouldn't be the first choice anymore. The same applies (to a lesser extend) to C++ because it suffers from the same fundamental flaws of C (even though you now have libraries and frameworks which work around many of them).
If you're in C++ you can write a quick wrapper class.
template<typename T, int size> class my_array_wrapper {
T contents[size];
public:
T& operator[](int index) {
if (index >= size)
throw std::runtime_error("Attempted to access outside array bounds!");
if (index < 0)
throw std::runtime_error("Attempted to access outside array bounds!");
return contents[index];
}
const T& operator[](int index) const {
if (index >= size)
throw std::runtime_error("Attempted to access outside array bounds!");
if (index < 0)
throw std::runtime_error("Attempted to access outside array bounds!");
return contents[index];
}
operator T*() {
return contents;
}
operator const T*() const {
return contents;
}
};
my_array_wrapper<char, 100> array1;
array1[101]; // exception
Problem solved, although if you access through the pointer decay there will be no bounds checking. You could use the boost::array pre-provided solution.
If you ran a static analyser (i.e. cppcheck) against your code it would give you a bounds error
http://en.wikipedia.org/wiki/User:Exuwon/Cppcheck#Bounds_checking
to solve it... you'd be better off using a container of some sorts (i.e. std::vector) or writing a wrapper
Why the array is not overflowed (e.g. error alert) when the array is declared globally, in other why I'm able to fill it with unlimited amount of elements (through for) even it's limited by size in declaration and it does alert when I declare the array locally inside the main ?
char name[9];
int main(){
int i;
for( int i=0; i<18; ++i){
cin>>name[i];
}
cout<<"Inside the array: ";
for(i=0; i<20; i++)
cout<<name[i];
return 0;
}
C++ does not check bounds errors for arrays of any kind. Reading or writing outside of the array bounds causes what is known as "undefined behaviour", which means anything could happen. In your case, it seems that what happens is that it appears to work, but the program will still be in an invalid state.
If you want bounds checking, use a std::vector and its at() member function:
vector <int> a( 3 ); // vector of 3 ints
int n = a.at( 0 ); // ok
n = a.at( 42 ); // throws an exception
C++ does not have array bounds checking so the language never check to see if you have exceeded the end of your array but as others have mentioned bad things can be expected to happen.
Global variables exists in the static segment which is totally separate from your stack. It also does not contain important information like return addresses. When you exceed an array's boundaries you are effectively corrupting memory. It just so happens that corrupting the stack is more likely to cause more visible bad things than corrupting the data segment. All of this depends on the way your operating system organizes a process's memory.
its undefined behavior. Anything can happen.
You cannot assume too much about the memory layout of variables. It may run on your computer with these parameters, but totally fail when you increase your access bounds, or run that code on another machine. So if you seriously want to write code, don't let this become a habit.
I'd go one step further and state that C/C++ do not have arrays. What they have is array-like syntactic sugar that is immediately translated to pointer arithmetic, which cannot be checked, as pointers can be used to access potentially all of memory. Any checking that the compiler may manage to perform based on static sizes and constant bounds on an index is a happy accident, but you cannot rely on it.
Here's an oddity that stunned me when I first saw it:
int a[10], i;
i = 5;
a[i] = 42; // Looks normal.
5[a] = 37; // But what's this???
std::cout << "Array element = " << a[i] << std::endl;
But the odd-looking line is perfectly legal C++. This example emphasizes that arrays in C/C++ are a fiction.
Neil Butterworth already commented on the benefits of using std::vector and the at() access method for it, and I cannot second his recommendation strongly enough. (Unfortunately, the designers of STL blew a golden opportunity to make checked access the [] operators, with the at() methods the unchecked operators. This has probably cost the C++ programming community millions of hours and millions of dollars, and will continue to do so.)