Proper use of memory with dynamic array lengths in c++

Proper use of memory with dynamic array lengths in c++ - c++

When creating dynamic arrays would the following code be considered "correct" in terms of memory use, and performance? Please explain why / why not.
My function getFifoData takes a pointer to a receive buffer, and internally calculates how long the message is based on the current FIFO size using getFifoThreshold.
int serial_spi_handler::getFifoData(unsigned char * rxBuf) {
uint16_t currentFifoThreshold = getFifoThreshold();
const int msgLength = (currentFifoThreshold * 2) + 1;
std::vector < uint8_t > txBuf;
txBuf.reserve(msgLength);
uint8_t tBuff[txBuf.size()];
tBuff[0] = 0xC2;
int bytesWritten = readWrite(busDescriptor, tBuff, rxBuf, msgLength);
if (consoleLogging) {
printf("getFifoData function, wrote: %d bytes\n\r", bytesWritten);
} else if (diagOutput) {
qDebug() << "getFifoData function, wrote: " << bytesWritten << " bytes";
}
return msgLength;
}
//Header of readWrite:
//int readWrite(int busDescriptor, uint8_t *pTxBuffer, uint8_t *pRxBuffer, int length);

I'm not sure what you mean by "correct"; your code is not correct at least in the sense mentione by #SamVarshavchik, and which a compiler will tell you about:
a.cpp: In function ‘int getFifoData(unsigned char*)’:
a.cpp:20:29: warning: ISO C++ forbids variable length array ‘tBuff’ [-Wvla]
uint8_t tBuff[txBuf.size()];
If you want to understand why C++ has no VLA's read this SO question:
Why aren't variable-length arrays part of the C++ standard?
Issues with the code
Here are some issues I believe you should consider.
Confusing names
The function readWrite() - does it read? does it write? does it do both? Who knows.
Don't clip names. If you want to name a buffer, call it my_buffer, not my_buff. Similarly, diagnostic_output not diagOutput.
No impromptu initials. What's a tBuffer? Is it a test buffer? A transaction buffer? A transmission buffer? A temporary buffer?
Not everyone knows what RX means.
getFifoData() - what does it even do? Is its parameter a "FIFO"? That's not what the parameter name says. And if it is - where is that information we supposedly get? There's no destination buffer that's passed for use, nor a container that's returned.
Chance of buffer overflow / invalid memory access
Why does getFifoData() take a buffer without also taking its length as well?
Better yet, why can't it take a span?
Use of dynamically-allocated buffers
Both std::vector and the variable-length array are dynamically-allocated memory. VLAs have their own issues (see link above), but as for vectors - you would be performing a memory allocation call on each call of this function; and that might be expensive if it gets called a lot.
Logging
Printing to the console or to a file is slow. Well, sort of slow, anyway - this is all relative. Now, this happens within an "if" statement, but if you've configured your app to log things, you'll be paying this price on every call to getFifoData().
Time it!
Finally, - if you're worried about performance, time your function, or do it with a profiler. Then you can see how much time it actually takes and whether that's a problem.

Related

Passing pointer and then allocating variable length array to stack

Is it possible to allocate a variable length array to the stack in one function from another function?
One way that works is to just allocate the largest possible size up front, but I'm wondering if there is a way to avoid this.
void outside_function(){
char[] place_to_allocate_stack_array;
size_t array_size = allocate_and_fill_array(place_to_allocate_stack_array);
//do stuff with the now allocated variable length array on stack
}
size_t allocate_and_fill_array(char* place_to_allocate){
//does some stuff to determine how long the array needs to be
size_t length= determine_length();
//here I want to allocate the variable length array to the stack,
//but I want the outside_function to still be able to access it after
//the code exits allocate_and_fill_array
place_to_allocate[length];
//do stuff to fill the array with data
return length;
}
size_t determine_length(){
////unknown calculations to determine required length
}

No, even ignoring the concerns people have about using variable-length arrays (VLAs). You are trying to accomplish too much in a single function. Step back a bit and look at what you are asking.
For consistency and to get away from arrays, I'm going to rename some things. Consider this version of your setup:
class X; // instead of "char []" so we can drop the VLA baggage
size_t inner_function(X & data) { // was "allocate_and_fill_array"
// Determine how data should be allocated
// Do stuff with data
}
void outer_function() {
X data;
size_t data_size = inner_function(data);
}
Requirement #1: The inner function needs access to a variable declared in the outer function. This requires that the variable be passed as a parameter to the inner function. This in turn requires that the inner function be called after the variable is declared.
Requirement #2: The inner function determines how data should be allocated (which happens at the point of declaration). This requires that the inner function be called before the variable is declared.
These requirements have contradictory prerequisites. Not possible.
I am led to the question: what led you to this approach? You already wrote a separate determine_length function. Let outside_function call that, declare the VLA, then pass the VLA and the length to the inner function. Much simpler conceptually.
size_t determine_length() {
// unknown calculations to determine required length
}
void fill_array(char* stack_array, size_t length) {
//do stuff to fill the array with data
}
void outside_function(){
size_t length = determine_length();
char stack_array[length];
fill_array(stack_array, length);
//do stuff with the variable length array on stack
}
Still, this obsession with getting the data on the stack is probably premature. While it is true that heap storage is more expensive than stack storage, the difference is often not worth worrying about. Get your program working before jumping through hoops to tweak performance. Focus on robustness. Only spend time on a performance bottleneck after it has been identified by a profiler.

What difference does it make (in memory terms) if I declare arguments as variables in advance instead of writing them in-line of the function call?

For example, for the dummy function write(int length, const char* text){...}, is there any difference in terms of memory between these two approaches?
write(18,"The cake is a lie.");
or
int len = 18;
char txt[19] = "The cake is a lie.";
write(len,txt)
Bonus: what if there's some repetition? i.e. A loop calls the function repeatedly using an array whose elements are the intended arguments.
I'm asking this question, especially the bonus, in hopes of better understanding how each consumes memory to optimize my efficiency when writing on memory-sensitive platforms like Arduino. That said, if you know of an even more efficient way, please share! Thanks!

It depends on whether char txt[19] is declared in scope of a function or at a global (or namespace) scope.
If in scope of a function, then txt will be allocated on the stack and initialized at run time from a copy of the string literal residing in a (read-only) data segment.
If at global scope, then it will be allocated at build time in the data segment.
Bonus: if it's allocated in some sub-scope, like a loop body, then you should assume it will be initialized during every loop iteration (the optimizer might do some tricks but don't count on it).
Example 1:
int len = 18;
char txt[19] = "The cake is a lie.";
int main() {
write(len,txt);
}
Here len (an int) and txt (19 bytes + alignment padding) will be allocated in the program's data segment at build time.
Example 2:
int main() {
int len = 18;
char txt[19] = "The cake is a lie.";
write(len,txt);
}
Here the string literal "The cake is a lie." will be allocated in the program's data segment at build time. In addition, len and txt (19 bytes + padding) may be allocated on the stack at run time. The optimizer may omit the len allocation and maybe even txt, but don't count on it, as it's going to depend on many factors, like whether write body is available, what it does exactly, the quality of the optimizer, etc. When in doubt, look at the generated code (godbolt now supports AVR targets).
Example 3:
int main() {
write(18,"The cake is a lie.");
}
Here the string literal "The cake is a lie." will be allocated in the program's data segment at build time. The 18 will be embedded in the program code.
Since you're developing on AVR, there are some additional specifics worth mentioning, namely the application's executable is initially stored in the Flash, and once you "run" it, it is copied to the RAM. It is possible to avoid copying to RAM and keep the data in the Flash using the PROGMEM keyword (though to do anything meaningful with the data you will need to copy it to RAM).

c++ function that takes any data type without using templates?

I have assignment which asks one to write a function for any data type.The function is supposed to print the bytes of the structure and identify the total number of bytes the data structure uses along with differentiating between bytes used for members and bytes used for padding.
My immediate reaction, along with most of the classes reaction was to use templates. This allows you to write the function once and gather the run time type of the objects passed into the function. Using memset and typeid's one can easily accomplish what has been asked. However, our prof. just saw our discussion about templates and damned templates to hell.
After seeing this I was thrown for a loop and I'm looking for a little guidance as the best way to get around this. Some things I've looked into:
void pointers with explicit casting (this seems like it'd get messy)
base class with virtual functions only from which all data structures inherit from, seems a bit odd to do.
a base class with 'friendships' to each of our data structures.
rewriting a function for each data structure in our problem set (what I imagine is the worst possible solution).
Was hoping I overlooked a common c++ tool, does anyone have any ideas?

Treat the function as stupid as possible, in fact, treat it as if it doesn't know anything and all information must be passed to it.
Parameters to the function:
Structure address, as a uint8_t *. (Needed to print the bytes)
Structure size, in bytes. (Needed to print the bytes and to print the
total size)
A vector of member information: member length OR the sum of the bytes used by the members.
The vector is needed to fulfill the requirement of printing the bytes used by the members and the bytes used by padding. Optionally you could pass the sum of the members.
Example:
void Analyze_Structure(uint8_t const * p_structure,
size_t size_of_structure,
size_t size_occupied_by_members);
The trick of this assignment is to figure out how to have the calling function determine these items.
Hope this helps.
Edit 1:
struct Apple
{
char a;
int weight;
double protein_per_gram;
};
int main(void)
{
Apple granny_smith;
Analyze_Structure((uint8_t *) &granny_smith,
sizeof(Apple),
sizeof(granny_smith.a)
+ sizeof(granny_smith.weight)
+ sizeof(granny_smith.protein_per_gram);
return 0;
}

I have assignment which asks one to write a function for any data type.
This means either templates (which your prof. dismissed), void*, or variable number of arguments (simiar to printf).
The function is supposed to print the bytes of the structure
void your_function(void* data, std::size_t size)
{
std::uint8_t* bytes = reinterpret_cast<std::uint8_t*>(data);
for(auto x = bytes; x != bytes + size; ++x)
std::clog << "0x" << std::hex << static_cast<std::uint32_t>(*x) << " ";
}
[...] and identify the total number of bytes the data structure uses along with differentiating between bytes used for members and bytes used for padding.
On this one, I'm lost: the bytes used for padding are (by definition) not part of the structure. Consider:
struct x { char c; char d; char e; }; // sizeof(x) == 3;
x instance{ 0, 0, 0 };
your_function(&instance, sizeof(x)); // passes 3, not 4 (4 for 32bits architecture)
Theoretically, you could also pass alignof(instance) to the function, but that won't tell you the alignment of the fields in memory (as far as I know it is not standardized, but I may be wrong).
There are a few possibilities here:
Your prof. learned "hacky" C++ that was considered good code 10 or 20 years ago and didn't update his knowledge (C-style code, pointers, direct memory access and "smart hacks" are all in here).
He didn't know how to express exactly what he wanted or the terminology to use ("write a function for any data type" is too vague: as a developer, if I got this assignment, the first thing to do would be to ask for details - like "how will it be used?" and "what is the expected function signature").
For example, this could be achieved - to a degree - with macros, but if he wants you to use macros in place of functions and templates, you should probably contemplate changing professors.
He meant that you should write some arbitrary data type (like my struct x above) and define your API around that (unlikely).

I am not sure that such a function can be built without a minimum of introspection: you need to know what the struct members are, otherwise you only have access to the size of the struct.
Anyway, here is my proposal for a solution that should work without introspection, provided the user of the code "cooperates".
Your functions will take as arguments void* and size_t for the address and sizeof of the struct.
0) let the user create a struct of the desired type.
1) let the user call a function of yours that sets all bytes to 0.
2) let the user assign a value to every field of the struct.
3) let the user call a function of yours that keeps a record of every byte that is still 0.
4) let the user call a function of yours that sets all bytes to 1.
5) let the user assign a value to every field of the struct again. (Same values as the first time!)
6) let the user call a function of yours and count the bytes that are still 1 AND were marked before. These are padding bytes.
The reason to try with values 0 then 1 is that the values assigned by the user could include bytes 0; but they can't be bytes 0 and bytes 1 at the same time so one of the test will exclude them.
struct _S { int I; char C } S;
Fill0(S, sizeof(S));
// User cooperation
S.I= 0;
S.C= '\0';
Mark0(S, sizeof(S)); // Has some form of static storage
Fill1(S, sizeof(S));
// User cooperation
S.I= 0;
S.C= '\0';
DetectPadding(S, sizeof(S));
You can pack all of this in a single function that takes a callback function argument that does the member assignments.
void Assign(void* pS) // User-written callback
{
struct _S& S= *(struct _S)pS;
S.I= 0;
S.C= '\0';
}

C++ Call by Ref. with a dynamic sized struct without knowing its size

I need to use a function (part of an API) which stores some requested data into a dynamic sized struct using call by reference. The struct is defined as follows - it concerns access control lists of either posix or NFS4 version, but that is just the use case, I guess.
typedef struct my_struct
{
unsigned int len; /* total length of the struct in bytes */
... /* some other static sized fields */
unsigned int version; /* 2 different versions are possible */
unsigned int amount; /* number of entries that follow */
union {
entry_v1_t entry_v1[1];
entry_v2_t entry_v2[1];
};
} my_struct_t;
There are 2 versions of the entries and I know which one I will obtain (v1). Both entry_v1_t and entry_v2_t are fixed (but different) sized structs just containing integers (so I guess they are not worth being explained here). Now I need to use an existing function to fill my structure with the information I need using Call by Reference, the signature is as follows, including the comments - I don't have access to the implementation:
int get_information(char *pathname, void *ptr);
/* The ptr parameter must point to a buffer mapped by the my_struct
* structure. The first four bytes of the buffer must contain its total size.
*/
So the point is, that I must allocate memory for that struct but don't know for how much entries (and, as consequence, the total size) I must allocate. Have you ever dealt with such a situation?

Under Windows API there are many such functions, you normally call them with some NULL pointer to get size of the buffer, then call again with allocated buffer. In case during next call size of buffer have changed function returns error and you need allocate again. So you do it in a while loop till function returns with success.
So your get_information must implement somehow such mechanisms, either it returns false if buffer is to small or returns its correct size if ptr is NULL. But that is just my guess.

OK I thing I figured out how it works. Thanks for your ideas and notes. I declared a my_struct pointer and allocated minimum space for the fixed sized fields (5) before the dynamic array => 5 * sizeof(unsigned int). Invoking get_information with that pointer returns -1 and sets errno = 28 and strerror(errno) = "No space left on device".
But, it sets the my_struct->len field to the required size and that seems to be the answer to my question - how should you know? No I can invoke get_information initially with the minimum space and figure out how much I need to allocate, and afterwards call it again with the right sized memory allocated to get the information successfully.
The loop solution seems to make sense anyway and would have been my next try - since there are usually just a few entries in that dynamic array.
Thank you.

Accessing array beyond the limit

I have two character array of size 100 (char array1[100], char array2[100]). Now i just want to check whether anybody is accessing array beyond the limit or not. Its necessary because suppose allocated memory for array1 and array2 are consecutive means as the array1 finish then array2 starts. Now if anyone write: array1[101], conceptually its wrong but compiler will give warning but will not crash. So How can i detect this problems and solve it?
Update 1:
I already have a code of line 15,000. And for that code i have to check this condition and i can invoke my functions but cannot change the written code. Please suggest me according to this.

Most modern languages will detect this and prevent it from happening. C and its derivatives don't detect this, and basically can't detect this, because of the numerous ways you can access the memory, including bare pointers. If you can restrict the way you access the memory, then you can possibly use a function or something to check your access.

My initial response to this would be to wrap the access to these arrays in a function or method and send the index as a parameter. If the index is out of bounds, raise an exception or report the error in some other way.
EDIT:
This is of course a run-time prevention. Don't know how you would check this at compile time if the compiler cannot checkt this for you. Also, as Kolky has already pointed out, it'd be easier to answer this if we know which language you are using.

If you are using C++ rather than C there any reason you can't use std::vector? That will give you bounds checking if the user goes outside your range. Am I missing something here?
Wouldn't it be sensible to prevent the user having direct access to the collections in the first place?

If you use boost::array or similar you will get an exception range_error if array bounds are overstepped. http://www.boost.org/doc/libs/1_44_0/doc/html/boost/array.html. Boost is fabulous.

In C/C++, there is no general solution. You can't do it at compile time since there are too many ways to change memory in C. Example:
char * ptr = &array2;
ptr = foo(ptr); // ptr --;
ptr now contains a valid address but the address is outside of array2. This can be a bug or what you want. C can't know (there is no way to say "I want it so" in C), so the compiler can't check it. Sililarily:
char * array2 = malloc(100);
How should the C compiler know that you are treating the memory as a char array and would like a warning when you write &array2[100]?
Therefore, most solutions use "mungwalls", i.e. when you call malloc(), they will actually allocate 16/32 bytes more than you ask for:
malloc(size) {
mungwall_size = 16;
ptr = real_malloc(size + mungwall_size*2);
createMungwall(ptr, mungwall_size);
createMungwall(ptr+size, mungwall_size);
return ptr+size;
}
in free() it will check that 16 bytes before and after the allocated memory area haven't been touched (i.e. that the mungwall pattern is still intact). While not perfect, it makes your program crash earlier (and hopefully closer to the bug).
You could also use special CPU commands to check all memory accesses but this approach would make your program 100 to 1 million times slower than it is now.
Therefore, languages after C don't allow pointers which means "array" is a basic type which has a size. Now, you can check every array access with a simple compare.
If you want to write code in C which is save, you must emulate this. Create an array type, never use pointers or char * for strings. It means you must convert your data type all the time (because all library functions use const char * for strings) but it makes your code safer.
Languages do age. C is now 40 years old and our knowledge has moved on. It's still used in a lot of places but it shouldn't be the first choice anymore. The same applies (to a lesser extend) to C++ because it suffers from the same fundamental flaws of C (even though you now have libraries and frameworks which work around many of them).

If you're in C++ you can write a quick wrapper class.
template<typename T, int size> class my_array_wrapper {
T contents[size];
public:
T& operator[](int index) {
if (index >= size)
throw std::runtime_error("Attempted to access outside array bounds!");
if (index < 0)
throw std::runtime_error("Attempted to access outside array bounds!");
return contents[index];
}
const T& operator[](int index) const {
if (index >= size)
throw std::runtime_error("Attempted to access outside array bounds!");
if (index < 0)
throw std::runtime_error("Attempted to access outside array bounds!");
return contents[index];
}
operator T*() {
return contents;
}
operator const T*() const {
return contents;
}
};
my_array_wrapper<char, 100> array1;
array1[101]; // exception
Problem solved, although if you access through the pointer decay there will be no bounds checking. You could use the boost::array pre-provided solution.

If you ran a static analyser (i.e. cppcheck) against your code it would give you a bounds error
http://en.wikipedia.org/wiki/User:Exuwon/Cppcheck#Bounds_checking
to solve it... you'd be better off using a container of some sorts (i.e. std::vector) or writing a wrapper

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js