Variable Length Array Performance Implications (C/C++)

Variable Length Array Performance Implications (C/C++) - c++

I'm writing a fairly straightforward function that sends an array over to a file descriptor. However, in order to send the data, I need to append a one byte header.
Here is a simplified version of what I'm doing and it seems to work:
void SendData(uint8_t* buffer, size_t length) {
uint8_t buffer_to_send[length + 1];
buffer_to_send[0] = MY_SPECIAL_BYTE;
memcpy(buffer_to_send + 1, buffer, length);
// more code to send the buffer_to_send goes here...
}
Like I said, the code seems to work fine, however, I've recently gotten into the habit of using the Google C++ style guide since my current project has no set style guide for it (I'm actually the only software engineer on my project and I wanted to use something that's used in industry). I ran Google's cpplint.py and it caught the line where I am creating buffer_to_send and threw some comment about not using variable length arrays. Specifically, here's what Google's C++ style guide has to say about variable length arrays...
http://google-styleguide.googlecode.com/svn/trunk/cppguide.xml#Variable-Length_Arrays_and_alloca__
Based on their comments, it appears I may have found the root cause of seemingly random crashes in my code (which occur very infrequently, but are nonetheless annoying). However, I'm a bit torn as to how to fix it.
Here are my proposed solutions:
Make buffer_to_send essentially a fixed length array of a constant length. The problem that I can think of here is that I have to make the buffer as big as the theoretically largest buffer I'd want to send. In the average case, the buffers are much smaller, and I'd be wasting about 0.5KB doing so each time the function is called. Note that the program must run on an embedded system, and while I'm not necessarily counting each byte, I'd like to use as little memory as possible.
Use new and delete or malloc/free to dynamically allocate the buffer. The issue here is that the function is called frequently and there would be some overhead in terms of constantly asking the OS for memory and then releasing it.
Use two successive calls to write() in order to pass the data to the file descriptor. That is, the first write would pass only the one byte, and the next would send the rest of the buffer. While seemingly straightforward, I would need to research the code a bit more (note that I got this code handed down from a previous engineer who has since left the company I work for) in order to guarantee that the two successive writes occur atomically. Also, if this requires locking, then it essentially becomes more complex and has more performance impact than case #2.
Note that I cannot make the buffer_to_send a member variable or scope it outside the function since there are (potentially) multiple calls to the function at any given time from various threads.
Please let me know your opinion and what my preferred approach should be. Thanks for your time.

You can fold the two successive calls to write() in your option 3 into a single call using writev().
http://pubs.opengroup.org/onlinepubs/009696799/functions/writev.html

I would choose option 1. If you know the maximum length of your data, then allocate that much space (plus one byte) on the stack using a fixed size array. This is no worse than the variable length array you have shown because you must always have enough space left on the stack otherwise you simply won't be able to handle your maximum length (at worst, your code would randomly crash on larger buffer sizes). At the time this function is called, nothing else will be using the further space on your stack so it will be safe to allocate a fixed size array.

Related

How to determinte how much space to allocate for boost::stacktrace::safe_dump_to?

I'm looking at the boost::stacktrace::safe_dump_to API and I cannot for the life of me work out how to determine how much space to allocate for the safe_dump_to() call. If I pass (nullptr, 0), it just returns 0, so that's not it. I could guess some constant number, but how do i know that's enough?

The docs specify:
This header contains low-level async-signal-safe functions for dumping call stacks. Dumps are binary serialized arrays of void*, so you could read them by using 'od -tx8 -An stacktrace_dump_failename' Linux command or using boost::stacktrace::stacktrace::from_dump functions.
And additionally
Returns:
Stored call sequence depth including terminating zero frame. To get the actually consumed bytes multiply this value by the sizeof(boost::stacktrace::frame::native_frame_ptr_t))
It's not overly explicit, but this means you need sizeof(boost::stacktrace::frame::native_frame_ptr_t)*N where N is the number of stackframes in the trace.
Now of course, you can find out N, but there is not async-safe way to do allocate dynamically anyways, so you'd simply have to pick a number that suits your application. E.g. 256 frames might be reasonable, but you should look at your own needs (e.g. DEBUG builds will show a lot more stack frames, especially with code that heavily relies on template generics, YMMV).
Since the whole safe_dump_to construct is designed to be async-safe, I always just use the overload that writes to a file. When reading it back (typically after a restart) you will be able to deduce the number of frames from the file-size.
Optionally see some of my answers for code samples/more background information

C++ fill an empty buffer with a single value

I apologize in advance if I am using the incorrect terminology, I'm new to the C++ language. I have a class with a constructor that creates an empty buffer using malloc
LPD6803PWM::LPD6803PWM(uint16_t leds, uint8_t dout, uint8_t cout) {
numLEDs = leds;
pixels = (uint16_t *) malloc(numLEDs);
dataPin = dout;
clockPin = cout;
}
My understanding is that this creates an empty buffer with the length of whatever I pass to numLEDs this is essentially a dynamically created array correct? I'm using malloc because this code goes on an Arduino that has very limited memory and I want to avoid overflows and from what I have read, this is the best way to declare arrays is you don't know what size the array will be and you want to avoid overflow errors.
My question is, once this array has been created is there a faster way than a traditional for loop to fill the array with a single value. Very often I will want to do this and even microseconds make a difference in this application. I know that from the C++ standard library array classes have a fill method, but what about an array declared in this way?

My question is, once this array has been created is there a faster way than a traditional for loop to fill the array with a single value.
The C standard library provides memset() and related functions for filling a buffer. There's also calloc(), which allocates a buffer just like malloc(), but fills the buffer with 0 at the same time.
Very often I will want to do this and even microseconds make a difference in this application.
In that case you might consider ways to avoid repeatedly allocating the array, which could take more time than filling an existing array. As well, the easiest way to make your code go faster is to run it on faster hardware. Arduino is a great platform, but Raspberry Pi Zero costs less ($5, if you can find them), has a LOT more memory, and has a clock speed that's 64x faster than a typical Arduino (1Ghz vs. 16MHz). Computing is often a tradeoff between good, cheap, and fast, but in this case you get all three.

You can still use std::fill (or std::fill_n), most standard library implementations will delegate to memset for RandomAccessIterator (e.g. gcc and Clang). Trust in the standard library writers!

You can use memset. But you have to be careful about the value you want to set. And you won't be much faster than using a for loop. The computer needs to set all these values somehow! memset may set larger contiguous memory spans and therefore be faster, but a smart compiler may do the same for a for loop.
If you're really concerned about microseconds you need to do some profiling.

Well, you can use memset from stdlib.h:
memset(array, 0, size_of_array_in_bytes);
Note however that memset works byte for byte,e.g it sets the first byte to 0 or whatever value you set as the second parameter, then the second byte and so on, which means that you must be careful.
Just a note:
malloc gets its size as the size of arrays in bytes, so you might consider multiplying its parameter by sizeof(uint16_t)

Defending classes with 'magic numbers'

A few months ago I read a book on security practices, and it suggested the following method for protecting our classes from overwriting with e.g. overflows etc.:
first define a magic number and a fixed-size array (can be a simple integer too)
use that array containing the magic number, and place one at the top, and one at the bottom of our class
a function compares these numbers, and if they are equal, and equal to the static variable, the class is ok, return true, else it is corrupt, and return false.
place this function at the start of every other class method, so this will check the validity of the class on function calls
it is important to place this array at the start and the end of the class
At least this is as I remember it. I'm coding a file encryptor for learning purposes, and I'm trying to make this code exception safe.
So, in which scenarios is it useful, and when should I use this method, or is this something totally useless to count on? Does it depend on the compiler or OS?
PS: I forgot the name of the book mentioned in this post, so I cannot check it again, if anyone of you know which one was it please tell me.

What you're describing sounds a Canary, but within your program, as opposed to the compiler. This is usually on by default when using gcc or g++ (plus a few other buffer overflow countermeasures).
If you're doing mutable operations on your class and you want to make sure you don't have side effects, I don't know if having a magic number is very useful. Why rely on a homebrew validity check when there are mothods out there that are more likely to be successful?
Checksums: I think it'd be more useful for you to hash the unencrypted text and add that to the end of the encrypted file. When decrypting, remove the hash and compare the hash(decrypted text) with what it should be.
I think most, if not all, widely used encryptors/decryptors store some sort of checksum in order to verify that the data has not changed.

This type of a canary will partially protect you against a very specific type of overflow attack. You can make it a little more robust by randomizing the canary value every time you run the program.
If you're worried about buffer overflow attacks (and you should be if you are ever parsing user input), then go ahead and do this. It probably doesn't cost too much in speed to check your canaries every time. There will always be other ways to attack your program, and there might even be careful buffer overflow attacks that get around your canary, but it's a cheap measure to take so it might be worth adding to your classes.

Efficiently collect data from multiple 1-D arrays in to a single 1-D array

I've got a prewritten function in C that fills an 1-D array with data, e.g.
int myFunction(myData **arr,...);
myData *array;
int arraySize;
arraySize = myFunction(&arr, ...);
I would like to call the function n times in a row with slightly different parameters (n is dependent on user input), and I need all the data collected in a single C array afterwards. The size of the returned array is not always fixed. Oh, and myFunction does the memory allocation internally. I want to do this in a memory-efficient way, but using realloc in each iteration does not sound like a good idea.
I do have all the C++ functionality available (the project is in C++, just using a C library), but using std::vector is no good because the collected data is later sent in to a function with a definition similar to:
void otherFunction(myData *data, int numData, ...);
Any ideas? Only things I can think of are realloc or using a std::vector and copying the data into an array afterwards, and those don't sound too promising.

Using realloc() in each iteration sounds like a very fine idea to me, for two reasons:
"does not sound like a good idea" is what people usually say when they have not established a performance requirement for their software, and they have not tested their software against the performance requirement to see if there is any need to improve it.
Instead of reallocating a new block each time, the realloc method will simply keep expanding your memory block which will presumably be at the top of the memory heap, so it won't be wasting any time either traversing memory block lists, or copying data around. This holds true provided that whatever memory allocated by myFunction() gets freed before it returns. You can verify it by looking at the pointer returned by realloc() and seeing that it always (or almost always(*1)) is the exact same pointer as the one you gave it to reallocate.
EDIT (*1) some C++ runtimes implement two heaps, one for small allocations and one for large allocations, so if your block gets allocated in the heap for small blocks, and then it grows large, there is a possibility that it will be moved once to the heap for large blocks. So, don't expect the pointer to always be the same; just most of the time.

Just copy all of the data into an std::vector. You can call otherFunction on a vector v with
otherFunction(&v[0], v.size(), ...)
or
otherFunction(v.data(), v.size(), ...)
As for your efficiency requirement: it looks to me like your optimizing prematurely. First try this option, then measure how fast it is and only look for other solutions if it's really too slow.

If you know that you are going to call the function N times, and returned arrays are always M long, then why don't you just allocate one array M*N initially? Or if you don't know one of M or N, then set a worst case maximum. Or are M and N both dependent on user-input?
Then, change how you call your user-input-getting function, such that the array pointer you pass it is actually an offset into that large array, so that it stores the data in the right location. Then, next iteration, offset further, and call again.

I think best solution would be to write your own 1D array class with some methods which you need.
depending on how you write the class you'll get such result. (sorry bad grammar)..

C++ string memory management

Last week I wrote a few lines of code in C# to fire up a large text file (300,000 lines) into a Dictionary. It took ten minutes to write and it executed in less than a second.
Now I'm converting that piece of code into C++ (because I need it in an old C++ COM object). I've spent two days on it this far. :-( Although the productivity difference is shocking on its own, it's the performance that I would need some advice on.
It takes seven seconds to load, and even worse: it takes just exactly that much time to free all the CStringWs afterwards. This is not acceptable, and I must find a way to increase the performance.
Are there any chance that I can allocate this many strings without seeing this horrible performace degradation?
My guess right now is that I'll have to stuff all the text into a large array and then let my hash table point to the beginning of each string within this array and drop the CStringW stuff.
But before that, any advice from you C++ experts out there?
EDIT: My answer to myself is given below. I realized that that is the fastest route for me, and also step in what I consider the right direction - towards more managed code.

This sounds very much like the Raymond Chen vs Rico Mariani's C++ vs C# Chinese/English dictionary performance bake off. It took Raymond several iterations to beat C#.
Perhaps there are ideas there that would help.
http://blogs.msdn.com/ricom/archive/2005/05/10/performance-quiz-6-chinese-english-dictionary-reader.aspx

You are stepping into the shoes of Raymond Chen. He did the exact same thing, writing a Chinese dictionary in unmanaged C++. Rico Mariani did too, writing it in C#. Mr. Mariani made one version. Mr. Chen wrote 6 versions, trying to match the perf of Mariani's version. He pretty much rewrote significant chunks of the C/C++ runtime library to get there.
Managed code got a lot more respect after that. The GC allocator is impossible to beat. Check this blog post for the links. This blog post might interest you too, instructive to see how the STL value semantics are part of the problem.

Yikes. get rid of the CStrings...
try a profiler as well.
are you sure you were not just running debug code?
use std::string instead.
EDIT:
I just did a simple test of ctor and dtor comparisons.
CStringW seems to take between 2 and 3 times the time to do a new/delete.
iterated 1000000 times doing new/delete for each type. Nothing else - and a GetTickCount() call before and after each loop. Consistently get twice as long for CStringW.
That doesn't address your entire issue though I suspect.
EDIT:
I also don't think that using string or CStringW is the real the problem - there is something else going on that is causing your issue.
(but for god's sake, use stl anyway!)
You need to profile it. That is a disaster.

If it is a read-only dictionary then the following should work for you.
Use fseek/ftell functionality, to find the size of the text file.
Allocate a chunk of memory of that size + 1 to hold it.
fread the entire text file, into your memory chunk.
Iterate though the chunk.
push_back into a vector<const char *> the starting address of each line.
search for the line terminator using strchr.
when you find it, deposit a NUL, which turns it into a string.
the next character is the start of the next line
until you do not find a line terminator.
Insert a final NUL character.
You can now use the vector, to get the pointer, that will let you
access the corresponding value.
When you are finished with your dictionary, deallocate the memory, let the vector
die when going out of scope.
[EDIT]
This can be a little more complicated on the dos platform, as the line terminator is CRLF.
In that case, use strstr to find it, and increment by 2 to find the start of the next line.

What sort of a container are you storing your strings in? If it's a std::vector of CStringW and if you haven't reserve-ed enough memory beforehand, you're bound to take a hit. A vector typically resizes once it reaches it's limit (which is not very high) and then copies out the entirety to the new memory location which is can give you a big hit. As your vector grows exponentially (i.e. if initial size is 1, next time it allocates 2, 4 next time onwards, the hit becomes less and less frequent).
It also helps to know how long the individual strings are. (At times :)

Thanks all of you for your insightful comments. Upvotes for you! :-)
I must admit I wasn't prepared for this at all - that C# would beat the living crap out of good old C++ in this way. Please don't read that as an offence to C++, but instead what an amazingly good memory manager that sits inside the .NET Framework.
I decided to take a step back and fight this battle in the InterOp arena instead! That is, I'll keep my C# code and let my old C++ code talk to the C# code over a COM interface.
A lot of questions were asked about my code and I'll try to answer some of them:
The compiler was Visual Studio 2008 and no, I wasn't running a debug build.
The file was read with an UTF8 file reader which I downloaded from a Microsoft employee who published it on their site. It returned CStringW's and about 30% of the time was actually spent there just reading the file.
The container I stored the strings in was just a fixed size vector of pointers to CStringW's and it was never resized.
EDIT: I'm convinced that the suggestions I was given would indeed work, and that I probably could beat the C# code if I invested enough time in it. On the other hand, doing so would provide no customer value at all and the only reason to pull through with it would be just to prove that it could be done...

The problem is not in the CString, but rather that you are allocating a lot of small objects - the default memory allocator isn't optimized for this.
Write your own allocator - allocate a big chunk of memory and then just advance a pointer in it when allocating. This what actually the .NET allocator does. When you are ready delete the whole buffer.
I think there was sample of writing custom new/delete operators in (More) Effective C++

Load the string to a single buffer, parse the text to replace line breaks with string terminators ('\0'), and use pointers into that buffer to add to the set.
Alternatively - e.g. if you have to do an ANSI/UNICODE conversion during load - use a chunk allocator, that sacrifices deleting individual elements.
class ChunkAlloc
{
std::vector<BYTE> m_data;
size_t m_fill;
public:
ChunkAlloc(size_t chunkSize) : m_data(size), m_fill(0) {}
void * Alloc(size_t size)
{
if (m_data.size() - m_fill < size)
{
// normally, you'd reserve a new chunk here
return 0;
}
void * result = &(m_data[m_fill]);
m_fill += size;
return m_fill;
}
}
// all allocations from chuunk are freed when chung is destroyed.
Wouldn't hack that together in ten minutes, but 30 minutes and some testing sounds fine :)

When working with string classes, you should always have a look at unnecessary operations, for example, don't use constructors, concatenation and such operations too often, especially avoid them in loops. I suppose there's some character coding reason you use CStringW, so you probably can't use something different, this would be another way to optimize your code.

It's no wonder that CLR's memory management is better than the bunch of old and dirty tricks MFC is based on: it is at least two times younger than MFC itself, and it is pool-based. When I had to work on similar project with string arrays and WinAPI/MFC, I just used std::basic_string instantiated with WinAPI's TCHAR and my own allocator based on Loki::SmallObjAllocator. You can also take a look at boost::pool in this case (if you want it to have an "std feel" or have to use a version of VC++ compiler older than 7.1).

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Variable Length Array Performance Implications (C/C++) - c++

You can fold the two successive calls to write() in your option 3 into a single call using writev(). http://pubs.opengroup.org/onlinepubs/009696799/functions/writev.html

Related

How to determinte how much space to allocate for boost::stacktrace::safe_dump_to?

C++ fill an empty buffer with a single value

Defending classes with 'magic numbers'

Efficiently collect data from multiple 1-D arrays in to a single 1-D array

C++ string memory management

Categories

Resources