C++ string memory management - c++

Last week I wrote a few lines of code in C# to fire up a large text file (300,000 lines) into a Dictionary. It took ten minutes to write and it executed in less than a second.
Now I'm converting that piece of code into C++ (because I need it in an old C++ COM object). I've spent two days on it this far. :-( Although the productivity difference is shocking on its own, it's the performance that I would need some advice on.
It takes seven seconds to load, and even worse: it takes just exactly that much time to free all the CStringWs afterwards. This is not acceptable, and I must find a way to increase the performance.
Are there any chance that I can allocate this many strings without seeing this horrible performace degradation?
My guess right now is that I'll have to stuff all the text into a large array and then let my hash table point to the beginning of each string within this array and drop the CStringW stuff.
But before that, any advice from you C++ experts out there?
EDIT: My answer to myself is given below. I realized that that is the fastest route for me, and also step in what I consider the right direction - towards more managed code.

This sounds very much like the Raymond Chen vs Rico Mariani's C++ vs C# Chinese/English dictionary performance bake off. It took Raymond several iterations to beat C#.
Perhaps there are ideas there that would help.
http://blogs.msdn.com/ricom/archive/2005/05/10/performance-quiz-6-chinese-english-dictionary-reader.aspx

You are stepping into the shoes of Raymond Chen. He did the exact same thing, writing a Chinese dictionary in unmanaged C++. Rico Mariani did too, writing it in C#. Mr. Mariani made one version. Mr. Chen wrote 6 versions, trying to match the perf of Mariani's version. He pretty much rewrote significant chunks of the C/C++ runtime library to get there.
Managed code got a lot more respect after that. The GC allocator is impossible to beat. Check this blog post for the links. This blog post might interest you too, instructive to see how the STL value semantics are part of the problem.

Yikes. get rid of the CStrings...
try a profiler as well.
are you sure you were not just running debug code?
use std::string instead.
EDIT:
I just did a simple test of ctor and dtor comparisons.
CStringW seems to take between 2 and 3 times the time to do a new/delete.
iterated 1000000 times doing new/delete for each type. Nothing else - and a GetTickCount() call before and after each loop. Consistently get twice as long for CStringW.
That doesn't address your entire issue though I suspect.
EDIT:
I also don't think that using string or CStringW is the real the problem - there is something else going on that is causing your issue.
(but for god's sake, use stl anyway!)
You need to profile it. That is a disaster.

If it is a read-only dictionary then the following should work for you.
Use fseek/ftell functionality, to find the size of the text file.
Allocate a chunk of memory of that size + 1 to hold it.
fread the entire text file, into your memory chunk.
Iterate though the chunk.
push_back into a vector<const char *> the starting address of each line.
search for the line terminator using strchr.
when you find it, deposit a NUL, which turns it into a string.
the next character is the start of the next line
until you do not find a line terminator.
Insert a final NUL character.
You can now use the vector, to get the pointer, that will let you
access the corresponding value.
When you are finished with your dictionary, deallocate the memory, let the vector
die when going out of scope.
[EDIT]
This can be a little more complicated on the dos platform, as the line terminator is CRLF.
In that case, use strstr to find it, and increment by 2 to find the start of the next line.

What sort of a container are you storing your strings in? If it's a std::vector of CStringW and if you haven't reserve-ed enough memory beforehand, you're bound to take a hit. A vector typically resizes once it reaches it's limit (which is not very high) and then copies out the entirety to the new memory location which is can give you a big hit. As your vector grows exponentially (i.e. if initial size is 1, next time it allocates 2, 4 next time onwards, the hit becomes less and less frequent).
It also helps to know how long the individual strings are. (At times :)

Thanks all of you for your insightful comments. Upvotes for you! :-)
I must admit I wasn't prepared for this at all - that C# would beat the living crap out of good old C++ in this way. Please don't read that as an offence to C++, but instead what an amazingly good memory manager that sits inside the .NET Framework.
I decided to take a step back and fight this battle in the InterOp arena instead! That is, I'll keep my C# code and let my old C++ code talk to the C# code over a COM interface.
A lot of questions were asked about my code and I'll try to answer some of them:
The compiler was Visual Studio 2008 and no, I wasn't running a debug build.
The file was read with an UTF8 file reader which I downloaded from a Microsoft employee who published it on their site. It returned CStringW's and about 30% of the time was actually spent there just reading the file.
The container I stored the strings in was just a fixed size vector of pointers to CStringW's and it was never resized.
EDIT: I'm convinced that the suggestions I was given would indeed work, and that I probably could beat the C# code if I invested enough time in it. On the other hand, doing so would provide no customer value at all and the only reason to pull through with it would be just to prove that it could be done...

The problem is not in the CString, but rather that you are allocating a lot of small objects - the default memory allocator isn't optimized for this.
Write your own allocator - allocate a big chunk of memory and then just advance a pointer in it when allocating. This what actually the .NET allocator does. When you are ready delete the whole buffer.
I think there was sample of writing custom new/delete operators in (More) Effective C++

Load the string to a single buffer, parse the text to replace line breaks with string terminators ('\0'), and use pointers into that buffer to add to the set.
Alternatively - e.g. if you have to do an ANSI/UNICODE conversion during load - use a chunk allocator, that sacrifices deleting individual elements.
class ChunkAlloc
{
std::vector<BYTE> m_data;
size_t m_fill;
public:
ChunkAlloc(size_t chunkSize) : m_data(size), m_fill(0) {}
void * Alloc(size_t size)
{
if (m_data.size() - m_fill < size)
{
// normally, you'd reserve a new chunk here
return 0;
}
void * result = &(m_data[m_fill]);
m_fill += size;
return m_fill;
}
}
// all allocations from chuunk are freed when chung is destroyed.
Wouldn't hack that together in ten minutes, but 30 minutes and some testing sounds fine :)

When working with string classes, you should always have a look at unnecessary operations, for example, don't use constructors, concatenation and such operations too often, especially avoid them in loops. I suppose there's some character coding reason you use CStringW, so you probably can't use something different, this would be another way to optimize your code.

It's no wonder that CLR's memory management is better than the bunch of old and dirty tricks MFC is based on: it is at least two times younger than MFC itself, and it is pool-based. When I had to work on similar project with string arrays and WinAPI/MFC, I just used std::basic_string instantiated with WinAPI's TCHAR and my own allocator based on Loki::SmallObjAllocator. You can also take a look at boost::pool in this case (if you want it to have an "std feel" or have to use a version of VC++ compiler older than 7.1).

Related

Reading text into string vs. directly initialize string with text

I want to store in an array of strings many lines of text(the text is always the same). I can think of 2 ways to so:
One way:
string s[100]={"first_line","second_line",...,"100th_line"};
The other way would be
string s[100];
fstream fin("text.txt");
for (int i = 0; i < 100; i++)
fin.getline(s[i]);
text.txt:
first_line
second_line
...
100th_line
The actual number of lines will be around 500 and the length of each line will be 50-60 characters long.
So my question is: which way is faster/better?
L.E.: How can I put the text from the first method in another file and still be able to use the string s in my source.cpp? I want to do so because I don't want my source.cpp to get messy from all those lines of initialization.
Here some latency number every programmer should know:
memory read from cache: 0.5-7 nanoseconds
memory read from main memory: 100 nanoseconds
SSD disk access: 150 000 nanoseconds (reach location to read)
Hard disk access : 10 000 000 nanoseconds (reach location to read)
So what's the fastest for you ?
The first version will always be faster: the text is loaded together with your executable (no access overhead), and the string objects are constructed in memory (see assembly code online).
The second version will require several disk accesses (at least to open current directory, and to access the file), a couple of operating system actions (e.g. access control), not to forget the buffering of input in memory. Only then would the string objects be created in memory as in first version.
Fortunately, user don't notice nanoseconds and will probably not realize the difference: the human eye requires 13 ms to identify an image and the reaction time from eye to mouse is around 215 ms (215 000 000 nanoseconds)
So, my advice: no premature optimization. Focus on functionality (easy customization of content) and maintenability (e.g. easy localization if software used in several languages) before going too deep on performance.
In the grand scheme of things, with only 500 relatively short strings, which approach is better is mostly an academic question, with little practical difference.
But, if one wants to be picky, reading it from a file requires a little bit more work at runtime than just immediately initializing the string array. Also, you have to prepare for the possibility that the initialization file is missing, and handle that possibility, in some way.
Compiling-in the initial string values as part of the code avoids the need to do some error handling, and saves a little bit of time. The biggest win will be lack of a need to handle the possibility that the initialization file is missing. There's a direct relationship between the likelyhood that something might go wrong, and the actual number of things that could potentially go wrong.
I'd go with the first one, since its constructing the string directly inside the array, which is practically emplace (Or perhaps its moving them, if so I might be wrong), without any more operations, so its probably much better than reading from hard-disk and then doing the same procedure as the first method.
If the data does not change then hard code it into a source file.
If you ever need to change the data, for testing or maintenance, the data should be placed into a file.
Don't worry about execution speed until somebody complains. Concentrate your efforts on robust and easily readable code. Most of the time spent with applications is in maintenance. If you make your code easy to maintain, you will spend less time maintaining it.

Nearest possible solution to replace arrays with something that could check bounds c++

I have ~15'000 lines C/C++ program and somewhere in it - simple array is used out of boundaries(its my guess), cause of 'undefined behaviour' happens(well, no, my cdrom isn't opening randomly) but heap memory is modified from somewhere in code ! some defined integers memory just goes into inaccesible, random exceptions of memory where it shouldnt happen, and if i remove or change anything exceptions occurs elsewhere, strange and scarry...
So, i need to replace 100 arrays with anything what would check boundaries properly. And that anything would require minimal modifications.
Can i maybe create a class which mimics arrays behaviour but checks for boundaries, so i could change all arrays easly ? or which solution in this case would you offer ? I am kinda new in c++, any examples is gold for me.
I am using Windows 7,
by saying simple array i mean:
int data[400];
data[20] = 4; // its fine
data[-13] = 9; // opens cdrom, or formats hard drive, or works till your windows gets updated
a simple, safe c++ array that checks bounds... have you heard of std::array?
http://en.cppreference.com/w/cpp/container/array/at
Unexpected modification of heap memory could be due to array boundary violations.
It could also be due to dereferencing dangling pointers. In my experience, that's even more likely.
Across 15000 lines of code, the fastest way you could solve this mystery would be to invest the time to figure out how to use a tool like valgrind.
Since you say you're kinda new in C++, you should note that the standard library's template containers (like the bounds-check-able std::array), range-based for loops, and smart pointers go far to prevent both the issues I mention.
Simplest thing you could try is to link with debugging malloc library. No code changes, and it will likely to catch overruns in heap allocated arrays. If no catches then likely it is something bad on stack
On linux most used is electric fence, helped me a lot
There are ports of this library to Windows,
http://sourceforge.net/projects/duma/
https://code.google.com/p/electric-fence-win32/

Variable Length Array Performance Implications (C/C++)

I'm writing a fairly straightforward function that sends an array over to a file descriptor. However, in order to send the data, I need to append a one byte header.
Here is a simplified version of what I'm doing and it seems to work:
void SendData(uint8_t* buffer, size_t length) {
uint8_t buffer_to_send[length + 1];
buffer_to_send[0] = MY_SPECIAL_BYTE;
memcpy(buffer_to_send + 1, buffer, length);
// more code to send the buffer_to_send goes here...
}
Like I said, the code seems to work fine, however, I've recently gotten into the habit of using the Google C++ style guide since my current project has no set style guide for it (I'm actually the only software engineer on my project and I wanted to use something that's used in industry). I ran Google's cpplint.py and it caught the line where I am creating buffer_to_send and threw some comment about not using variable length arrays. Specifically, here's what Google's C++ style guide has to say about variable length arrays...
http://google-styleguide.googlecode.com/svn/trunk/cppguide.xml#Variable-Length_Arrays_and_alloca__
Based on their comments, it appears I may have found the root cause of seemingly random crashes in my code (which occur very infrequently, but are nonetheless annoying). However, I'm a bit torn as to how to fix it.
Here are my proposed solutions:
Make buffer_to_send essentially a fixed length array of a constant length. The problem that I can think of here is that I have to make the buffer as big as the theoretically largest buffer I'd want to send. In the average case, the buffers are much smaller, and I'd be wasting about 0.5KB doing so each time the function is called. Note that the program must run on an embedded system, and while I'm not necessarily counting each byte, I'd like to use as little memory as possible.
Use new and delete or malloc/free to dynamically allocate the buffer. The issue here is that the function is called frequently and there would be some overhead in terms of constantly asking the OS for memory and then releasing it.
Use two successive calls to write() in order to pass the data to the file descriptor. That is, the first write would pass only the one byte, and the next would send the rest of the buffer. While seemingly straightforward, I would need to research the code a bit more (note that I got this code handed down from a previous engineer who has since left the company I work for) in order to guarantee that the two successive writes occur atomically. Also, if this requires locking, then it essentially becomes more complex and has more performance impact than case #2.
Note that I cannot make the buffer_to_send a member variable or scope it outside the function since there are (potentially) multiple calls to the function at any given time from various threads.
Please let me know your opinion and what my preferred approach should be. Thanks for your time.
You can fold the two successive calls to write() in your option 3 into a single call using writev().
http://pubs.opengroup.org/onlinepubs/009696799/functions/writev.html
I would choose option 1. If you know the maximum length of your data, then allocate that much space (plus one byte) on the stack using a fixed size array. This is no worse than the variable length array you have shown because you must always have enough space left on the stack otherwise you simply won't be able to handle your maximum length (at worst, your code would randomly crash on larger buffer sizes). At the time this function is called, nothing else will be using the further space on your stack so it will be safe to allocate a fixed size array.

Is using strings this way inefficient?

I am a newbee to c++ and am running into problems with my teacher using strings in my code. Though it is clear to me that I have to stop doing that in her class, I am curious as to why it is wrong. In this program the five strings I assigned were going to be reused no less than 4 to 5 times, therefore I put the text into strings. I was told to stop doing it as it is inefficient. Why? In c++ are textual strings supposed to be typed out as opposed to being stored into strings, and if so why? Below is some of the program, please tell me why it is bad.
string Bry = "berries";
string Veg = "vegetables";
string Flr = "flowers";
string AllStr;
float Tmp1, Precip;
int Tmp, FlrW, VegW, BryW, x, Selct;
bool Cont = true;
AllStr = Flr + ", " + Bry + ", " + "and " + Veg;
Answering whether using strings is inefficient is really something that very much depends on how you're using them.
First off, I would argue that you should be using C++ strings as a default - only going to raw C strings if you actually measure and find C++ strings to be too slow. The advantages (primarily for security) are just too great - it's all too easy to screw up buffer management with raw C strings. So I would disagree with your teacher that this is overly inefficient.
That said, it's important to understand the performance implications of using C++ strings. Since they are always dynamically allocated, you may end up spending a lot of time copying and reallocating buffers. This is usually not a problem; usually there are other things which take up much more time. However, if you're doing this right in the middle of a loop that's critical to your program's performance, you may need to find another method.
In short, premature optimization is usually a bad idea. Write code that is obviously correct, even if it takes ever-so-slightly longer to run. But be aware of the costs and trade-offs you're making at the same time; that way, if it turns out that C++ strings are actually slowing your program down a lot, you'll know what to change to fix that.
Yes, it's fairly inefficient, for following reasons:
When you construct a std::string object, it has to allocate a storage space for the string content (which may or may not be a separate dynamic memory allocation, depending on whether small-string optimization is in effect) and copy the literal string that is parameter of the constructor. For example, when you say: string Bry = "berries" it allocates a separate memory block (potentially from the dynamic memory), then copies "berries" to that block.
So you potentially have an extra dynamic memory allocation (costing time),
have to perform the copy (costing more time),
and end-up with 2 copies of the same string (costing space).
Using std::string::operator+ produces a new string that is the result of concatenation. So when you write several + operators in a row, you have several temporary concatenation results and a lot of unnecessary copying.
For your example, I recommend:
Using string literals unless you actually need the functionality only available in std::string.
Using std::stringstream to concatenate several strings together.
Normally, code readability is preferred over micro-optimizations of this sort, but luckily you can have both performance and readability in this case.
Your teacher is both right and wrong. S/he's right that building up strings from substrings at runtime is less CPU-efficient than simply providing the fully pre-built string in the code to start with -- but s/he's wrong in thinking that efficiency is necessarily an important factor to worry about in this case.
In a lot of cases, efficiency simply doesn't matter. At all. For example, if your code above is only going to be executed rarely (e.g. no more than once per second), then it's going to be literally impossible to measure any difference between the "most efficient version" and your not-so-efficient version. Given that, it's quite justifiable to decide that other factors (such as code readability and maintainability) are more important than maximizing efficiency.
Of course, if your program is going to be reconstructing these strings thousands or millions of times per second, then making sure your code is maximally efficient, even at the expense of readability/maintainability, is a good tradeoff to make. But I doubt that is the case here.
Your approach is almost perfect - try and declare everything only once. But if it is not used more than once - dont wast you fingers typing it :-) ie a 10 line program
The only change I would suggest is to make the strings const to help the compiler optimize you program.
If you instructor still disagrees - get a new instructor.
it is inefficient. doing that last line right would be 4-5 times faster.
at the very least you should use +=
+= means that you would avoid creating new strings with the + operator.
The instructor knows that when you do a string = string + string C++ creates a new string that is immediately destroyed.
Efficiency is probably not is good argument to not use string in school assignments but yes, if I am a teacher and the topic is not about some very high level applications, I don't want my students using string.
The real reason is string hides the low level memory management. A student coming out of college should have the basic memory management skill. Though nowadays in working environment, programmers don't deal with the memory management in most of the time but there are always situations where you need to understand what's happening under the hood to be able to reason the problem you are encountering.
With the context given, it looks like you should just be able to declare AllString as a const or string literal without all the substrings and addition. Assuming there's more to it, declaring them as literal string objects allocates memory at runtime. (And, not that there is any practical impact here, but you should be aware that stl container objects sometimes allocate a default minimum of space that is larger than the number of things initially in it, as part of its optimizations in anticipation of later modifying operations. I'm not sure if std::string does so on an declare/assign or not.) If you are only ever going to use them as literals, declaring them as a const char* or in a #define is easier on both memory and runtime performance, and you can still use them as r-values in string operations. If you are using them other ways in code you are not showing us, then its up to whether they need to ever be changed or manipulated as to whether they need to be strings or not.
If you are trying to learn coding, inefficiencies that don't matter in practice are still things you should be aware of and avoid if unnecessary. In production code, there are sometimes reasons to do something like this, but if it is not for any good reason, it's just sloppy. She's right to point it out, but what she should be doing is using that as a starting point for a conversation about the various tradeoffs involved - memory, speed, readability, maintainability, etc. If she's a teacher, she should be looking for "teaching moments" like this rather than just an opportunity to scold.
you can use string.append() ;
its better than + or +=

NULL terminated string and its length

I have a legacy code that receives some proprietary, parses it and creates a bunch of static char arrays (embedded in class representing the message), to represent NULL strings. Afterwards pointers to the string are passed all around and finally serialized to some buffer.
Profiling shows that str*() methods take a lot of time.
Therefore I would like to use memcpy() whether it's possible. To achive it I need a way to associate length with pointer to NULL terminating string. I though about:
Using std::string looks less efficient, since it requires memory allocation and thread synchronization.
I can use std::pair<pointer to string, length>. But in this case I need to maintain length "manually".
What do you think?
use std::string
Profiling shows that str*() methods
take a lot of time
Sure they do ... operating on any array takes a lot of time.
Therefore I would like to use memcpy()
whether it's possible. To achive it I
need a way to associate length with
pointer to NULL terminating string. I
though about:
memcpy is not really any slower than strcpy. In fact if you perform a strlen to identify how much you are going to memcpy then strcpy is almost certainly faster.
Using std::string looks less
efficient, since it requires memory
allocation and thread synchronization
It may look less efficient but there are a lot of better minds than yours or mine that have worked on it
I can use std::pair. But in this case I need to
maintain length "manually".
thats one way to save yourself time on the length calculation. Obviously you need to maintain the length manually. This is how windows BSTRs work, effectively (though the length is stored immediately prior, in memory, to the actual string data). std::string. for example, already does this ...
What do you think?
I think your question is asked terribly. There is no real question asked which makes answering next to impossible. I advise you actually ask specific questions in the future.
Use std::string. It's an advice already given, but let me explain why:
One, it uses a custom memory allocation scheme. Your char* strings are probably malloc'ed. That means they are worst-case aligned, which really isn't needed for a char[]. std::string doesn't suffer from needless alignment. Furthermore, common implementatios of std::string use the "Small String Optimization" which eliminates a heap allocation altogether, and improves locality of reference. The string size will be on the same cache line as the char[] itself.
Two, it keeps the string length, which is indeed a speed optimization. Most str* functions are slower because they don't have this information up front.
A second option would be a rope class, e.g. from SGI. This be more efficient by eliminating some string copies.
Your post doesn't explain where the str*() function calls are coming from; passing around char * certainly doesn't invoke them. Identify the sites that actually do the string manipulation and then try to find out if they're doing so inefficiently. One common pitfall is that strcat first needs to scan the destination string for the terminating 0 character. If you call strcat several times in a row, you can end up with a O(N^2) algorithm, so be careful about this.
Replacing strcpy by memcpy doesn't make any significant difference; strcpy doesn't do an extra pass to find the length of the string, it's simply (conceptually!) a character-by-character copy that stops when it encounters the terminating 0. This is not much more expensive than memcpy, and always cheaper than strlen followed by memcpy.
The way to gain performance on string operations is to avoid copies where possible; don't worry about making the copying faster, instead try to copy less! And this holds for all string (and array) implementations, whether it be char *, std::string, std::vector<char>, or some custom string / array class.
What do I think? I think that you should do what everyone else obsessed with pre-optimization does. You should find the most obscure, unmaintainable, yet intuitively (to you anyway) high-performance way you can and do it that way. Sounds like you're onto something with your pair<char*,len> with malloc/memcpy idea there.
Whatever you do, do NOT use pre-existing, optimized wheels that make maintenence easier. Being maintainable is simply the least important thing imaginable when you're obsessed with intuitively measured performance gains. Further, as you well know, you're quite a bit smarter than those who wrote your compiler and its standard library implementation. So much so that you'd be seriously silly to trust their judgment on anything; you should really consider rewriting the entire thing yourself because it would perform better.
And ... the very LAST thing you'll want to do is use a profiler to test your intuition. That would be too scientific and methodical, and we all know that science is a bunch of bunk that's never gotten us anything; we also know that personal intuition and revelation is never, ever wrong. Why waste the time measuring with an objective tool when you've already intuitively grasped the situation's seemingliness?
Keep in mind that I'm being 100% honest in my opinion here. I don't have a sarcastic bone in my body.