Surprising performance degradation with std::vector and std::string - c++

I am processing really large text file in following way:
class Loader{
template<class READER>
bool loadFile(READER &reader){
/* for each line of the input file */ {
processLine_(line);
}
}
bool processLine_(std::string_view line){
std::vector<std::string> set; // <-- here
std::string buffer; // <-- here
// I can not do set.reserve(),
// because I have no idea how much items I will put.
// do something...
}
void printResult(){
// print aggregated result
}
}
The processing of 143,000,000 records take around 68 minutes.
So I decided to do some very tricky optimizations with several std::array buffers. Result was about 62 minutes.
However the code become very unreadable so I decided not to use them in production.
Then I decided to do partial optimization, e.g.
class Loader{
template<class READER>
bool loadFile(READER &reader);
std::vector<std::string> set; // <-- here
std::string buffer; // <-- here
bool processLine_(std::string_view line){
set.clear();
// do something...
}
void printResult();
}
I was hoping this will reduce malloc / free (new[] / delete[]) operation from buffer and from the set vector. I realize the strings inside the set vector still dynamically allocate memory.
However result went to 83 minutes.
Note I do not change anything except I move set and buffer on "class" level. I use them only inside processLine_ method.
Why is that?
Locality of reference?
Only explanation I think about is some strings to be small enough and to fit in SSO, but this sounds unlikely.
Using clang with -O3

I did profile and I found that most of the time is spent in a third party C library.
I supposed this library to be very fast, but this was not the case.
I am still puzzling with the slowdown, but even if I optimize it, it wont make such a big difference.

Related

std::map very slow if contents were deleted

I've discovered a stange behaviour and I don't know, why it is as it is. Look at this code here:
std::map<size_t, std::vector<Resource*>> bla;
for(int i = 0; i<100000; i++) {
std::vector<Resource*> blup;
blup.push_back(new Resource());
bla[i] = blup;
}
for (auto& resources : bla) {
for (auto resource : resources.second) {
delete resource; // <---- this delete here
}
resources.second.clear();
}
bla.clear(); // <---- this clear here
If I run this program in eclipse-debugger, the clear() on the last line took several seconds (too long in my opinion). For bigger maps (>10M elements) it needs up to several minutes(!).
But if I comment out the delete statement in the inner loop, then the clear() function become very fast (as fast as I expect it to be).
Without debugger the code is fast in both cases (with and without delete).
The class "Resource" is a small container class, containinig 2 uint64_t's and a std::string (and ofcourse some methods).
Why is clear() slow? And why it speeds up, if I don't delete the pointers. I don't understand it, because I even clear the vectors in the map. So the map should not see any pointers, it is a map between uint64_t and std::vectors.
I am using MinGw_64 for Windows (msys64, msys2-x86_64-20210419), and Eclipse.
Now I have switched to map-only without vectors. And all is fine with this. Clear is super fast, even, if I delete the pointers. But without vectors I have to think about my algorithm a little bit.

most efficient way to convert char vector to string

I have these large pcap files of market tick data. On average they are 20gb each. The files are divided into packets. Packets are divided into a header and messages. Messages are divided into a header and fields. Fields are divided into a field code and field value.
I am reading the file a character at a time. I have a file reader class that reads the characters and passes the characters by const ref to 4 call back functions, on_packet_delimiter, on_header_char, on_message_delimiter, on_message_char. The message object uses a similar function to construct its fields.
Up to here I've noticed little loss of efficiency as compared to just reading the chars and not doing anything with them.
The part of my code, where I'm processing the message header and extracting the instrument symbol of the message, slows down the process considerable.
void message::add_char(const char& c)
{
if (!message_header_complete) {
if (is_first_char) {
is_first_char = false;
if (is_lower_case(c)) {
first_prefix = c;
} else {
symbol_vector.push_back(c);
}
} else if (is_field_delimiter(c)) {
on_message_header_complete();
on_field_delimiter(c);
} else {
symbol_vector.push_back(c);
}
} else {
// header complete, collect field information
if (is_field_delimiter(c)) {
on_field_delimiter(c);
} else {
fp->add_char(c);
}
}
}
...
void message::on_message_header_complete()
{
message_header_complete = true;
symbol.assign(symbol_vector.begin(),symbol_vector.end());
}
...
In on_message_header_complete() I am feeding the chars to symbol_vector. Once header is complete I convert to string using vector iterator. Is this the most efficient way to do this?
In addition to The Quantum Physicist's answer: std::string should behave quite similar as vector does. Even the 'reserve' function is available in the string class, if you intend to use it for efficiency.
Adding the characters is just as easy as it can get:
std::string s;
char c = 's';
s += c;
You could add the characters directly to your member, and you are fine. But if you want to keep your member clean until the whole string is collected, you still should use a std::string object instead of the vector. You then add the characters to the temporary string and upon completion, you can swap the contents then. No copying, just pointer exchange (and some additional data such as capacity and size...).
How about:
std::string myStr(myVec.begin(), myVec.end());
Although this works, I don't understand why you need to use vectors in the first place. Just use std::string from the beginning, and use myStr.append() to add characters or strings.
Here's an example:
std::string myStr = "abcd";
myStr.append(1,'e');
myStr.append(std::string("fghi"));
//now myStr is "abcdefghi"

C/C++ (Other Languages Too?) Conditional Early Return Good Code Practice

Recently, I was reviewing some code I maintain and I noticed a practice different than what I am used to. As a result, I'm wondering which method to use when performing an early return in a function.
Here's some example:
Version 1:
int MyFunction(int* ptr)
{
if(!ptr) { // oh no, NULL pointer!
return -1; // what was the caller doing? :(
}
// other code goes here to do work on the pointer
// ...
return 0; // we did it!
}
Version 2:
int MyFunction(int* ptr)
{
if(!ptr) { // oh no, NULL pointer!
return -1; // what was the caller doing? :(
} else { // explicitly show that this only gets call when if statement fails
// other code goes here to do work on the pointer
// ...
return 0; // hooray!
}
}
As a result, I'm wondering which is considered the "best practice" for those of you who have endured (and survived) many code reviews. I know each effectively does the same thing, but does the "else" add much in terms of readability and clarity? Thanks for the help.
The else would only add clarity if the else clause is short, a few lines of code at best. And if you have several initial conditions you want to check, the source gets cluttered very quickly.
The only time I would use an else if it is a small function with a small else, meaning less than about 10 source lines, and there are no other initial checks to make.
In some cases I have used a single loop so that a series of initial checks can use a break to leave.
do {
...
} while (0);
I am loathe to use a goto which is practically guaranteed to get at least one true believer of goto less programming up in arms.
So much would depend on any code standards of your organization. I tend to like minimalism so I use the first version you provide without the else.
I might also do something like the following in a smaller function say less than 20 or 30 lines:
int MyFunction(int* ptr)
{
int iRetStatus = -1; // we have an error condition
if (ptr) { // good pointer
// stuff to do in this function
iRetStatus = 0;
}
return iRetStatus; // we did it!
}
The only problem with returns in the body of the function is that sometimes people scanning the function do not realize that there is a return. In small functions where everything can be pretty much seen on a single screen, the chance of missing a return is pretty small. However for large functions, returns in the middle can be missed especially large complex functions that have gone through several maintenance cycles and had a lot of cruft and work arounds put into them.

Is Boost Pool free efficiency O(n) or O(1)

Recently I've discovered Boos Pool library and started adapting it to my code. One thing that library mentions it was missing was a base class that would override new/delete operators for any class and use the pool for memory management. I wrote my own implementation and with some meta-template programming, it actually came out looking very decent (support any class with size between 1 and 1024 bytes by simply deriving from the base class)
I mentioned those things because so far this was really cool and exciting and then I found this post from Boost mailing list. It appears some people really hammer the Pool library and especially point out the inefficiency of free() method which they said runs in O(n) time. I stepped through the code and found this to be the implementation of that method:
void free(void * const chunk)
{
nextof(chunk) = first;
first = chunk;
}
To me this looks like O(1) and I really don't see the inefficiency they are talking about. One thing I did notice is that if you are using multiple instances of singleton_pool (i.e. different tags and/or allocation sizes), they all share the same mutex (critical section to be more precise) and this could be optimized a bit. But if you were using regular heap operations, they'd use the same form of synchronization.
So does anyone else consider Pool library to be inefficient and obsolete?
That free sure does look constant time to me. Perhaps the author of the post was referring to ordered_free, which has this implementation:
void ordered_free(void * const chunk)
{
// This (slower) implementation of 'free' places the memory
// back in the list in its proper order.
// Find where "chunk" goes in the free list
void * const loc = find_prev(chunk);
// Place either at beginning or in middle/end
if (loc == 0)
(free)(chunk);
else
{
nextof(chunk) = nextof(loc);
nextof(loc) = chunk;
}
}
Where find_prev is as follows
template <typename SizeType>
void * simple_segregated_storage<SizeType>::find_prev(void * const ptr)
{
// Handle border case
if (first == 0 || std::greater<void *>()(first, ptr))
return 0;
void * iter = first;
while (true)
{
// if we're about to hit the end or
// if we've found where "ptr" goes
if (nextof(iter) == 0 || std::greater<void *>()(nextof(iter), ptr))
return iter;
iter = nextof(iter);
}
}

Is throwing an exception a healthy way to exit?

I have a setup that looks like this.
class Checker
{ // member data
Results m_results; // see below
public:
bool Check();
private:
bool Check1();
bool Check2();
// .. so on
};
Checker is a class that performs lengthy check computations for engineering analysis. Each type of check has a resultant double that the checker stores. (see below)
bool Checker::Check()
{ // initilisations etc.
Check1();
Check2();
// ... so on
}
A typical Check function would look like this:
bool Checker::Check1()
{ double result;
// lots of code
m_results.SetCheck1Result(result);
}
And the results class looks something like this:
class Results
{ double m_check1Result;
double m_check2Result;
// ...
public:
void SetCheck1Result(double d);
double GetOverallResult()
{ return max(m_check1Result, m_check2Result, ...); }
};
Note: all code is oversimplified.
The Checker and Result classes were initially written to perform all checks and return an overall double result. There is now a new requirement where I only need to know if any of the results exceeds 1. If it does, subsequent checks need not be carried out(it's an optimisation). To achieve this, I could either:
Modify every CheckN function to keep check for result and return. The parent Check function would keep checking m_results. OR
In the Results::SetCheckNResults(), throw an exception if the value exceeds 1 and catch it at the end of Checker::Check().
The first is tedious, error prone and sub-optimal because every CheckN function further branches out into sub-checks etc.
The second is non-intrusive and quick. One disadvantage is I can think of is that the Checker code may not necessarily be exception-safe(although there is no other exception being thrown anywhere else). Is there anything else that's obvious that I'm overlooking? What about the cost of throwing exceptions and stack unwinding?
Is there a better 3rd option?
I don't think this is a good idea. Exceptions should be limited to, well, exceptional situations. Yours is a question of normal control flow.
It seems you could very well move all the redundant code dealing with the result out of the checks and into the calling function. The resulting code would be cleaner and probably much easier to understand than non-exceptional exceptions.
Change your CheckX() functions to return the double they produce and leave dealing with the result to the caller. The caller can more easily do this in a way that doesn't involve redundancy.
If you want to be really fancy, put those functions into an array of function pointers and iterate over that. Then the code for dealing with the results would all be in a loop. Something like:
bool Checker::Check()
{
for( std::size_t id=0; idx<sizeof(check_tbl)/sizeof(check_tbl[0]); ++idx ) {
double result = check_tbl[idx]();
if( result > 1 )
return false; // or whichever way your logic is (an enum might be better)
}
return true;
}
Edit: I had overlooked that you need to call any of N SetCheckResultX() functions, too, which would be impossible to incorporate into my sample code. So either you can shoehorn this into an array, too, (change them to SetCheckResult(std::size_t idx, double result)) or you would have to have two function pointers in each table entry:
struct check_tbl_entry {
check_fnc_t checker;
set_result_fnc_t setter;
};
check_tbl_entry check_tbl[] = { { &Checker::Check1, &Checker::SetCheck1Result }
, { &Checker::Check2, &Checker::SetCheck2Result }
// ...
};
bool Checker::Check()
{
for( std::size_t id=0; idx<sizeof(check_tbl)/sizeof(check_tbl[0]); ++idx ) {
double result = check_tbl[idx].checker();
check_tbl[idx].setter(result);
if( result > 1 )
return false; // or whichever way your logic is (an enum might be better)
}
return true;
}
(And, no, I'm not going to attempt to write down the correct syntax for a member function pointer's type. I've always had to look this up and still never ot this right the first time... But I know it's doable.)
Exceptions are meant for cases that shouldn't happen during normal operation. They're hardly non-intrusive; their very nature involves unwinding the call stack, calling destructors all over the place, yanking the control to a whole other section of code, etc. That stuff can be expensive, depending on how much of it you end up doing.
Even if it were free, though, using exceptions as a normal flow control mechanism is a bad idea for one other, very big reason: exceptions aren't meant to be used that way, so people don't use them that way, so they'll be looking at your code and scratching their heads trying to figure out why you're throwing what looks to them like an error. Head-scratching usually means you're doing something more "clever" than you should be.