unordered_set Visual Studio C++ Slow - c++

So I think there was a bad version of C++ where unordered_set was incredibly slow. This piece of code took one minute to execute:
int main() {
unordered_set<int> blah;
for (int i = 0; i < 1000000; ++i)
{
blah.insert(i);
}
cout << "done 2" << endl;
return 0;
}
It took about 40 seconds to get to the output statement, then took another 20 seconds to deallocate the object. It's C# counterpart with 10 times the insertions executes in about a second:
static void Main(string[] args)
{
HashSet<int> set = new HashSet<int>();
for (int i = 0; i < 10000000; ++i)
{
set.Add(i);
}
}
This caused my FacebookHackerCup solution to not run in time :(. How can this be fixed for someone using the visual studio C++ IDE? I don't know what version it runs via command line or how to upgrade it.

Run without debugging
Need to compile in Release mode
Potentially need to turn on optimizations by going to project properties -> Configuration Properties -> C/C++ -> Optimization -> Maximize Speed (/O2). Mine was already set for Release, not for Debug.
This makes the original code run in about a second, and the 10,000,000 variant to run in about 8 seconds (so still about ~4 times slower than C# in Debug mode with debugging). EDIT: It seems C#'s speed does not greatly change with or without debugging or building in Release mode for this particular program.

Related

Severe & Bizzare performance issue

Update
Ok, I removed the 3 couts and replaced it with *buffer = 'a', and there was a big performance difference. Removing that line made the program 2x as fast. If you go on godbolt and compile it using msvc, that single line of code changes most of the program. (It adds a whole lot more complexity)
The following might seem extremely weird, but it's true on my computer:
Alright, so I was doing some benchmarking of some code, and I noticed extremely weird performance anomalies that were 100% consistent. I'm running windows-10 and visual-studio-2019. Basically, deleting a line of code that is never called completely changes the performance of the program.
Here is exactly what to do:
Create new VS-2019 Console C++ App project
Set the configuration to Release & x64
Paste the code below:
#include <iostream>
#include <chrono>
class Test {
public:
size_t length;
size_t doublingVal;
char* buffer;
Test() : length(0), doublingVal(2) {
buffer = static_cast<char*>(malloc(1));
}
~Test() {
std::cout << "called" << "\n";
std::cout << "called" << "\n";
std::cout << "called" << "\n"; // Remove this line and the time decreases DRASTICALLY (ie remove line 14)
}
void append() {
if (doublingVal == length) {
doublingVal <<= 1;
}
*buffer = 'a';
++length;
}
};
int main()
{
Test test;
auto start = std::chrono::high_resolution_clock::now();
for (size_t i = 0; i < static_cast<size_t>(1024) * 1024 * 1024 * 4; ++i) {
test.append();
}
std::cout << std::chrono::duration_cast<std::chrono::milliseconds>(std::chrono::high_resolution_clock::now() - start).count() << "\n";
}
Run the program using CTRL+F5, not in debug. Now remember how long it takes to run. (a few seconds)
Then, in the destructor of Test, remove the third line which has the comment.
Run the program again, and you should see that the performance increases drastically. I tested this exact same code with 4 different projects all brand new, and 3 different computers.
The destructor is called at the very end, when the entire program is finished measuring time. The extra cout shouldn't affect anything.
Edit:
You can also see a similar thing go on if you remove the 3 cout's and replace it with a single *buffer = 'a'. Then CTRL+F5 once again, record the time, and then remove that line we just added. Then run it again and the time magically decreases by half.
WTF is going on, and how do you solve the weird performance difference?

C/C++ elapsed process cycles, not including at breakpoints

See the following code, which is my attempt to print the time elapsed between loops.
void main()
{
while (true)
{
static clock_t timer;
clock_t t = clock();
clock_t elapsed = t - timer;
float elapsed_sec = (float)elapsed / (float)CLOCKS_PER_SEC;
timer = t;
printf("[dt:%d ms].\n", (int)(elapsed_sec*1000));
}
}
However, if I set a breakpoint and sit there for 10 seconds, when i continue execution the elapsed time includes that 5 seconds -- and I don't want it to, for my intended usage.
I assume clock() is the wrong function, but what is the correct one?
Note that if there IS no standard C or C++ single call for this -- well, how do you compute it? Is there a posix way?
I suspect that this is actually information only knowable with platform-specific calls. If that is the case, i'd like to at least know how to do so on windows (msvc).
Yes, trying to measure the CPU time of your process would be dependent on support from your operating system. Rather than look up what support is available from various operating systems, though, I would propose that your approach is flawed.
Debugging typically uses a debug build that has most optimizations turned off (to make it easier to do things like set breakpoints accurately). Timings on a non-optimized build lack practical value. Hence any timings of your program should usually be ignored when you are using breakpoints–even if the breakpoint is outside the timed section.
To combine using breakpoints with timings, I would do the debugging in two phases. In one phase, you set breakpoints and look at what is happening in the debug build. In the other phase, you use identical input (redirect a file into std::cin if it helps) and time the process in the release build. There may be some back-and-forth between the stages as you work out what is going on. That's fine; the point is not to have exactly two phases, but to keep breakpoints and timings separate.
Although JaMit gives a better answer (here), it is possible but it depends entirely on your compiler and the amount of overhead this creates will probably slow down your program too much to get an accurate result. You can use whatever time recording function you please but either way you would have to:
Record the start of the loop
Record the start of the breakpoint
Programmatically cause a breakpoint
Record the end of the breakpoint
Record the end of the loop.
If you're looking for speed though, you really need to be testing in an optimized release mode without breakpoints and writing the output to the console or a file. Nonetheless, it is possible to do what you're trying to do, and here's a sample solution.
#include <chrono>
#include <intrin.h> //include for visual studio break
#include <iostream>
int main(void) {
for (int c = 0; c < 100; c++) {
//Start of loop
auto start = std::chrono::duration_cast<std::chrono::milliseconds>(std::chrono::high_resolution_clock::now().time_since_epoch()).count();
/*Do stuff here*/
auto startBreak = std::chrono::duration_cast<std::chrono::milliseconds>(std::chrono::high_resolution_clock::now().time_since_epoch()).count();
//visual studio only
__debugbreak();
auto endBreak = std::chrono::duration_cast<std::chrono::milliseconds>(std::chrono::high_resolution_clock::now().time_since_epoch()).count();
/*Do more stuff here*/
auto end = std::chrono::duration_cast<std::chrono::milliseconds>(std::chrono::high_resolution_clock::now().time_since_epoch()).count();
/*Time for 1 pass of the loop, including all the records of getting the time*/
std::cout << end - start - (endBreak - startBreak) << "\n";
}
return 0;
}
given your objective: which is my attempt to print the time elapsed between loops.
in the C language:
Note that clock() returns the number of clock tics since the program started.
This code measures the elapsed time for each loop by showing the start/end time for each loop
#include <stdio.h>
#include <time.h>
int main( void )
{
clock_t loopStart = clock();
clock_t loopEnd;
for( int i=0; i< 10000; i++ )
{
loopEnd = clock();
// something here to be timed
printf("[%lf : %lf ms].\n", (double)loopStart/CLOCKS_PER_SEC, (double)loopEnd/CLOCKS_PER_SEC );
loopStart = loopEnd;
}
}
Of course, if you want to display the actual number of clock ticks per loop, then remove the division by CLOCKS_PER_SEC and calculate the difference and only display the difference

execution time not showing in vs c++ 2010

in visual c++ 2010 I'm timing the execution as follows
unsigned int start = clock();
//run some stuff
int time_ = clock() - start;
cout<<"this took "<<time_<<" msecs.";
cin.get();
return 0;
in debug mode, the time is reflective of the complexity but in release mode, while it feels a bit faster than debug is showing 0 for the time no matter what, is there setting that I'm not aware of? thanks.

destroying a protocol buffer message in debug mode is almost 500 times slower than in release mode

Use the following code with your own timing code around the call to delete msg in main(). When running in debug mode, it is taking 473 times as long, on average, as when running without debugging. Does anyone know why this is happening? If so, is there a way that I can get this code to run much faster in debug mode?
Note: I am using Visual Studio 2008 SP 1 on a Windows 7 machine.
// This file is generated by using the Google Protocol Buffers compiler
// to compile a PropMsg.proto file (contents of that file are listed below)
#include "PropMsg.pb.h"
void RawSerializer::serialize(int i_val, PropMsg * o_msg)
{
o_msg->set_v_int32(i_val);
}
void serialize(std::vector<int> const & i_val, PropMsg * o_msg)
{
for (std::vector<int>::const_iterator it = i_val.begin(); it != i_val.end(); ++it) {
PropMsg * objMsg = o_msg->add_v_var_repeated();
serialize( * it, objMsg);
}
}
int main()
{
std::vector<int> testVec(100000);
PropMsg * msg = new PropMsg;
serialize(testVec, msg);
delete msg; // Time this guy
}
PropMsg was created with the following .proto file definition:
option optimize_for = SPEED;
message PropMsg
{
optional int32 v_int32 = 7;
repeated PropMsg v_var_repeated = 101;
}
Here's some sample test output that I got:
datatype: class std::vector<int,class std::allocator<int> >
num runs: 10
num items: 100000
deserializing from PropMsg time: 0.0046
serializing to PropMsg time: 0.0426
reading from disk time: 0.7195
writing to disk time: 0.0298
deallocating PropMsg time: 8.99
Notice how this is NOT IO-bound.
STL containers in VS Debug are notoriously slow. Game programmers forums are rife with complaints about this. Often people choose alternative implementations. However, from what I've read, you can get a performance boost up front by disabling iterator debugging/checking:
#define _HAS_ITERATOR_DEBUGGING 0
#define _SECURE_SCL 0
Other things that can affect debug performance are excessive calls to new and delete. Memory pools can help with that. You haven't provided details for PropMsg::add_v_var_repeated() or PropMsg::~PropMsg(), so I can't comment. But I assume there's a vector or other STL container inside that class?

Why does my STL code run so slowly when I have the debugger/IDE attached?

I'm running the following code, using Visual Studio 2008 SP1, on Windows Vista Business x64, quad core machine, 8gb ram.
If I build a release build, and run it from the command line, it reports 31ms. If I then start it from the IDE, using F5, it reports 23353ms.
Here are the times: (all Win32 builds)
DEBUG, command line: 421ms
DEBUG, from the IDE: 24,570ms
RELEASE, command line: 31ms
RELEASE, from IDE: 23,353ms
code:
#include <windows.h>
#include <iostream>
#include <set>
#include <algorithm>
using namespace std;
int runIntersectionTestAlgo()
{
set<int> set1;
set<int> set2;
set<int> intersection;
// Create 100,000 values for set1
for ( int i = 0; i < 100000; i++ )
{
int value = 1000000000 + i;
set1.insert(value);
}
// Create 1,000 values for set2
for ( int i = 0; i < 1000; i++ )
{
int random = rand() % 200000 + 1;
random *= 10;
int value = 1000000000 + random;
set2.insert(value);
}
set_intersection(set1.begin(),set1.end(), set2.begin(), set2.end(), inserter(intersection, intersection.end()));
return intersection.size();
}
int main(){
DWORD start = GetTickCount();
runIntersectionTestAlgo();
DWORD span = GetTickCount() - start;
std::cout << span << " milliseconds\n";
}
Running under a Microsoft debugger (windbg, kd, cdb, Visual Studio Debugger) by default forces Windows to use the debug heap instead of the default heap. On Windows 2000 and above, the default heap is the Low Fragmentation Heap, which is insanely good compared to the debug heap. You can query the kind of heap you are using with HeapQueryInformation.
To solve your particular problem, you can use one of the many options recommended in this KB article: Why the low fragmentation heap (LFH) mechanism may be disabled on some computers that are running Windows Server 2003, Windows XP, or Windows 2000
For Visual Studio, I prefer adding _NO_DEBUG_HEAP=1 to Project Properties->Configuration Properties->Debugging->Environment. That always does the trick for me.
Pressing pause while in the VS IDE shows that the additional time appears to be spent in malloc/free. This would lead me to believe the debugging support in MS's malloc and free implementation have additional logic if the debugger is attached. This would explain the discrepancy in times from the console and from the debugger.
EDIT: Confirmed by running with CTRL+F5 v. F5 (1047ms v. 9088ms on my machine)
So it sounds like this may just be what happens when one attaches the debugger. However, I just can't my head around the performance changing from 30ms to 23,000ms because of that, especially when the rest of my code seems to run just as fast whether or not the debugger is attached.