Drastic performance difference across C++ compiler - c++

Whenever I use standard library containers stack, queue, deque or priority_queue, performance in Visual Studio becomes inexplicably slow. The same program that can run in gcc compiler (Qt Creator) within few seconds takes over a minute in Visual Studio.
Here is a simple program that uses BFS to check if a number can be transformed into a target number. Allowed transformations are x->x+1 and x->x/2.
Code:
#include <queue>
#include <stack>
#include <iostream>
#include <cstring>
#include <cstdio>
#include <chrono>
using namespace std;
using namespace std::chrono;
const int M=10000000;
int vis[M+1];
bool can(int s, int t) {
memset(vis, 0, sizeof(vis));
stack<int> q;
q.push(s);
int m=0;
vis[s]=true;
while(!q.empty()) {
m=max(m, (int)q.size());
int top=q.top();
q.pop();
if(top==t) return true;
if(top+1<=M && !vis[top+1]) {
q.push(top+1);
vis[top+1]=true;
}
if(!vis[top/2]) {
q.push(top/2);
vis[top/2]=true;
}
}
return false;
}
int main() {
vector <int> S {8769154, 9843630, 2222222, 1, 3456789};
vector<int> T {94383481, 1010101, 9999999, 9876543, 1};
high_resolution_clock::time_point t1=high_resolution_clock::now();
for(int i=0; i<S.size(); i++) {
cout<<can(S[i], T[i])<<endl;
}
high_resolution_clock::time_point t2=high_resolution_clock::now();
auto duration=std::chrono::duration_cast<std::chrono::microseconds>(t2 - t1).count();
cout<<"Execution time "<<duration/1000000.0<<" second";
return 0;
}
Output:
Visual Studio : Execution time 77.3784 second
Qt Creator : Execution time 4.79727 second
Output of The same program on using stack instead of queue:
Visual Studio : Execution time 114.896 second
Qt Creator : Execution time 4.35225 second
So Qt Creator runs almost 20-30 times faster than Visual Studio in this case. I have no idea why this happens. The performance difference is very little when I Don't use these STL containers.

As noted in the comments, Visual Studio is slow in debug mode. It's partially because optimizations are off, partially because the Standard Library implementation in VC++ has a lot of checks on iterator abuse.

Related

C++ 17 programming on visual studio 2017?

In title, I am specific about the task that I want to achieve. I want to utilize the c++17 features such as parallel STL etc. On visual studio 2017, I configure to c++17 under project properties for language. Even after doing this I get the error with #include that no execution file.
I am just starting with simple example of array addition in parallel with C++ 17 algorithms. How do I resolve this?
Source:
#include <stddef.h>
#include <stdio.h>
#include <algorithm>
#include <execution>
#include <chrono>
#include <random>
#include <ratio>
#include <vector>
using std::chrono::duration;
using std::chrono::duration_cast;
using std::chrono::high_resolution_clock;
using std::milli;
using std::random_device;
using std::sort;
using std::vector;
const size_t testSize = 1'000'000;
const int iterationCount = 5;
void print_results(const char *const tag, const vector<double>& sorted,
high_resolution_clock::time_point startTime,
high_resolution_clock::time_point endTime) {
printf("%s: Lowest: %g Highest: %g Time: %fms\n", tag, sorted.front(),
sorted.back(),
duration_cast<duration<double, milli>>(endTime - startTime).count());
}
int main() {
random_device rd;
// generate some random doubles:
printf("Testing with %zu doubles...\n", testSize);
vector<double> doubles(testSize);
for (auto& d : doubles) {
d = static_cast<double>(rd());
}
// time how long it takes to sort them:
for (int i = 0; i < iterationCount; ++i)
{
vector<double> sorted(doubles);
const auto startTime = high_resolution_clock::now();
sort(sorted.begin(), sorted.end());
const auto endTime = high_resolution_clock::now();
print_results("Serial", sorted, startTime, endTime);
}
}
and this is error:
Error C1083 Cannot open include file: 'execution': No such file or directory
Task that I want to achieve is that C++17 with CUDA GPU. Both new to me although not c++ in itself. But I am interested in parallel STL of C++17 with CUDA. I want to start from base. Any suggestions will help me?
Thanks,
Govind
Please check if the header file is included in the header file directory. the C++ headers path are:
1.C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.15.26726\include
2.C:\Program Files (x86)\Windows Kits\10\Include\10.0.17134.0\ucrt
The first contains standard C++ headers such as iostream. The second contains legacy C headers such as stdio.h.
If you are going to use C++ to develop desktop applications, I recommend you to refer to my setup.
Also I tested your code on VS2022 without any errors. So I suggest you to use a higher version of VS and install the environment you need.

Determine whether eigen has optimized code for SSE instructions or not

I am having a code which is using Eigen::vectors, I want to confirm that Eigen has optimized this code for SSE or not.
I am using Visual Studio 2012 Express, in which i can set the command line option "/Qvec-report:2" which gives the optimization details of C++ code. Is there any option in visual studio or Eigen which can tell me that code has been optimized or not?
My code is as below:
#include <iostream>
#include <vector>
#include <time.h>
#include<Eigen/StdVector>
int main(char *argv[], int argc)
{
int tempSize=100;
/** I am aligning these vectors as specfied on http://eigen.tuxfamily.org/dox/group__TopicStlContainers.html */
std::vector<Eigen::Vector3d,Eigen::aligned_allocator<Eigen::Vector3d>> eiVec(tempSize);
std::vector<Eigen::Vector3d,Eigen::aligned_allocator<Eigen::Vector3d>> eiVec1(tempSize);
std::vector<Eigen::Vector3d,Eigen::aligned_allocator<Eigen::Vector3d>> eiVec2(tempSize);
for(int i=0;i<100;i++)
{
eiVec1[i] = Eigen::Vector3d::Zero();
eiVec2[i] = Eigen::Vector3d::Zero();
}
Eigen::Vector3d *eV = &eiVec.front();
const Eigen::Vector3d *eV1 = &eiVec1.front();
const Eigen::Vector3d *eV2 = &eiVec2.front();
/** Below loop is not vectorized by visual studio due to code 1304:
Because here comes the operations at level of Eigen, I want to
know here whether Eigen has optimized this operation or not? */
for(int i=0;i<100;i++)
{
eV[i] = eV1[i] - eV2[i];
}
return 0;
}
Look at the asm output.
If you see SUBPD (packed double) inside the inner loop, it vectorized. If you only see SUBSD (scalar double) and no SUBPD anywhere, it didn't.

Cross platform way to prevent high cpu usage in while loop (Without boost)?

I have a server that I want to run, and it uses a cross platform library that only gives me a tick() to call:
int main()
{
Inst inst;
while(true)
{
inst.tick();
}
}
I need to try to lower the cpu usage so that it doesnt constantly take up 1 core.
Is there a simple way to do this without boost?
Thanks
#include <iostream>
#include <thread>
#include <chrono>
using namespace std;
int main()
{
//5 seconds
auto duration = chrono::duration<float>(5);
this_thread::sleep_for(duration);
return 0;
}
However, even if this code is completely fine, I can't seem to compile it with the provided MinGW compiler from Code::Blocks.

small program much slower in visual studio 2012 vs. visual studio 2005

We are using Visual Studio 2005. We are looking at maybe upgrading to Visual Studio 2012 once it is released. I tried this small program in Visual Studio 2012 RC and was surprised to see it ran more than 2X slower than it does in Visual Studio 2005. In VS2012 I used default release build settings. For me it takes about 20ms in VS2005 and about 50ms in VS2012. Why is it that much slower?
#include <windows.h>
#include <deque>
using namespace std;
deque<int> d;
int main(int argc, char* argv[])
{
const int COUNT = 5000000;
timeBeginPeriod(1);
for (int i = 0; i < COUNT; ++i)
{
d.push_back(i);
}
double sum = 0;
DWORD start = timeGetTime();
for (int i = 0; i < COUNT; ++i)
{
sum += d[i];
}
printf("time=%dms\n", timeGetTime() - start);
printf("sum=%f\n", sum);
return 0;
}
So we reposted this question to the Microsoft forum.
http://social.msdn.microsoft.com/Forums/en-US/vcgeneral/thread/72234b06-7622-445e-b359-88f572b4de52
The short answer is that the implementation of std::deque::operator[] in VS2012RC is just slower compared to VS2005. Other common stl containers tested as equal or faster. It will be interesting to retest when VS2012 is production to see if the operator[] performance is resolved.
ps
Hi Raphael
Karl
My suspicion is that you're running into thread-safety code and that 2012 configures your libraries for multi-threaded code by default, meaning there are a bunch of lock and unlock operations built into your deque accesses.
Try comparing the compiler and linker options of the two builds to see how they differ.
(I'd try this myself but I don't have a Windows system with the relevant software on it handy. Sorry.)
Try timing both of those loops separately. I bet the issue is that the stl container implementation is slower in the new compiler.
Err wait- I meant try timing something that doesn't use STL.

Freezing feelings towards Visual Studio

This code freezes VS2010 sp1:
// STC_14_1.cpp : Defines the entry point for the console application.
//
//#include "stdafx.h"
#include <exception>
#include <iostream>
#include <cstdlib>
using std::cerr;
using std::cout;
using std::cin;
void my_new_handler()
{
cerr << "Mem. alloc failed.";
std::exit(-1);
}
//std::unexpected_handler std::set_unexpected(std::unexpected_handler);
class STC
{
std::new_handler old;
public:
STC(std::new_handler n_h):old(std::set_new_handler(n_h))
{ }
~STC()
{
std::set_unexpected(old);
}
};
int main(int argc, char* argv[])
{
STC stc(&my_new_handler);
while (true)
{
auto tmp = new int[50000];
}
return 0;
}
Is it that I'm doing something wrong or it's VS's problem?
Your loop is endless:
while (true)
{
auto tmp = new int[50000];
}
You have to define a condition to exit outside of the loop. In counterpart, VS will be frozen iterating through the loop and draining memory from the heap (since you allocate a new block of memory in every iteration).
EDIT: Your handler is not called because it has to be defined as void __cdecl:
void __cdecl no_memory () {
cout << "Failed to allocate memory!\n";
exit (1);
}
Since handler is not called, the problem is in endless loop.
It works on my VS 2010.
When you say 'freezes', are you sure that it's not just that the code is still actually running and has not hit the new handler code yet. I tried running the example set_new_handler code from the MSDN here, and it still took a minute or so and the example is allocating 5000000 at a time rather than 50000.