small program much slower in visual studio 2012 vs. visual studio 2005 - c++

We are using Visual Studio 2005. We are looking at maybe upgrading to Visual Studio 2012 once it is released. I tried this small program in Visual Studio 2012 RC and was surprised to see it ran more than 2X slower than it does in Visual Studio 2005. In VS2012 I used default release build settings. For me it takes about 20ms in VS2005 and about 50ms in VS2012. Why is it that much slower?
#include <windows.h>
#include <deque>
using namespace std;
deque<int> d;
int main(int argc, char* argv[])
{
const int COUNT = 5000000;
timeBeginPeriod(1);
for (int i = 0; i < COUNT; ++i)
{
d.push_back(i);
}
double sum = 0;
DWORD start = timeGetTime();
for (int i = 0; i < COUNT; ++i)
{
sum += d[i];
}
printf("time=%dms\n", timeGetTime() - start);
printf("sum=%f\n", sum);
return 0;
}

So we reposted this question to the Microsoft forum.
http://social.msdn.microsoft.com/Forums/en-US/vcgeneral/thread/72234b06-7622-445e-b359-88f572b4de52
The short answer is that the implementation of std::deque::operator[] in VS2012RC is just slower compared to VS2005. Other common stl containers tested as equal or faster. It will be interesting to retest when VS2012 is production to see if the operator[] performance is resolved.
ps
Hi Raphael
Karl

My suspicion is that you're running into thread-safety code and that 2012 configures your libraries for multi-threaded code by default, meaning there are a bunch of lock and unlock operations built into your deque accesses.
Try comparing the compiler and linker options of the two builds to see how they differ.
(I'd try this myself but I don't have a Windows system with the relevant software on it handy. Sorry.)

Try timing both of those loops separately. I bet the issue is that the stl container implementation is slower in the new compiler.
Err wait- I meant try timing something that doesn't use STL.

Related

OpenMP pragma is ignored in Visual Studio 2017

I have problems getting Visual Studio 2017 to aknowledge
openMP pragmas like "#pragma omp parallel". I understand that Visual Studio should at least support OpenMP 2.0, but the behavior of my program indicates that my pragmas are alltogether ignored.
In an empty C++ project, I enable the OpenMP setting found under "MyProject>Properties>C/C++>Languages>OpenMP Support", and write the following main function:
#include <iostream>
#include <omp.h>
int main(int argc, char* argv[])
{
// I have tried both with and without the following line:
// omp_set_num_threads(15);
#pragma omp parallel
{
std::cout << "Hello world \n";
}
system("pause");
return 0;
}
I expect one "Hello world" to be printed from each active thread in the parallel region. Only one line is printed, indicating that I may have missed something.
Any suggestions?

Determine whether eigen has optimized code for SSE instructions or not

I am having a code which is using Eigen::vectors, I want to confirm that Eigen has optimized this code for SSE or not.
I am using Visual Studio 2012 Express, in which i can set the command line option "/Qvec-report:2" which gives the optimization details of C++ code. Is there any option in visual studio or Eigen which can tell me that code has been optimized or not?
My code is as below:
#include <iostream>
#include <vector>
#include <time.h>
#include<Eigen/StdVector>
int main(char *argv[], int argc)
{
int tempSize=100;
/** I am aligning these vectors as specfied on http://eigen.tuxfamily.org/dox/group__TopicStlContainers.html */
std::vector<Eigen::Vector3d,Eigen::aligned_allocator<Eigen::Vector3d>> eiVec(tempSize);
std::vector<Eigen::Vector3d,Eigen::aligned_allocator<Eigen::Vector3d>> eiVec1(tempSize);
std::vector<Eigen::Vector3d,Eigen::aligned_allocator<Eigen::Vector3d>> eiVec2(tempSize);
for(int i=0;i<100;i++)
{
eiVec1[i] = Eigen::Vector3d::Zero();
eiVec2[i] = Eigen::Vector3d::Zero();
}
Eigen::Vector3d *eV = &eiVec.front();
const Eigen::Vector3d *eV1 = &eiVec1.front();
const Eigen::Vector3d *eV2 = &eiVec2.front();
/** Below loop is not vectorized by visual studio due to code 1304:
Because here comes the operations at level of Eigen, I want to
know here whether Eigen has optimized this operation or not? */
for(int i=0;i<100;i++)
{
eV[i] = eV1[i] - eV2[i];
}
return 0;
}
Look at the asm output.
If you see SUBPD (packed double) inside the inner loop, it vectorized. If you only see SUBSD (scalar double) and no SUBPD anywhere, it didn't.

Drastic performance difference across C++ compiler

Whenever I use standard library containers stack, queue, deque or priority_queue, performance in Visual Studio becomes inexplicably slow. The same program that can run in gcc compiler (Qt Creator) within few seconds takes over a minute in Visual Studio.
Here is a simple program that uses BFS to check if a number can be transformed into a target number. Allowed transformations are x->x+1 and x->x/2.
Code:
#include <queue>
#include <stack>
#include <iostream>
#include <cstring>
#include <cstdio>
#include <chrono>
using namespace std;
using namespace std::chrono;
const int M=10000000;
int vis[M+1];
bool can(int s, int t) {
memset(vis, 0, sizeof(vis));
stack<int> q;
q.push(s);
int m=0;
vis[s]=true;
while(!q.empty()) {
m=max(m, (int)q.size());
int top=q.top();
q.pop();
if(top==t) return true;
if(top+1<=M && !vis[top+1]) {
q.push(top+1);
vis[top+1]=true;
}
if(!vis[top/2]) {
q.push(top/2);
vis[top/2]=true;
}
}
return false;
}
int main() {
vector <int> S {8769154, 9843630, 2222222, 1, 3456789};
vector<int> T {94383481, 1010101, 9999999, 9876543, 1};
high_resolution_clock::time_point t1=high_resolution_clock::now();
for(int i=0; i<S.size(); i++) {
cout<<can(S[i], T[i])<<endl;
}
high_resolution_clock::time_point t2=high_resolution_clock::now();
auto duration=std::chrono::duration_cast<std::chrono::microseconds>(t2 - t1).count();
cout<<"Execution time "<<duration/1000000.0<<" second";
return 0;
}
Output:
Visual Studio : Execution time 77.3784 second
Qt Creator : Execution time 4.79727 second
Output of The same program on using stack instead of queue:
Visual Studio : Execution time 114.896 second
Qt Creator : Execution time 4.35225 second
So Qt Creator runs almost 20-30 times faster than Visual Studio in this case. I have no idea why this happens. The performance difference is very little when I Don't use these STL containers.
As noted in the comments, Visual Studio is slow in debug mode. It's partially because optimizations are off, partially because the Standard Library implementation in VC++ has a lot of checks on iterator abuse.

CUDA - Simple adder program always gives zero

I have a problem with a simple CUDA program which just adds two numbers. I run it on a laptop with a Geforce GT320M GPU on Windows 7. I compile this program with Visual Studio 2013 (I don't know if it means something). The problem is that I always get 0 as a result. I tried to check the given parameters (just return with all the parameters given to the method in an array) and they all seemed to be 0. I run this program in an other computer (at university) and there it runs completely fine, and returns the correct result. So I think there should be some setting problem, but I am not sure of it.
#include <cuda.h>
#include <stdio.h>
#include "cuda_runtime.h"
#include "device_launch_parameters.h"
__global__ void add(int a, int b, int* c)
{
*c = a + b;
return;
}
int main(int argc, char** argv)
{
int c;
int* dev_c;
cudaMalloc((void**)&dev_c, sizeof(int));
add << <1, 1 >> >(1, 2, dev_c);
cudaMemcpy(&c, dev_c, sizeof(int), cudaMemcpyDeviceToHost);
printf("a + b = %d\n", c);
cudaFree(dev_c);
return 0;
}
I also run this code snippet that I found somewhere.
cudaSetDevice(0);
cudaDeviceSynchronize();
cudaThreadSynchronize();
This isn't returning anything.
If you are using the typical CUDA template to create a new Visual Studio project using CUDA, then you have to take care of correctly setting the compute capability for which to compile, changing the default values if needed. This can be done by setting, for example,
compute_12,sm_12
in the CUDA C/C++ Configuration Properties. In your case, the default compute capability was 2.0, while your card is of a previous architecture. This was the source of your mis-computations.
P.S. As of September 2014, CUDA 6.5 is the only version of CUDA supporting Visual Studio 2013, see Is Cuda 6 supported with Visual Studio 2013?.

C++ array Visual Studio 2010 vs Bloodshed Dev-C++ 4.9.9.2

This code compiles fine in Bloodshed Dev-C++ 4.9.9.2, but in Visual Studio 2010 I get an error: expression must have a constant value. How do I make an array after the user input about array size without using pointers?
#include <cstdlib>
#include <iostream>
using namespace std;
int main()
{
int size = 1;
cout << "Input array size ";
cin >> size;
int array1[size];
system("PAUSE");
return 0;
}
Use an std::vector instead of an array (usually a good idea anyway):
std::vector<int> array1(size);
In case you care, the difference you're seeing isn't from Dev-C++ itself, it's from gcc/g++. What you're using is a non-standard extension to C++ that g++ happens to implement, but VC++ doesn't.
The ability to size automatic arrays using a variable is part of C, not part of C++, and is an extension that GCC seems to want to foist on us all. And DevC++ is an unholy piece of cr*p, although it is not at fault here. for a change (this is entirely GCC's doing) - I can't imagine why you (or anyone else) would want to use it.
You should really compile your C++ code with GCC with flags that warn you about stuff like this. I suggest -Wall and -pedantic as a minimum.
Or
int array1 = new int[size];
will work aswell I believe (been a month or 3 since I last touched C++)
But indeed, if using C++, go for an std::vector, much more flexible.