Suggestion for chkstk.asm stackoverflow exception in C++ with Visual Studio

Suggestion for chkstk.asm stackoverflow exception in C++ with Visual Studio - c++

I am working with an implementation of merge sort. I am trying with C++ Visual Studio 2010 (msvc). But when I took a array of 300000 integers for timing, it is showing an unhandled stackoverflow exception and taking me to a readonly file named "chkstk.asm". I reduced the size to 200000 and it worked. Again the same code worked with C-free 4 editor (mingw 2.95) without any problem while the size was 400000. Do you have any suggestion to get the code working in Visual Studio?
May be the recursion in the mergesort is causing the problem.

Problem solved. Thanks to Kotti for supplying the code. I got the problem while comparing with that code. The problem was not about too much recursion. Actually I was working with a normal C++ array which was being stored on stack. Thus the problem ran out of stack space. I just changed it to a dynamically allocated array with the new/delete statements and it worked.

I'm not exactly sure, but this may be a particular problem of your implementation of yor merge sort (that causes stack overflow). There are plenty of good implementations (use google), the following works on VS2008 with array size = 2000000.
(You could try it in VS2010)
#include <cstdlib>
#include <memory.h>
// Mix two sorted tables in one and split the result into these two tables.
void Mix(int* tab1, int *tab2, int count1, int count2)
{
int i,i1,i2;
i = i1 = i2 = 0;
int * temp = (int *)malloc(sizeof(int)*(count1+count2));
while((i1<count1) && (i2<count2))
{
while((i1<count1) && (*(tab1+i1)<=*(tab2+i2)))
{
*(temp+i++) = *(tab1+i1);
i1++;
}
if (i1<count1)
{
while((i2<count2) && (*(tab2+i2)<=*(tab1+i1)))
{
*(temp+i++) = *(tab2+i2);
i2++;
}
}
}
memcpy(temp+i,tab1+i1,(count1-i1)*sizeof(int));
memcpy(tab1,temp,count1*sizeof(int));
memcpy(temp+i,tab2+i2,(count2-i2)*sizeof(int));
memcpy(tab2,temp+count1,count2*sizeof(int));
free(temp);
}
void MergeSort(int *tab,int count) {
if (count == 1) return;
MergeSort(tab, count/2);
MergeSort(tab + count/2, (count + 1) /2);
Mix(tab, tab + count / 2, count / 2, (count + 1) / 2);
}
void main() {
const size_t size = 2000000;
int* array = (int*)malloc(sizeof(int) * size);
for (int i = 0; i < size; ++i) {
array[i] = rand() % 5000;
}
MergeSort(array, size);
}

My guess is that you've got so much recursion that you're just running out of stack space. You can increase your stack size with the linker's /F command line option. But, if you keep hitting stack size limits you probably want to refactor the recursion out of your algorithm.

_chkstk() refers to "Check Stack". This happens in Windows by default. It can be disabled with /Gs- option or allocating reasonably high size like /Gs1000000. The other way is to disable this function using:
#pragma check_stack(off) // place at top header to cover all the functions
Official documentation.
Reference.

Related

Recursive code exits with exit code 3221225477

I am new to c++ programming and StackOverflow, but I have some experience with core Java. I wanted to participate in programming Olympiads and I choose c++ because c++ codes are generally faster than that of an equivalent Java code.
I was solving some problems involving recursion and DP at zonal level and I came across this question called Sequence game
But unfortunately my code doesn't seem to work. It exits with exit code 3221225477, but I can't make anything out of it. I remember Java did a much better job of pointing out my mistakes, but here in c++ I don't have a clue of what's happening. Here's the code btw,
#include <iostream>
#include <fstream>
#include <cstdio>
#include <algorithm>
#include <vector>
#include <set>
using namespace std;
int N, minimum, maximum;
set <unsigned int> result;
vector <unsigned int> integers;
bool status = true;
void score(unsigned int b, unsigned int step)
{
if(step < N)
{
unsigned int subtracted;
unsigned int added = b + integers[step];
bool add_gate = (added <= maximum);
bool subtract_gate = (b <= integers[step]);
if (subtract_gate)
subtracted = b - integers[step];
subtract_gate = subtract_gate && (subtracted >= minimum);
if(add_gate && subtract_gate)
{
result.insert(added);
result.insert(subtracted);
score(added, step++);
score(subtracted, step++);
}
else if(!(add_gate) && !(subtract_gate))
{
status = false;
return;
}
else if(add_gate)
{
result.insert(added);
score(added, step++);
}
else if(subtract_gate)
{
result.insert(subtracted);
score(subtracted, step++);
}
}
else return;
}
int main()
{
ios_base::sync_with_stdio(false);
ifstream input("input.txt"); // attach to input file
streambuf *cinbuf = cin.rdbuf(); // save old cin buffer
cin.rdbuf(input.rdbuf()); // redirect cin to input.txt
ofstream output("output.txt"); // attach to output file
streambuf *coutbuf = cout.rdbuf(); // save old cout buffer
cout.rdbuf(output.rdbuf()); // redirect cout to output.txt
unsigned int b;
cin>>N>>b>>minimum>>maximum;
for(unsigned int i = 0; i < N; ++i)
cin>>integers[i];
score(b, 0);
set<unsigned int>::iterator iter = result.begin();
if(status)
cout<<*iter<<endl;
else
cout<<-1<<endl;
cin.rdbuf(cinbuf);
cout.rdbuf(coutbuf);
return 0;
}
(Note: I intentionally did not use typedef).
I compiled this code with mingw-w64 in a windows machine and here is the Output:
[Finished in 19.8s with exit code 3221225477] ...
Although I have an intel i5-8600, it took so much time to compile, much of the time was taken by the antivirus to scan my exe file, and even sometimes it keeps on compiling for long without any intervention from the anti-virus.
(Note: I did not use command line, instead I used used sublime text to compile it).
I even tried tdm-gcc, and again some other peculiar exit code came up. I even tried to run it on a Ubuntu machine, but unfortunately it couldn't find the output file. When I ran it on a Codechef Online IDE, even though it did not run properly, but the error message was less scarier than that of mingw's.
It said that there was a run-time error and "SIGSEGV" was displayed as an error code. Codechef states that
A SIGSEGV is an error(signal) caused by an invalid memory reference or
a segmentation fault. You are probably trying to access an array
element out of bounds or trying to use too much memory. Some of the
other causes of a segmentation fault are : Using uninitialized
pointers, dereference of NULL pointers, accessing memory that the
program doesn’t own.
It's been a few days that I am trying to solve this, and I am really frustrated by now. First when i started solving this problem I used c arrays, then changed to vectors and finally now to std::set, while hopping that it will solve the problem, but nothing worked. I tried a another dp problem, and again this was the case.
It would be great if someone help me figure out what's wrong in my code.
Thanks in advance.

3221225477 converted to hex is 0xC0000005, which stands for STATUS_ACCESS_VIOLATION, which means you tried to access (read, write or execute) invalid memory.
I remember Java did a much better job of pointing out my mistakes, but here in c++ I don't have a clue of what's happening.
When you run into your program crashing, you should run it under a debugger. Since you're running your code on Windows, I highly recommend Visual Studio 2017 Community Edition. If you ran your code under it, it would point exact line where the crash happens.
As for your crash itself, as PaulMcKenzie points out in the comment, you're indexing an empty vector, which makes std::cin write into out of bounds memory.

integers is a vector which is a dynamic contiguous array whose size is not known at compile time here. So when it is defined initially, it is empty. You need to insert into the vector. Change the following:
for(unsigned int i = 0; i < N; ++i)
cin>>integers[i];
to this:
int j;
for(unsigned int i = 0; i < N; ++i) {
cin>> j;
integers.push_back(j);
}

P.W's answer is correct, but an alternative to using push_back is to pre-allocate the vector after N is known. Then you can read from cin straight into the vector elements as before.
integers = vector<unsigned int>(N);
for (unsigned int i = 0; i < N; i++)
cin >> integers[i];
This method has the added advantage of only allocating memory for the vector once. The push_back method will reallocate if the underlying buffer fills up.

Why is my C++ code three times slower than the C equivalent on LeetCode? [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 7 years ago.
Improve this question
I've been doing some of the LeetCode problems, and I notice that the C solutions are a couple of times faster than the exact same thing in C++. For example:
Updated with a couple of simpler examples:
Given a sorted array and a target value, return the index if the target is found. If not, return the index where it would be if it were inserted in order. You may assume no duplicates in the array. (Link to question on LeetCode)
My solution in C, runs in 3 ms:
int searchInsert(int A[], int n, int target) {
int left = 0;
int right = n;
int mid = 0;
while (left<right) {
mid = (left + right) / 2;
if (A[mid]<target) {
left = mid + 1;
}
else if (A[mid]>target) {
right = mid;
}
else {
return mid;
}
}
return left;
}
My other C++ solution, exactly the same but as a member function of the Solution class runs in 13 ms:
class Solution {
public:
int searchInsert(int A[], int n, int target) {
int left = 0;
int right = n;
int mid = 0;
while (left<right) {
mid = (left + right) / 2;
if (A[mid]<target) {
left = mid + 1;
}
else if (A[mid]>target) {
right = mid;
}
else {
return mid;
}
}
return left;
}
};
Even simpler example:
Reverse the digits of an integer. Return 0 if the result will overflow. (Link to question on LeetCode)
The C version runs in 6 ms:
int reverse(int x) {
long rev = x % 10;
x /= 10;
while (x != 0) {
rev *= 10L;
rev += x % 10;
x /= 10;
if (rev>(-1U >> 1) || rev < (1 << 31)) {
return 0;
}
}
return rev;
}
And the C++ version is exactly the same but as a member function of the Solution class, and runs for 19 ms:
class Solution {
public:
int reverse(int x) {
long rev = x % 10;
x /= 10;
while (x != 0) {
rev *= 10L;
rev += x % 10;
x /= 10;
if (rev>(-1U >> 1) || rev < (1 << 31)) {
return 0;
}
}
return rev;
}
};
I see how there would be considerable overhead from using vector of vector as a 2D array in the original example if the LeetCode testing system doesn't compile the code with optimisation enabled. But the simpler examples above shouldn't suffer that issue because the data structures are pretty raw, especially in the second case where all you have is long or integer arithmetics. That's still slower by a factor of three.
I'm starting to think that there might be something odd happening with the way LeetCode do the benchmarking in general because even in the C version of the integer reversing problem you get a huge bump in running time from just replacing the line
if (rev>(-1U >> 1) || rev < (1 << 31)) {
with
if (rev>INT_MAX || rev < INT_MIN) {
Now, I suppose having to #include<limits.h> might have something to do with that but it seems a bit extreme that this simple change bumps the execution time from just 6 ms to 19 ms.

Lately I've been seeing the vector<vector<int>> suggestion a lot for doing 2d arrays in C++, and I've been pointing out to people why this really isn't a good idea. It's a handy trick to know when slapping together temporary code, but there's (almost) never any reason to ever use it for real code. The right thing to do is to use a class that wraps a contiguous block of memory.
So my first reaction might be to point to this as a possible source for the disparity. However you're also using int** in the C version, which is generally a sign of the exact same problem as vector<vector<int>>.
So instead I decided to just compare the two solutions.
http://coliru.stacked-crooked.com/a/fa8441cc5baa0391
6468424
6588511
That's the time taken by the 'C version' vs the 'C++ version' in nanoseconds.
My results don't show anything like the disparity you describe. Then it occurred to me to check a common mistake people make when benchmarking
http://coliru.stacked-crooked.com/a/e57d791876b9252b
18386695
42400612
Notice that the -O3 flag from the first example has become -O0, which disables optimization.
Conclusion: you're probably comparing unoptimized executables.
C++ supports building rich abstractions that don't require overhead, but eliminating the the overhead does require certain code transformations that play havoc with the 'debuggability' of code.
That means debug builds avoid those transformations and therefore C++ debug builds are often slower than debug builds of C style code because C style code just doesn't use much abstraction. Seeing a 130% slowdown such as the above is not at all surprising when timing, for example, machine code that uses function calls in place of simple store instructions.
Some code really needs optimizations in order to have reasonable performance even for debugging, so compilers often offer a mode that applies some optimizations which don't cause too much trouble for debuggers. Clang and gcc use -O1 for this, and you can see that even this level of optimization essentially eliminates the gap in this program between C style code and the more C++ style code:
http://coliru.stacked-crooked.com/a/13967ebcfcfa4073
8389992
8196935
Update:
In those later examples optimization shouldn't make a difference, since the C++ is not using any abstraction beyond what the C version is doing. I'm guessing that the explanation for this is that the examples are being compiled with different compilers or with some other different compiler options. Without knowing how the compilation is done I would say it makes no sense to compare these runtime numbers; LeetCode is clearly not producing an apples to apples comparison.

You are using vector of vector in your C++ code snippet. Vectors are sequence containers in C++ that are like arrays that can change in size. Instead of vector<vector<int>> if you use statically allocated arrays, that would be better. You may use your own Array class as well with operator [] overloaded, but vector has more overhead as it dynamically resizes when you add more elements than its original size. In C++, you use call by reference to further reduce your time if you compare that with C. C++ should run even faster if written well.

C++ vector and memoization runtime error issues

I encountered a problem here at Codechef. I am trying to use a vector for memoization. As I am still new at programming and quite unfamiliar with STL containers, I have used vector, for the lookup table. (although, I was suggested that using map helps to solve the problem).
So, my question is how is the solution given below running into a run time error. In order to get the error, I used the boundary value for the problem (100000000) as the input. The error message displayed by my Netbeans IDE is RUN FAILED (exit value 1, total time: 4s) with input as 1000000000. Here is the code:
#include <iostream>
#include <cstdlib>
#include <vector>
#include <string>
#define LCM 12
#define MAXSIZE 100000000
using namespace std;
/*
*
*/
vector<unsigned long> lookup(MAXSIZE,0);
int solve(int n)
{
if ( n < 12) {
return n;
}
else {
if (n < MAXSIZE) {
if (lookup[n] != 0) {
return lookup[n];
}
}
int temp = solve(n/2)+solve(n/3)+solve(n/4);
if (temp >= lookup[n] ) {
lookup[n] = temp;
}
return lookup[n];
}
}
int main(int argc, char** argv) {
int t;
cin>>t;
int n;
n = solve(t);
if ( t >= n) {
cout<<t<<endl;
}
else {
cout<<n<<endl;
}
return 0;
}

I doubt if this is a memory issue because he already said that the program actually runs and he inputs 100000000.
One things that I noticed, in the if condition you're doing a lookup[n] even if n == MAXSIZE (in this exact condition). Since C++ is uses 0-indexed vectors, then this would be 1 beyond the end of the vector.
if (n < MAXSIZE) {
...
}
...
if (temp >= lookup[n] ) {
lookup[n] = temp;
}
return lookup[n];
I can't guess what the algorithm is doing but I think the closing brace } of the first "if" should be lower down and you could return an error on this boundary condition.

You either don't have enough memory or don't have enough contiguous address space to store 100,000,000 unsigned longs.

This mostly is a memory issue. For a vector, you need contiguous memory allocation [so that it can keep up with its promise of constant time lookup]. In your case, with an 8 byte double, you are basically requesting your machine to give you around 762 mb of memory, in a single block.
I don't know which problem you're solving, but it looks like you're solving Bytelandian coins. For this, it is much better to use a map, because:
You will mostly not be storing the values for all 100000000 cases in a test case run. So, what you need is a way to allocate memory for only those values that you are actually memoize.
Even if you are, you have no need for a constant time lookup. Although it would speed up your program, std::map uses trees to give you logarithmic look up time. And it does away with the requirement of using up 762 mb contiguously. 762 mb is not a big deal, but expecting in a single block is.
So, the best thing to use in your situation is an std::map. In your case, actually just replacing std::vector<unsigned long> by std::map<int, unsigned long> would work as map also has [] operator access [for the most part, it should].

A "free(): invalid next size (fast)" in C++

I just ran into a free(): invalid next size (fast) problem while writing a C++ program. And I failed to figure out why this could happen unfortunately. The code is given below.
bool not_corrupt(struct packet *pkt, int size)
{
if (!size) return false;
bool result = true;
char *exp_checksum = (char*)malloc(size * sizeof(char));
char *rec_checksum = (char*)malloc(size * sizeof(char));
char *rec_data = (char*)malloc(size * sizeof(char));
//memcpy(rec_checksum, pkt->data+HEADER_SIZE+SEQ_SIZE+DATA_SIZE, size);
//memcpy(rec_data, pkt->data+HEADER_SIZE+SEQ_SIZE, size);
for (int i = 0; i < size; i++) {
rec_checksum[i] = pkt->data[HEADER_SIZE+SEQ_SIZE+DATA_SIZE+i];
rec_data[i] = pkt->data[HEADER_SIZE+SEQ_SIZE+i];
}
do_checksum(exp_checksum, rec_data, DATA_SIZE);
for (int i = 0; i < size; i++) {
if (exp_checksum[i] != rec_checksum[i]) {
result = false;
break;
}
}
free(exp_checksum);
free(rec_checksum);
free(rec_data);
return result;
}
The macros used are:
#define RDT_PKTSIZE 128
#define SEQ_SIZE 4
#define HEADER_SIZE 1
#define DATA_SIZE ((RDT_PKTSIZE - HEADER_SIZE - SEQ_SIZE) / 2)
The struct used is:
struct packet {
char data[RDT_PKTSIZE];
};
This piece of code doesn't go wrong every time. It would crash with the free(): invalid next size (fast) sometimes in the free(exp_checksum); part.
What's even worse is that sometimes what's in rec_checksum stuff is just not equal to what's in pkt->data[HEADER_SIZE+SEQ_SIZE+DATA_SIZE] stuff, which should be the same according to the watch expressions from my debugging tools. Both memcpy and for methods are used but this problem remains.
I don't quite understand why this would happen. I would be very thankful if anyone could explain this to me.
Edit:
Here's the do_checksum() method, which is very simple:
void do_checksum(char* checksum, char* data, int size)
{
for (int i = 0; i < size; i++)
{
checksum[i] = ~data[i];
}
}
Edit 2:
Thanks for all.
I switched other part of my code from the usage of STL queue to STL vector, the results turn to be cool then.
But still I didn't figure out why. I am sure that I would never pop an empty queue.

The error you report is indicative of heap corruption. These can be hard to track down and tools like valgrind can be extremely helpful. Heap corruptions are often hard to debug with a simple debugger because the runtime error often occurs long after the actual corruption.
That said, the most obvious potential cause of your heap corruption, given the code posted so far, is if DATA_SIZE is greater than size. If that occurs then do_checksum will write beyond the end of exp_checksum.

Three immediate suggestions:
Check for size <= 0 (instead of "!size")
Check for size >= DATA_SIZE
Check for malloc returning NULL

Have you tried Valgrind?
Also, make sure to never send more than RDT_PKTSIZE as size to not_corrupt()
bool not_corrupt(struct packet *pkt, int size)
{
if (!size) return false;
if (size > RDT_PKTSIZE) return false;
/* ... */

Valgrind is good ... but validating all your inputs and checking all error conditions is even better.
Stepping through the code in the debugger isn't a bad idea, either.
I would also call "do_checksum (size)" (your actual size), instead of DATA_SIZE (presumably "maximum size").

DATA_SIZE is a macro defined the max length in my program so the size
should be less than DATA_SIZE
even if that is true, your logic only creates enough memory to hold size characters.
so you should call
do_checksum(exp_checksum, rec_data, size);
and, if you do not want to use std::string (which is fine), you should switch from malloc/free to new/delete when talking C++

Executable runs faster on Wine than Windows -- why?

Solution: Apparently the culprit was the use of floor(), the performance of which turns out to be OS-dependent in glibc.
This is a followup question to an earlier one: Same program faster on Linux than Windows -- why?
I have a small C++ program, that, when compiled with nuwen gcc 4.6.1, runs much faster on Wine than Windows XP (on the same computer). The question: why does this happen?
The timings are ~15.8 and 25.9 seconds, for Wine and Windows respectively. Note that I'm talking about the same executable, not only the same C++ program.
The source code is at the end of the post. The compiled executable is here (if you trust me enough).
This particular program does nothing useful, it is just a minimal example boiled down from a larger program I have. Please see this other question for some more precise benchmarking of the original program (important!!) and the most common possibilities ruled out (such as other programs hogging the CPU on Windows, process startup penalty, difference in system calls such as memory allocation). Also note that while here I used rand() for simplicity, in the original I used my own RNG which I know does no heap-allocation.
The reason I opened a new question on the topic is that now I can post an actual simplified code example for reproducing the phenomenon.
The code:
#include <cstdlib>
#include <cmath>
int irand(int top) {
return int(std::floor((std::rand() / (RAND_MAX + 1.0)) * top));
}
template<typename T>
class Vector {
T *vec;
const int sz;
public:
Vector(int n) : sz(n) {
vec = new T[sz];
}
~Vector() {
delete [] vec;
}
int size() const { return sz; }
const T & operator [] (int i) const { return vec[i]; }
T & operator [] (int i) { return vec[i]; }
};
int main() {
const int tmax = 20000; // increase this to make it run longer
const int m = 10000;
Vector<int> vec(150);
for (int i=0; i < vec.size(); ++i)
vec[i] = 0;
// main loop
for (int t=0; t < tmax; ++t)
for (int j=0; j < m; ++j) {
int s = irand(100) + 1;
vec[s] += 1;
}
return 0;
}
UPDATE
It seems that if I replace irand() above with something deterministic such as
int irand(int top) {
static int c = 0;
return (c++) % top;
}
then the timing difference disappears. I'd like to note though that in my original program I used a different RNG, not the system rand(). I'm digging into the source of that now.
UPDATE 2
Now I replaced the irand() function with an equivalent of what I had in the original program. It is a bit lengthy (the algorithm is from Numerical Recipes), but the point was to show that no system libraries are being called explictly (except possibly through floor()). Yet the timing difference is still there!
Perhaps floor() could be to blame? Or the compiler generates calls to something else?
class ran1 {
static const int table_len = 32;
static const int int_max = (1u << 31) - 1;
int idum;
int next;
int *shuffle_table;
void propagate() {
const int int_quo = 1277731;
int k = idum/int_quo;
idum = 16807*(idum - k*int_quo) - 2836*k;
if (idum < 0)
idum += int_max;
}
public:
ran1() {
shuffle_table = new int[table_len];
seedrand(54321);
}
~ran1() {
delete [] shuffle_table;
}
void seedrand(int seed) {
idum = seed;
for (int i = table_len-1; i >= 0; i--) {
propagate();
shuffle_table[i] = idum;
}
next = idum;
}
double frand() {
int i = next/(1 + (int_max-1)/table_len);
next = shuffle_table[i];
propagate();
shuffle_table[i] = idum;
return next/(int_max + 1.0);
}
} rng;
int irand(int top) {
return int(std::floor(rng.frand() * top));
}

edit: It turned out that the culprit was floor() and not rand() as I suspected - see
the update at the top of the OP's question.
The run time of your program is dominated by the calls to rand().
I therefore think that rand() is the culprit. I suspect that the underlying function is provided by the WINE/Windows runtime, and the two implementations have different performance characteristics.
The easiest way to test this hypothesis would be to simply call rand() in a loop, and time the same executable in both environments.
edit I've had a look at the WINE source code, and here is its implementation of rand():
/*********************************************************************
* rand (MSVCRT.#)
*/
int CDECL MSVCRT_rand(void)
{
thread_data_t *data = msvcrt_get_thread_data();
/* this is the algorithm used by MSVC, according to
* http://en.wikipedia.org/wiki/List_of_pseudorandom_number_generators */
data->random_seed = data->random_seed * 214013 + 2531011;
return (data->random_seed >> 16) & MSVCRT_RAND_MAX;
}
I don't have access to Microsoft's source code to compare, but it wouldn't surprise me if the difference in performance was in the getting of thread-local data rather than in the RNG itself.

Wikipedia says:
Wine is a compatibility layer not an emulator. It duplicates functions
of a Windows computer by providing alternative implementations of the
DLLs that Windows programs call,[citation needed] and a process to
substitute for the Windows NT kernel. This method of duplication
differs from other methods that might also be considered emulation,
where Windows programs run in a virtual machine.[2] Wine is
predominantly written using black-box testing reverse-engineering, to
avoid copyright issues.
This implies that the developers of wine could replace an api call with anything at all to as long as the end result was the same as you would get with a native windows call. And I suppose they weren't constrained by needing to make it compatible with the rest of Windows.

From what I can tell, the C standard libraries used WILL be different in the two different scenarios. This affects the rand() call as well as floor().
From the mingw site... MinGW compilers provide access to the functionality of the Microsoft C runtime and some language-specific runtimes. Running under XP, this will use the Microsoft libraries. Seems straightforward.
However, the model under wine is much more complex. According to this diagram, the operating system's libc comes into play. This could be the difference between the two.

While Wine is basically Windows, you're still comparing apples to oranges. As well, not only is it apples/oranges, the underlying vehicles hauling those apples and oranges around are completely different.
In short, your question could trivially be rephrased as "this code runs faster on Mac OSX than it does on Windows" and get the same answer.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js