Some background: I am trying to track a bug which is causing me major headaches. After many dead ends (see this question) I finally ended up with this code:
#include <thread>
#include <vector>
#include <iosfwd>
#include <sstream>
#include <string>
#include <windows.h>
int main()
{
SRWLOCK srwl;
InitializeSRWLock(&srwl);
for(size_t i=0;i<1000;++i)
{
std::vector<std::thread>threads;
for(size_t j=0;j<100;++j)
{
OutputDebugString(".");
threads.emplace_back([&](){
AcquireSRWLockExclusive(&srwl);
//Code below modifies the probability to see the bug.
std::this_thread::sleep_for(std::chrono::microseconds(1));
std::wstringstream wss;
wss<<std::this_thread::get_id();
wss.str();
//Code above modifies the probability to see the bug.
ReleaseSRWLockExclusive(&srwl);});
}
for(auto&t:threads){t.join();}
OutputDebugString((std::to_string(i)+"\n").data());
}
return 0;
}
When I run this code inside VS 2013 debugger the program hangs with an output like this one:
....................................................................................................0
....................................................................................................1
....................................................................................................2
...........................
Strangely enough, If I pause the debugger and inspect what is going on, one of the threads is inside AcquireSRWLockExclusive (in NtWaitForAlertByThreadId) apparently there is no reason why the program is hanging. When I click resume, the program happily continues and print some more stuff until it is blocked again.
Do you have any Idea what is going on here ?
Some more info:
As far as I can tell, this bug only exists on Windows 8.1.
I tried VS2013.4 and VS2015 RC.
I could reproduce it on two different computers under Windows 8.1.
One of the machine was formatted, the RAM, CPU and Disk tested (I thought of a malfunction because at first I could only observe the bug on this particular machine)
I could never reproduce it on Windows 7.
It may be useful to modify the code between the comments to observe the bug. When I added the microsecond sleep, I could at last reproduce the bug on another computer.
With VS2015 RC I could reproduce the same behavior with a simple std::mutex. On VS2013 however the SRWLOCK seems mandatory to observe the bug.
This issue is caused by an OS scheduler bug introduced in the spring 2014 update to Windows 8.1. A hotfix for this problem was released in May 2015 and available at https://support.microsoft.com/en-us/kb/3036169.
For me it looks like a bug in windows OS, I have different code variants that hangs using new Vista primitives under debugger in Win 8.1 / Server 2012R2 after April 2014 update. Also some thread pool wait function hangs too. Looks like it mostly tied to other thread finished execution at the moment of wait/lock. Here is the simple code that always hangs under debugger in NtWaitForAlertByThreadId() :
#include <windows.h>
#include <stdio.h>
#include <conio.h>
#include <tchar.h>
#pragma optimize("",off)
VOID CALLBACK _WorkCallback(PTP_CALLBACK_INSTANCE Instance, PVOID pUser, PTP_WORK Work)
{
for (int i = 0; i < INT_MAX / 256; i++) {}
}
DWORD WINAPI ThreadProc(LPVOID lpParameter)
{
for (int i = 0; i < INT_MAX / 256; i++) {}
return 0;
}
#pragma optimize("",on)
int _tmain(int argc, _TCHAR* argv[])
{
LONGLONG c = 0;
while(!_kbhit())
{
PTP_WORK ptpw = CreateThreadpoolWork(&_WorkCallback, NULL, NULL);
if (ptpw != NULL)
{
for(long i = 0; i < 3; i++) SubmitThreadpoolWork(ptpw);
CreateThread(NULL, 0, ThreadProc, NULL, 0, NULL);
WaitForThreadpoolWorkCallbacks(ptpw, FALSE);
CloseThreadpoolWork(ptpw);
}
printf("%I64d \r", c++);
}
_getch();
return 0;
}
Unfortunately I have no idea where to report it to the Microsoft.
Related
Suppose I have a complex C++ application that I need to debug with a lot of variables. I wanna avoid using std::cout and printf approaches (below there's an explaination why).
In order to explain my issue, I wrote a minimal example using chrono (This program calculates fps of its while cycle over time and increment i_times counter until it reaches 10k):
#include <chrono>
using chrono_hclock = std::chrono::high_resolution_clock;
int main(int argc, char** argv){
bool is_running = true;
float fps;
int i_times=0;
chrono_hclock::time_point start;
chrono_hclock::time_point end;
while(is_running){
start = chrono_hclock::now();
// Some code execution
end = chrono_hclock::now();
fps=(float)1e9/(float)std::chrono::duration_cast<std::chrono::nanoseconds>(end-start).count());
if(++i_times==10000) is_running=false;
}
return 0;
}
I would like to debug this program and watch for fps and i_times variables continuosly over time, without stopping execution.
Of course I can simply use std::cout, printf or other means to output variables values redirecting them to stdout or a file while debugging and those are OK for simple types, but I have multiple variables which data type are struct-based and it would be creepy, time expensive and code bloating to write instructions to print each one of them. Also my application is a realtime video/audio H.264 encoder streaming with RTSP protocol and stopping at breakpoints means visualizing artifacts in my other decoder application because the encoder can't keep up with the decoder (because the encoder hit a breakpoint).
How can I solve this issue?
Thanks and regards!
The IDE I'm currently using for developing is Visual Studio 2019 Community.
I'm using the Local Windows Debugger.
I'm open to using alternative open source IDEs like VSCode or alternative debugging methods to solve this problem and/or to not be confinated into using a specific IDE.
To watch for specific multiple variables in VS I use the built-in Watch Window. While debugging with LWD, I add manually variables by right-clicking them in my source code and click Add Watch. Then those are showed in the Watch Window (Debug-Windows-Watch-Watch 1):
However I can only watch this window contents once I hit a breakpoint I set inside the while cycle, thus blocking execution, so that doesn't solve my issue.
You can use nonblocking breakpoint. First add the breakpoint. Then click on breakpoint settings or right click and select action.
Now you add a message like any string that is suggestive for you. And in brackets include the values to show, for instance
value of y is {y} and value of x is {x}
In the image is shown the value of i when it hits the breakpoint. Check the "Continue code execution" so breakpoint will not block execution. The shape of your breakpoint will change to red diagonal square. You can add also specific conditions if you click the Conditions checkbox.
Now while debugging all these debug messages will be shown in the output window:
In the above image it is showing the following message:
the value of i is {i}
By checking the "Conditions" you can add specific conditions, for instance i%100==0 and it will show the message only if i is divisible by 100.
This time your breakpoint will be marked with a + sign, meaning it has condition. Now while debugging there will be shown the i only when divisible by 100, so you can restrict the output to some more meaningful cases
The strict answer is "no" but...
I think I understand what you're trying to accomplish. This could be done by dumping the watched variables into to shared memory which is read by 2nd process. A watch and a break point in the 2nd would allow you to see the values in Visual Studio without interrupting the original application.
A few caveats:
UAC must be admin on both sides to open the memory handle
This wouldn't work with pointers as the 2nd program only has access to the shared memory
Windows anti-virus went nuts for the first few times I
ran this but eventually calmed down
Worker application:
#include <stdio.h>
#include <conio.h>
#include <tchar.h>
#include <windows.h>
#include <chrono>
#include <thread>
PCWSTR SHARED_MEMORY_NAME = L"Global\\WatchMemory";
struct watch_collection // Container for everything we want to watch
{
int i;
int j;
int k;
};
using chrono_hclock = std::chrono::high_resolution_clock;
int main(int argc, char** argv)
{
bool is_running = true;
float fps;
int i_times = 0;
chrono_hclock::time_point start;
chrono_hclock::time_point end;
HANDLE map_file;
void* shared_buffer;
// Set up the shared memory space
map_file = CreateFileMapping(INVALID_HANDLE_VALUE, NULL, PAGE_READWRITE, 0, sizeof(watch_collection), SHARED_MEMORY_NAME);
if (map_file == NULL)
{
return 1; // Didn't work, bail. Check UAC level!
}
shared_buffer = MapViewOfFile(map_file, FILE_MAP_ALL_ACCESS, 0, 0, sizeof(watch_collection));
if (shared_buffer == NULL)
{
CloseHandle(map_file); // Didn't work, clean up the file handle and bail.
return 1;
}
// Do some stuff
while (is_running) {
start = chrono_hclock::now();
for (int i = 0; i < 10000; i++)
{
for (int j = 0; j < 10000; j++)
{
for (int k = 0; k < 10000; k++) {
watch_collection watches { i = i, j = j, k = k };
CopyMemory(shared_buffer, (void*)&watches, (sizeof(watch_collection))); // Copy the watches to the shared memory space
// Do more things...
}
}
}
end = chrono_hclock::now();
fps = (float)1e9 / (float)std::chrono::duration_cast<std::chrono::nanoseconds>(end - start).count();
if (++i_times == 1000000) is_running = false;
}
// Clean up the shared memory buffer and handle
UnmapViewOfFile(shared_buffer);
CloseHandle(map_file);
return 0;
}
Watcher application:
#include <windows.h>
#include <stdio.h>
#include <conio.h>
#include <tchar.h>
#pragma comment(lib, "user32.lib")
PCWSTR SHARED_MEMORY_NAME = L"Global\\WatchMemory";
struct watch_collection // Container for everything we want to watch
{
int i;
int j;
int k;
};
int main()
{
HANDLE map_file;
void* shared_buffer;
bool is_running = true;
watch_collection watches; // Put a watch on watches
// Connect to the shared memory
map_file = OpenFileMapping(FILE_MAP_ALL_ACCESS, FALSE, SHARED_MEMORY_NAME);
if (map_file == NULL)
{
return 1; // Couldn't open the handle, bail. Check UAC level!
}
shared_buffer = MapViewOfFile(map_file, FILE_MAP_ALL_ACCESS, 0, 0, sizeof(watch_collection));
if (shared_buffer == NULL)
{
CloseHandle(map_file);
return 1;
}
// Loop forever
while (is_running)
{
CopyMemory((void*)&watches, shared_buffer, (sizeof(watch_collection)));
} // Breakpoint here
UnmapViewOfFile(shared_buffer);
CloseHandle(map_file);
return 0;
}
I have written a C++ program for solving a difficult optimization problem using multiple processors. Its basic structure can be seen in the snippet below. The paralellization is made in a simple way using glib, by spawning threads with g_thread_new.
The program was originally developed in Linux, where htop shows that it uses 100% of all cores. But in Windows the CPU usage peaks at around 30-40% in a quad-core computer with 4 processors + 4 virtual processors. I have compiled it in Windows using MinGW and g++.
Why is the performance so degraded under Windows? Is this caused by the fact that I compiled the program using MinGW?
#include <gtk/gtk.h>
#include <thread>
using namespace std;
void intensive_function() {
//... heavy computations
return;
}
static gpointer worker(gpointer data) {
intensive_function();
return NULL;
}
int main(int argc, char *argv[]) {
int processors = thread::hardware_concurrency();
for(int i = 0; i < processors; i++) {
GThread *thread;
thread = g_thread_new("worker", worker, NULL);
g_thread_unref(thread);
}
}
Try to check value:
int processors = thread::hardware_concurrency();
the value can be other than processors/cores amount.
I have a problem with my sfml project in c++. After compiling and running this simple code, i start using my mouse in the window (the code is a very simple pathfinding algorithm, in which, where i click, that's where the "ch" texture goes) and after aprox. 10 or so seconds the window stops responding. The only time when the program doesn't crash is when i run it in debug mode. i had this problem some time ago with a bigger project, but because of this problem i gave up on it. I believe that the program crashing has somthing to do with using the mouse, because in the bigger project i once started and crashed in the same way I also used the sfml mouse functions, and when compiled and ran in debug mode, it didn't crash. I'm programming in Code::Blocks version:13.12, and the SFML library i'm not sure what version. I have no idea why this happens, so ask you for help whith this problem. Thanks:D
The code:
#include <SFML/Graphics.hpp>
#include <iostream>
#include <conio.h>
#include <windows.h>
using namespace std;
using namespace sf;
int main()
{
RenderWindow win(VideoMode(700,700),"test");
float x=10,y=10;
int mx=x,my=y;
int mxo,myo;
Texture t;
t.loadFromFile("char.png");
Sprite ch;
ch.setTexture(t);
ch.setPosition(x,y);
while(win.isOpen())
{
win.clear();
if(Mouse::isButtonPressed(Mouse::Left))
{
mx=Mouse::getPosition(win).x;
my=Mouse::getPosition(win).y;
}
if(x!=mx)
{
if(mx>x)
{
x++;
}
if(mx<x)
{
x--;
}
Sleep(2);
}
if(y!=my)
{
if(my>y)
{
y++;
}
if(my<y)
{
y--;
}
Sleep(2);
}
ch.setPosition(x,y);
win.draw(ch);
win.display();
}
return 0;
}
You have blocking sleeps in the main event loop, and are not handling events.
This makes the OS think that the program is unresponsive, and it tells you that.
It didn't actually crash.
I have setup a chrooted Debian Etch (32bit) under Ubuntu 12.04 (64bit), and it appears that clock_gettime() works with CLOCK_MONOTONIC, but fails with both CLOCK_PROCESS_CPUTIME_ID and CLOCK_THREAD_CPUTIME_ID. The errno is set to EINVAL, which according to the man page means that "The clk_id specified is not supported on this system."
All three clocks work fine outside the chrooted Debian and in 64bit chrooted Debian etch.
Can someone explains to me why this is the case and how to fix it?
Much appreciated.
I don't know the cause yet, but I have ideas that won't fit in the comment box.
First, you can make the test program simpler by compiling it as C instead of C++ and not linking it to libpthread. -lrt should be good enough to get clock_gettime. Also, compiling it with -static could make tracing easier since the dynamic linker startup stuff won't be there.
Static linking might even change the behavior of clock_gettime. It's worth trying just to find out whether it works around the bug.
Another thing I'd like to see is the output of this vdso-bypassing test program:
#define _GNU_SOURCE
#include <stdio.h>
#include <time.h>
#include <unistd.h>
#include <sys/syscall.h>
int main(void)
{
struct timespec ts;
if(syscall(SYS_clock_gettime, CLOCK_PROCESS_CPUTIME_ID, &ts)) {
perror("clock_gettime");
return 1;
}
printf("CLOCK_PROCESS_CPUTIME_ID: %lu.%09ld\n",
(unsigned long)ts.tv_sec, ts.tv_nsec);
return 0;
}
with and without -static, and if it fails, add strace.
Update (actually, skip this. go to the second update)
A couple more simple test ideas:
compile and run a 32-bit test program in the Ubuntu host system, by adding -m32 to the gcc command. It's possible that the kernel's 32-bit compatibility mode is causing the error. If that's the case, then the 32-bit version will fail no matter which libc it gets linked to.
take the non-static test programs you compiled under Debian, copy them to the Ubuntu host system and try to run them there. Change in behavior will point to libc as the cause.
Then it's time for the hard stuff. Looking at disassembled code and maybe single-stepping it in gdb. Instead of having you do that on your own, I'd like to get a copy of the code you're running. Upload a a static-compiled failing test program somewhere I can get it. Also a copy of the 32-bit vdso provided by your kernel might be interesting. To extract the vdso, run the following program (compiled in the 32-bit chroot) which will create a file called vdso.dump, and upload that too.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
static int getvseg(const char *which, const char *outfn)
{
FILE *maps, *outfile;
char buf[1024];
void *start, *end;
size_t sz;
void *copy;
int ret;
char search[strlen(which)+4];
maps = fopen("/proc/self/maps", "r");
if(!maps) {
perror("/proc/self/maps");
return 1;
}
outfile = fopen(outfn, "w");
if(!outfile) {
perror(outfn);
fclose(maps);
return 1;
}
sprintf(search, "[%s]\n", which);
while(fgets(buf, sizeof buf, maps)) {
if(strlen(buf)<strlen(search) ||
strcmp(buf+strlen(buf)-strlen(search),search))
continue;
if(sscanf(buf, "%p-%p", &start, &end)!=2) {
fprintf(stderr, "weird line in /proc/self/maps: %s", buf);
continue;
}
sz = (char *)end - (char *)start;
/* copy because I got an EFAULT trying to write directly from vsyscall */
copy = malloc(sz);
if(!copy) {
perror("malloc");
goto fail;
}
memcpy(copy, start, sz);
if(fwrite(copy, 1, sz, outfile)!=sz) {
if(ferror(outfile))
perror(outfn);
else
fprintf(stderr, "%s: short write", outfn);
free(copy);
goto fail;
}
free(copy);
goto success;
}
fprintf(stderr, "%s not found\n", which);
fail:
ret = 1;
goto out;
success:
ret = 0;
out:
fclose(maps);
fclose(outfile);
return ret;
}
int main(void)
{
int ret = 1;
if(!getvseg("vdso", "vdso.dump")) {
printf("vdso dumped to vdso.dump\n");
ret = 0;
}
if(!getvseg("vsyscall", "vsyscall.dump")) {
printf("vsyscall dumped to vsyscall.dump\n");
ret = 0;
}
return ret;
}
Update 2
I reproduced this by downloading an etch libc. It's definitely caused be glibc stupidity. Instead of a simple syscall wrapper for clock_gettime it has a big wad of preprocessor spaghetti culminating in "you can't use clockid's that we didn't pre-approve". You're not going to get it to work with that old glibc. Which brings us to the question I didn't want to ask: why are you trying to use an obsolete version of Debian anyway?
I'm running the following code, using Visual Studio 2008 SP1, on Windows Vista Business x64, quad core machine, 8gb ram.
If I build a release build, and run it from the command line, it reports 31ms. If I then start it from the IDE, using F5, it reports 23353ms.
Here are the times: (all Win32 builds)
DEBUG, command line: 421ms
DEBUG, from the IDE: 24,570ms
RELEASE, command line: 31ms
RELEASE, from IDE: 23,353ms
code:
#include <windows.h>
#include <iostream>
#include <set>
#include <algorithm>
using namespace std;
int runIntersectionTestAlgo()
{
set<int> set1;
set<int> set2;
set<int> intersection;
// Create 100,000 values for set1
for ( int i = 0; i < 100000; i++ )
{
int value = 1000000000 + i;
set1.insert(value);
}
// Create 1,000 values for set2
for ( int i = 0; i < 1000; i++ )
{
int random = rand() % 200000 + 1;
random *= 10;
int value = 1000000000 + random;
set2.insert(value);
}
set_intersection(set1.begin(),set1.end(), set2.begin(), set2.end(), inserter(intersection, intersection.end()));
return intersection.size();
}
int main(){
DWORD start = GetTickCount();
runIntersectionTestAlgo();
DWORD span = GetTickCount() - start;
std::cout << span << " milliseconds\n";
}
Running under a Microsoft debugger (windbg, kd, cdb, Visual Studio Debugger) by default forces Windows to use the debug heap instead of the default heap. On Windows 2000 and above, the default heap is the Low Fragmentation Heap, which is insanely good compared to the debug heap. You can query the kind of heap you are using with HeapQueryInformation.
To solve your particular problem, you can use one of the many options recommended in this KB article: Why the low fragmentation heap (LFH) mechanism may be disabled on some computers that are running Windows Server 2003, Windows XP, or Windows 2000
For Visual Studio, I prefer adding _NO_DEBUG_HEAP=1 to Project Properties->Configuration Properties->Debugging->Environment. That always does the trick for me.
Pressing pause while in the VS IDE shows that the additional time appears to be spent in malloc/free. This would lead me to believe the debugging support in MS's malloc and free implementation have additional logic if the debugger is attached. This would explain the discrepancy in times from the console and from the debugger.
EDIT: Confirmed by running with CTRL+F5 v. F5 (1047ms v. 9088ms on my machine)
So it sounds like this may just be what happens when one attaches the debugger. However, I just can't my head around the performance changing from 30ms to 23,000ms because of that, especially when the rest of my code seems to run just as fast whether or not the debugger is attached.