dprintf vs returning in C++ - c++

Disclaimer that I know nothing about C++ so bear with me... I am looking at some existing code which prints a continuous stream of strings describing the position of a VR controller.
void CMainApplication::printDevicePositionalData(const char * deviceName, vr::HmdMatrix34_t posMatrix, vr::HmdVector3_t position, vr::HmdQuaternion_t quaternion)
{
LARGE_INTEGER qpc; // Query Performance Counter for Acquiring high-resolution time stamps.
// From MSDN: "QPC is typically the best method to use to time-stamp events and
// measure small time intervals that occur on the same system or virtual machine.
QueryPerformanceCounter(&qpc);
// Print position and quaternion (rotation).
dprintf("\n%lld, %s, x = %.5f, y = %.5f, z = %.5f, qw = %.5f, qx = %.5f, qy = %.5f, qz = %.5f",
qpc.QuadPart, deviceName,
position.v[0], position.v[1], position.v[2],
quaternion.w, quaternion.x, quaternion.y, quaternion.z);
}
When I run the compiled exe in powershell it does not seem to print anything. Only if I run .\this_program.exe | tee output.txt do I see anything, as it simultaneously writes to a .txt file.
How can I change to above code to return these values, as I want to be able to read them in realtime with python using subprocess and stdout. Thanks

If you want to print to the console output, you should not be using:
dprintf - This function prints a formatted string to the command window for the debugger.
With C++, IO streams should be used (std::cout, std::clog, or std::cerr).
Or fallback to printf.

Related

error -11 OpenCL

i'm getting error -11 on this line
checkerror(clBuildProgram(program, deviceidcount, deviceids.data(), nullptr, nullptr, nullptr));
my kernel is
__kernel void render(double playerx,double playery,double playerz,double yaw,double pitch,double x1,double y1,double z1,double x2,double y2,double z2,double x3,double y3,double z3,__global int* texture){
//const int i = get_global_id(0);
//x[i] = a*x[i];
//x[i] = cos(a);
x1 = x1-playerx;
y1 = y1-playery;
z1 = z1-playerz;
x2 = x2-playerx;
y2 = y2-playery;
z2 = z2-playerz;
x3 = x3-playerx;
y3 = y3-playery;
z3 = z3-playerz;
double smallyaw = yaw - M_PI_2;
double bigpitch = pitch + M_PI_2;
double screenx1 = cos(smallyaw)*cos(pitch)*x1 + sin(smallyaw)*cos(pitch)*y1 + sin(pitch)*z1;
double screeny1 = cos(yaw)*cos(bigpitch)*x1 + sin(yaw)*cos(bigpitch)*y1 + sin(bigpitch)*z1;
double screenz1 = cos(yaw)*cos(pitch)*x1 + sin(yaw)*cos(pitch)*y1 + sin(pitch)*z1;
printf(screenx1);
printf(screeny1);
printf(screenz1);
}
i can't see anything wrong with it in the terms of syntax. and i also tried replacing all the doubles with floats.
this is stupid after looking at this for the longest time i commented out the printf lines and it worked. how am i supposed to check what these variables are equal to. can someone tell me how to properly print things?
printf("value = %#g\n", 3.012);
prints 3.012 to console.
Printing to console should be done in a thread-safe way so your cl thread should be same as console flushing thread.
Printing output from many cores can give unexpected results. Timings are off, prints don't come back in any particular order etc. Try to print from only one work item.
if i == 0{
printf(...)
}
You can also put a barrier above that and loop through multiple values from work item 0 if you need to.
i can't see anything wrong with it in the terms of syntax.
Well try looking harder because that's not how you use printf.
If you want to print doubles use
printf("%f", value);
If this doesn't make sense would recommend reading documentation for both generic C printf and OpenCL printf.
Also since -11 is CL_BUILD_PROGRAM _FAILURE you can use clGetProgramBuildInfo to retrieve the build log and check where the compilation went wrong.

CUDA kernel causing causing "display driver not responding" with the addition of 4 lines

The basic problem was as follows:
When I run the below Kernel with N threads and don't include the 4
lines to instantiate and populate the ScaledLLA variable every thing
works fine.
When I run the below Kernel with N threads and do include the 4
lines to instantiate and populate the ScaledLLA variable the GPU locks
up, and Windows throws a "display driver not responding" error.
If I reduce the number of threads running by reducing the grid size
everything worked fine.
I'm new to CUDA and have been incrementally building out some GIS functionality.
my host code looks like this at the kernel call.
MapperKernel << <g_CUDAControl->aGetGridSize(), g_CUDAControl->aGetBlockSize() >> >(g_Deltas.lat, g_Deltas.lon, 32.2,
g_DataReader->aGetMapper().aGetRPCBoundingBox()[0], g_DataReader->aGetMapper().aGetRPCBoundingBox()[1],
g_CUDAControl->aGetBlockSize().x,
g_CUDAControl->aGetThreadPitch(),
LLA_Offset,
LLA_ScaleFactor,
RPC_XN,RPC_XD,RPC_YN,RPC_YD,
Pixel_Offset, Pixel_ScaleFactor,
device_array);
cudaDeviceSynchronize(); //code crashes here
host_array = (point3D*)malloc(num_bytes);
cudaMemcpy(host_array, device_array, num_bytes, cudaMemcpyDeviceToHost);
the Kernel that is being called looks like this:
__global__ void MapperKernel(double deltaLat, double deltaLon, double passedAlt,
double minLat, double minLon,
int threadsperblock,
int threadPitch,
point3D LLA_Offset,
point3D LLA_ScaleFactor,
double * RPC_XN, double * RPC_XD, double * RPC_YN, double * RPC_YD,
point2D pixelOffset, point2D pixelScaleFactor,
point3D * rValue)
{
//calculate thread's LLA
int latindex = threadIdx.x + blockIdx.x*threadsperblock;
int lonindex = threadIdx.y + blockIdx.y*threadsperblock;
point3D LLA;
LLA.lat = ((double)(latindex))*deltaLat + minLat;
LLA.lon = ((double)(lonindex))*deltaLon + minLon;
LLA.alt = passedAlt;
//scale threads LLA - adding these four lines is what causes the problem
point3D ScaledLLA;
ScaledLLA.lat = (LLA.lat - LLA_Offset.lat) * LLA_ScaleFactor.lat;
ScaledLLA.lon = (LLA.lon - LLA_Offset.lon) * LLA_ScaleFactor.lon;
ScaledLLA.alt = (LLA.alt - LLA_Offset.alt) * LLA_ScaleFactor.alt;
rValue[lonindex*threadPitch + latindex] = ScaledLLA; //if I assign LLA without calculating ScaledLLA everything works fine
}
if I assign LLA to rValue then everything executes quickly and I get the expected behavior; however, when I add those fourlines for ScaledLLA and try to assign it to rValue, CUDA takes too long for windows's liking at the cudaDeviceSynchronize() call and I get a
"display driver not responding" error that then proceeds to reset the GPU. From looking around the error appears to be a windows thing that occurs when Windows believes that the GPU isn't being responsive. I am certain that the kernel is running and performing the right calculations, because I have stepped through it with the NSIGHT debugger.
Does anybody have a good explanation for why adding those three lines to the kernel would cause the execution time to spike?
I'm running Win7 VS 2013 and have nsight 4.5 installed.
For those who get here later via a search engine. It turns out the problem was with the card running out of memory.
That should probably have been one of the top couple of things to think of since the problem occurred only after the instantiation was added.
The card only had so much memory (~2GB) and my rvalue buffer was taking up most (~1.5GB) of it. With every thread trying to instantiate its own point3D variable the card simply ran out of memory.
For those interested NSight's profiler said that it was a cudaUknownError.
The fix was to lower the number of threads running the kernel

C++ beginner how to use GetSystemTimeAsFileTime

I have a program that reads the current time from the system clock and saves it to a text file. I previously used the GetSystemTime function which worked, but the times weren't completely consistent eg: one of the times is 32567.789 and the next time is 32567.780 which is backwards in time.
I am using this program to save the time up to 10 times a second. I read that the GetSystemTimeAsFileTime function is more accurate. My question is, how to I convert my current code to use the GetSystemTimeAsFileTime function? I tried to use the FileTimeToSystemTime function but that had the same problems.
SYSTEMTIME st;
GetSystemTime(&st);
WORD sec = (st.wHour*3600) + (st.wMinute*60) + st.wSecond; //convert to seconds in a day
lStr.Format( _T("%d %d.%d\n"),GetFrames() ,sec, st.wMilliseconds);
std::wfstream myfile;
myfile.open("time.txt", std::ios::out | std::ios::in | std::ios::app );
if (myfile.is_open())
{
myfile.write((LPCTSTR)lStr, lStr.GetLength());
myfile.close();
}
else {lStr.Format( _T("open file failed: %d"), WSAGetLastError());
}
EDIT To add some more info, the code captures an image from a camera which runs 10 times every second and saves the time the image was taken into a text file. When I subtract the 1st entry of the text file from the second and so on eg: entry 2-1 3-2 4-3 etc I get this graph, where the x axis is the number of entries and the y axis is the subtracted values.
All of them should be around the 0.12 mark which most of them are. However you can see that a lot of them vary and some even go negative. This isn't due to the camera because the camera has its own internal clock and that has no variations. It has something to do with capturing the system time. What I want is the most accurate method to extract the system time with the highest resolution and as little noise as possible.
Edit 2 I have taken on board your suggestions and ran the program again. This is the result:
As you can see it is a lot better than before but it is still not right. I find it strange that it seems to do it very incrementally. I also just plotted the times and this is the result, where x is the entry and y is the time:
Does anyone have any idea on what could be causing the time to go out every 30 frames or so?
First of all, you wanna get the FILETIME as follows
FILETIME fileTime;
GetSystemTimeAsFileTime(&fileTime);
// Or for higher precision, use
// GetSystemTimePreciseAsFileTime(&fileTime);
According to FILETIME's documentation,
It is not recommended that you add and subtract values from the FILETIME structure to obtain relative times. Instead, you should copy the low- and high-order parts of the file time to a ULARGE_INTEGER structure, perform 64-bit arithmetic on the QuadPart member, and copy the LowPart and HighPart members into the FILETIME structure.
So, what you should be doing next are
ULARGE_INTEGER theTime;
theTime.LowPart = fileTime.dwLowDateTime;
theTime.HighPart = fileTime.dwHighDateTime;
__int64 fileTime64Bit = theTime.QuadPart;
And that's it. The fileTime64Bit variable now contains the time you're looking for.
If you want to get a SYSTEMTIME object instead, you could just do the following:
SYSTEMTIME systemTime;
FileTimeToSystemTime(&fileTime, &systemTime);
Getting the system time out of Windows with decent accuracy is something that I've had fun with, too... I discovered that Javascript code running on Chrome seemed to produce more consistent timer results than I could with C++ code, so I went looking in the Chrome source. An interesting place to start is the comments at the top of time_win.cc in the Chrome source. The links given there to a Mozilla bug and a Dr. Dobb's article are also very interesting.
Based on the Mozilla and Chrome sources, and the above links, the code I generated for my own use is here. As you can see, it's a lot of code!
The basic idea is that getting the absolute current time is quite expensive. Windows does provide a high resolution timer that's cheap to access, but that only gives you a relative, not absolute time. What my code does is split the problem up into two parts:
1) Get the system time accurately. This is in CalibrateNow(). The basic technique is to call timeBeginPeriod(1) to get accurate times, then call GetSystemTimeAsFileTime() until the result changes, which means that the timeBeginPeriod() call has had an effect. This gives us an accurate system time, but is quite an expensive operation (and the timeBeginPeriod() call can affect other processes) so we don't want to do it each time we want a time. The code also calls QueryPerformanceCounter() to get the current high resolution timer value.
bool NeedCalibration = true;
LONGLONG CalibrationFreq = 0;
LONGLONG CalibrationCountBase = 0;
ULONGLONG CalibrationTimeBase = 0;
void CalibrateNow(void)
{
// If the timer frequency is not known, try to get it
if (CalibrationFreq == 0)
{
LARGE_INTEGER freq;
if (::QueryPerformanceFrequency(&freq) == 0)
CalibrationFreq = -1;
else
CalibrationFreq = freq.QuadPart;
}
if (CalibrationFreq > 0)
{
// Get the current system time, accurate to ~1ms
FILETIME ft1, ft2;
::timeBeginPeriod(1);
::GetSystemTimeAsFileTime(&ft1);
do
{
// Loop until the value changes, so that the timeBeginPeriod() call has had an effect
::GetSystemTimeAsFileTime(&ft2);
}
while (FileTimeToValue(ft1) == FileTimeToValue(ft2));
::timeEndPeriod(1);
// Get the current timer value
LARGE_INTEGER counter;
::QueryPerformanceCounter(&counter);
// Save calibration values
CalibrationCountBase = counter.QuadPart;
CalibrationTimeBase = FileTimeToValue(ft2);
NeedCalibration = false;
}
}
2) When we want the current time, get the high resolution timer by calling QueryPerformanceCounter(), and use the change in that timer since the last CalibrateNow() call to work out an accurate "now". This is in Now() in my code. This also periodcally calls CalibrateNow() to ensure that the system time doesn't go backwards, or drift out.
FILETIME GetNow(void)
{
for (int i = 0; i < 4; i++)
{
// Calibrate if needed, and give up if this fails
if (NeedCalibration)
CalibrateNow();
if (NeedCalibration)
break;
// Get the current timer value and use it to compute now
FILETIME ft;
::GetSystemTimeAsFileTime(&ft);
LARGE_INTEGER counter;
::QueryPerformanceCounter(&counter);
LONGLONG elapsed = ((counter.QuadPart - CalibrationCountBase) * 10000000) / CalibrationFreq;
ULONGLONG now = CalibrationTimeBase + elapsed;
// Don't let time go back
static ULONGLONG lastNow = 0;
now = max(now,lastNow);
lastNow = now;
// Check for clock skew
if (LONGABS(FileTimeToValue(ft) - now) > 2 * GetTimeIncrement())
{
NeedCalibration = true;
lastNow = 0;
}
if (!NeedCalibration)
return ValueToFileTime(now);
}
// Calibration has failed to stabilize, so just use the system time
FILETIME ft;
::GetSystemTimeAsFileTime(&ft);
return ft;
}
It's all a bit hairy but works better than I had hoped. This also seems to work well as far back on Windows as I have tested (which was Windows XP).
I believe you are looking for GetSystemTimePreciseAsFileTime() function or even QueryPerformanceCounter() - to be short for something that is guarantied to produce monotone values.

How do I effectively use PortAudio Pa_OpenStream() on Windows?

I wrote a small sound playing library with PortAudio on Linux. It's for a small game, so there are lots of little sounds when various things happen. I open up a stream for each wav file to play by calling Pa_OpenStream(). On linux this call takes on average around 10ms. However on Windows this typically takes 40 to 70ms. And worse, the first call takes 1.3 seconds. Then after that occasionally it will again take 1.3 seconds. I haven't been able to find anything consistent about why it hangs, except that it happens every first call. The windows build actually runs fine on Wine.
I assume this has to do with differences in the underlying sound API in use in different systems. But oddly enough I haven't found any information anywhere, despite extensive searching.
Here's my play function:
int play(const char * sN)
{
float threshold = .01f;
char * soundName = (char*)sN;
float g = glfwGetTime();
updatePlayer();
float g2 = glfwGetTime();
if (g2-g > threshold) printf("updatePlayer: %f/", g2 - g);
if (!paused && (int)streams.size() < maxStreams && !mute)
{
streamStr * ss = new streamStr;
g = glfwGetTime();
if (g-g2 > threshold) printf("new stream: %f/", g - g2);
PaError err;
sfData * sdata = getData(soundName);
ss->sfd = sdata;
g2 = glfwGetTime();
if (g2-g > threshold)printf("getData: %f/", g2 - g);
err = Pa_OpenStream(&(ss->stream), 0, &sdata->outputParameters, sdata->sfInfo.samplerate, paFramesPerBufferUnspecified, paNoFlag, PaCallback, ss);
if (err)
{
printf("PortAudio error opening output: %s\n", Pa_GetErrorText(err));
delete ss;
return 1;
}
g = glfwGetTime();
if (g-g2 > threshold)
printf("Pa_OpenStream: %f/", g - g2);
Pa_StartStream(ss->stream);
g2 = glfwGetTime();
if (g2-g > threshold)printf("Pa_StartStream: %f/", g2 - g);
addStreams(ss);
g = glfwGetTime();
if (g-g2 > threshold)printf("addStreams: %f", g - g2);
//Pa_SetStreamFinishedCallback(ss, finishedCallback);
printf("\n");
}
return 0;
}
IDK why it's taking that long (because I don't know windows), but I can say you are going about this the wrong way. Specifically, you shouldn't make any timing expectations about opening a new stream. For example, I would expect similar issues (albeit to a much lesser degree) on OS X.
The correct implementation would be to always have a stream open, playing silence. Then, when you need to play a sound, you can play it right away. For best latency, you should pre-load the first few buffers from the file so you don't need to access the disk when playback starts. I don't know what the exact overhead is on windows for opening a stream (I'm sure it depends on the API), but on some versions of OS X, it's huge (the entire kernel switches into preemptive mode if no audio was running before).
That said, 1.3 seconds is insane. I recommend asking on the mailing list. Be sure to say what host-API you are using because you didn't say that here, and, for Windows, it matters. Also, what version of windows.
To minimise startup latency for this use-case (i.e. expecting StartStream() to give minimum startup latency) you should use the paPrimeOutputBuffersUsingStreamCallback stream flag. Otherwise the initial buffers will be zero and the time it takes for the sound to hit the DACs will include playing out the buffer length of zeros (which would be around 80ms on Windows WMME or DirectSound with the default PA settings).

Windows: How do I calculate the time it takes a c/c++ application to run?

I am doing a performance comparison test. I want to record the run time for my c++ test application and compare it under different circumstances. The two cases to be compare are: 1) a file system driver is installed and active and 2) also when that same file system driver is not installed and active.
A series of tests will be conducted on several operating systems and the two runs described above will be done for each operating system and it's setup. Results will only be compared between the two cases for a given operating system and setup.
I understand that when running a c/c++ application within an operating system that is not a real-time system there is no way to get the real time it took for the application to run. I don't think this is a big concern as long as the test application runs for a fairly long period of time, therefore making the scheduling, priorities, switching, etc of the CPU negligible.
Edited: For Windows platform only
How can I generate some accurate application run time results within my test application?
If you're on a POSIX system you can use the time command, which will give you the total "wall clock" time as well as the actual CPU times (user and system).
Edit: Apparently there's an equivalent for Windows systems in the Windows Server 2003 Resource Kit called timeit.exe (not verified).
I think what you are asking is "How do I measure the time it takes for the process to run, irrespective of the 'external' factors, such as other programs running on the system?" In that case, the easiest thing would be to run the program multiple times, and get an average time. This way you can have a more meaningful comparison, hoping that various random things that the OS spends the CPU time on will average out. If you want to get real fancy, you can use a statistical test, such as the two-sample t-test, to see if the difference in your average timings is actually significant.
You can put this
#if _DEBUG
time_t start = time(NULL);
#endif
and finish with this
#if _DEBUG
time end = time(NULL);
#endif
in your int main() method. Naturally you'll have to return the difference either to a log or cout it.
Just to expand on ezod's answer.
You run the program with the time command to get the total time - there are no changes to your program
If you're on a Windows system you can use the high-performance counters by calling QueryPerformanceCounter():
#include <windows.h>
#include <string>
#include <iostream>
int main()
{
LARGE_INTEGER li = {0}, li2 = {0};
QueryPerformanceFrequency(&li);
__int64 freq = li.QuadPart;
QueryPerformanceCounter(&li);
// run your app here...
QueryPerformanceCounter(&li2);
__int64 ticks = li2.QuadPart-li.QuadPart;
cout << "Reference Implementation Ran In " << ticks << " ticks" << " (" << format_elapsed((double)ticks/(double)freq) << ")" << endl;
return 0;
}
...and just as a bonus, here's a function that converts the elapsed time (in seconds, floating point) to a descriptive string:
std::string format_elapsed(double d)
{
char buf[256] = {0};
if( d < 0.00000001 )
{
// show in ps with 4 digits
sprintf(buf, "%0.4f ps", d * 1000000000000.0);
}
else if( d < 0.00001 )
{
// show in ns
sprintf(buf, "%0.0f ns", d * 1000000000.0);
}
else if( d < 0.001 )
{
// show in us
sprintf(buf, "%0.0f us", d * 1000000.0);
}
else if( d < 0.1 )
{
// show in ms
sprintf(buf, "%0.0f ms", d * 1000.0);
}
else if( d <= 60.0 )
{
// show in seconds
sprintf(buf, "%0.2f s", d);
}
else if( d < 3600.0 )
{
// show in min:sec
sprintf(buf, "%01.0f:%02.2f", floor(d/60.0), fmod(d,60.0));
}
// show in h:min:sec
else
sprintf(buf, "%01.0f:%02.0f:%02.2f", floor(d/3600.0), floor(fmod(d,3600.0)/60.0), fmod(d,60.0));
return buf;
}
Download Cygwin and run your program by passing it as an argument to the time command. When you're done, spend some time to learn the rest of the Unix tools that come with Cygwin. This will be one of the best investments for your career you'll ever make; the Unix toolchest is a timeless classic.
QueryPerformanceCounter can have problems on multicore systems, so I prefer to use timeGetTime() which gives the result in milliseconds
you need a 'timeBeginPeriod(1)' before and 'timeEndPeriod(1)' afterwards to reduce the granularity as far as you can but I find it works nicely for my purposes (regulating timesteps in games), so it should be okay for benchmarking.
You can also use the program very sleepy to get a bunch of runtime information about your program. Here's a link: http://www.codersnotes.com/sleepy