Problem with using printf in OpenCL kernel - c++

I use OpenCL 2.0 on AMD. The code is pretty simple. If I use 1 printf, the work is good. But if i add a second printf, then there will be crooked data.
My Code in host C++:
cl_int errcode;
// Get available platforms
vector<Platform> platforms;
Platform::get(&platforms);
// Select the default platform and create a context using this platform and the GPU
cl_context_properties cps[3] = {
CL_CONTEXT_PLATFORM,
(cl_context_properties)(platforms[0])(),
0
};
Context context(CL_DEVICE_TYPE_GPU, cps);
vector<Device> devices = context.getInfo<CL_CONTEXT_DEVICES>();
CommandQueue queue = CommandQueue(context, devices[0]);
// Read source file
string name;
name += "CalcN.cl";
std::ifstream sourceFile(name);
std::string sourceCode(
std::istreambuf_iterator<char>(sourceFile),
(std::istreambuf_iterator<char>()));
Program::Sources source(1, std::make_pair(sourceCode.c_str(), sourceCode.length() + 1));
Program program = Program(context, source);
errcode = program.build(devices);
if (errcode != CL_SUCCESS)
{
cout << "There were error during build kernel code. Please, check program code. Errcode = " << errcode << "\n";
cout << "BUILD LOG: " + program.getBuildInfo<CL_PROGRAM_BUILD_LOG>(devices[0]) + "\n";
getchar();
}
// Make kernel
Kernel kernel(program, "Optimization");
NDRange global(1);
queue.enqueueNDRangeKernel(kernel, 0, global);
My Code in kernel:
__kernel void Optimization()
{
for(int i = 0;i<100;i++)
{
printf("%d",i);
printf("%d",i);
}
}
Console with One printf
And console with Two printf:
I’ve already asked about this problem more than once, but no one knows.

Your output prints new lines after each printf while there isn't a \n in your code. My system wouldn't do that; it would print 112233... in one line.
You could try printf("%i\n",i);.

The problem was with the video card drivers. Today they released an update that fixes this bug.

Just use setbuf(stdout,NULL);. Write it under the declaration.

Related

Can't figure out how to call IOCTL_STORAGE_MANAGE_DATA_SET_ATTRIBUTES IOCTL on Windows - INVALID_PARAMETERS

Greetings!
I have come today to ask a question about invoking a very specific IOCTL on Windows. I have some amount of driver development experience, but my experience with file system drivers is relatively limited.
The Goal
I am developing a tool that manages volumes/physical disks/partitions. For the purpose I am attempting to learn to invoke many of the Windows file system data set management (DSM) IOCTLs. Currently I am learning how to use IOCTL_STORAGE_MANAGE_DATA_SET_ATTRIBUTES which is documented at https://learn.microsoft.com/en-us/windows/win32/api/winioctl/ni-winioctl-ioctl_storage_manage_data_set_attributes?redirectedfrom=MSDN.
However, I have had to intuit how to set up the call to the IOCTL myself. The MSDN article does not give fully detailed instructions on how to set up the input buffer, and specifically what values of the inputs are strictly required. I have uncertainty about how to call the IOCTL that has lead to a bug I cannot debug easily.
In order to reduce my uncertainty about proper invocation of the IOCTL I worked off a tool MS released a few years ago and copied some of their code: https://github.com/microsoft/StorScore/blob/7cbe261a7cad74f3a4f758c2b8a35ca552ba8dde/src/StorageTool/src/_backup.c
My Code
At first I tried:
#include <windows.h>
#include <stdio.h>
#include <string>
#include <iostream>
#include <winnt.h>
#include <winternl.h>
#include <ntddstor.h>
int main(int argc, const char* argv[]) {
//My understanding is for this IOCTL I need to open the drive, not the object PartmgrControl device that the driver registers.
HANDLE hDevice = CreateFile(L"\\\\.\\Physicaldrive0",
GENERIC_READ | GENERIC_WRITE,
FILE_SHARE_READ | FILE_SHARE_WRITE,
NULL,
OPEN_EXISTING,
FILE_FLAG_NO_BUFFERING,
NULL);
int cf_error = 0;
cf_error = GetLastError();
if (hDevice == INVALID_HANDLE_VALUE) {
std::cout << "COULDN'T GET HANDLE";
return -1;
}
std::cout << "Device Handle error: " << cf_error << "\n";
std::cout << "Handle value: " << hDevice << "\n";
_DEVICE_MANAGE_DATA_SET_ATTRIBUTES attributes_struct;
LPDWORD BytesReturned = 0;
int inputbufferlength = 0;
inputbufferlength = sizeof(DEVICE_MANAGE_DATA_SET_ATTRIBUTES) + sizeof(_DEVICE_DSM_OFFLOAD_WRITE_PARAMETERS) + sizeof(DEVICE_DATA_SET_RANGE);
PUCHAR inputbuffer = (PUCHAR)malloc(inputbufferlength);
PUCHAR outputbuffer = (PUCHAR)malloc(inputbufferlength);
//RtlZeroMemory(inputbuffer, inputBufferLength);
PDEVICE_MANAGE_DATA_SET_ATTRIBUTES dsmAttributes = (PDEVICE_MANAGE_DATA_SET_ATTRIBUTES)inputbuffer;
PDEVICE_DSM_OFFLOAD_WRITE_PARAMETERS offload_write_parameters = NULL;
dsmAttributes->Size = sizeof(DEVICE_MANAGE_DATA_SET_ATTRIBUTES);
dsmAttributes->Action = DeviceDsmAction_OffloadWrite;
dsmAttributes->Flags = 0;
dsmAttributes->ParameterBlockOffset = sizeof(DEVICE_MANAGE_DATA_SET_ATTRIBUTES);
dsmAttributes->ParameterBlockLength = sizeof(DEVICE_DSM_OFFLOAD_WRITE_PARAMETERS);
offload_write_parameters = (PDEVICE_DSM_OFFLOAD_WRITE_PARAMETERS)((PUCHAR)dsmAttributes + dsmAttributes->ParameterBlockOffset);
offload_write_parameters->Flags = 0;
offload_write_parameters->TokenOffset = 0;
dsmAttributes->DataSetRangesOffset = dsmAttributes->ParameterBlockOffset + dsmAttributes->ParameterBlockLength;
dsmAttributes->DataSetRangesLength = sizeof(DEVICE_DATA_SET_RANGE);
PDEVICE_DATA_SET_RANGE lbaRange = NULL;
lbaRange = (PDEVICE_DATA_SET_RANGE)((PUCHAR)dsmAttributes + dsmAttributes->DataSetRangesOffset);
lbaRange->StartingOffset = 0; // not sure about this one for now
lbaRange->LengthInBytes = 256 * 1024 * 1024;
int status = DeviceIoControl(
hDevice, // handle to device
IOCTL_STORAGE_MANAGE_DATA_SET_ATTRIBUTES, // dwIoControlCode
inputbuffer, // input buffer
inputbufferlength, // size of the input buffer
outputbuffer, // output buffer
inputbufferlength, // size of the input buffer - modified to be too small!
BytesReturned, // number of bytes returned
0 //(LPOVERLAPPED) &overlapped_struct // OVERLAPPED structure
);
DWORD error_num = GetLastError();
CloseHandle(hDevice);
std::cout << "STATUS IS: " << status << "\n";
std::cout << "ERROR IS: " << error_num;
return 0;
}
But this returned error 87 ERROR_INVALID_PARAMETER when attempting to call it.
My instinct was to debug the IOCTL by placing a breakpoint on partmgr!PartitionIoctlDsm - I was under the impression the targeted IOCTL was throwing the error. However my breakpoint was not being hit. So, then I moved on to placing a breakpoint on the IOCTL dispatch table itself
bp partmgr!PartitionDeviceControl
But that BP is never hit either. So, something else before my driver is throwing the error.
The Question(s)
How should I go about debugging this? How do I figure which driver is throwing the error?
What is the correct way to invoke this driver without throwing errors?
Why Am I getting this error?
Additional information
To be absolutely clear, I am dead set on using this particular IOCTL function. This is a learning exercise, and I am not interested in using alternative/easier to use functionality to implement the same effect. My curiosity lies in figuring out why the IO manager wont let me call the function.
I am running this code as admin.
I am running this code in a virtual machine.
I am debugging with windbg preview over a COM port.
Through some sleuthing I believe this is a filter driver, and that other drivers can intercept and handle this request.
Let me know if there is any other information I can provide.

'pcap_loop' is not recording packets and isn't even running

I'm trying to do some simple packet capturing with pcap, and so I've created a handle to listen through eth0. My issue is with the pcap_loop(handle, 10, myCallback, NULL); line near the end of my code. I'm trying to use pcap_loop.
The expected output is supposed to be:
eth0
Activated!
1
2
3
...
10
Done processing packets!
Current output is missing the increments:
eth0
Activated!
Done processing packets!
Currently it's just skipping right through to "Done processing packets!" and I have no idea why. Even if it doesn't go to the callback, it should still be waiting on packets as the ;count' parameter (see documentation for pcap_loop) is set to 10.
#include <iostream>
#include <pcap.h>
#include <stdlib.h>
#include <netinet/in.h>
#include <arpa/inet.h>
void myCallback(u_char *useless, const struct pcap_pkthdr* hdr, const u_char*packet){
static int count = 1;
std::cout <<count<<std::endl;
count ++;
}
int main(){
char errbuf[PCAP_ERRBUF_SIZE];
char * devName;
char* net;
char* mask;
const u_char*packet;
struct in_addr addr;
struct pcap_pkthdr hdr;
bpf_u_int32 netp;
bpf_u_int32 maskp;
pcap_if_t *devs;
pcap_findalldevs(&devs, errbuf);
devName = pcap_lookupdev(errbuf);
std::cout <<devName<<std::endl;
int success = pcap_lookupnet(devName, &netp, &maskp, errbuf);
if(success<0){
exit(EXIT_FAILURE);
}
pcap_freealldevs(devs);
//Create a handle
pcap_t *handle = pcap_create(devName, errbuf);
pcap_set_promisc(handle, 1);
pcap_can_set_rfmon(handle);
//Activate the handle
if(pcap_activate(handle)){
std::cout <<"Activated!"<<std::endl;
}
else{
exit(EXIT_FAILURE);
}
pcap_loop(handle, 10, myCallback, NULL);
std::cout <<"Done processing packets!"<<std::endl;
//close handle
pcap_close(handle);
}
pcap_findalldevs(&devs, errbuf);
That call isn't doing anything useful, as you're not doing anything with devs other than freeing it. (You also aren't checking whether it succeeds or fails.) You might as well remove it unless you have some need to know what all the devices on which you can capture are.
pcap_can_set_rfmon(handle);
That all isn't doing anything useful, as you're not checking its return value. If you are capturing on a Wi-Fi device, and you want to capture in monitor mode, you call pcap_set_rfmon() - not pcap_can_set_rfmon() - on the handle after creating and before activating the handle.
//Activate the handle
if(pcap_activate(handle)){
std::cout <<"Activated!"<<std::endl;
}
else{
exit(EXIT_FAILURE);
}
To quote the pcap_activate() man page:
RETURN VALUE
pcap_activate() returns 0 on success without warnings, PCAP_WARN-
ING_PROMISC_NOTSUP on success on a device that doesn't support promis-
cuous mode if promiscuous mode was requested, PCAP_WARNING on success
with any other warning, PCAP_ERROR_ACTIVATED if the handle has already
been activated, PCAP_ERROR_NO_SUCH_DEVICE if the capture source speci-
fied when the handle was created doesn't exist, PCAP_ERROR_PERM_DENIED
if the process doesn't have permission to open the capture source,
PCAP_ERROR_RFMON_NOTSUP if monitor mode was specified but the capture
source doesn't support monitor mode, PCAP_ERROR_IFACE_NOT_UP if the
capture source is not up, and PCAP_ERROR if another error occurred. If
PCAP_WARNING or PCAP_ERROR is returned, pcap_geterr() or pcap_perror()
may be called with p as an argument to fetch or display a message
describing the warning or error. If PCAP_WARNING_PROMISC_NOTSUP,
PCAP_ERROR_NO_SUCH_DEVICE, or PCAP_ERROR_PERM_DENIED is returned,
pcap_geterr() or pcap_perror() may be called with p as an argument to
fetch or display an message giving additional details about the problem
that might be useful for debugging the problem if it's unexpected.
This means that the code above is 100% wrong - if pcap_activate() returns a non-zero value, it may have failed, and if it returns 0, it succeeded.
If the return value is negative, it's an error value, and it has failed. If it's non-zero but positive, it's a warning value; it has succeeded, but, for example, it might not have turned promiscuous mode on, as the OS or device might not let promiscuous mode be set.
So what you want is, instead:
//Activate the handle
int status;
status = pcap_activate(handle);
if(status >= 0){
if(status == PCAP_WARNING){
// warning
std:cout << "Activated, with warning: " << pcap_geterror(handle) << std::endl;
}
else if (status != 0){
// warning
std:cout << "Activated, with warning: " << pcap_statustostr(status) << std::endl;
}
else{
// no warning
std::cout <<"Activated!"<<std::endl;
}
}
else{
if(status == PCAP_ERROR){
std:cout << "Failed to activate: " << pcap_geterror(handle) << std::endl;
}
else{
std:cout << "Failed to activate: " << pcap_statustostr(status) << std::endl;
}
exit(EXIT_FAILURE);
}

clCreateContextFromType ends up in a SEGFAULT while execution

I am trying to create an OpenCL context on the platform which is containing my graphics card. But when I call clCreateContextFromType() a SEGFAULT is thrown.
int main(int argc, char** argv)
{
/*
...
*/
cl_platform_id* someValidPlatformId;
//creating heap space using malloc to store all platform ids
getCLPlatforms(someValidPlatformId);
//error handling for getCLPLatforms()
//OCLPlatform(cl_platform_id platform)
OCLPLatform platform = OCLPlatform(someValidPlatformId[0]);
//OCLContext::OCL_GPU_DEVICE == CL_DEVICE_TYPE_GPU
OCLContext context = OCLContext(platform,OCLContext::OCL_GPU_DEVICE);
/*
...
*/
}
cl_platform_id* getCLPlatforms(cl_platform_id* platforms)
{
cl_int errNum;
cl_uint numPlatforms;
numPlatforms = (cl_uint) getCLPlatformsCount(); //returns the platform count
//using clGetPlatformIDs()
//as described in the Khronos API
if(numPlatforms == 0)
return NULL;
errNum = clGetPlatformIDs(numPlatforms,platforms,NULL);
if(errNum != CL_SUCCESS)
return NULL;
return platforms;
}
OCLContext::OCLContext(OCLPlatform platform,unsigned int type)
{
this->initialize(platform,type);
}
void OCLContext::initialize(OCLPlatform platform,unsigned int type)
{
cl_int errNum;
cl_context_properties contextProperties[] =
{
CL_CONTEXT_PLATFORM,
(cl_context_properties)platform.getPlatformId(),
0
};
cout << "a" << endl;std::flush(cout);
this->context = clCreateContextFromType(contextProperties,
(cl_device_type)type,
&pfn_notify,
NULL,&errNum);
if(errNum != CL_SUCCESS)
throw OCLContextException();
cout << "b" << endl;std::flush(cout);
/*
...
*/
}
The given type is CL_DEVICE_TYPE_GPU and also the platform contained by the cl_context_properties array is valid.
To debug the error I implemented the following pfn_notify() function described by the Khronos API:
static void pfn_notify(const char* errinfo,
const void* private_info,
size_t cb, void* user_data)
{
fprintf(stderr, "OpenCL Error (via pfn_notify): %s\n", errinfo);
flush(cout);
}
Here is the ouput schown by the shell:
$ ./OpenCLFramework.exe
a
Segmentation fault
The machine i am working with has the following properties:
Intel Core i5 2500 CPU
NVIDIA Geforce 210 GPU
OS: Windows 7
AMD APP SDK 3.0 Beta
IDE: Eclipse with gdb
It would be great if somebody knew an answer to this problem.
The problem seems to be solved now.
Injecting the a valid cl_platform_id throught gdb solved the SEGFAULT. So I digged a little bit deeper and the issue for the error was that I saved the value as a standard primitive. When I called a function with this value casted to cl_platform_id some functions failed handling that. So it looks like it is a mingling of types what lead to this failure.
Now I save the value as cl_platform_id and cast it to an primitive when needed and not vice versa.
I thank you for your answers and apologize for the long radio silence for my part.

DeviceIoControl, passing an int to driver

Driver:
PIO_STACK_LOCATION pIoStackLocation = IoGetCurrentIrpStackLocation(pIrp);
PVOID pBuf = pIrp->AssociatedIrp.SystemBuffer;
switch (pIoStackLocation->Parameters.DeviceIoControl.IoControlCode)
{
case IOCTL_TEST:
DbgPrint("IOCTL IOCTL_TEST.");
DbgPrint("int received : %i", pBuf);
break;
}
User-space App:
int test = 123;
int outputBuffer;
DeviceIoControl(hDevice, IOCTL_SET_PROCESS, &test, sizeof(test), &outputBuffer, sizeof(outputBuffer), &dwBytesRead, NULL);
std::cout << "Output reads as : " << outputBuffer << std::endl;
The user-space application prints out the correct value received back through the output buffer, but in debug view, the value printed out seems to be garbage (ie: "int received : 169642096")
What am I doing wrong?
As said by the previous user, you are printing the address of the variable, not the content.
I strongly suggest you to take a look to the following Driver Development tutorials:
http://www.opferman.com/Tutorials/

How do I perform a 'simplified experiment' with Nvidia's Performance Toolkit?

I am trying to use Nvidia's performance toolkit to identify the performance bottleneck in an OpenGL application. Based on the user guide and the samples provided, I have arrived at this code:
// ********************************************************
// Set up NVPMAPI
#define NVPM_INITGUID
#include "NvPmApi.Manager.h"
// Simple singleton implementation for grabbing the NvPmApi
static NvPmApiManager S_NVPMManager;
NvPmApiManager *GetNvPmApiManager() { return &S_NVPMManager; }
const NvPmApi* getNvPmApi() { return S_NVPMManager.Api(); }
void MyApp::profiledRender()
{
NVPMRESULT nvResult;
nvResult = GetNvPmApiManager()->Construct(L"C:\\Program Files\\PerfKit_4.1.0.14260\\bin\\win7_x64\\NvPmApi.Core.dll");
if (nvResult != S_OK)
{
return; // This is an error condition
}
auto api = getNvPmApi();
nvResult = api->Init();
if ((nvResult) != NVPM_OK)
{
return; // This is an error condition
}
NVPMContext context;
nvResult = api->CreateContextFromOGLContext((uint64_t)::wglGetCurrentContext(), &context);
if (nvResult != NVPM_OK)
{
return; // This is an error condition
}
api->AddCounterByName(context, "GPU Bottleneck");
NVPMUINT nCount(1);
api->BeginExperiment(context, &nCount);
for (NVPMUINT i = 0; i < nCount; i++) {
api->BeginPass(context, i);
render();
glFinish();
api->EndPass(context, i);
}
api->EndExperiment(context);
NVPMUINT64 bottleneckUnitId(42424242);
NVPMUINT64 bottleneckCycles(42424242);
api->GetCounterValueByName(context, "GPU Bottleneck", 0, &bottleneckUnitId, &bottleneckCycles);
char name[256] = { 0 };
NVPMUINT length = 0;
api->GetCounterName(bottleneckUnitId, name, &length);
NVPMUINT64 counterValue(42424242), counterCycles(42424242);
api->GetCounterValue(context, bottleneckUnitId, 0, &counterValue, &counterCycles);
std::cout << "--- NVIDIA Performance Kit GPU profile ---\n"
"bottleneckUnitId: " << bottleneckUnitId
<< ", bottleneckCycles: " << bottleneckCycles
<< ", unit name: " << name
<< ", unit value: " << counterValue
<< ", unit cycles: " << counterCycles
<< std::endl;
}
However, the printed output shows that all of my integer values have been left unmodified:
--- NVIDIA Performance Kit GPU profile ---
bottleneckUnitId: 42424242, bottleneckCycles: 42424242, unit name: , unit value:
42424242, unit cycles: 42424242
I am in a valid GL context when calling profiledRender and while the cast in api->CreateContextFromOGLContext((uint64_t)::wglGetCurrentContext(), &context); looks a tiny bit dodgy it does return an OK result (whereas passing 0 for the context will return a not-OK result and putting in a random number will cause an access violation).
This is built against Cinder 0.8.6 running in x64 on Windows 8.1. Open GL 4.4, GeForce GT 750M.
Ok some more persistent analysis of the API return codes and further examination of the manual revealed the problems.
The render call needs to be wrapped in api->BeginObject(context, 0); and api->EndObject(context, 0);. That gives us a bottleneckUnitId.
It appears that the length pointer passed to GetCounterName both indicates the char array size as an input and is written to with the string length as output. This is kind of obvious on reflection but is a mistake copied from the user guide example. This gives us the name of the bottleneck.