OpenCL copy character from global to local memory - c++

I'm implementing sha512 in OpenCL technology. I have simple definition of kernel function
__kernel void _sha512(__global char *message, const uint length, __global char *hash);
On host I have implemented and successfully tested implementation of sha512 algorithm.
I have a problem with copy data from message array to temporary variable called character.
char character = message[i];
Where i is a loop variable - in range from 0 to message's size.
When I tried to run my program there I got this errors
0x00007FFD9FA03D54 (0x0000000010CD0F88 0x0000000010CD0F88 0x0000000010BAEE88 0x000000001A2942A0), nvvmCompilerProperty() + 0x26174 bytes(s)
...
0x00007FFDDFA70D51 (0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000), RtlUserThreadStart() + 0x21 bytes(s)
0x00007FFDDFA70D51 (0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000), RtlUserThreadStart() + 0x21 bytes(s)
I readed about async_work_group_copy() but I can't understand how to use it - in docs I can't found any example code.
I have tried with char character = (__private char) message[i]; but it's not working too.
I don't understand how to pass last parameter into async_work_group_copy() and how to use it to copy data from __global memory into __private memory.

OpenCL by default does not allow single-byte access in kernels: memory access needs to be in multiples of 4 bytes, aligned to 4-byte boundaries. If your implementation supports it, you can enable byte-wise memory accesses. This involves the cl_khr_byte_addressable_store extension, which you need to check for and explicitly enable in your kernel source. Give that a try and see if it solves your problem.
To use async_work_group_copy, try something like this:
#define LOCAL_MESSAGE_SIZE 64 // or some other suitable size for your workgroup
__local char local_message[LOCAL_MESSAGE_SIZE];
event_t local_message_ready = async_work_group_copy(local_message, message, LOCAL_MESSAGE_SIZE, 0);
// ...
// Just before you need to use local_message's content:
wait_group_events(1, &local_message_ready);
// Use local_message from here onwards
Note that async_work_group_copy is not required; you can access global memory directly. Which will be faster depends on your kernel, OpenCL implementation, and hardware.
Another option (the only option if your implementation/hardware do not support cl_khr_byte_addressable_store) is to fetch your data in chunks of at least 4 bytes. Declare your message as a __global uint* and unpack the bytes by shifting and masking:
uint word = message[i];
char byte0 = (word & 0xff);
char byte1 = ((word >> 8) & 0xff);
char byte2 = ((word >> 16) & 0xff);
char byte3 = ((word >> 24) & 0xff);
// use byte0..byte3 in your algorithm
Depending on implementation, hardware, etc. you may find this to be faster than bytewise access. (You may want to check if you need to reverse the unpacking by reading the CL_DEVICE_ENDIAN_LITTLE property using clGetDeviceInfo if you're not sure if all your deployment platforms will be little-endian.)

Related

How to work with 8-bit char data in HLSL?

I'm converting some OpenCL code to DirectCompute and need to process 8-bit character strings in a compute shader but don't find an HLSL data type for "byte" or "char". OpenCL supports a "char" type, so I was expecting an equivalent. What is the best way to define and access the data?
It seems that the data can be passed by treating it as a series of "uint" types and unpacking it with bit-shifting, AND-ing, etc. but this seems like it will cause unnecessary overhead. What is the correct way?
I've found two ways to do this, although they both require working with int/uint values in the HLSL since I haven't found an 8-bit data type:
Option 1 is to let the "view" handle the translation:
Pass the original data as a byte/char buffer.
Set the Shader Resource View format (D3D11_SHADER_RESOURCE_VIEW_DESC.Format) to DXGI_FORMAT_R8_UINT
Define the HLSL data type as Buffer<uint>
Reference each byte using its byte offset (i.e., treat it as a buffer of bytes not a buffer of uints). Each character is
automatically promoted to a uint value.
Option 2 is to treat each 4-byte sequence as a uint, using the format DXGI_FORMAT_R32_UINT, and manually extract each character using something like this:
Buffer<uint> buffer;
uint offset = ...;
uint ch1, ch2, ch3, ch4;
ch1 = buffer[offset] >> 24;
ch2 = (buffer[offset] & 0x00ff0000) >> 16;
ch3 = (buffer[offset] & 0x0000ff00) >> 8;
ch4 = (buffer[offset] & 0x000000ff);
Either way you end up working with 32-bit values but at least they correspond to individual characters.

htonl without using network related headers

We are writing an embedded application code and validating a string for a valid IPv4 format. I am successfully able to do so using string tokenizer but now I need to convert the integers to Host-To-Network order using htonl() function.
Since it an embedded application I cannot include network header and library just to make use of htonl() function.
Is there any way / non-network header in C++ by which I can avail htonl() functionality?
From htonl()'s man page:
The htonl() function converts the unsigned integer hostlong from host byte order to network byte order.
Network byte order is actually just big endian.
All you need to do is write (or find) a function that converts an unsigned integer to big endian and use it in place of htonl. If your system is already in big endian than you don't need to do anything at all.
You can use the following to determine the endianness of your system:
int n = 1;
// little endian if true
if(*(char *)&n == 1) {...}
Source
And you can convert a little endian uint32_t to big endian using the following:
uint32_t htonl(uint32_t x) {
unsigned char *s = (unsigned char *)&x;
return (uint32_t)(s[0] << 24 | s[1] << 16 | s[2] << 8 | s[3]);
}
Source
You don't strictly need htonl. If you have the IP address as individual bytes like this:
uint8_t a [4] = { 192, 168, 2, 1 };
You can just send these 4 bytes, in that exact order, over the network. That is unless you specifically need it as a 4 byte Integer, which you probably don't, since you presumably are not using sockaddr_in & friends.
If you already have the address as a 32 bit integer in host byte order, you can obtain a like this:
uint32_t ip = getIPHostOrder();
uint8_t a [4] = { (ip >> 24) & 0xFF, (ip >> 16) & 0xFF, (ip >> 8) & 0xFF, ip & 0xFF };
This has the advantage of not relying on implementation defined behaviour and being portable.

Problem converting endianness

I'm following this tutorial for using OpenAL in C++: http://enigma-dev.org/forums/index.php?topic=730.0
As you can see in the tutorial, they leave a few methods unimplemented, and I am having trouble implementing file_read_int32_le(char*, FILE*) and file_read_int16_le(char*, FILE*). Apparently what it should do is load 4 bytes from the file (or 2 in the case of int16 I guess..), convert it from little-endian to big endian and then return it as an unsigned integer. Here's the code:
static unsigned int file_read_int32_le(char* buffer, FILE* file) {
size_t bytesRead = fread(buffer, 1, 4, file);
printf("%x\n",(unsigned int)*buffer);
unsigned int* newBuffer = (unsigned int*)malloc(4);
*newBuffer = ((*buffer << 24) & 0xFF000000U) | ((*buffer << 8) & 0x00FF0000U) | ((*buffer >> 8) & 0x0000FF00U) | ((*buffer >> 24) & 0x000000FFU);
printf("%x\n", *newBuffer);
return (unsigned int)*newBuffer;
}
When debugging (in XCode) it says that the hexadecimal value of *buffer is 0x72, which is only one byte. When I create newBuffer using malloc(4), I get a 4-byte buffer (*newBuffer is something like 0xC0000003) which then, after the operations, becomes 0x72000000. I assume the result I'm looking for is 0x00000027 (edit: actually 0x00000072), but how would I achieve this? Is it something to do with converting between the char* buffer and the unsigned int* newBuffer?
Yes, *buffer will read in Xcode's debugger as 0x72, because buffer is a pointer to a char.
If the first four bytes in the memory block pointed to by buffer are (hex) 72 00 00 00, then the return value should be 0x00000072, not 0x00000027. The bytes should get swapped, but not the two "nybbles" that make up each byte.
This code leaks the memory you malloc'd, and you don't need to malloc here anyway.
Your byte-swapping is correct on a PowerPC or 68K Mac, but not on an Intel Mac or ARM-based iOS. On those platforms, you don't have to do any byte-swapping because they're natively little-endian.
Core Foundation provides a way to do this all much more easily:
static uint32_t file_read_int32_le(char* buffer, FILE* file) {
fread(buffer, 1, 4, file); // Get four bytes from the file
uint32_t val = *(uint32_t*)buffer; // Turn them into a 32-bit integer
// Swap on a big-endian Mac, do nothing on a little-endian Mac or iOS
return CFSwapInt32LittleToHost(val);
}
there's a whole range of functions called "htons/htonl/hton" whose sole purpose in life is to convert from "host" to "network" byte order.
http://beej.us/guide/bgnet/output/html/multipage/htonsman.html
Each function has a reciprocal that does the opposite.
Now, these functions won't help you necessarily because they intrinsically convert from your hosts specific byte order, so please just use this answer as a starting point to find what you need. Generally code should never make assumptions about what architecture it's on.
Intel == "Little Endian".
Network == "Big Endian".
Hope this starts you out on the right track.
I've used the following for integral types. On some platforms, it's not safe for non-integral types.
template <typename T> T byte_reverse(T in) {
T out;
char* in_c = reinterpret_cast<char *>(&in);
char* out_c = reinterpret_cast<char *>(&out);
std::reverse_copy(in_c, in_c+sizeof(T), out_c);
return out;
};
So, to put that in your file reader (why are you passing the buffer in, since it appears that it could be a temporary)
static unsigned int file_read_int32_le(FILE* file) {
unsigned int int_buffer;
size_t bytesRead = fread(&int_buffer, 1, sizeof(int_buffer), file);
/* Error or less than 4 bytes should be checked */
return byte_reverse(int_buffer);
}

convert memory address to int

I am trying to read memory addresses from an executable running in memory, and then use those memory addresses to walk the PE structure.
I am having trouble because I'm unsure how to convert a 4 byte char array to it's int equivalent.
Here is my code so far:
char buffer[4];
int e_lfanew = 60;
if(!ReadProcessMemory(pHandle, (me32.modBaseAddr + e_lfanew), buffer, 4, NULL))
{
printf("ReadProcessMemory # %x Failed (%d)\n", me32.modBaseAddr, GetLastError());
}
The address i'm reading in, in this case 0xE0000000, is the offset of the PE Header. I want to take the memory address I just read and use it as an offset to read from process memory again, but I cannot figure out how to convert it to an int properly.
Any help would be greatly appreciated.
buffer[0] |
(buffer[1] << 8) |
(buffer[2] << 16) |
(buffer[3] << 24)
or the other way around, depending on whether your high-order byte is buffer[0] or buffer[3]
int MemoryBufferToInt(char* buffer, int buffer_size) {
int result;
assert(buffer_size == sizeof(result));
memcpy(&result, &buffer[0], sizeof(result));
return result;
}
The code above assumes that this buffer was obtained from the process, so that the byte order of the memory buffer is the same as the byte order of a regular int on your platform. Otherwise, you can easily contruct the integer for a specific byte order if you know what the byte order of the buffer is.
NOTE that you could just use static_cast<char*>(&result) in place of your buffer as the parameter to the function that retrieves the buffer contents.

C/C++ read a byte from an hexinput from stdin

Can't exactly find a way on how to do the following in C/C++.
Input : hexdecimal values, for example: ffffffffff...
I've tried the following code in order to read the input :
uint16_t twoBytes;
scanf("%x",&twoBytes);
Thats works fine and all, but how do I split the 2bytes in 1bytes uint8_t values (or maybe even read the first byte only). Would like to read the first byte from the input, and store it in a byte matrix in a position of choosing.
uint8_t matrix[50][50]
Since I'm not very skilled in formating / reading from input in C/C++ (and have only used scanf so far) any other ideas on how to do this easily (and fast if it goes) is greatly appreciated .
Edit: Found even a better method by using the fread function as it lets one specify how many bytes it should read from the stream (stdin in this case) and save to a variable/array.
size_t fread ( void * ptr, size_t size, size_t count, FILE * stream );
Parameters
ptr - Pointer to a block of memory with a minimum size of (size*count) bytes.
size - Size in bytes of each element to be read.
count - Number of elements, each one with a size of size bytes.
stream - Pointer to a FILE object that specifies an input stream.
cplusplus ref
%x reads an unsigned int, not a uint16_t (thought they may be the same on your particular platform).
To read only one byte, try this:
uint32_t byteTmp;
scanf("%2x", &byteTmp);
uint8_t byte = byteTmp;
This reads an unsigned int, but stops after reading two characters (two hex characters equals eight bits, or one byte).
You should be able to split the variable like this:
uint8_t LowerByte=twoBytes & 256;
uint8_t HigherByte=twoBytes >> 8;
A couple of thoughts:
1) read it as characters and convert it manually - painful
2) If you know that there are a multiple of 4 hexits, you can just read in twobytes and then convert to one-byte values with high = twobytes << 8; low = twobyets & FF;
3) %2x