C++ PBKDF2 Issue - c++

I have the following function:
void PBKDF2_HMAC_SHA_512_string(const char* pass, const char* salt, int32_t iterations, uint32_t HashLength, char* out) {
unsigned int i;
HashLength = HashLength / 2;
unsigned char* digest = new unsigned char[HashLength];
PKCS5_PBKDF2_HMAC(pass, strlen(pass), (const unsigned char*)salt, strlen(salt), iterations, EVP_sha512(), HashLength, digest);
for (i = 0; i < sizeof(digest); i++) {
sprintf(out + (i * 2), "%02x", 255 & digest[i]);
}
}
When I call the function like below, I expect to get a hash back of 2400 in length, however it returns me 16:
char PBKDF2Hash[1025]; //\0 terminating space?
memset(PBKDF2Hash, 0, sizeof(PBKDF2Hash));
PBKDF2_HMAC_SHA_512_string("Password", "0123456789123456", 3500, 1024, PBKDF2Hash);
//PBKDF2Hash is now always 16 long -> strlen(PBKDF2Hash),
//while I expect it to be 2400 long?
//How is this possible and why is this happening?
//I can't figure it out

Since digest is a pointer, sizeof(digest) will not give the length of the array. Depending on different platforms, sizeof(digest) may give you 4 or 8, which is not what you want. Maybe you should use for (i = 0; i < HashLength; i++).
Another unrelated issue of your code is that, digest is not deleted in PBKDF2_HMAC_SHA_512_string, which causes memory leak

Related

OpenCL result changes with arbitrary code alterations that are not related

This is a very strange issue. I'm working on an GPU based crypto miner and I have an issue with a SHA hash function.
1 - The initial function calls a SHA256 routine and then prints the results. I'm comparing those results to a CPU based SHA256 to make sure I get the same thing.
2 - Later on in the function, there are other operations that occur, such as adding, XOR and additional SHA rounds.
As part of the miner kernel, I wrote an auxiliary function to decompose an array of 8 uints into an array of 32 unsigned char, using AND mask and bit shift.
I'm calling the kernel with global/local work unit of 1.
So, here's where things get really strange. The part I am comparing is the very first SHA. I get a buffer of 80 bytes in, SHA it and then print the result. It matches under certain conditions. However, if I make changes to the code that is executing AFTER that SHA executes, then it doesnt match. This is what I've been able to narrow down:
1 - If I put a printf debug in the decomposition auxiliary function, the results match. Just removing that printf causes it to mismatch.
2 - There are 4 operations I use to decompose the uint into char. I tried lots of different ways to do this with the same result. However, if I remove any 1 of the 4 "for" loops in the routine, it matches. Simply removing a for loop in code that gets executed -after- the initial code, changes the result of the initial SHA.
3 - If I change my while loop to never execute then it matches. Again, this is all -after- the initial SHA comparison.
4 - If I remove all the calls to the auxiliary function, then it matches. Simply calling the function after the initial SHA causes a mismatch.
I've tried adding memory guards everywhere, however being that its 1 global and 1 local work unit, I don't see how that could apply.
I'd love to debug this, but apparently openCL cannot be debugged in VS 2019 (really?)
Any thoughts, guesses, insight would be appreciated.
Thanks!
inline void loadUintHash ( __global unsigned char* dest, __global uint* src) {
//**********if I remove this it doesn't work
printf ("src1 %08x%08x%08x%08x%08x%08x%08x%08x",
src[0],
src[1],
src[2],
src[3],
src[4],
src[5],
src[6],
src[7]
);
//**********if I take away any one of these 4 for loops, then it works
for ( int i = 0; i < 8; i++)
dest[i*4+3] = (src[i] & 0xFF000000) >> 24;
for ( int i = 0; i < 8; i++)
dest[i*4+2] = (src[i] & 0x00FF0000) >> 16;
for ( int i = 0; i < 8; i++)
dest[i*4+1] = (src[i] & 0x0000FF00) >> 8;
for ( int i = 0; i < 8; i++)
dest[i*4] = (src[i] & 0x000000FF);
//**********if I remove this it doesn't work
printf ("src2 %08x%08x%08x%08x%08x%08x%08x%08x",
src[0],
src[1],
src[2],
src[3],
src[4],
src[5],
src[6],
src[7]
);
}
#define HASHOP_ADD 0
#define HASHOP_XOR 1
#define HASHOP_SHA_SINGLE 2
#define HASHOP_SHA_LOOP 3
#define HASHOP_MEMGEN 4
#define HASHOP_MEMADD 5
#define HASHOP_MEMXOR 6
#define HASHOP_MEM_SELECT 7
#define HASHOP_END 8
__kernel void dyn_hash (__global uint* byteCode, __global uint* memGenBuffer, int memGenSize, __global uint* hashResult, __global char* foundFlag, __global unsigned char* header, __global unsigned char* shaScratch) {
int computeUnitID = get_global_id(0);
__global uint* myMemGen = &memGenBuffer[computeUnitID * memGenSize * 8]; //each memGen unit is 256 bits, or 8 bytes
__global uint* myHashResult = &hashResult[computeUnitID * 8];
__global char* myFoundFlag = foundFlag + computeUnitID;
__global unsigned char* myHeader = header + (computeUnitID * 80);
__global unsigned char* myScratch = shaScratch + (computeUnitID * 32);
sha256 ( computeUnitID, 80, myHeader, myHashResult );
//**********this is the result I am comparing
if (computeUnitID == 0) {
printf ("gpu first sha uint %08x%08x%08x%08x%08x%08x%08x%08x",
myHashResult[0],
myHashResult[1],
myHashResult[2],
myHashResult[3],
myHashResult[4],
myHashResult[5],
myHashResult[6],
myHashResult[7]
);
}
uint linePtr = 0;
uint done = 0;
uint currentMemSize = 0;
uint instruction = 0;
//**********if I change this to done == 1, then it works
while (done == 0) {
if (byteCode[linePtr] == HASHOP_ADD) {
linePtr++;
uint arg1[8];
for ( int i = 0; i < 8; i++)
arg1[i] = byteCode[linePtr+i];
linePtr += 8;
}
else if (byteCode[linePtr] == HASHOP_XOR) {
linePtr++;
uint arg1[8];
for ( int i = 0; i < 8; i++)
arg1[i] = byteCode[linePtr+i];
linePtr += 8;
}
else if (byteCode[linePtr] == HASHOP_SHA_SINGLE) {
linePtr++;
}
else if (byteCode[linePtr] == HASHOP_SHA_LOOP) {
printf ("HASHOP_SHA_LOOP");
linePtr++;
uint loopCount = byteCode[linePtr];
for ( int i = 0; i < loopCount; i++) {
loadUintHash(myScratch, myHashResult);
sha256 ( computeUnitID, 32, myScratch, myHashResult );
if (computeUnitID == 1) {
loadUintHash(myScratch, myHashResult);
... more irrelevant code...
This is how the kernel is being called:
size_t globalWorkSize = 1;// computeUnits;
size_t localWorkSize = 1;
returnVal = clEnqueueNDRangeKernel(command_queue, kernel, 1, NULL, &globalWorkSize, &localWorkSize, 0, NULL, NULL);
The issue ended up being multiple things. 1 - The CPU SHA had a bug in it that was causing an incorrect result in some cases. 2 - There was a very strange syntax error which seems to have broken the compiler in a weird way:
void otherproc () {
...do stuff...
}
if (something) {/
...other code
}
That forward slash after the opening curly brace was messing up "otherproc" in a weird way, and the compiler did not throw an error. After staring at the code line by line I found that slash, removed it, and everything started working.
If anyone is interested, the working implementation of a GPU miner can be found here:
https://github.com/dynamofoundation/dyn_miner

Assigning Values to unsigned short* buffer

I am having issues assigning value to unsigned short* and unsigned char* buffers. The code looks something like this:
// Header File
unsigned short* short_buff;
unsigned char* char_buff;
// Implementation File
short_buff = new unsigned short[10];
memset(&short_buff, 0, 10);
char_buff = new unsigned char[10];
memset(&char_buff, 0, 10);
unsigned short sVal = 0;
unsigned char cVal = 0x0;
// All of these cause core dumps
short_buff[0] = sVal;
memcpy(&short_buff[0], &sVal, 2); // 2 bytes per unsigned short
std::cout << short_buff[0] << std::endl;
// All of these also cause core dumps
char_buff[0] = cVal;
memcpy(&char_buff[0], &cVal, 1); // 1 byte per unsigned char
std::cout << char_buff[0] << std::endl;
// Yet strangely these cause no issues
unsigned short s2Val = short_buff[0];
unsigned char c2Val = char_buff[0];
I am completely at a loss as to what is going on here and why.. Any help would be greatly appreciated!
memset(short_buff, 0, 10*sizeof(short));
and
memset(char_buff, 0, 10*sizeof(char));
Two mistakes, & is wrong, you should pass the value of the pointer to memset not the address of the pointer variable. (This version memset(&short_buffer[0], ...); also works).
Secondly memset counts bytes not elements, so you need to multiply the array size by the element size, use sizeof for that.
Strangely you got it more or less right with memcpy later on. Why not the same thing for memset?

Convert char buffer to struct

I have a char buffer buf containing buf[0] = 10, buf[1] = 3, buf[2] = 3, buf[3] = 0, buf[4] = 58,
and a structure:
typedef struct
{
char type;
int version;
int length;
}Header;
I wanted to convert the buf into a Header. Now I am using the function
int getByte( unsigned char* buf)
{
int number = buf[0];
return number;
}
int getInt(unsigned char* buf)
{
int number = (buf[0]<<8)+buf[1];
return number;
}
main()
{
Header *head = new Header;
int location = 0;
head->type = getByte(&buf[location]);
location++; // location = 1
head->version = getInt(&buf[location]);
location += 2; // location = 3
head->ength = getInt(&buf[location]);
location += 2; // location = 5
}
I am searching for a solution such as
Header *head = new Header;
memcpy(head, buf, sizeof(head));
In this, first value in the Header, head->type is proper and rest is garbage. Is it possible to convert unsigned char* buf to Header?
The only full portable and secure way is:
void convertToHeader(unsigned char const * const buffer, Header *header)
{
header->type = buffer[0];
header->version = (buffer[1] << 8) | buffer[2];
header->length = (buffer[3] << 8) | buffer[4];
}
and
void convertFromHeader(Header const * const header, unsigned char * buffer)
{
buffer[0] = header->type;
buffer[1] = (static_cast<unsigned int>(header->version) >> 8) & 0xFF;
buffer[2] = header->version & 0xFF;
buffer[3] = (static_cast<unsigned int>(header->length) >> 8) & 0xFF;
buffer[4] = header->length & 0xFF;
}
Example
see Converting bytes array to integer for explanations
EDIT
A quick summary of previous link: other possible solutions (memcpy or union for example) are no portable according endianess of different system (doing what you do is probably for a sort of communication between at least two heterogeneous systems) => some of systems byte[0] is LSB of int and byte[1] is MSB and on other is the inverse.
Also, due to alignement, struct Header can be bigger than 5 bytes (probably 6 bytes in your case, if alignement is 2 bytes!) (see here for example)
Finally, according alignment restrictions and aliasing rules on some platform, compiler can generate incorrect code.
What you want would need your version and length to have the same length as 2 elements of your buf array; that is you'd need to use the type uint16_t, defined in <cstdint>, rather than int which is likely longer. And also you'd need to make buf an array of uint8_t, as char is allowed to take more than 1 byte!
You probably also need to move type to the end; as otherwise the compiler will almost certainly insert a padding byte after it to be able to align version to a 2-byte boundary (once you have made it uint16_t and thus 2 bytes); and then your buf[1] would end up there rather than were you want it.
This is probably what you observe right now, by the way: by having a char followed by an int, which is probably 4 bytes, you have 3 bytes of padding, and the elements 1 to 3 of your array are being inserted there (=lost forever).
Another solution would be to modify your buf array to be longer and have empty padding bytes as well, so that the data will be actually aligned with the struct fields.
Worth mentioning again is that, as pointed out in the comments, sizeof(head) returns the size of pointers on your system, not of the Header structure. You can directly write sizeof(Header); but at this level of micromanagement, you wont be losing any more flexibility if you just write "5", really.
Also, endianness can screw with you. Processors have no obbligation to store the bytes of a number in the order you expect rather than the opposite one; both make internal sense after all. This means that blindly copying bytes buf[0], buf[1] into a number can result in (buf[0]<<8)+buf[1], but also in (buf[1]<<8)+buf[0], or even in (buf[1]<<24)+(buf[0]<<16) if the data type is 4 bytes (as int usually is). And even if it works on your computer now, there is at least one out there where the same code will result in garbage. Unless, that is, those bytes actually come from reinterpreting a number in the first place. In which case the code is wrong (not portable) now, however.
...is it worth it?
All things considered, my advice is strongly to keep the way you handle them now. Maybe simplify it.
It really makes no sense to convert a byte to an int then to byte again, or to take the address of a byte to dereference it again, nor there is need of helper variables with no descriptive name and no purpose other than being returned, or of a variable whose value you know in advance at all time.
Just do
int getTwoBytes(unsigned char* buf)
{
return (buf[0]<<8)+buf[1];
}
main()
{
Header *head = new Header;
head->type = buf[0];
head->version = getTwoBytes(buf + 1);
head->length = getTwoBytes(buf + 3);
}
the better way is to create some sort of serialization/deserialization routines.
also, I'd use not just int or char types, but would use more specific int32_t etc. it's just platform-independent way (well, actually you can also pack your data structures with pragma pack).
struct Header
{
char16_t type;
int32_t version;
int32_t length;
};
struct Tools
{
std::shared_ptr<Header> deserializeHeader(const std::vector<unsigned char> &loadedBuffer)
{
std::shared_ptr<Header> header(new Header);
memcpy(&(*header), &loadedBuffer[0], sizeof(Header));
return header;
}
std::vector<unsigned char> serializeHeader(const Header &header)
{
std::vector<unsigned char> buffer;
buffer.resize(sizeof(Header));
memcpy(&buffer[0], &header, sizeof(Header));
return buffer;
}
}
tools;
Header header = {'B', 5834, 4665};
auto v1 = tools.serializeHeader(header);
auto v2 = tools.deserializeHeader(v1);

convert 512-bits in a 256 Hexadecimal table

I have a buffer unsigned char table[512] that I want to convert faster into a table of short int table[256] where every position is compound by to bytes of the table.
I have a camera that give me this buffer that is the table to convert the disparity to the real depth.
unsigned char zDtable[512] = {0};
unsigned short int zDTableHexa[256]={0};
.. get the buffer data.....
for (int i = 0; i < 256; ++i) {
zDTableHexa[i]=zDtable[i*2]<<8 + zDtable[i*2+1];
}
these 2 has problem in converting well the values, the bytes are inversed:
memcpy(zDTableHexa_ptr,zDtable,256*sizeof( unsigned short int));
unsigned short* zDTableHexa = (unsigned short*)zDtable;
Try something like this
short* zDTableHexa = (short*)zDtable;
It simply maps the memory space of char array to an array of shorts. So if the memory looks like this:
(char0),(char1),(char2),(char3)
then it will be reinterpreted to be
(short0 = char0,char1),(short1 = char2,char3)
Beware that such direct reinterpretation depends on endianness and formally allows a sufficiently pedantic compiler to do ungood things, i.e., it's system- and compiler-specific.

char to int conversion in host device with CUDA

I have been having trouble converting from a single character to an integer while in the host function of my CUDA program. After the line -
token[j] = token[j] * 10 + (buf[i] - '0' );
I use cuda-gdb check the value for token[j], and I always get different numbers that do not seem to have a pattern. I have also tried simple casting, not multiplying by ten (which I saw in another thread), not subtracting '0', and I always seem to get a different result. Any help would be appreciated. This is my first time posting on stack overflow, so give me a break if my formatting is awful.
-A fellow struggling coder
__global__ void rread(unsigned int *table, char *buf, int *threadbytes, unsigned int *token) {
int i = 0;
int j = 0;
*token = NULL;
int tid = threadIdx.x;
unsigned int key;
char delim = ' ';
for(i = tid * *threadbytes; i <(tid * *threadbytes) + *threadbytes ; i++)
{
if (buf[i] != delim) { //check if its not a delim
token[j] = token[j] * 10 + (buf[i] - '0' );
There's a race condition on writing to token.
If you want to have a local array per block you can use shared memory. If you want a local array per thread, you will need to use local per-thread memory and declare the array on the stack. In the first case you will have to deal with concurrency inside the block as well. In the latter you don't have to, although you might potentially waste a lot more memory (and reduce collaboration).