how to convert float to int preserving bit value - c++

I have a float4 coming into a compute shader, 3 of these floats are really floats but the fourth is 2 uints shifted together, how would i convert the float to uint by preserving the bit sequence instead of the numeric value?
on the c++ side i solved it by creating a uint pointer, filling it with the desired number and passing on the pointer as a float pointer instead. However in hlsl as similar as it is to c/c++ there are no pointers so im stuck here :|

In HLSL you should be able to do the following (assuming the value you are after is in f4.w)
uint ui = asuint( f4.w );
uint ui1 = ui & 0xffff;
uint ui2 = ui >> 16;
Basically it looks like the asuint intrinsic is your friend :)

You could use a union.
float f; // you float value is here
union X
{
float f;
short int a[2];
} x;
x.f = f;
int i1 = x.a[0]; // these are your ints
int i2 = x.a[1];

Related

Struct & Function - Does not name a type

I'm trying to convert a Hex Colour String into a RGB value using a function and struct, to then return the data
I've managed to do most of the work, but I'm struggling a bit to understand how exactly my Struct and Function should work together.
Here is my code that returns the error RGB does not name a type
//Define my Struct
struct RGB {
byte r;
byte g;
byte b;
};
//Create my function to return my Struct
RGB getRGB(String hexValue) {
char newVarOne[40];
hexValue.toCharArray(newVarOne, sizeof(newVarOne)-1);
long number = (long) strtol(newVarOne,NULL,16);
int r = number >> 16;
int g = number >> 8 & 0xFF;
int b = number & 0xFF;
RGB value = {r,g,b}
return value;
}
//Function to call getRGB and return the RGB colour values
void solid(String varOne) {
RGB theseColours;
theseColours = getRGB(varOne);
fill_solid(leds, NUM_LEDS, CRGB(theseColours.r,theseColours.g,theseColours.b));
FastLED.show();
}
The line that it is erroring about is:
RGB getRGB(String hexValue) {
Could someone explain what I have done wrong and how to resolve it please?
If you are using a C compiler (as opposed to C++) you either have to typedef your struct or use the struct keyword wherever you use the type.
So it's either:
typedef struct RGB {
byte r;
byte g;
byte b;
} RGB;
and then:
RGB theseColours;
or
struct RGB {
byte r;
byte g;
byte b;
};
and then:
struct RGB theseColours;
However, if you are using a C++ compiler, then it may help if you tell us on what line the error occurs.

Change OpenCL function to C++

I am trying to write a code in C++, but after some search on the internet, I found one OpenCL based code is doing exactly the same thing as I want to do in C++. But since this is the first time I see a OpenCL code, I don't know how to change the following functions into c++:
const __global float4 *in_buf;
int x = get_global_id(0);
int y = get_global_id(1);
float result = y * get_global_size(0);
Is 'const __global float4 *in_buf' equivalent to 'const float *in_buf' in c++? And how to change the above other functions? Could anyone help? Thanks.
In general, you should take a look at the OpenCL specification (I'm assuming it's written in OpenCL 1.x) to better understand functions, types and how a kernel works.
Specifically for your question:
get_global_id returns the id of the current work item, and get_global_size returns the total number of work items. Since an OpenCL work-item is roughly equivalent to a single iteration in a sequential language, the equivalent of OpenCL's:
int x = get_global_id(0);
int y = get_global_id(1);
// do something with x and y
float result = y * get_global_size(0);
Will be C's:
for (int x = 0; x < dim0; x++) {
for (int y = 0; y < dim1; y++) {
// do something with x and y
float result = y * dim0;
}
}
As for float4 it's a vector type of 4 floats, roughly equivalent to C's float[4] (except that it supports many additional operators, such as vector arithmetic). Of course in this case it's a buffer, so an appropriate type would be float** or float[4]* - or better yet, just pack them together into a float* buffer and then load 4 at a time.
Feel free to ignore the __global modifier.
const __global float4 *in_buf is not equivalent to const float *in_buf.
The OpenCL uses vector variables, e.g. floatN, where N is e.g. 2,4,8. So float4 is in fact struct { float w, float x, float y, float z} with lot of tricks available to express vector operations.
get_global_id(0) gives you the iterator variable, so essentially replace every get_global_id(dim) with for(int x = 0; x< max[dim]; x++)

Split Multiplication of integers

I need an algorithm that uses two 32-bit integers as parameters, and returns the multiplication of these parameters split into two other 32-bit integers: 32-highest-bits part and 32-lowest-bits part.
I would try:
uint32_t p1, p2; // globals to hold the result
void mult(uint32_t x, uint32_t y){
uint64_t r = (x * y);
p1 = r >> 32;
p2 = r & 0xFFFFFFFF;
}
Although it works1, it's not guaranteed the existence of 64-bit integers in the machine, neither is the use of them by the compiler.
So, how is the best way to solve it?
Note1: Actually, it didn't work because my compiler does not support 64-bit integers.
Obs: Please, avoid using boost.
Just use 16 bits digits.
void multiply(uint32_t a, uint32_t b, uint32_t* h, uint32_t* l) {
uint32_t const base = 0x10000;
uint32_t al = a%base, ah = a/base, bl = b%base, bh = b/base;
*l = al*bl;
*h = ah*bh;
uint32_t rlh = *l/base + al*bh;
*h += rlh/base;
rlh = rlh%base + ah*bl;
*h += rlh/base;
*l = (rlh%base)*base + *l%base;
}
As I commented, you can treat each number as a binary string of length 32.
Just multiply these numbers using school arithmetic. You will get a 64 character long string.
Then just partition it.
If you want fast multiplication, then you can look into Karatsuba multiplication algorithm.
This is the explanation and an implementation of the Karatsubas-Algorithm.
I have downloaded the code and ran it several times. It seems that it's doing well. You can modify the code according to your need.
If the unsigned long type are supported, this should work:
void umult32(uint32 a, uint32 b, uint32* c, uint32* d)
{
unsigned long long x = ((unsigned long long)a)* ((unsigned long long)b); //Thanks to #Толя
*c = x&0xffffffff;
*d = (x >> 32) & 0xffffffff;
}
Logic borrowed from here.

How can you convert a std::bitset<64> to a double?

Is there a way to convert a std::bitset<64> to a double without using any external library (Boost, etc.)? I am using a bitset to represent a genome in a genetic algorithm and I need a way to convert a set of bits to a double.
The C++11 road:
union Converter { uint64_t i; double d; };
double convert(std::bitset<64> const& bs) {
Converter c;
c.i = bs.to_ullong();
return c.d;
}
EDIT: As noted in the comments, we can use char* aliasing as it is unspecified instead of being undefined.
double convert(std::bitset<64> const& bs) {
static_assert(sizeof(uint64_t) == sizeof(double), "Cannot use this!");
uint64_t const u = bs.to_ullong();
double d;
// Aliases to `char*` are explicitly allowed in the Standard (and only them)
char const* cu = reinterpret_cast<char const*>(&u);
char* cd = reinterpret_cast<char*>(&d);
// Copy the bitwise representation from u to d
memcpy(cd, cu, sizeof(u));
return d;
}
C++11 is still required for to_ullong.
Most people are trying to provide answers that let you treat the bit-vector as though it directly contained an encoded int or double.
I would advise you completely avoid that approach. While it does "work" for some definition of working, it introduces hamming cliffs all over the place. You usually want your encoding to arrange things so that if two decoded values are near to one another, then their encoded values are near to one another as well. It also forces you to use 64-bits of precision.
I would manage the conversion manually. Say you have three variables to encode, x, y, and z. Your domain expertise can be used to say, for example, that -5 <= x < 5, 0 <= y < 100, and 0 <= z < 1, where you need 8 bits of precision for x, 12 bits for y, and 10 bits for z. This gives you a total search space of only 30 bits. You can have a 30 bit string, treat the first 8 as encoding x, the next 12 as y, and the last 10 as z. You are also free to gray code each one to remove the hamming cliffs.
I've personally done the following in the past:
inline void binary_encoding::encode(const vector<double>& params)
{
unsigned int start=0;
for(unsigned int param=0; param<params.size(); ++param) {
// m_bpp[i] = number of bits in encoding of parameter i
unsigned int num_bits = m_bpp[param];
// map the double onto the appropriate integer range
// m_range[i] is a pair of (min, max) values for ith parameter
pair<double,double> prange=m_range[param];
double range=prange.second-prange.first;
double max_bit_val=pow(2.0,static_cast<double>(num_bits))-1;
int int_val=static_cast<int>((params[param]-prange.first)*max_bit_val/range+0.5);
// convert the integer to binary
vector<int> result(m_bpp[param]);
for(unsigned int b=0; b<num_bits; ++b) {
result[b]=int_val%2;
int_val/=2;
}
if(m_gray) {
for(unsigned int b=0; b<num_bits-1; ++b) {
result[b]=!(result[b]==result[b+1]);
}
}
// insert the bits into the correct spot in the encoding
copy(result.begin(),result.end(),m_genotype.begin()+start);
start+=num_bits;
}
}
inline void binary_encoding::decode()
{
unsigned int start = 0;
// for each parameter
for(unsigned int param=0; param<m_bpp.size(); param++) {
unsigned int num_bits = m_bpp[param];
unsigned int intval = 0;
if(m_gray) {
// convert from gray to binary
vector<int> binary(num_bits);
binary[num_bits-1] = m_genotype[start+num_bits-1];
intval = binary[num_bits-1];
for(int i=num_bits-2; i>=0; i--) {
binary[i] = !(binary[i+1] == m_genotype[start+i]);
intval += intval + binary[i];
}
}
else {
// convert from binary encoding to integer
for(int i=num_bits-1; i>=0; i--) {
intval += intval + m_genotype[start+i];
}
}
// convert from integer to double in the appropriate range
pair<double,double> prange = m_range[param];
double range = prange.second - prange.first;
double m = range / (pow(2.0,double(num_bits)) - 1.0);
// m_phenotype is a vector<double> containing all the decoded parameters
m_phenotype[param] = m * double(intval) + prange.first;
start += num_bits;
}
}
Note that for reasons that probably don't matter to you, I wasn't using bit vectors -- just ordinary vector<int> to encoding things. And of course, there's a bunch of stuff tied into this code that isn't shown here, but you can probably get the basic idea.
One other note, if you're doing GPU calculations or if you have a particular problem such that 64 bits are the appropriate size anyway, it may be worth the extra overhead to stuff everything into native words. Otherwise, I would guess that the overhead you add to the search process will probably overwhelm whatever benefits you get by faster encoding and decoding.
Edit:: I've decided that I was being a bit silly with this. While you do end up with a double it assumes that the bitset holds an integer... which is a big assumption to make. You will end up with a predictable and repeatable value per bitset but still I don't think that this is what the author intended.
Well if you iterate over the bit values and do
output_double += pow( 2, 64-(bit_position+1) ) * bit_value;
That would work. As long as it is big-endian

C++ Multidimensional array

I have a 3D array
double values[30][30][30];
I have a loop where I assign values to the array;
Something like:
for(int z = 0;z<30; z++)
for (int y = 0;y<30; y++)
for (int x = 0;x<30; x++)
values[z][y][x] = intensity;
end
So this is how I am filling the array. The problem is that I want to create column in addition to intensity to store another variable. For instance, the second to last line should be something like
values[z][y][x] = intensity | distance;
I hope you get the idea. My knowledge is limited and I couldn't come up with a solution. Thanks for your suggestions.
This is really dependant on your datatypes. The easiest solution is using a struct:
struct data {
float intensity; // or replace 'float' with whatever datatype you need
float distance;
};
Use this struct instead of the datatype you're using now for the array, then later on set the values:
values[z][y][x].intensity = intensity;
values[z][y][x].distance = distance;
If you're using small values only (e.g. char for each value only) you could as well use bitwise operators to store everything in an integer:
values[z][y][x] = intensity << 8 | distance;
intensity = values[z][y][x] >> 8;
distance = values[z][y][x] & 255;
But I wouldn't advise you to do so unless you're really savy with that value ranges (e.g. for saving bitmap/texture stuff).