Convert one-hot encoding to plain binary

Convert one-hot encoding to plain binary - c++

This isn't a regular "binary to bcd" question, in fact, I'm not really sure what to call the thing I'm trying to do!
There is a single byte in an embedded device that stores the numbers 1 through 7 (for days of the week) in the following format:
00000001 = 1
00000010 = 2
00000100 = 3
00001000 = 4
00010000 = 5
00100000 = 6
01000000 = 7
I want to read this byte, and convert its contents (1 through 7) into BCD, but I'm not sure how to do this.
I know I could just brute-force it with a series of if statements:
if(byte == B00000001)
{
answer = 1;
}
else
if(byte == B00000010)
{
answer = 2;
}
and so on, but I think there could be a better way. This data is stored in a single register on a real time clock. I'm getting this byte by performing an I2C read, and I read it into a byte in my program. The datasheet for this real-time clock specifies that this particular register is formatted as I have outlined above.

You can use a lookup table...
/* this is only needed once, if lut is global or static */
unsigned char lut[65];
lut[1]=1;
lut[2]=2;
lut[4]=3;
lut[8]=4;
lut[16]=5;
lut[32]=6;
lut[64]=7;
...
...
...
/* Perform the conversion */
answer = lut[byte];
Or you can even use some math...
answer = 1 + log(byte)/log(2);

If this is being compiled on an ARM processor, you can simply do this:
result = 31 - __CLZ(number);
Assuming number is a 32-bit one-hot > 0.

You can make use of bitwise and modulo operations to do this efficiently without needing to create a large array
for (int answer = 1; (byte % 2) == 0; ++answer) {
byte >>= 1;
}
(I know this was an old question, I just wanted to share because this was a high Google result for me)

Related

Does a 64 bit packed structure contains a field set to specified value

I have an odd structure with 5 fields of bit length 12 and 4 boolean flags stored in the high bits. This all fits nicely into a 64 bit long, and as such they are stored as a 64 bit word array. What I want to do is search the array and find if any of the 12 bit fields are set to a given value.
I have tried the obvious solution of using bit shifts and masks, however this is a very hot function and needs to be optimized for speed. This led me to the this page containing a way to check for a byte in a word in very few operations. This makes me think it is possible to do something similar with the 12 bit fields, however I am struggling to find what constants I would replace the ones given on that page with.

I'm not very versed in low level languages, but I'm in the mood to fiddle with some bits so I thought I'd give it a try.
POC: JS can't do 64bit longs, but we can check if we can adapt the algorithm to deal with 2x12bit fields + 8boolean flags (noise) in an 32bit (u)int.
The noise because the original algorithm. Dealt with exactly 4 bytes and no further bits, but neither 32 nor 64 can be divided by 12 so we need to ensure that these additional bits don't interfere. Or worse, get matched.
function hasValue(x, n) { return hasZero(x ^ (0x001001 * n)); }
function hasZero(v) { return ((v - 0x001001) & ~(v) & 0x800800); }
function hex(v) { return "0x" + v.toString(16) }
// create a random value, 2x12bit fields plus 8 random flags.
var v = Math.floor(Math.random() * 0x100000000);
console.log("value", hex(v));
// get the two fields
var a = v & 0xFFF;
console.log("check", hex(a), !!hasValue(v, a));
var b = (v >> 12) & 0xFFF;
console.log("check", hex(b), !!hasValue(v, b));
// brute force.
// check if any other value is matched.
// these should only return the 2 values from above.
for (var i = 0; i < 0x1000; ++i) {
if (hasValue(v, i)) {
console.log("matched", hex(i));
}
}
extrapolating from this, your solution should be
#define hasValue(x,n) hasZero(x ^ (0x001001001001001 * n))
#define hasZero(v) ((v - 0x001001001001001) & ~(v) & 0x800800800800800)
where all values are unsigned longs. (sorry don't know if you somehow have to annotate any of these numbers)

C++ save and load huge vector<bool>

I have a huge vector<vector<bool>> (512x 44,000,000 bits). It takes me 4-5 hours to do the calculation for creating it and obviously I want to save results to spare me of repeating the process ever again. When I run the program again, all I want to do is load the same vector (no other app will use this file).
I believe text files are out of the question for such a great size. Is there a simple (quick and dirty) way to do this? I do not use Boost and this is only a minor part of my scientific app, so it must be something quick. I also thought of inversing it online and store it in a Postgres DB (44000000 records with a 512 bit data), so the DB can handle it easily. I have seen answers such take 8bits > 1byte and then save, but with my limited newbie C++ experience, they sound too complicated. Any ideas?

You can save 8 bits into a single byte:
unsigned char saver(bool bits[])
{
unsigned char output=0;
for(int i=0;i<8;i++)
{
output=output|(bits[i]<<i); //probably faster than if(){output|=(1<<i);}
//example: for the starting array 00000000
//first iteration sets: 00000001 only if bits[0] is true
//second sets: 0000001x only if bits[1] is true
//third sets: 000001xx only third is true
//fifth: 00000xxx if fifth is false
// x is the value before
}
return output;
}
You can load 8 bits from a single byte:
void loader(unsigned char var, bool * bits)
{
for(int i=0;i<8;i++)
{
bits[i] = var & (1 << i);
// for example you loaded var as "200" which is 11001000 in binary
// 11001000 --> zeroth iteration gets false
// first gets false
// second false
// third gets true
//...
}
}
1<<0 is 1 -----> 00000001
1<<1 is 2 -----> 00000010
1<<2 is 4 -----> 00000100
1<<3 is 8 -----> 00001000
1<<4 is 16 ----> 00010000
1<<5 is 32 ----> 00100000
1<<6 is 64 ----> 01000000
1<<7 is 128 ---> 10000000
Edit: Using gpgpu, an embarrassingly parallel algorithm taking 4-5 hours on cpu can be shortened to 0.04 - 0.05 hours on gpu(or even less than a minute with multiple gpus) For example, the upper "saver/loader" functions are embarrassingly parallel.

I have seen answers such take 8bits > 1byte and then save, but with my limited newbie C++ experience, they sound too complicated. Any ideas?
If you are going to read the file often, this would be a good time to learn bitwise operations. Using one bit per bool would be 1/8th the size. That's going to save a lot of memory and I/O.
So save it as one bit per bool, then either break it into chunks and/or read it using mapped memory (e.g. mmap). You can put this behind a usable interface, so you need to implement it just once and abstract the serialized format when you need to read the values.

Process as said before, here vec is the vector of vector of bool and we pack all bit in sub vector 8 x 8 in bytes and push those a bytes in a vector.
std::vector<unsigned char> buf;
int cmp = 0;
unsigned char output=0;
FILE* of = fopen("out.bin")
for_each ( auto& subvec in vec)
{
for_each ( auto b in subvec)
{
output=output | ((b ? 1 : 0) << cmp);
cmp++;
if(cmp==8)
{
buf.push_back(output);
cmp = 0;
output = 0;
}
}
fwrite(&buf[0], 1, buf.size(), of);
buf.clear();
}
fclose(of);

1+2+4 in Binary bits

I have an email from a developer in which he says:
As you may know 1110000000000000 means 1+2+4
I won't be able to contact him for a few days. Can anyone else explain how that is possible?
Numbers appear to be turned into binary using the following function:
function toBinaryString(bitmask)
tvar2 = 0
tvar3 = 1
tvar1 = ""
do while tvar2 < 16
if (bitmask and tvar3) > 0 then
tvar1 = tvar1 & "1"
else
tvar1 = tvar1 & "0"
end if
tvar3 = tvar3 * 2
tvar2 = tvar2 + 1
loop
toBinaryString = tvar1
end function

It's little endian notation (Wiki). Basicaly the least significant bits appear on the left, unlike big endian notation (which is what most people think of when talking about binary).
As such the first bit represents 0^2, then 1^2, 2^2 etc. (so 1 + 2 + 4).

Prepare for some interesting reading material: How bytes work
Actually your developer is not correct, 1110000000000000 in binary notation is 57344 in decimal notation.

Are bitwise operations going to help me to serialize some bools?

I'm not used to binary files, and I'm trying to get the hang of it. I managed to store some integers and unsigned char, and read them without too much pain. Now, when I'm trying to save some booleans, I see that each of my bool takes exactly 1 octet in my file, which seems logical since a lone bool is stored in a char-sized data (correct me if I'm wrong!).
But since I'm going to have 3 or 4 bools to serialize, I figure it is a waste to store them like this : 00000001 00000001 00000000, for instance, when I could have 00000110. I guess to obtain this I should use bitwise operation, but I'm not very good with them... so could somebody tell me:
How to store up to 8 bools in a single octet using bitwise manipulations?
How to give proper values to (up to 8 bools) from a single octet using bitwise manipulation?
(And, bonus question, does anybody can recommend a simple, non-mathematical-oriented-mind like mine, bit manipulation tutorial if this exists? Everything I found I understood but could not put into practice...)
I'm using C++ but I guess most C-syntaxic languages will use the same kind of operation.

To store bools in a byte:
bool flag; // value to store
unsigned char b = 0; // all false
int position; // ranges from 0..7
b = b | (flag << position);
To read it back:
flag = (b & (1 << position));

The easy way is to use std::bitset which allows you to use indexing to access individual bits (bools), then get the resulting value as an integer. It also allows the reverse.
int main() {
std::bitset<8> s;
s[1] = s[2] = true; // 0b_0000_0110
cout << s.to_ulong() << '\n';
}

Without wrapping in fancy template/pre-processor machinery:
Set bit 3 in var:var |= (1 << 3)
Set bit n in var:var |= (1 << n)
Clear bit n in var:var &= ~(1 << n)
Test bit n in var: (the !! ensures the result is 0 or 1)!!(var & (1 << n))

Try reading this in order.
http://www.cprogramming.com/tutorial/bitwise_operators.html
http://www-graphics.stanford.edu/~seander/bithacks.html#ConditionalSetOrClearBitsWithoutBranching
Some people willthink that 2nd link is way too hardcore, but once you will master simple manipulation, it will come handy.

Basic stuff first:
The only combination of bits that means false is 00000000 all the others mean true i.e: 00001000,01010101
00000000 = 0(decimal), 00000001 = 2^0, 00000010 = 2^1, 00000100 = 2^2, …. ,10000000 = 2^7
There is a big difference between the operands (&&, ||) and (&,|) the first ones give the result of the logic operation between the two numbers, for example:
00000000 && 00000000 = false,
01010101 && 10101010 = true
00001100 || 00000000 = true,
00000000 || 00000000 = false
The second pair makes a bitwise operation (the logic operation between each bit of the numbers):
00000000 & 00000000 = 00000000 = false
00001111 & 11110000 = 00000000 = false
01010101 & 10101001 = 00000001 = true
00001111 | 11110000 = 11111111 = true
00001100 | 00000011 = 00001111 = true
To work with this and play with the bits, you only need to know some basic tricks:
To set a bit to 1 you make the operation | with an octet that has a 1 in that position and ceros in the rest.
For example: we want the first bit of the octet A to be 1 we make: A|00000001
To set a bit to 0 you make the operation & with an octet that has a 0 in that position and ones in the rest.
For example: we want the last bit of the octet A to be 0 we make: A&01111111
To get the Boolean value that holds a bit you make the operation & with an octet that has a 1 in that position and ceros in the rest.
For example: we want to see the value of the third bit of the octet A, we make: A&00000100, if A was XXXXX1XX we get 00000100 = true and if A was XXXXX0XX we get 00000000 = false;

You can always serialize bitfields. Something like:
struct bools
{
bool a:1;
bool b:1;
bool c:1;
bool d:1;
};
has a sizeof 1

C/C++ Bit Array or Bit Vector

I am learning C/C++ programming & have encountered the usage of 'Bit arrays' or 'Bit Vectors'. Am not able to understand their purpose? here are my doubts -
Are they used as boolean flags?
Can one use int arrays instead? (more memory of course, but..)
What's this concept of Bit-Masking?
If bit-masking is simple bit operations to get an appropriate flag, how do one program for them? is it not difficult to do this operation in head to see what the flag would be, as apposed to decimal numbers?
I am looking for applications, so that I can understand better. for Eg -
Q. You are given a file containing integers in the range (1 to 1 million). There are some duplicates and hence some numbers are missing. Find the fastest way of finding missing
numbers?
For the above question, I have read solutions telling me to use bit arrays. How would one store each integer in a bit?

I think you've got yourself confused between arrays and numbers, specifically what it means to manipulate binary numbers.
I'll go about this by example. Say you have a number of error messages and you want to return them in a return value from a function. Now, you might label your errors 1,2,3,4... which makes sense to your mind, but then how do you, given just one number, work out which errors have occured?
Now, try labelling the errors 1,2,4,8,16... increasing powers of two, basically. Why does this work? Well, when you work base 2 you are manipulating a number like 00000000 where each digit corresponds to a power of 2 multiplied by its position from the right. So let's say errors 1, 4 and 8 occur. Well, then that could be represented as 00001101. In reverse, the first digit = 1*2^0, the third digit 1*2^2 and the fourth digit 1*2^3. Adding them all up gives you 13.
Now, we are able to test if such an error has occured by applying a bitmask. By example, if you wanted to work out if error 8 has occured, use the bit representation of 8 = 00001000. Now, in order to extract whether or not that error has occured, use a binary and like so:
00001101
& 00001000
= 00001000
I'm sure you know how an and works or can deduce it from the above - working digit-wise, if any two digits are both 1, the result is 1, else it is 0.
Now, in C:
int func(...)
{
int retval = 0;
if ( sometestthatmeans an error )
{
retval += 1;
}
if ( sometestthatmeans an error )
{
retval += 2;
}
return retval
}
int anotherfunc(...)
{
uint8_t x = func(...)
/* binary and with 8 and shift 3 plaes to the right
* so that the resultant expression is either 1 or 0 */
if ( ( ( x & 0x08 ) >> 3 ) == 1 )
{
/* that error occurred */
}
}
Now, to practicalities. When memory was sparse and protocols didn't have the luxury of verbose xml etc, it was common to delimit a field as being so many bits wide. In that field, you assign various bits (flags, powers of 2) to a certain meaning and apply binary operations to deduce if they are set, then operate on these.
I should also add that binary operations are close in idea to the underlying electronics of a computer. Imagine if the bit fields corresponded to the output of various circuits (carrying current or not). By using enough combinations of said circuits, you make... a computer.

regarding the usage the bits array :
if you know there are "only" 1 million numbers - you use an array of 1 million bits. in the beginning all bits will be zero and every time you read a number - use this number as index and change the bit in this index to be one (if it's not one already).
after reading all numbers - the missing numbers are the indices of the zeros in the array.
for example, if we had only numbers between 0 - 4 the array would look like this in the beginning: 0 0 0 0 0.
if we read the numbers : 3, 2, 2
the array would look like this: read 3 --> 0 0 0 1 0. read 3 (again) --> 0 0 0 1 0. read 2 --> 0 0 1 1 0. check the indices of the zeroes: 0,1,4 - those are the missing numbers
BTW, of course you can use integers instead of bits but it may take (depends on the system) 32 times memory
Sivan

Bit Arrays or Bit Vectors can be though as an array of boolean values. Normally a boolean variable needs at least one byte storage, but in a bit array/vector only one bit is needed.
This gets handy if you have lots of such data so you save memory at large.
Another usage is if you have numbers which do not exactly fit in standard variables which are 8,16,32 or 64 bit in size. You could this way store into a bit vector of 16 bit a number which consists of 4 bit, one that is 2 bit and one that is 10 bits in size. Normally you would have to use 3 variables with sizes of 8,8 and 16 bit, so you only have 50% of storage wasted.
But all these uses are very rarely used in business aplications, the come to use often when interfacing drivers through pinvoke/interop functions and doing low level programming.

Bit Arrays of Bit Vectors are used as a mapping from position to some bit value. Yes it's basically the same thing as an array of Bool, but typical Bool implementation is one to four bytes long and it uses too much space.
We can store the same amount of data much more efficiently by using arrays of words and binary masking operations and shifts to store and retrieve them (less overall memory used, less accesses to memory, less cache miss, less memory page swap). The code to access individual bits is still quite straightforward.
There is also some bit field support builtin in C language (you write things like int i:1; to say "only consume one bit") , but it is not available for arrays and you have less control of the overall result (details of implementation depends on compiler and alignment issues).
Below is a possible way to answer to your "search missing numbers" question. I fixed int size to 32 bits to keep things simple, but it could be written using sizeof(int) to make it portable. And (depending on the compiler and target processor) the code could only be made faster using >> 5 instead of / 32 and & 31 instead of % 32, but that is just to give the idea.
#include <stdio.h>
#include <errno.h>
#include <stdint.h>
int main(){
/* put all numbers from 1 to 1000000 in a file, except 765 and 777777 */
{
printf("writing test file\n");
int x = 0;
FILE * f = fopen("testfile.txt", "w");
for (x=0; x < 1000000; ++x){
if (x == 765 || x == 777760 || x == 777791){
continue;
}
fprintf(f, "%d\n", x);
}
fprintf(f, "%d\n", 57768); /* this one is a duplicate */
fclose(f);
}
uint32_t bitarray[1000000 / 32];
/* read file containing integers in the range [1,1000000] */
/* any non number is considered as separator */
/* the goal is to find missing numbers */
printf("Reading test file\n");
{
unsigned int x = 0;
FILE * f = fopen("testfile.txt", "r");
while (1 == fscanf(f, " %u",&x)){
bitarray[x / 32] |= 1 << (x % 32);
}
fclose(f);
}
/* find missing number in bitarray */
{
int x = 0;
for (x=0; x < (1000000 / 32) ; ++x){
int n = bitarray[x];
if (n != (uint32_t)-1){
printf("Missing number(s) between %d and %d [%x]\n",
x * 32, (x+1) * 32, bitarray[x]);
int b;
for (b = 0 ; b < 32 ; ++b){
if (0 == (n & (1 << b))){
printf("missing number is %d\n", x*32+b);
}
}
}
}
}
}

That is used for bit flags storage, as well as for parsing different binary protocols fields, where 1 byte is divided into a number of bit-fields. This is widely used, in protocols like TCP/IP, up to ASN.1 encodings, OpenPGP packets, and so on.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Convert one-hot encoding to plain binary - c++

If this is being compiled on an ARM processor, you can simply do this: result = 31 - __CLZ(number); Assuming number is a 32-bit one-hot > 0.

You can make use of bitwise and modulo operations to do this efficiently without needing to create a large array for (int answer = 1; (byte % 2) == 0; ++answer) { byte >>= 1; } (I know this was an old question, I just wanted to share because this was a high Google result for me)

Related

Does a 64 bit packed structure contains a field set to specified value

C++ save and load huge vector<bool>

1+2+4 in Binary bits

Are bitwise operations going to help me to serialize some bools?

C/C++ Bit Array or Bit Vector

Categories

Resources