Declare a bit in C++ - c++

I write a program with highly memory cost requirement and I want save memory with no performance lost.
So I want change every variable which has only two situations into bit.
But I can't find bit type in C++ and bitset in STL is always multiples of 4 byte in 32-bit machine.
Writing a data struct to manage bits will cause performance lost.
Is there any way to declare a bit value just like bit a;?
Thanks everyone. At last the answer I want is:"you can't buy half bytes in C++".

There is none. The smallest addressable entity is a byte. This is the char or unsigned char type. (The best type is the integer because it is aligned to the width of your processor and thus fastest to fetch and work on)
To work with bits you need to use boolean operators and mask/shift your data in the larger types. Or work with STL bitsets.

source: http://www.learncpp.com/cpp-tutorial/3-8a-bit-flags-and-bit-masks/
If you are using simple booleans, the above example displays how you can adress them to seperate bit values inside of bytes.
C++14
Define 8 separate bit flags (these can represent whatever you want)
const unsigned char option1 = 0b0000'0001;
const unsigned char option2 = 0b0000'0010;
const unsigned char option3 = 0b0000'0100;
const unsigned char option4 = 0b0000'1000;
const unsigned char option5 = 0b0001'0000;
const unsigned char option6 = 0b0010'0000;
const unsigned char option7 = 0b0100'0000;
const unsigned char option8 = 0b1000'0000;
C++11 or earlier
Define 8 separate bit flags (these can represent whatever you want)
const unsigned char option1 = 0x1; // hex for 0000 0001
const unsigned char option2 = 0x2; // hex for 0000 0010
const unsigned char option3 = 0x4; // hex for 0000 0100
const unsigned char option4 = 0x8; // hex for 0000 1000
const unsigned char option5 = 0x10; // hex for 0001 0000
const unsigned char option6 = 0x20; // hex for 0010 0000
const unsigned char option7 = 0x40; // hex for 0100 0000
const unsigned char option8 = 0x80; // hex for 1000 0000
We use a byte-size value to hold our options
Each bit in myflags corresponds to one of the options defined above
unsigned char myflags = 0; -- all options turned off to start
To query a bit state, we use bitwise AND ('&' operator):
if (myflags & option4) ... -- if option4 is set, do something
if !(myflags & option5) ... -- if option5 is not set, do something
To set a bit (turn on), we use bitwise OR('|' operator):
myflags |= option4; -- turn option 4 on.
myflags |= (option4 | option5); -- turn options 4 and 5 on.
To clear a bit (turn off), we use bitwise AND with an inverse(~) bit pattern:
myflags &= ~option4; -- turn option 4 off
myflags &= ~(option4 | option5); -- turn options 4 and 5 off
To toggle a bit state, we use bitwise XOR:
myflags ^= option4; -- flip option4 from on to off, or vice versa
myflags ^= (option4 | option5); -- flip options 4 and 5
You can use: static_cast(value) to turn said value into a bool.

There is no such data type as a "bit" specifically. The a practice is to use a standard uint8_t (or uint16, 32) and use the individual bits for different values. E.g.:
#define BIT1 0x01
#define BIT2 0x02
#define BIT3 0x04
#define BIT4 0x08
uint8_t bit_vars;
// Make a function to access a particular bit
uint8_t get_bitx(int x)
{
switch (x)
{
case 1:
return bit_vars & BIT1;
break;
case 2:
return bit_vars & BIT2;
break;
case 3:
return bit_vars & BIT3;
break;
case 4:
return bit_vars & BIT4;
break;
}
// Make a function to set/storea particular bit
void set_bitx(int x, bool set_flag)
{
switch (x)
{
case 1:
if (set_flag) {bit_vars |= 1 << (BIT1 - 1);}
break;
case 2:
if (set_flag) {bit_vars |= 1 << (BIT2 - 1);}
break;
case 3:
if (set_flag) {bit_vars |= 1 << (BIT3 - 1);}
break;
case 4:
if (set_flag) {bit_vars |= 1 << (BIT4 - 1);}
break;
}
Note: This is just a rough example, not compilable.
You can also use bit-fields, I personally tend to stay away from them, as they are not always portable across different processors / compilers.

You can use bit fields. Or use std::vector with bool type, which has template specialization.

use an integer storage (32 bits) where bit represent 1 variable.
indeed this makes your code ugly but if you wish to have memory optimization, you have to pay somewhere else.
Accessing each variable's "bit" should be done by bit-wise operations on that integer.

Related

How to build N bits variables in C++?

I am dealing with very large list of booleans in C++, around 2^N items of N booleans each. Because memory is critical in such situation, i.e. an exponential growth, I would like to build a N-bits long variable to store each element.
For small N, for example 24, I am just using unsigned long int. It takes 64MB ((2^24)*32/8/1024/1024). But I need to go up to 36. The only option with build-in variable is unsigned long long int, but it takes 512GB ((2^36)*64/8/1024/1024/1024), which is a bit too much.
With a 36-bits variable, it would work for me because the size drops to 288GB ((2^36)*36/8/1024/1024/1024), which fits on a node of my supercomputer.
I tried std::bitset, but std::bitset< N > creates a element of at least 8B.
So a list of std::bitset< 1 > is much greater than a list of unsigned long int.
It is because the std::bitset just change the representation, not the container.
I also tried boost::dynamic_bitset<> from Boost, but the result is even worst (at least 32B!), for the same reason.
I know an option is to write all elements as one chain of booleans, 2473901162496 (2^36*36), then to store then in 38654705664 (2473901162496/64) unsigned long long int, which gives 288GB (38654705664*64/8/1024/1024/1024). Then to access an element is just a game of finding in which elements the 36 bits are stored (can be either one or two). But it is a lot of rewriting of the existing code (3000 lines) because mapping becomes impossible and because adding and deleting items during the execution in some functions will be surely complicated, confusing, challenging, and the result will be most likely not efficient.
How to build a N-bits variable in C++?
How about a struct with 5 chars (and perhaps some fancy operator overloading as needed to keep it compatible to the existing code)? A struct with a long and a char probably won't work because of padding / alignment...
Basically your own mini BitSet optimized for size:
struct Bitset40 {
unsigned char data[5];
bool getBit(int index) {
return (data[index / 8] & (1 << (index % 8))) != 0;
}
bool setBit(int index, bool newVal) {
if (newVal) {
data[index / 8] |= (1 << (index % 8));
} else {
data[index / 8] &= ~(1 << (index % 8));
}
}
};
Edit: As geza has also pointed out int he comments, the "trick" here is to get as close as possible to the minimum number of bytes needed (without wasting memory by triggering alignment losses, padding or pointer indirection, see http://www.catb.org/esr/structure-packing/).
Edit 2: If you feel adventurous, you could also try a bit field (and please let us know how much space it actually consumes):
struct Bitset36 {
unsigned long long data:36;
}
I'm not an expert, but this is what I would "try". Find the bytes for the smallest type your compiler supports (should be char). You can check with sizeof and you should get 1. That means 1 byte, so 8 bits.
So if you wanted a 24 bit type...you would need 3 chars. For 36 you would need 5 char array and you would have 4 bits of wasted padding on the end. This could easily be accounted for.
i.e.
char typeSize[3] = {0}; // should hold 24 bits
Now make a bit mask to access each position of typeSize.
const unsigned char one = 0b0000'0001;
const unsigned char two = 0b0000'0010;
const unsigned char three = 0b0000'0100;
const unsigned char four = 0b0000'1000;
const unsigned char five = 0b0001'0000;
const unsigned char six = 0b0010'0000;
const unsigned char seven = 0b0100'0000;
const unsigned char eight = 0b1000'0000;
Now you can use the bit-wise or to set the values to 1 where needed..
typeSize[1] |= four;
*typeSize[0] |= (four | five);
To turn off bits use the & operator..
typeSize[0] &= ~four;
typeSize[2] &= ~(four| five);
You can read the position of each bit with the & operator.
typeSize[0] & four
Bear in mind, I don't have a compiler handy to try this out so hopefully this is a useful approach to your problem.
Good luck ;-)
You can use array of unsigned long int and store and retrieve needed bit chains with bitwise operations. This approach excludes space overhead.
Simplified example for unsigned byte array B[] and 12-bit variables V (represented as ushort):
Set V[0]:
B[0] = V & 0xFF; //low byte
B[1] = B[1] & 0xF0; // clear low nibble
B[1] = B[1] | (V >> 8); //fill low nibble of the second byte with the highest nibble of V

Safely convert 2 bytes to short

I'm making an emulator for the Intel 8080. One of the opcodes requires a 16 bit address by combining the b and c registers (both 1 byte). I have a struct with the registers adjacent to each other. The way I combine the two registers is:
using byte = char;
struct {
... code
byte b;
byte c;
... code
} state;
...somewhere in code
// memory is an array of byte with a size of 65535
memory[*reinterpret_cast<short*>(&state.b)]
I was thinking I can just OR them together, but that doesn't work.
short address = state.b | state.c
Another way I tried doing this was by creating a short, and setting the 2 bytes individually.
short address;
*reinterpret_cast<byte*>(&address) = state.b;
*(reinterpret_cast<byte*>(&address) + 1) = state.c;
Is there a better/safer way to achieve what I am trying to do?
short j;
j = state.b;
j <<= 8;
j |= state.c;
Reverse the state.b and state.c if you need the opposite endianness.
short address = ((unsigned short)state.b << 8) | (unsigned char)state.c;
That's the portable way. Your way, with reinterpret_cast is not really that terrible, as long as you understand that it'll only work on architecture with the correct endian-ness.
As others have mentioned there are concerns with endian-ness but you can also use a union to manipulate the memory without the need to do any shifting.
Example Code
#include <cstdint>
#include <iostream>
using byte = std::uint8_t;
struct Regs
{
union
{
std::uint16_t bc;
struct
{
// The order of these bytes matters
byte c;
byte b;
};
};
};
int main()
{
Regs regs;
regs.b = 1; // 0000 0001
regs.c = 7; // 0000 0111
// Read these vertically to know the value associated with each bit
//
// 2 1
// 5 2631
// 6 8426 8421
//
// The overall binary: 0000 0001 0000 0111
//
// 256 + 4 + 2 + 1 = 263
std::cout << regs.bc << "\n";
return 0;
}
Example Output
263
Live Example
You can use:
unsigned short address = state.b * 0x100u + state.c;
Using multiplication instead of shift avoids all the issues relating to shifting the sign bit etc.
The address should be unsigned otherwise you will cause out-of-range assignment, and probably you want to use 0 to 65535 as your address range anyway, instead of -32768 to 32767.

Change the width of a signed integer to a nonstandard width

For a networking application I need a signed, 2's complement integer. With a custom width. Specified at run time. Assuming the value of the integer falls in the width.
The problem I have is the parity bit. Is there any way of avoid having to manually set the parity bit? Say I have an integer with a width of 11 bits, i'll store it in an array of 2 chars like this:
int myIntWidth = 11;
int32_t myInt= 5;
unsigned char charArray[2] = memcpy(charArray, &myInt, (myIntWidth + 7)/8);
It doesn't work like that. It can't work, because you are copying two bytes from the start of myInt but you don't know where the bytes that you are interested in are stored. You also need to know in which order you are supposed to store the bytes. Depending on that, use one of these two codes:
unsigned char charArray [2];
charArray [0] = myInt & 0xff; // Lowest 8 bits
charArray [1] = (myInt >> 8) & 0x07; // Next 3 bits
or
unsigned char charArray [2];
charArray [1] = myInt & 0xff; // Lowest 8 bits
charArray [0] = (myInt >> 8) & 0x07; // Next 3 bits
With the help of a lot of the posts above, I've come up with this solution:
inline void reduceSignedIntWidth(int32_t& destInt, int width)
{
//create a value mask, with 1's at the masked part
uint32_t l_mask = (0x01u << width) - 1;
destInt &= l_mask;
}
It will return the reduced int, with zeros as padding.

Writing on MSB and on LSB of an unsigned Char

I have an unsigned char and i want to write 0x06 on the four most significant, and i want to write 0x04 on its 4 least significant bits.
So the Char representation should be like 0110 0010
Can some can guide me how i can do this in C?
c = (0x06 << 4) | 0x04;
Because:
0x04 = 0000 0100
0x06 = 0000 0110
0x06<<4 = 0110 0000
or op: = 0110 0100
Shift values into the right position with the bitwise shift operators, and combine with bitwise or.
unsigned char c = (0x6 << 4) | 0x4;
To reverse the process and extract bitfields, you can use bitwise and with a mask containing just the bits you're interested in:
unsigned char lo4 = c & 0xf;
unsigned char hi4 = c >> 4;
First, ensure there are eight bits per unsigned char:
#include <limits.h>
#if CHAR_BIT != 8
#error "This code does not support character sizes other than 8 bits."
#endif
Now, suppose you already have an unsigned char defined with:
unsigned char x;
Then, if you want to completely set an unsigned char to have 6 in the high four bits and 4 in the low four bits, use:
x = 0x64;
If you want to see the high bits to a and the low bits to b, then use:
// Shift a to high four bits and combine with b.
x = a << 4 | b;
If you want to set the high bits to a and leave the low bits unchanged, use:
// Shift a to high four bits, extract low four bits of x, and combine.
x = a << 4 | x & 0xf;
If you want to set the low bits to b and leave the high bits unchanged, use:
// Extract high four bits of x and combine with b.
x = x & 0xf0 | b;
The above presumes that a and b contain only four-bit values. If they might have other bits set, use (a & 0xf) and (b & 0xf) in place of a and b above, respectively.

C++ How to combine two signed 8 Bit numbers to a 16 Bit short? Unexplainable results

I need to combine two signed 8 Bit _int8 values to a signed short (16 Bit) value. It is important that the sign is not lost.
My code is:
unsigned short lsb = -13;
unsigned short msb = 1;
short combined = (msb << 8 )| lsb;
The result I get is -13. However, I expect it to be 499.
For the following examples, I get the correct results with the same code:
msb = -1; lsb = -6; combined = -6;
msb = 1; lsb = 89; combined = 345;
msb = -1; lsb = 13; combined = -243;
However, msb = 1; lsb = -84; combined = -84; where I would expect 428.
It seems that if the lsb is negative and the msb is positive, something goes wrong!
What is wrong with my code? How does the computer get to these unexpected results (Win7, 64 Bit and VS2008 C++)?
Your lsb in this case contains 0xfff3. When you OR it with 1 << 8 nothing changes because there is already a 1 in that bit position.
Try short combined = (msb << 8 ) | (lsb & 0xff);
Or using a union:
#include <iostream>
union Combine
{
short target;
char dest[ sizeof( short ) ];
};
int main()
{
Combine cc;
cc.dest[0] = -13, cc.dest[1] = 1;
std::cout << cc.target << std::endl;
}
It is possible that lsb is being automatically sign-extended to 16 bits. I notice you only have a problem when it is negative and msb is positive, and that is what you would expect to happen given the way you're using the or operator. Although, you're clearly doing something very strange here. What are you actually trying to do here?
Raisonanse C complier for STM8 (and, possibly, many other compilers) generates ugly code for classic C code when writing 16-bit variables into 8-bit hardware registers.
Note - STM8 is big-endian, for little-endian CPUs code must be slightly modified. Read/Write byte order is important too.
So, standard C code piece:
unsigned int ch1Sum;
...
TIM5_CCR1H = ch1Sum >> 8;
TIM5_CCR1L = ch1Sum;
Is being compiled to:
;TIM5_CCR1H = ch1Sum >> 8;
LDW X,ch1Sum
CLR A
RRWA X,A
LD A,XL
LD TIM5_CCR1,A
;TIM5_CCR1L = ch1Sum;
MOV TIM5_CCR1+1,ch1Sum+1
Too long, too slow.
My version:
unsigned int ch1Sum;
...
TIM5_CCR1H = ((u8*)&ch1Sum)[0];
TIM5_CCR1L = ch1Sum;
That is compiled into adequate two MOVes
;TIM5_CCR1H = ((u8*)&ch1Sum)[0];
MOV TIM5_CCR1,ch1Sum
;TIM5_CCR1L = ch1Sum;
MOV TIM5_CCR1+1,ch1Sum+1
Opposite direction:
unsigned int uSonicRange;
...
((unsigned char *)&uSonicRange)[0] = TIM1_CCR2H;
((unsigned char *)&uSonicRange)[1] = TIM1_CCR2L;
instead of
unsigned int uSonicRange;
...
uSonicRange = TIM1_CCR2H << 8;
uSonicRange |= TIM1_CCR2L;
Some things you should know about the datatypes (un)signed short and char:
char is an 8-bit value, thats what you where looking for for lsb and msb. short is 16 bits in length.
You should also not store signed values in unsigned ones execpt you know what you are doing.
You can take a look at the two's complement. It describes the representation of negative values (for integers, not for floating-point values) in C/C++ and many other programming languages.
There are multiple versions of making your own two's complement:
int a;
// setting a
a = -a; // Clean version. Easier to understand and read. Use this one.
a = (~a)+1; // The arithmetical version. Does the same, but takes more steps.
// Don't use the last one unless you need it!
// It can be 'optimized away' by the compiler.
stdint.h (with inttypes.h) is more for the purpose of having exact lengths for your variable. If you really need a variable to have a specific byte-length you should use that (here you need it).
You should everythime use datatypes which fit your needs the best. Your code should therefore look like this:
signed char lsb; // signed 8-bit value
signed char msb; // signed 8-bit value
signed short combined = msb << 8 | (lsb & 0xFF); // signed 16-bit value
or like this:
#include <stdint.h>
int8_t lsb; // signed 8-bit value
int8_t msb; // signed 8-bit value
int_16_t combined = msb << 8 | (lsb & 0xFF); // signed 16-bit value
For the last one the compiler will use signed 8/16-bit values everytime regardless what length int has on your platform. Wikipedia got some nice explanation of the int8_t and int16_t datatypes (and all the other datatypes).
btw: cppreference.com is useful for looking up the ANSI C standards and other things that are worth to know about C/C++.
You wrote, that you need to combine two 8-bit values. Why you're using unsigned short then?
As Dan already said, lsb automatically extended to 16 bits. Try the following code:
uint8_t lsb = -13;
uint8_t msb = 1;
int16_t combined = (msb << 8) | lsb;
This gives you the expected result: 499.
If this is what you want:
msb: 1, lsb: -13, combined: 499
msb: -6, lsb: -1, combined: -1281
msb: 1, lsb: 89, combined: 345
msb: -1, lsb: 13, combined: -243
msb: 1, lsb: -84, combined: 428
Use this:
short combine(unsigned char msb, unsigned char lsb) {
return (msb<<8u)|lsb;
}
I don't understand why you would want msb -6 and lsb -1 to generate -6 though.