n-bit signed number where n is not a power of 2

n-bit signed number where n is not a power of 2 - c++

Is it possible to have for example 6-bit signed integer or 11-bit signed integer (or any n-bit integer where n != 2^x) in C or C++?

It's certainly possible that a C or C++ implementation could provide types that are sized like this, but most implementations won't provide built-in types with sizes like these because (1) it's uncommon to encounter them and (2) most processors don't support direct operations on types like these.
If you're trying to use these integers as groups of binary flags, consider using std::bitset as the other answers have suggested. This might even be a better option, since it explicitly indicates that you're using a group of flags. For example:
std::bitset<6> bits; // Six independent bit flags
bits[3] = true; // Set the third flag
If you're trying to use them as actual integer types, just constrained by the number of bits they use, consider using bitfields, like this:
struct uint6_t {
uint64_t value : 6; // 6 bits, unsigned
};
struct int6_t {
int64_t value : 6; // 6 bits, signed
};
You can then use uint6_t's value field as a six-bit integer. This will only work for sizes that are smaller than the size of the underlying type you use inside the bitfield, which should work for sizes like 6 or 11 but not for sizes like 137 or 271. One note - the actual size of these objects will likely not be six bits because of the padding bits introduced by the compiler, but they will function like six-bit integers nonetheless.
Apparently C++ templates allow you to do something like this:
template <unsigned int NumBits> struct uint {
uint64_t data : NumBits;
};
template <unsigned int NumBits> struct Int {
int64_t data : NumBits;
};
uint<6> value;
value.data = 0; // Or other uses
Int<6> value;
value.data = -1; // Or other uses
EDIT: Based on what you're trying to do, it seems like you're looking for something like this:
uint<6> value;
value.data = -3;
std::cout << value.data << std::endl; // Prints 3

You can use bitfields to simulate this:
struct int6_t{
int32_t intPart : 6; // 6 bit integer
}
int_6_t mySilly6bitInt;
mySilly6bitInt.intPart = 5;

C and C++ both require that a byte, an unsigned char, is at least 8 bits. But there is no upper limit, and there is no restriction to powers of two. So in principle CHAR_BIT (the number of bits per byte) can be e.g. 10.
Mostly this is for backward compatibility. In earlier times there were computers based on e.g. 12-bit bytes. To cater to modern 8-bit systems C99 introduced the stdint.h header, which provides types that, if they're supported, guarantee a multiple of 8 bits.
For the in-practice you can simulate types with any number of bits, at a cost of some run-time overhead.

Related

How to iterate over every bit of a type in C++

I wanted to write the Digital Search Tree in C++ using templates. To do that given a type T and data of type T I have to iterate over bits of this data. Doing this on integers is easy, one can just shift the number to the right an appropriate number of positions and "&" the number with 1, like it was described for example here How to get nth bit values . The problem starts when one tries to do get i'th bit from the templated data. I wrote something like this
#include <iostream>
template<typename T>
bool getIthBit (T data, unsigned int bit) {
return ((*(((char*)&data)+(bit>>3)))>>(bit&7))&1;
}
int main() {
uint32_t a = 16;
for (int i = 0; i < 32; i++) {
std::cout << getIthBit (a, i);
}
std::cout << std::endl;
}
Which works, but I am not exactly sure if it is not undefined behavior. The problem with this is that to iterate over all bits of the data, one has to know how many of them are, which is hard for struct data types because of padding. For example here
#include <iostream>
struct s {
uint32_t i;
char c;
};
int main() {
std::cout << sizeof (s) << std::endl;
}
The actual data has 5 bytes, but the output of the program says it has 8. I don't know how to get the actual size of the data, or if it is at all possible. A question about this was asked here How to check the size of struct w/o padding? , but the answers are just "don't".

It's easy to know know how many bits there are in a type. There's exactly CHAR_BIT * sizeof(T). sizeof(T) is the actual size of the type in bytes. But indeed, there isn't a general way within standard C++ to know which of those bits - that are part of the type - are padding.
I recommend not attempting to support types that have padding as keys of your DST.
Following trick might work for finding padding bits of trivially copyable classes:
Use std::memset to set all bits of the object to 0.
For each sub object with no sub objects of their own, set all bits to 1 using std::memset.
For each sub object with their own sub objects, perform the previous and this step recursively.
Check which bits stayed 0.
I'm not sure if there are any technical guarantees that the padding actually stays 0, so whether this works may be unspecified. Furthermore, there can be non-classes that have padding, and the described trick won't detect those. long double is typical example; I don't know if there are others. This probably won't detect unused bits of integers that underlie bitfields.
So, there are a lot of caveats, but it should work in your example case:
s sobj;
std::memset(&sobj, 0, sizeof sobj);
std::memset(&sobj.i, -1, sizeof sobj.i);
std::memset(&sobj.c, -1, sizeof sobj.c);
std::cout << "non-padding bits:\n";
unsigned long long ull;
std::memcpy(&ull, &sobj, sizeof sobj);
std::cout << std::bitset<sizeof sobj * CHAR_BIT>(ull) << std::endl;

There's a Standard way to know if a type has unique representation or not. It is std::has_unique_object_representations, available since C++17.
So if an object has unique representations, it is safe to assume that every bit is significant.
There's no standard way to know if non-unique representation caused by padding bytes/bits like in struct { long long a; char b; }, or by equivalent representations¹. And no standard way to know padding bits/bytes offsets.
Note that "actual size" concept may be misleading, as padding can be in the middle, like in struct { char a; long long b; }
Internally compiler has to distinguish padding bits from value bits to implement C++20 atomic<T>::compare_exchange_*. MSVC does this by zeroing padding bits with __builtin_zero_non_value_bits. Other compiler may use other name, another approach, or not expose atomic<T>::compare_exchange_* internals to this level.
¹ like multiple NaN floating point values

MSVC 1 bit enum type equals -1 and equality test fails

I have defined a bitfield of enum types to match a set of bits in an embedded system. I'm trying to write a test harness in MSVC for the code, but comparing what should be equal values fails.
The definition looks like this:
typedef enum { SERIAL, PARALLEL } MODE_e;
typedef union {
struct {
TYPE_e Type : 1; // 1
POSITION_e 1Pos : 1; // 2
POSITION_e 2Pos : 1; // 3
bool Enable : 1; // 4
NET_e Net : 1; // 5
TYPE_e Type : 1; // 6
bool En : 1; // 7
TIME_e Time : 3; // 8-10
MODE_e Mode : 1; // 11
bool TestEn : 1; // 12
bool DelayEn : 1; // 13
MODE_e Mode : 1; // 14
bool xEn : 1; // 15
MODE_e yMode : 1; // 16
bool zEnable : 1; // 17
} Bits;
uint32_t Word;
} BITS_t;
Later the following comparison fails:
Store.Bits.Mode = PARALLEL;
if (store.Bits.Mode == PARALLEL)
...
I examined the Mode bool in the debugger, and it looked odd. The value of Mode is -1.
It's as if MSVC considers the value to be a two's complement number, but 1 bit wide, so 0b1 is decimal -1. The enum sets PARALLEL to 1, so the two do not match.
The comparison works fine on the embedded side using LLVM or GCC.
Which behavior is correct? I assume GCC and LLVM have better support for the C standards than MSVC in areas such as bit fields. More importantly, can I work around this difference without making major changes to the embedded code?

Dissecting this in detail, you have the following problems:
There is no guarantee that Type : 1 is the MSB or LSB. Generally, there are no guarantees of the bit-field layout in memory at all.
As mentioned in other answers, enumeration variables (unlike enumeration constants) have implementation-defined size. Meaning that you can't know their size, portably. In addition, if the size is something which isn't the same as either int or _Bool, the compiler need not support it at all.
Enums are most often a signed integer type. And when you create a bit-field of size 1 with a signed type, nobody including the standard knows what it means. Is it the sign bit you intend to store there or is it data?
The size of what the C standard calls "storage unit" inside the bit-field, is unspecified. Typically it is alignment-based. The C standard does guarantee that if you have several bit-fields of the same type trailing each other, they must be merged into the same storage unit (if there is room). For different types, there are no such guarantees.
It is fairly common that when you go from one type like POSITION_e to a different type bool, the compiler places them in different storage units. In practice meaning that there's a high risk of padding bit insertion whenever this happens. Lots of mainstream compilers do in fact behave just like that.
In addition, a struct or union may contain padding bytes anywhere.
In addition, there is the endianess problem.
Conclusion: bit-fields cannot be used in programs that need any form of portability. They cannot be used for the purpose of memory mapping.
Also, you really don't need all these abstraction layers - it's a simple dip-switch, not a space shuttle! :)
Solution:
I would strongly recommend to drop all of this in favour for a plain uint32_t. You can mask individual bits with plain integer constants:
#define DIP_TYPE (1u << 31)
#define DIP_POS (1u << 30)
...
uint32_t dipswitch = ...;
bool actuator_active = dipswitch & DIP_TYPE; // read
dipswitch |= DIP_POS; // write
This is massively portable, well-defined, standardized, MISRA-C compliant - you can even port it between different endianess architectures. It solves all of the above mentioned problems.

I would use the following approach.
typedef enum { SERIAL_TEST_MODE = 0, PARALLEL_TEST_MODE = 1 } TEST_MODE_e;
Then set the value and test the value as follows.
config.jumpers.Bits.TestMode = PARALLEL_TEST_MODE;
if (config.jumpers.Bits.TestMode & PARALLEL_TEST_MODE)
...
The value of 1 will have the least significant bit turned on and the value of 0 would have the least significant bit turned off.
And this should be portable across multiple compilers.

The exact type used to represent an enum is implementation defined. So what is most likely happening is that MSVC is using char for this particular enum which is signed. So declaring a 1-bit bitfield of this type means you get 0 and -1 for the values.
Rather that declaring the bitfield as the type of the enum, declare them as unsigned int or unsigned char so the values are properly represented.

A simple fix I came up with, which is only valid for MSVC and GCC/LLVM, is:
#ifdef _WIN32
#define JOFF 0
#define JON -1
#else
#define JOFF 0
#define JON 1
#endif
typedef enum { SERIAL = JOFF, PARALLEL = JON } TEST_MODE_e;

How can i use 6 bits to store value?

My data unit (a network packet header) i am currently working on has 2 flags in its definition, stored in a byte field and accessed via bitwise operators. Unfortunately, i need only 2 bits and thinking what i can do with other 6 bits? Can i use them to store number?
Can i use them to store some internal state code (value range smaller than char?) and do not just waste them.
Is there any data types smaller than byte and how can i use them in C++? If no, should i waste those bits and left them without meaning?

You could use a bit field, as described here.
Adapted from that page:
#include <iostream>
struct S {
// 6-bit unsigned field,
// allowed values are 0...63
unsigned int b : 6;
};
int main()
{
S s = {7};
++s.b;
std::cout << s.b << '\n'; // output: 8
}

In C++, there is no datatype smaller than a char, which is - by definition - one byte. However, you do not need a dedicated datatype to access the bits of a value. Bitwise logic and Bitwise Shift operators are sufficient.
If you cannot live with sacrificing 6 bits (this is assuming 8-bit bytes) you might want to consider the std::vector<bool> specialization. Note, though, that there are a number of restrictions and differences to a regular std::vector.
Another option to make individual (consecutive) bits of a datatype accessible by name is to use bit fields:
struct S {
unsigned int flags : 2;
unsigned int state : 6;
};
static_assert( sizeof( S ) == 1, "Packing is implementation-defined." );
This declares a structure that can hold two pieces of information: flags and state, which occupy 2 and 6 bits, respectively. Adjacent bit fields are usually packed together (although this behavior is implementation-defined).

Custom byte size?

So, you know how the primitive of type char has the size of 1 byte? How would I make a primitive with a custom size? So like instead of an in int with the size of 4 bytes I make one with size of lets say 16.
Is there a way to do this? Is there a way around it?

It depends on why you are doing this. Usually, you can't use types of less than 8 bits, because that is the addressable unit for the architecture. You can use structs, however, to define different lengths:
struct s {
unsigned int a : 4; // a is 4 bits
unsigned int b : 4; // b is 4 bits
unsigned int c : 16; // c is 16 bits
};
However, there is no guarantee that the struct will be 24 bits long. Also, this can cause endian issues. Where you can, it's best to use system independent types, such as uint16_t, etc. You can also use bitwise operators and bit shifts to twiddle things very specifically.

Normally you'd just make a struct that represents the data in which you're interested. If it's 16 bytes of data, either it's an aggregate of a number of smaller types or you're working on a processor that has a native 16-byte integral type.
If you're trying to represent extremely large numbers, you may need to find a special library that handles arbitrarily-sized numbers.

In C++11, there is an excellent solution for this: std::aligned_storage.
#include <memory>
#include <type_traits>
int main()
{
typedef typename std::aligned_storage<sizeof(int)>::type memory_type;
memory_type i;
reinterpret_cast<int&>(i) = 5;
std::cout << reinterpret_cast<int&>(i) << std::endl;
return 0;
}
It allows you to declare a block of uninitialized storage on the stack.

If you want to make a new type, typedef it. If you want it to be 16-bytes in size, typedef a struct that has 16-bytes of member data within it. Just beware that quite often compilers will pad things on you to match your systems alignment needs. A 1 byte struct rarely remains 1 bytes without care.

You could just static cast to and from std::string. I don't know enough C++ to give an example, but I think this would be pretty intuitive.

Define smallest possible datatype in c++ that can hold six values

I want to define my own datatype that can hold a single one of six possible values in order to learn more about memory management in c++. In numbers, I want to be able to hold 0 through 5. Binary, It would suffice with three bits (101=5), although some (6 and 7) wont be used. The datatype should also consume as little memory as possible.
Im not sure on how to accomplish this. First, I tried an enum with defined values for all the fields. As far as I know, the values are in hex there, so one "hexbit" should allow me to store 0 through 15. But comparing it to a char (with sizeof) it stated that its 4 times the size of a char, and a char holds 0 through 255 if Im not misstaken.
#include <iostream>
enum Foo
{
a = 0x0,
b = 0x1,
c = 0x2,
d = 0x3,
e = 0x4,
f = 0x5,
};
int main()
{
Foo myfoo = a;
char mychar = 'a';
std::cout << sizeof(myfoo); // prints 4
std::cout << sizeof(mychar); // prints 1
return 1;
}
Ive clearly misunderstood something, but fail to see what, so I turn to SO. :)
Also, when writing this post I realised that I clearly lack some parts of the vocabulary. Ive made this post a community wiki, please edit it so I can learn the correct words for everything.

A char is the smallest possible type.
If you happen to know that you need several such 3 bit values in a single place you get use a structure with bitfield syntax:
struct foo {
unsigned int val1:3;
unsigned int val2:3;
};
and hence get 2 of them within one byte. In theory you could pack 10 such fields into a 32-bit "int" value.

C++ 0x will contain Strongly typed enumerations where you can specify the underlying datatype (in your example char), but current C++ does not support this. The standard is not clear about the use of a char here (the examples are with int, short and long), but they mention the underlying integral type and that would include char as well.
As of today Neil Butterworth's answer to create a class for your problem seems the most elegant, as you can even extend it to contain a nested enumeration if you want symbolical names for the values.

C++ does not express units of memory smaller than bytes. If you're producing them one at a time, That's the best you can do. Your own example works well. If you need to get just a few, You can use bit-fields as Alnitak suggests. If you're planning on allocating them one at a time, then you're even worse off. Most archetectures allocate page-size units, 16 bytes being common.
Another choice might be to wrap std::bitset to do your bidding. This will waste very little space, if you need many such values, only about 1 bit for every 8.
If you think about your problem as a number, expressed in base-6, and convert that number to base two, possibly using an Unlimited precision integer (for example GMP), you won't waste any bits at all.
This assumes, of course, that you're values have a uniform, random distribution. If they follow a different distribution, You're best bet will be general compression of the first example, with something like gzip.

You can store values smaller than 8 or 32 bits. You just need to pack them into a struct (or class) and use bit fields.
For example:
struct example
{
unsigned int a : 3; //<Three bits, can be 0 through 7.
bool b : 1; //<One bit, the stores 0 or 1.
unsigned int c : 10; //<Ten bits, can be 0 through 1023.
unsigned int d : 19; //<19 bits, can be 0 through 524287.
}
In most cases, your compiler will round up the total size of your structure to 32 bits on a 32 bit platform. The other problem is, like you pointed out, that your values may not have a power of two range. This will make for wasted space. If you read the entire struct as one number, you will find values that will be impossible to set, if your input ranges aren't all powers of 2.
Another feature you may find interesting is a union. They work like a struct, but share memory. So if you write to one field it overwrites the others.
Now, if you are really tight for space, and you want to push each bit to the maximum, there is a simple encoding method. Let's say you want to store 3 numbers, each can be from 0 to 5. Bit fields are wasteful, because if you use 3 bits each, you'll waste some values (i.e. you could never set 6 or 7, even though you have room to store them). So, lets do an example:
//Here are three example values, each can be from 0 to 5:
const int one = 3, two = 4, three = 5;
To pack them together most efficiently, we should think in base 6 (since each value is from 0-5). So packed into the smallest possible space is:
//This packs all the values into one int, from 0 - 215.
//pack could be any value from 0 - 215. There are no 'wasted' numbers.
int pack = one + (6 * two) + (6 * 6 * three);
See how it looks like we're encoding in base six? Each number is multiplied by it's place like 6^n, where n is the place (starting at 0).
Then to decode:
const int one = pack % 6;
pack /= 6;
const int two = pack % 6;
pack /= 6;
const int three = pack;
Theses schemes are extremely handy when you have to encode some fields in a bar code or in an alpha numeric sequence for human typing. Just saying those few partial bits can make a huge difference. Also, the fields don't all have to have the same range. If one field is from 0 through 7, you'd use 8 instead of 6 in the proper place. There is no requirement that all fields have the same range.

Minimal size what you can use - 1 byte.
But if you use group of enum values ( writing in file or storing in container, ..), you can pack this group - 3 bits per value.

You don't have to enumerate the values of the enum:
enum Foo
{
a,
b,
c,
d,
e,
f,
};
Foo myfoo = a;
Here Foo is an alias of int, which on your machine takes 4 bytes.
The smallest type is char, which is defined as the smallest addressable data on the target machine. The CHAR_BIT macro yields the number of bits in a char and is defined in limits.h.
[Edit]
Note that generally speaking you shouldn't ask yourself such questions. Always use [unsigned] int if it's sufficient, except when you allocate quite a lot of memory (e.g. int[100*1024] vs char[100*1024], but consider using std::vector instead).

The size of an enumeration is defined to be the same of an int. But depending on your compiler, you may have the option of creating a smaller enum. For example, in GCC, you may declare:
enum Foo {
a, b, c, d, e, f
}
__attribute__((__packed__));
Now, sizeof(Foo) == 1.

The best solution is to create your own type implemented using a char. This should have sizeof(MyType) == 1, though this is not guaranteed.
#include <iostream>
using namespace std;
class MyType {
public:
MyType( int a ) : val( a ) {
if ( val < 0 || val > 6 ) {
throw( "bad value" );
}
}
int Value() const {
return val;
}
private:
char val;
};
int main() {
MyType v( 2 );
cout << sizeof(v) << endl;
cout << v.Value() << endl;
}

It is likely that packing oddly sized values into bitfields will incur a sizable performance penalty due to the architecture not supporting bit-level operations (thus requiring several processor instructions per operation). Before you implement such a type, ask yourself if it is really necessary to use as little space as possible, or if you are committing the cardinal sin of programming that is premature optimization. At most, I would encapsulate the value in a class whose backing store can be changed transparently if you really do need to squeeze every last byte for some reason.

You can use an unsigned char. Probably typedef it into an BYTE. It will occupy only one byte.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js