Creating a bit sized variable [duplicate] - c++

This question already has answers here:
Using Bit Fields to save memory
(2 answers)
When is it worthwhile to use bit fields?
(11 answers)
How slow are bit fields in C++
(3 answers)
C/C++: Force Bit Field Order and Alignment
(7 answers)
Declare a bit in C++
(5 answers)
Closed 2 years ago.
I wanted to create a bit sized variable to save the only possible values 2^1 = 0 | 1
My initial approach was to create a class with variables of type int for storing the value 0 | 1
But then i found that i could also use a bitfield and also create my own struct with custom bits for each type. My question is that does using a struct with bits set to 1 provide better memory performance and hence faster implementation than the class approach for a array of struct like ( 4000 x 4000 )
The code :
#include<iostream>
using namespace std;
struct maze
{
unsigned int top : 1;
unsigned int right : 1;
unsigned int bottom : 1;
unsigned int left : 1;
};
int main()
{
maze access;
cout<<sizeof(access);
access.top=1;
access.right=1;
access.bottom=1;
access.left=1;
cout<<endl<<sizeof(access);
return 0;
}
Edit:
I think i have found the answer : https://stackoverflow.com/a/46544000/13868755

This is really hard to speculate about the performance for this case, and there is a fair chance you'll make it work slower.
Unless this is a proven bottleneck, you should focus on writing a readable and testable code. Does the struct storing the four values make the code cleaner? If so, use the struct. Are the values actually boolean, i.e. true or false only? Use bool to make it clear to the reader.
This will be likely just as performant as your current implementation, and take as much space.
For gauging the performance you need a working code to benchmark on, as for the different applications the performance implications will differ, and guessing it beforehand, though a funny thought experiment, isn't really useful in practice.
An exception is when you know you have a lot of data (like, gigabytes) and your are memory constrained. In that case you should indeed focus on the memory over both code readability and CPU usage. In that case going straight to std::bitset would look like a promising option, with good memory usage guarantees and proven correctness.
If memory is only a secondary concern, simply using packed structs/arrays (look for compiler options for that) should be sufficient and way simpler and cleaner to write.

Related

Can I use data types like bool to compress data while improving readability?

My official question will be: "Is there a clean way to use data types to "encode and compress" data rather than using messy bit masking." The hopes would be to save space in the case of compressing, and I would like to use native data types, structures, and arrays in order to improve readability over bit masking. I am proficient in bit masking from my assembly background but I am learning C++ and OOP. We can store so much information in a 32 bit register by using individual bits and I feel that I am trying to get back to that low level environment while having the readability of C++ code.
I am attempting to save some space because I am working with huge resource requirements. I am still learning more about how c++ treats the bool data type. I realize that memory is stored in byte chunks and not individual bits. I believe that a bool usually uses one byte and is masked somehow. In my head I could use 8 bool values in one byte.
If I malloc in C++ an array of 2 bool elements. Does it allocate two bytes or just one?
Example: We will use DNA as an example since it can be encoded into two bit to represent A,C,G and T. If I make a struct with an array of two bool called DNA_Base, then I make an array of 20 of those.
struct DNA_Base{ bool Bit_1; bool Bit_2; };
DNA_Base DNA_Sequence[7] = {false};
cout << sizeof(DNA_Base)<<sizeof(DNA_Sequence)<<endl;
//Yields a 2 and a 14.
//I would like this to say 1 and 2.
In my example I would also show the case where the DNA sequence can be 20 bases long which would require 40 bits to encode. GATTACA could only take up a maximum of 2 bytes? I suppose an alternative question would have been "How to make C++ do the bit masking for me in a more readable way" or should I just make my own data type and classes and implement the bit masking using classes and operator overloading.
Not fully what you want but you can use bitfield:
struct DNA_Base
{
unsigned char Bit_1 : 1;
unsigned char Bit_2 : 1;
};
DNA_Base DNA_Sequence[7];
So sizeof(DNA_Base) == 1 and sizeof(DNA_Sequence) == 7
So you have to pack the DNA_Base to avoid to lose place with padding, something like:
struct DNA_Base_4
{
unsigned char base1 : 2; // may have value 0 1 2 or 3
unsigned char base2 : 2;
unsigned char base3 : 2;
unsigned char base4 : 2;
};
So sizeof(DNA_Base_4) == 1
std::bitset is an other alternative, but you have to do the interpretation job yourself.
An array of bools will be N-elements x sizeof(bool).
If your goal is to save space in registers, don't bother, because it is actually more efficient to use a word size for the processor in question than to use a single byte, and the compiler will prefer to use a word anyway, so in a struct/class the bool will usually be expanded to a 32-bit or 64-bit native word.
Now, if you like to save room on disk, or in RAM, due to needing to store LOTS of bools, go ahead, but it isn't going to save room in all cases unless you actually pack the structure, and on some architectures packing can also have performance impact because the CPU will have to perform unaligned or byte-by-byte access.
A bitmask (or bitfield), on the other hand, is performant and efficient and as dense as possible, and uses a single bitwise operation. I would look at one of the abstract data types that provide bit fields.
The standard library has bitset http://www.cplusplus.com/reference/bitset/bitset/ which can be as long as you want.
Boost also has something I'm sure.
Unless you are on a 4 bit machine, the final result will be using bit arithmetic. Whether you do it explicitly, have the compiler do it via bit fields, or use a bit container, there will be bit manipulation.
I suggest the following:
Use existing compression libraries.
Use the method that is most readable or understood by people other
than yourself.
Use the method that is most productive (talking about development
time).
Use the method that you will inject the least amount of defects.
Edit 1:
Write each method up as a separate function.
Tell the compiler to generate the assembly language for each function.
Compare the assembly language of each function to each other.
My belief is that they will be very similar, enough that wasting time discussing them is not worthwhile.
You can't operate on bits directly, but you can treat the smallest unit available to you as a multiple data store, and define
enum class DNAx4 : uint8_t {
AAAA = 0x00, AAAC = 0x01, AAAG = 0x02, AAAT = 0x03,
// .... And the rest of them
AAAA = 0xFC, AAAC = 0xFD, AAAG = 0xFE, AAAT = 0xFF
}
I'd actually go further, and create a structure DNAx16 or DNAx32 to efficiently use the native word size on your machine.
You can then define functions on the data type, which will have to use the underlying bit representation, but at least it allows you to encapsulate this and build higher level operations from these primitives.

Do data type ranges matter as a memory-saving measure anymore?

I was always taught to use the appropriate data type depending on the specific needs of the class/method/function/member/variable/what-have-you. That said, does it even matter anymore?
Hypothetically, if I have a class that has a data member that will never be negative and will never be more than the maximum value of unsigned char, does storing it as an unsigned char (1 byte) versus an int (4 bytes) even matter anymore due to implicit type promotion/demotion, internal representation, register size and the often quoted "CPUs are more efficient when working with int"?
Example:
class Foo {
public:
Foo() : _status(0) { /* DO NOTHING */ }
void AddTo(unsigned char value) {
if(std::numeric_limits<unsigned char>::max() - _stat < value) {
value = std::numeric_limits<unsigned char>::max() - _status;
}
_status += value;
}
void Increment() {
if(_status == std::numeric_limits<unsigned char>::max()) return;
++_status;
}
private:
unsigned char _status;
};
A main effect of generally using "right-sized" types is that you and others waste a lot of time on it.
If you have a zillion values stored, e.g. a very large picture, or if you absolutely need a 64-bit range, say, then sure, in such cases it makes sense to right-size.
But using right-sizing as a general guideline produces no significant gain and much pain.
Authority argument: Bjarne Stroustrup, who created the language, generally just uses a few types, e.g. int for integers.
"Premature optimization is the root of all evil" Donald Knuth.
Is this one data member's size going to significantly impact the size of the class? Are you serializing the class? Is the serialization representation seeing any reduction? Are you making the code harder to read worrying about this when your boss doesn't care?
Y2K, IPv4 32bit addresses, ASCII, yes the future will look back at your code and laugh. Remember moores law, write something that works, and expect that something will be wrong. Until it is you'll never know what. Write testable, maintainable, and refactorable code and it might just stay in production long enough for someone to care.
For most use cases when targeting PCs and servers, you're not going to need to worry about using chars vs using ints to hold numeric values. Just use an int or, if you need a larger range, a long.
However, if you're targeting a platform with 16 bytes of RAM which has less than 1 KB to store your program, you may need to carefully consider whether that loop counter really has to take up more than 1 byte.
Unless there's a particular reason for choosing some other variable type, just stick with int. A large part of modern programming is managing complexity and there's no reason to start sprinkling your code with a whole variety of types if it doesn't actually help anything. Sure, if you have 5,000 copies of a particular class or working on a system with a tightly constrained memory footprint then it might be important. But on a multigigabyte system this isn't generally going to be a concern. In that case it's more about writing something understandable and maintainable.
You are hitting one of the problems of C-style languages. They deprive the ability to do range checking that you can do in other languages. If your value should be within a specific range, the ability to say a type can be, say, 1..64 is a big help for error tracking. I have found so many bugs in C/C++ code by converting it to pascal or ada.
I like to use typedefs for documentation purposes in the situation you describe--
COLORCOMPONENT
DEGREES
RADIANS
--for documentation purposes. Even if the compiler does not do the checking for me, I can usually spot when I am using degrees when I should be using radians.

What numeric type to choose for small loop counters? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
Consider that int take 4 bytes in memory.
to understand what I'm looking for take this example :
for(x=0;x<10;x++)
//do something
in this for instruction I know that the value of x is less than 11,
I have seen lot of code and most people declare x like an int,
why we shouldn't or why most people doesn't declare x like a short or even like a char !!
I thought in the reason and I found this explanation, for example :
short s=5;
s take 2 bytes in memory, and what I know is that the compiler consider 5 like an int
so to put 5 to s, 5 should be converted to short right !!
-> so this instruction take less memory but more work
int i=5;
here i take 4 bytes but with no need for conversation (5 is an int)
-> so this instruction do less work but take more memory
is the reason something like what I thought !!
I hope that my question was clear
If you have to store millions of numbers in memory and each number could be between 0 and 11, then you'd be concerned with memory. In a loop, the variable is most likely stored in a CPU register which means it is, for example, 32-bit on x86, or 32 to 64 bits on x86_64, etc. All “smaller” integers would be zero-extended to 32 or 64 bit anyway.
int is simple and readable, so lots of people use it. But if you must worry about performance or hint the compiler about size constraints then use "(u)int_fast_*" types (i.e. uint_fast8_t.
You are thinking too much about the surface look of things. Reality differs from that.
For example, you are worried about the memory a loop variable takes. However, in many loops the loop variable won't ever be stored in memory. Instead it will be held in a register. The number of registers in the CPU is limited, but variables cannot take up half a register (usually - x86 is a bit strange there), so whether you use int, short or char, you probably lose a full register anyway. So you don't save anything by making the variable smaller.
Similar is your assumption that assigning an integer literal to a short takes more work than to an int. Here the problem is the assumption that the compiler would generate code that does some kind of conversion at runtime, when it is far simpler to just generate code that does the simple thing (just store the literal to the memory location) in the first place.
The best reason of all - readability. If I saw a loop iterating over a short or a char, I'd spend a bit of time to figure out why. An int is more intuitive because it's the most used type for iteration (even more used that iterator or size_t).
It sometimes makes sense to micro-optimize memory usage by choosing arithmetic types smaller than int. But that comes at a cost, because, formally, the value is promoted to int to do arithmetic on it, then converted back to the smaller type. int is the natural size for the target platform, so it's almost always better to just use it, especially since every future maintainer will have to figure out why someone wrote such unnatural code.
"Plain ints have the natural size suggested by the architecture of the execution environment"1, which means that in a typical case, a plain int is the type that imposes the least work on the processor to manipulate and work with. In short, int is the default type you normally want to use when you don't have a fairly good reason to use something else.
Also note that the reduction in memory usage from using a short or a char in the cited case may well be entirely illusory. In a typical case, we can expect a loop index variable to be allocated in a register anyway, so regardless of how few bits we care about, the variable occupies essentially the entire register in any case. If it's not in a register, it'll typically be on the stack, and on most cases the sizes of items on the stack are also fixed (32 bits on a 32-bit architecture, 64 bits on a 64-bit architecture, etc.) so allocating a single char (for example) can/will frequently end up using just as much memory as an int would have anyway.
§3.9.1/2 in n1337, but all the standards for C and C++ going all the way back to the original ANSI C89 standard have had virtually identical phrasing, though the section numbers have changed.
No, not completely. If you have a semi reasonable compiler, it will deal with this correctly. Use a short if you're that concerned about memory usage, and leave it at that.
int is probably better though, as it is usually the size of the register. This can be more efficient on some CPU architectures.
See this question too.
You may come across code when you need to loop for larger no. of times than what can a char or even a short may hold. This may be the situation when you are coding for say Database like systems where the looping is though over huge numbers. To be on the safer side its good to take a loop variable as 'int'. One more thing, if you are using such a loop for something like a Delay_ms() function, and if you try to restrict the loop variable to char, you will have to have nested loops, or call the same Delay_ms() function again and again.

Are boolean variables typically implemented as single bits? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
One-byte bool. Why?
I want to add a boolean variable to a class. However, this class is pretty size-sensitive, and as a result I'm loath to add another field. However, it is composed of a pile of members that are at least a char wide, and a single other bool.
If I were hand-writing this code, I would implement those boolean fields as bits in the last byte or so of the object. Since accesses have to be byte-aligned, this would cause no spacial overhead.
Now, do compilers typically do this trick? The only reason I can of for them not to is because it would involve an additional mask to get that bit out of there.
No, compilers can't do this trick because the address of each member has to be distinct. If you want to pack a fixed number of bits, use std::bitset. If you need a variable number of bits use boost::dynamic_bitset.
No, I don't know of any compilers which optimize a bool down to a bit.
You can force this behavior via:
unsigned int m_firstBit : 1;
unsigned int m_secondBit : 1;
unsigned int m_thirdBit : 1;
As for reasons why not, it would likely violate some language guarantees. For instance, you couldn't pass &myBool to a function which takes a bool* if it doesn't have its own reserved byte.
Compilers typically do not do that, but you could use std::bitset<2> to pack two bools into one byte.

Library, which helps packing structures with good performance

Days ago I heard (maybe I've even seen it!) about library, that helps with packing structures. Unfortunately - I can't recall it's name.
In my program I have to keep large amount of data, therefore I need to pack them and avoid loosing bits on gaps. For example: I have to keep many numbers from range 1...5. If I would keep them in char - it would take 8bits, but this number can be kept on 3 bits. Moreover - if I would keep this numbers in packs of 8bits with maximum number 256 I could pack there 51 numbers (instead of 1 or 2!).
Is there any librarary, which helps this actions, or do I have do this on my own?
As Tomalak Garet'kal already mentioned, this is a feature of ANSI C, called bit-fields. The wikipedia article is quite useful. Typically you declare them as structs.
For your example: as you mentioned you have one number in the range of 0..5 you can use 3 bits on this number, which leaves you 5 bits of use:
struct s
{
unsigned int mynumber : 3;
unsigned int myother : 5;
}
These can now be accesses simply like this:
struct s myinstance;
myinstance.mynumber = 3;
myinstance.myother = 1;
Be awared that bit fields are slower than usual struct members/variables, since the runtime has to perform bit-shifting/masking to allow access to simple bits.