I have been researching some code in c++ in unreal engine and I came across that they have on a header file 4 booleans declared as:
bool bIsEvaluating : 1;
bool bIsStopping : 1;
bool bIsBeginningPlay : 1;
bool bCompleteOnPostEvaluation : 1;
and one boolean without bit-field:
bool bIsPlayingForward;
The 4 booleans with bit-field are set on constructor and changed or used on some functions. The boolean without the bitfield is set and used on some functions. My question is when would someone use bit-fields on booleans and why not use them always on booleans as they take only true or false values?
I use bit-fields for boolean values in Unreal Engine to save memory and improve performance. Using a bit-field can save memory if I have a large number of boolean variables that don't need a full byte each. Accessing a bit-field may also be faster than accessing a full byte, as it requires fewer memory accesses. This can be particularly useful in cases where I need to access the boolean value frequently. Additionally, using bit-fields can make it clearer that a particular boolean value is meant to be used in a particular way, which can improve the readability of my code. It's important to note, however, that bit-fields have some limitations and trade-offs, and I use them sparingly and only when the benefits outweigh the potential downsides.
In Unreal Engine, you should use bit-fields for boolean values when you have a specific reason to do so, such as to save memory or improve performance. If you don't have a specific reason to use a bit-field, it's generally better to use a regular boolean variable. You should also be mindful of the limitations and trade-offs of using bit-fields, and use them sparingly, only when the benefits outweigh the potential downsides.
Related
I'm modernizing the code that reads and writes to a custom binary file format now.
I'm allowed to use C++17 and have already modernized large parts of the code base.
There are mainly two problems at hand.
binary selectors (my own name)
cased selectors (my own name as well)
For #1 it is as follows:
Given that 1 bit is set in a binary string. You either (read/write) two completely different structs.
For example, if bit 17 is set to true, it means bits 18+ should be streamed with Struct1.
But if bit 17 is false, bits 18+ should be streamed with Struct2.
Struct1 and Struct2 are completely different with minimal overlap.
For #2 it is basically the same but as follows:
Given that x bits in the bit stream are set. You have a potential pool of completely different structs. The number of structs is allowed to be between [0, 2**x] (inclusive range).
For instance, in one case you might have 3 bits and 5 structs.
But in another case, you might have 3 bits and 8 structs.
Again the overlap between the structs is minimal.
I'm currently using std::variant for this.
For case #1, it would be just two structs std::variant<Struct1, Struct2>
For case #2, it would be just a flat list of the structs again using std::variant.
The selector I use is naturally the index in the variant, but it needs to be remapped for a different bit pattern that actually needs to be written/read to/from the format.
Have any of you used or encountered some better strategies for these cases?
Is there a generally known approach to solve this in a much better way?
Is there a generally known approach to solve this in a much better way?
Nope, it's highly specific.
Have any of you used or encountered some better strategies for these cases?
The bit patterns should be encoded in the type, somehow. Almost all the (de)serialization can be generic so long as the required information is stored somewhere.
For example,
template <uint8_t BitPattern, typename T>
struct IdentifiedVariant;
// ...
using Field1 = std::variant< IdentifiedVariant<0x01, Field1a>,
IdentifiedVariant<0x02, Field1b> >;
I've absolutely used types like this in the past to automate all the boilerplate, but the details are extremely specific to the format and rarely reusable.
Note that even though you can't overlay your variant type on a buffer, there's no need for (de)serialization to proceed bit-by-bit. There's hardly any speed penalty so long as the data is already read into a buffer - if you really need to go full zero-copy, you can just have your FieldNx types keep a pointer into the buffer and decode fields on demand.
If you want your data to be bit-continuous you can't use std::variant You will need to use std::bitset or managing the memory completely manually to do that. But it isn't practical to do so because your structs will not be byte-aligned so you will need to do every read/write manually bit by bit. This will reduce speed greatly, so I only recommend this way if you want to save every bit of memory even at the cost of speed. And at this storage it will be hard to find the nth element you will need to iterate.
std::variant<T1,T2> will waste a bit of space because 1) it will always use enough space for storing the the bigger data, but using that over bit-manipulation will increase the speed and the readability of the code. (And will be easier to write)
I have heard quite a lot about storing external data in pointer.
For example in (short string optimization).
For example:
when we want to overload << for our SSO class, dependant of the length of the string we want to print either value of pointer or string.
Instead of creating bool flag we could encode this flag inside pointer itself. If i am not mistaken its thanks PC architecture that adds padding to prevent unalligned memory access.
But i have yet to see it in example. How could we detect such flag, when binary operation such as & to check if RSB or LSB is set to 1 ( as a flag ) are not allowed on pointers? Also wouldnt this mess up dereferencing pointers?
All answers are appreciated.
It is quite possible to do such things (unlike other's have said). Most modern architectures (x86-64, for example) enforce alignment requirements that allow you to use the fact that the least significant bits of a pointer may be assumed to be zero, and make use of that storage for other purposes.
Let me pause for a second and say that what I'm about to describe is considered 'undefined behavior' by the C & C++ standard. You are going off-the-rails in a non-portable way by doing what I describe, but there are more standards governing the rules of a computer than the C++ standard (such as the processors assembly reference and architecture docs). Caveat emptor.
With the assumption that we're working on x86_64, let us say that you have a class/structure that starts with a pointer member:
struct foo {
bar * ptr;
/* other stuff */
};
By the x86 architectural constraints, that pointer in foo must be aligned on an 8-byte boundary. In this trivial example, you can assume that every pointer to a struct foo is therefore an address divisible by 8, meaning the lowest 3 bits of a foo * will be zero.
In order to take advantage of such a constraint, you must play some casting games to allow the pointer to be treated as a different type. There's a bunch of different ways of performing the casting, ranging from the old C method (not recommended) of casting it to and from a uintptr_t to cleaner methods of wrapping the pointer in a union. In order to access either the pointer or ancillary data, you need to logically 'and' the datum with a bitmask that zeros out the part of the datum you don't wish.
As an example of this explanation, I wrote an AVL tree a few years ago that sinks the balance book-keeping data into a pointer, and you can take a look at that example here: https://github.com/jschmerge/structures/blob/master/tree/avl_tree.h#L31 (everything you need to see is contained in the struct avl_tree_node at the line I referenced).
Swinging back to a topic you mentioned in your initial question... Short string optimization isn't implemented quite the same way. The implementations of it in Clang and GCC's standard libraries differ somewhat, but both boil down to using a union to overload a block of storage with either a pointer or an array of bytes, and play some clever tricks with the string's internal length field for differentiating whether the data is a pointer or local array. For more of the details, this blog post is rather good at explaining: https://shaharmike.com/cpp/std-string/
"encode this flag inside pointer itself"
No, you are not allowed to do this in either C or C++.
The behaviour on setting (let alone dereferencing) a pointer to memory you don't own is undefined in either language.
Sadly what you want to achieve is to be done at the assembler level, where the distinction between a pointer and integer is sufficiently blurred.
I have to compare two larger objects for equality.
Properties of the objects:
Contain all their members by value (so no pointers to follow).
They also contain some stl::array.
They contain some other objects for which 1 and 2 hold.
Size is up to several kB.
Some of the members will more likely differ than others, therefore lead to a quicker break of the comparison operation if compared first.
The objects do not change. Basically, the algorithm is just to count how many objects are the same. Each object is only compared once against several "master" objects.
What is the best way to compare these objects? I see three options:
Just use plain, non-overloaded operator==.
Overload == and perform a member-by-member comparison, beginning with members likely to differ.
Overload == and view the object as a plain byte field and compare word by word.
Some thoughts:
Option 1 seems good because it means the least amount of work (and opportunities to introduce errors).
Option 2 seems good, because I can exploit the heuristic about which elements differ most likely. But maybe it's still slower because built-in ==of option 1 is ridiculously fast.
Option 3 seems to be most "low-level" optimized, but that's what the compiler probably also does for option 1.
So the questions are:
Is there a well-known best way to solve the task?
Is one of the options an absolute no-go?
Do I have to consider something else?
Default == is fast for small objects, but if you have big data members to compare, try to find some optimizations thinking about the specific data stored and the way they are update, redefining an overloaded == comparison operator more "smarter" than the default one.
As many already said, option 3 is wrong, due to the fact that fields generally are padded to respect the data-alignment, and for optimization reason the bytes added are not initialized to 0 (maybe this is done in the DEBUG version).
I can suggest you to explore the option to divide the check in two stages:
first stage, create some sort of small & fast member that "compress" the status of the instance (think like an hash); this field could be updated every time some big field changes, for example the elements of the stl-array. Than check frequently changed fields and this "compress" status first to make a conservative comparison (for example, the sum of all int in the array, or maybe the xor)
second stage, use an in-deep test for every members. This is the slowest but complete check, but likely will be activated only sometimes
A good question.
If you have some heuristic about which members are likely to differ - use it. So that overloading operator == and checking suspected members first seems to be a good idea.
About byte-wise comparison (aka memcmp and friends) - may be problematic due to struct member alignment. I.e. the compiler sometimes puts "empty spaces" in your struct layout, so that each member will have required alignment. Those are not initialized and usually contain garbage.
This may be solved by explicit zero-initializing of your whole object. But, I don't see any advantage of memcmp vs automatic operator ==, which is a members-wise comparison. It probably may save some code size (a single call to memcpy vs explicit reads and comparisons), however from the performance perspective this seems to be pretty much the same.
Suppose I have an array defined as follows:
volatile char v[2];
And I have two threads (denoted by A, B respectively) manipulating array v. If I ensure that A, B use different indices at any time, that is to say, if A is now manipulating v[i], then B is either doing nothing, or manipulating v[1-i]. I wonder is synchronization needed for this situation?
I have referred to this question, however I think it is limited in Java. The reason why I ask this question is that I have been struggling with a strange and rare bug in a large project for days, and up to now, the only reason I could come up with to explain the bug is that synchronization is needed for the above manipulation. (Since the bug is very rare, it is hard for me to prove whether my conjecture is true)
Edit: both reading and modifying are possible for v.
As far as the C++11 and C11 standards are concerned, your code is safe. C++11 §1.7 [intro.memory]/p2, irrelevant note omitted:
A memory location is either an object of scalar type or a maximal
sequence of adjacent bit-fields all having non-zero width. Two or more
threads of execution (1.10) can update and access separate memory
locations without interfering with each other.
char is a integral type, which means it's an arithmetic type, which means that volatile char is a scalar type, so v[0] and v[1] are separate memory locations.
C11 has a similar definition in §3.14.
Before C++11 and C11, the language itself has no concept of threads, so you are left to the mercy of the particular implementation you are using.
It might be a compiler bug or a hardware limitation.
Sometimes, when a less than 32-bit/64-bit variable is accesses from memory, the processor will read 32 bits, set the apprpriate 8 or 16 bits, then write back the whole register. That means it will read/write the adjacent memory as well, leading to a data race.
Solutions are
use byte-access instructions. They may not be available for your processor or your compiler does not know to use them.
pad your elements to avoid this kind of sharing. The compiler should do it automatically if your target platform does not support byte access. But in an array, this conflicts with the memory layout reqiurements.
synchronize the whole structure
C++03/C++11 debate
In classic C++ it's up to you to avoid/mitigate this kind of behaviour. In C++11 this violates memry model requitements, as stated in other answers.
You need to handle synchronization only if you are accessing the same memory and modifying it. If you are only reading then also you don't need to take care about the synchronization.
As you are saying Each thread will access different indices then you don't require synchronization here. but you need to make sure that the two thread should not modify the same indice at the same time.
I am porting application from fortran to JAVA.I was wondering how to convert if equivalence is between two different datatypes.
If I type cast,i may loose the data or should I pass that as byte array?
You have to fully understand the old FORTRAN code. EQUIVALENCE shares memory WITHOUT converting the values between different datatypes. Perhaps the programmer was conserving memory by overlapping arrays that weren't used at the same time and the EQUIVALENCE can be ignored. Perhaps they were doing something very tricky, based on the binary representation of a particular platform, and you will need to figure out what they were doing.
There is extremely little reason to use EQUIVALENCE in modern Fortran. In most cases where bits need to be transferred from one type to another without conversion, the TRANSFER intrinsic function should be used instead.
From http://www.fortran.com/F77_std/rjcnf0001-sh-8.html#sh-8.2 :
An EQUIVALENCE statement is used to specify the sharing of storage units by two or more entities in a program unit. This causes association of the entities that share the storage units.
If the equivalenced entities are of different data types, the EQUIVALENCE statement does not cause type conversion or imply mathematical equivalence. If a variable and an array are equivalenced, the variable does not have array properties and the array does not have the properties of a variable.
So, consider the reason it was EQUIVALENCE'd in the Fortran code and decide from there how to proceed. There's not enough information in your question to assess the intention or best way to convert it.