Efficiently handling a large set of bit-mask flags

Efficiently handling a large set of bit-mask flags - c++

My function takes a filesystem path and a bit-mask describing the desired check to be performed on the FS object.
Does it Exist ?
Is it : File, Folder, SymLink ? Empty, Hidden ? Readable (R), Writable ?
Symlink : to a File, to a Folder ?
These 11 flags afford for a large number of combinations so I provide convenience flags like SymLinkToAReadableFile and such. This is not only convenient, it also spares some potential confusion : SymLinkToFile | R could mean a readable symlink (not very useful), the name "SymLinkToAReadableFile" leaves no doubt.
And to cut back on the required processing, I would define RWHiddenFolder = Folder | R | W | Hidden, and same for all things Folder related : Folder | .. , and likewise for File & SymLink.
This makes checking that user in not mixing File and Folder (which does not make sense) easy : (flag & Folder) && (flag & File) -> error.
But this makes RWHiddenFolder | RWEmptyFile same as RWHiddenFile | RWEmptyFolder same as File | Folder | R | W | Empty | Hidden :
can't tell what the user actually passed !
and thus can't build a precise error message
if I make my function tolerant and ignore SymLink in presence of Folder, can't tell if user wants Empty or Hidden (or both) Folder !
The naive "solution" of affecting a bit to each possible combination obviously solves all of these issues, but builds a dense forest of if/else branches already for just making sure the provided flag combination makes sense, let alone doing the actual checks after that..
Is there a good trade-off between complex processing and convenient/clear interface (for the user) ?

I came up with a reasonable solution (I had to !).
Sharing here for anyone with the same issue.
For those who prefer action to lectures (like me) : sample (ignore the numerous constants definitions)
Obviously an enum can't do this, nor simple bitfields/sets : composing with OR loses the identity of the flag (ReadableFile == Readable | File).
But binary operations make for very effiecent (read "compact") manipulations (check, extract flag, ..)
So the idea is to carry mask and separate id :
constexpr static auto nFlags = 96u; // total amount of flags
struct argFlag {
string n; // reflection
bitset<nFlags> id;
bitset<16> mask; // plain bitmask
// id == position of true bit
argFlag(const string& s, const int i, const uint32_t m = 0 )
: n(s), mask(m) { id.set(i); }
// Cumulate id & mask bits, preserve reflection
argFlag operator| (const argFlag& b)
{ argFlag r{ (n.empty()? ""s : n+" | ") + b.n, 0 };
r.id = id | b.id; r.mask = mask | b.mask; return /* move */r; }
// operator<< (has atom Id ? Meaning flag = .. | Atom | .. )
bool operator<< (const argFlag& b)
{ .. }
}
Now you can make the list of base flags provided to the user, these are "atomic" flags : each one has 1 single true bit in id.
Now when user passes flag Symlink2Folder | HiddenFile :
you can decompose it to its building atoms (2 here) using id
you can't confuse this with Symlink2HiddenFolder | File (same mask, but different id bits) : exact error messages are achievable.
the mask member has all the right bits set (Symlink | Folder | Hidden | File). For cheap logical checks.
Reach for whatever member you need to achieve an operation on a flag (check id, check logical composition).
Using a couple of lambdas & macros, validateFlags() (example) can be very readably written in under 100 lines of code to handle gazillion unfathomable combinations (4 atoms make for 96x95x94x93 ~ 80 millions combinations).
You can try to get past validateFlags() in main() using whatever unlikely combination you can come up with ! Good luck ;-)

Related

What does double ampersand do in bpf filter?

I'm reading bpf(berkeley packet filter) core part in linux kernel, and I'm a little bit confused in the following.
This is the part of the code:
static unsigned int ___bpf_prog_run(u64 *regs, const struct bpf_insn *insn,
u64 *stack)
{
u64 tmp;
static const void *jumptable[256] = {
[0 ... 255] = &&default_label,
/* Now overwrite non-defaults ... */
/* 32 bit ALU operations */
[BPF_ALU | BPF_ADD | BPF_X] = &&ALU_ADD_X,
[BPF_ALU | BPF_ADD | BPF_K] = &&ALU_ADD_K,
[BPF_ALU | BPF_SUB | BPF_X] = &&ALU_SUB_X,
[BPF_ALU | BPF_SUB | BPF_K] = &&ALU_SUB_K,
So, what I am wondering is a role of the double ampersand. I already know about rvalue reference in C++, but it is C, not C++, isn't it?
I am so appreciate the help!

Even if this were C++, &&ALU_ADD_X and so on are expressions, not types, so the && couldn't indicate an rvalue reference.
If you scroll down a bit, you will find that all the ALU_* things, and default_label, are labels.
You will also find a goto *jumptable[op];, where op is a number.
GCC has an extension where you can take the "address" of a label as a value and use it as the target for goto.
&& is the operator that produces such a value.
A shorter example:
void example()
{
void* where = test_stuff() ? &&here : &&there;
goto *where;
here:
do_something();
return;
there:
do_something_else();
}
There's more information in the documentation (which is pretty much impossible to find unless you know what you're looking for).

see this document.
http://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html
It likes gcc non-standard syntax.

How do I compare values for equality by Type Constructor?

Background
I'm a relative newcomer to Reason, and have been pleasantly suprised by how easy it is to compare variants that take parameters:
type t = Header | Int(int) | String(string) | Ints(list(int)) | Strings(list(string)) | Footer;
Comparing different variants is nice and predictable:
/* not equal */
Header == Footer
Int(1) == Footer
Int(1) == Int(2)
/* equal */
Int(1) == Int(1)
This even works for complex types:
/* equal */
Strings(["Hello", "World"]) == Strings(["Hello", "World"])
/* not equal */
Strings(["Hello", "World"]) == Strings(["a", "b"])
Question
Is it possible to compare the Type Constructor only, either through an existing built-in operator/function I've not been able to find, or some other language construct?
let a = String("a");
let b = String("b");
/* not equal */
a == b
/* for sake of argument, I want to consider all `String(_)` equal, but how? */

It is possible by inspecting the internal representation of the values, but I wouldn't recommend doing so as it's rather fragile and I'm not sure what guarantees are made across compiler versions and various back-ends for internals such as these. Instead I'd suggest either writing hand-built functions, or using some ppx to generate the same kind of code you'd write by hand.
But that's no fun, so all that being said, this should do what you want, using the scarcely documented Obj module:
let equal_tag = (a: 'a, b: 'a) => {
let a = Obj.repr(a);
let b = Obj.repr(b);
switch (Obj.is_block(a), Obj.is_block(b)) {
| (true, true) => Obj.tag(a) == Obj.tag(b)
| (false, false) => a == b
| _ => false
};
};
where
equal_tag(Header, Footer) == false;
equal_tag(Header, Int(1)) == false;
equal_tag(String("a"), String("b")) == true;
equal_tag(Int(0), Int(0)) == true;
To understand how this function works you need to understand how OCaml represents values internally. This is described in the section on Representation of OCaml data types in the OCaml manual's chapter on Interfacing C with OCaml (and already here we see indications that this might not hold for the various JavaScript back-ends, for example, although I believe it does for now at least. I've tested this with BuckleScript/rescript, and js_of_ocaml tends to follow internals closer.)
Specifically, this section says the following about the representation of variants:
type t =
| A (* First constant constructor -> integer "Val_int(0)" *)
| B of string (* First non-constant constructor -> block with tag 0 *)
| C (* Second constant constructor -> integer "Val_int(1)" *)
| D of bool (* Second non-constant constructor -> block with tag 1 *)
| E of t * t (* Third non-constant constructor -> block with tag 2 *)
That is, constructors without a payload are represented directly as integers, while those with payloads are represented as "block"s with tags. Also note that block and non-block tags are independent, so we can't first extract some "universal" tag value from the values that we then compare. Instead we have to check whether they're both blocks or not, and then compare their tags.
Finally, note that while this function will accept values of any type, it is written only with variants in mind. Comparing values of other types is likely to yield unexpected results. That's another good reason to not use this.

ABAP equality check is wrong for INT4 and CHAR numeric

I've ran into an issue here, and I can't figure out exactly what SAP is doing. The test is quite simple, I have two variables that are a completely different type as well as having two completely different values.
The input is an INT4 of value 23579235. I am testing the equality function against a string '23579235.43'. Obviously my expectation is that these two variables are different because not only are they not the same type of variable, but they don't have the same value. Nothing about them is similar, actually.
EXPECTED1 23579235.43 C(11) \TYPE=%_T00006S00000000O0000000302
INDEX1 23579235 I(4) \TYPE=INT4
However, cl_abap_unit_assert=>assert_equals returns that these two values are identical. I started debugging and noticed the 'EQ' statement was used to check the values, and running the same statement in a simple ABAP also returns 'true' for this comparison.
What is happening here, and why doesn't the check fail immediately after noticing that the two data types aren't even the same? Is this a mistake on my part, or are these assert classes just incorrect?
report ztest.
if ( '23579235.43' eq 23579235 ).
write: / 'This shouldn''t be shown'.
endif.

As #dirk said, ABAP implicitly converts compared or assigned variables/literals if they have different types.
First, ABAP decides that the C-type literal is to be converted into type I so that it can be compared to the other I literal, and not the opposite because there's this priority rule when you compare types C and I : https://help.sap.com/http.svc/rc/abapdocu_752_index_htm/7.52/en-US/abenlogexp_numeric.htm###ITOC##ABENLOGEXP_NUMERIC_2
| decfloat16, decfloat34 | f | p | int8 | i, s, b |
.--------------|------------------------|---|---|------|---------|
| string, c, n | decfloat34 | f | p | int8 | i |
(intersection of "c" and "i" -> bottom rightmost "i")
Then, ABAP converts the C-type variable into I type for doing the comparison, using the adequate rules given at https://help.sap.com/http.svc/rc/abapdocu_752_index_htm/7.52/en-US/abenconversion_type_c.htm###ITOC##ABENCONVERSION_TYPE_C_1 :
Source Field Type c -> Numeric Target Fields -> Target :
"The source field must contain a number in mathematical or
commercial notation. [...] Decimal places are rounded commercially
to integer values. [...]"
Workarounds so that 23579235.43 is not implicitly rounded to 23579235 and so the comparison will work as expected :
either IF +'23579235.43' = 23579235. (the + makes it an expression i.e. it corresponds to 0 + '23579235.43' which becomes a big Packed type with decimals, because of another rule named "calculation type")
or IF conv decfloat16( '23579235.43' ) = 23579235. (decfloats 16 and 34 are big numbers with decimals)

Implement existing network interface defining bit-fields with respect to endianess in C++11

I am current developing a C++11 library for an existing network remote
control interface described in an Interface Control Document (ICD).
The interface is based on TCP/IPv4 and uses Network Byte Order (aka Big
Endian).
Requirement: The library shall be developed cross-platform.
Note: I want to develop a solution without (ab)using the preprocessor.
After a short research on the WWW I discovered Boost.Endian which solves
problems related to endianess of multi-byte data types. My approach is as
follows:
Serialize the (multi-)byte data types to a stream via
std::basic_ostream::write, more precisely via
os.write(reinterpret_cast<char const *>(&kData), sizeof(kData)).
Convert the std::basic_ostream to a std::vector<std::uint8_t>.
Send the std::vector<std::uint8_t> via Boost.Asio over the network.
So far so good. Everything seems to work as intended and the solution should be
platform independent.
Now comes the tricky part: The ICD describes messages consisting of multiple
words and a word consists of 8 bits. A message can contain multiple fields and a
field does not have to be byte-aligned, which means that one word can contain
multiple fields.
Example: Consider the following message format (the message body starts at
word 10):
Word | Bit(s) | Name
-----|--------|----------
10 | 0-2 | a
10 | 3 | b
10 | 4 | c
10 | 5 | d
10 | 6 | e
10 | 7 | RESERVED
11 | 16 | f
and so on...
So now I need a solution to be able to model and serialize a bit-based interface.
I have looked at the following approaches so far:
Bit field
std::bitset
boost::dynamic_bitset
1 is not cross-platform (compiler dependent).
2 and 3 do seem to work with Native Byte Order (i.e. Little Endian on
my machine) only, so the following example using boost::dynamic_bitset does not
work for my scenario:
Code:
// Using a arithmetic type from the `boost::endian` namespace does not work.
using Byte = std::uint8_t;
using BitSet = boost::dynamic_bitset<Byte>;
BitSet bitSetForA{3, 1};
BitSet bitSetForB{1};
// [...]
BitSet bitSetForF{16, 0x400}; // 1024
So, the 1024 in the example above is always serialized to 00 04 instead of
04 00 on my machine.
I really do not know what's the most pragmatic approach to solve my problem.
Maybe you can guide me into the correct direction.
In conclusion I do need a recipe to implement an existing network interface defining bit
fields in a platform-independent way with respect to the native byte order of
the machine the library has been compiled on.

Someone recently kindly pointed me to a nice article about endianness from which I took inspiration.
std::bitset has a to_ulong method that can be used to return the integer representation of the bitfield (endian independent), and the following code will print your output in the correct order:
#include <iostream>
#include <iomanip>
#include <bitset>
int main()
{
std::bitset<16> flags;
flags[10] = true;
unsigned long rawflags = flags.to_ulong();
std::cout << std::setfill('0') << std::setw(2) << std::hex
<< (rawflags & 0x0FF) // little byte in the beginning
<< std::setw(2)
<< ((rawflags>>8) & 0x0FF) // big byte in the end
<< std::endl;
}
Note that no solution using bit fields will easily work in this case, since also the bits will be swapped on little endian machines!
e.g. in a structure like this:
struct bits {
unsigned char first_bit:1;
unsigned char rest:7;
};
union {
bits b;
unsigned char raw;
};
Setting b.fist_bit to 1 will result in a raw value of 1 or 128 depending on the endianness!
HTH

Boolean bit fields vs logical bit masking or bit shifting - C++

I have a series of classes that are going to require many boolean fields, somewhere between 4-10. I'd like to not have to use a byte for each boolean. I've been looking into bit field structs, something like:
struct BooleanBitFields
{
bool b1:1;
bool b2:1;
bool b3:1;
bool b4:1;
bool b5:1;
bool b6:1;
};
But after doing some research I see a lot of people saying that this can cause inefficient memory access and not be worth the memory savings. I'm wondering what the best method for this situation is. Should I use bit fields, or use a char with bit masking (and's and or
s) to store 8bits? If the second solution is it better to bit shift or use logic?
If anyone could comment as to what method they would use and why it would really help me decide which route I should go down.
Thanks in advance!

With the large address spaces on desktop boxes, an array of 32/64-bit booleans may seem wasteful, and indeed it is, but most developers don't care, (me included). On RAM-restricted embedded controllers, or when accessing hardware in drivers, then sure, use bitfields, otherwise..
One other issue, apart from R/W ease/speed, is that a 32- or 64-bit boolean is thread-safer than one bit in the middle that has to be manipulated by multiple logical operations.

Bit fields are only a recommendation for the compiler. The compiler is free to implement them as it likes. On embedded systems there are compilers that guarantee 1 bit-to-bit mapping. Other compilers don't.
I would go with a regular struct, like yours but no bit fields. Make them unsigned chars - the shortest data type. The struct will make it easier to access them while editing, if your IDE supports auto completion.

Use an int bit array (leaves you lots of space to expand, and there is no advantage to a single char) and test with mask constants:
#define BOOL_A 1
#define BOOL_B 1 << 1
#define BOOL_C 1 << 2
#define BOOL_D 1 << 3
/* Alternately: use const ints for encapsulation */
// declare and set
int bitray = 0 | BOOL_B | BOOL_D;
// test
if (bitray & BOOL_B) cout << "Set!\n";

I want to write an answer to make sure once again and formalize the thought: "What does the transition from working with bytes to working with bits entail?" And also because the answer "I don't care" seems to me to be unreasonable.
Exploring char vs bitfield
Agree, It's very tempting. Especially when it's supposed to be used like this:
#define FLAG_1 1
#define FLAG_2 (1 << 1)
#define FLAG_3 (1 << 2)
#define FLAG_4 (1 << 3)
struct S1 {
char flag_1: 1;
char flag_2: 1;
char flag_3: 1;
char flag_4: 1;
}; //sizeof == 1
void MyFunction(struct S1 *obj, char flags) {
obj->flag_1 = flags & FLAG_1;
obj->flag_2 = flags & FLAG_2;
obj->flag_3 = flags & FLAG_3;
obj->flag_4 = flags & FLAG_4;
// we desire it to be as *obj = flags;
}
int main(int argc, char **argv)
{
struct S1 obj;
MyFunction(&obj, FLAG_1 | FLAG_2 | FLAG_3 | FLAG_4);
return 0;
}
But let's cover all aspects of such optimization. Let's decompose the operation into simpler C-commands, roughly corresponding to the assembler commands:
Initialization of all flags.
char flags = FLAG_1 | FLAG_3;
//obj->flag_1 = flags & FLAG_1;
//obj->flag_2 = flags & FLAG_2;
//obj->flag_3 = flags & FLAG_3;
//obj->flag_4 = flags & FLAG_4;
*obj = flags;
Writing one flag as a constant
//obj.flag_3 = 1;
char a = *obj;
a &= ~FLAG_3;
a |= FLAG_3;
*obj = a;
Write a single flag using a variable
char b = 3;
//obj.flag_3 = b;
char a = *obj;
a &= ~FLAG_3;
char c = b;
c <<= 3;
c &= ~FLAG_3; //Fixing b > 1
a |= c;
*obj = a;
Reading one flag into variable
//char f = obj.flag_3;
char f = *obj;
f >>= 3;
f &= 0x01;
Write one flag to another
//obj.flag_2 = obj.flag_4;
char a = *obj;
char b = a;
a &= FLAG_4;
a <<= 2; //Shift to FLAG_2 position
b |= a;
*obj = b;
Resume
Command
Cost, bitfield
Cost, variable
1. Init
1
4 or less
2. obj.flag_3 = 1;
3
1
3. obj.flag_3 = b;
7
1 or 3 *
4. char f = obj.flag_3;
2
1
5. obj.flag_2 = obj.flag_4;
6
1
*- if we guarantee flag be no more than 1
All operations except initialization take many lines of code. It looks like it would be better for us to leave bit fields alone after initialization)))). However, this is usually what happens to flags all the time. They change their state without warning and randomly.
We are essentially trying to make the rare value initialization operation cheaper by sacrificing frequent value change operations.
There are systems in which bitwise comparison operations, bit set and reset, bit copying and even bit swapping, bit branching, take one cycle. There are even systems in which mutex locking operations are implemented by a single assembler instruction (in such systems, bit fields may not be located on the entire memory area, for example, PIC microcontrollers). in any way it's not a common memory area.
Perhaps in such systems, the bool type could point to a component of the bitfield.
If your desire to save on insignificant bits of a byte has not yet disappeared, try to think about implementing addressability, atomicity of operations, arithmetic with bytes, and the resulting overhead for calls, data memory, code memory, stack if algorithms are placed in functions.
Reflections on the choice of bool or char
If your target platform decodes the bool type as 2 bytes or 4 or more. That most likely operations with bits on it will not be optimized. Rather, it is a platform for high-volume computing. This means that bit operations are not so in demand on it, in addition, operations with bytes and words are not so in demand on it.
In the same way that operations on bits hurt performance, operations on a single byte can also greatly increase the number of cycles to access a variable.
No system can be equally optimal for everything at once. Instead of obsessing over memory savings in systems that are clearly built with a lot of memory surplus, pay attention to the strengths of those systems.
Conclusion
Use char or bool if:
You need to store the mutable state or behavior of the algorithm (and change and return flags individually).
Your flag does not accurately describe the system and could evolve into a number.
You need to be able to access the flag by address.
If your code claims to be platform independent and there is no guarantee that bit operations will be optimized on the target platform.
Use bitfields if:
You need to store a huge number of flags without having to constantly read and rewrite them.
You have unusually tight memory requirements, or memory is low.
In other deeply justified cases, with calculations and confirming experiments.
Perhaps a short rule might be:
Independent flags are stored in a bool.
P.S.: If you've read this far and still want to save 7 bits out of 8, then consider why there is no desire to use 7 bit bit fields for variables that take a value up to 100 maximum.
References
Raymond Chen: The cost-benefit analysis of bitfields for a collection of booleans

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Efficiently handling a large set of bit-mask flags - c++

Related

What does double ampersand do in bpf filter?

How do I compare values for equality by Type Constructor?

ABAP equality check is wrong for INT4 and CHAR numeric

Implement existing network interface defining bit-fields with respect to endianess in C++11

Boolean bit fields vs logical bit masking or bit shifting - C++

Categories

Resources