C++ operator[] access to elements of SIMD (e.g. AVX) variable

C++ operator[] access to elements of SIMD (e.g. AVX) variable - c++

I'm looking for a way to overload operator[] (within a broader SIMD class) to facilitate reading and writing individual elements within a SIMD word (e.g. __m512i). A couple constraints:
Compliant with C++11 (or later)
Compatible with additional intrinsics based code
Not OpenCL/SYCL (which I could, but I can't *sigh*)
Mostly portable across g++, icpc, clang++
Preferably applicable to other SIMD beyond Intel (ARM, IBM, etc...)
(edit) Performance isn't really an issue (not generally used in places where performance matters)
(This rules out things like type punning through pointer casting, and GCC vector types.)
Based heavily on Scott Meyers' "More Effective C++" (Item 30), and other code I've come up with the following MVC code that seems "right", that seems to work, but also seems over complicated. (The "proxy" approach is meant to deal with the left/right hand operator[] usage, and the "memcpy" is meant to deal with the type punning/C++ standard issue.)
I'm wonder if someone has a better solution (and can explain it so I learn something ;^))
#include <iostream>
#include <cstring>
#include "immintrin.h"
using T = __m256i; // SIMD type
using Te = unsigned int; // SIMD element type
class SIMD {
class SIMDProxy;
public :
const SIMDProxy operator[](int index) const {
std::cout << "SIMD::operator[] const" << std::endl;
return SIMDProxy(const_cast<SIMD&>(*this), index);
}
SIMDProxy operator[](int index){
std::cout << "SIMD::operator[]" << std::endl;
return SIMDProxy(*this, index);
}
Te get(int index) {
std::cout << "SIMD::get" << std::endl;
alignas(T) Te tmp[8];
std::memcpy(tmp, &value, sizeof(T)); // _mm256_store_si256(reinterpret_cast<__m256i *>(tmp), c.value);
return tmp[index];
}
void set(int index, Te x) {
std::cout << "SIMD::set" << std::endl;
alignas(T) Te tmp[8];
std::memcpy(tmp, &value, sizeof(T)); // _mm256_store_si256(reinterpret_cast<__m256i *>(tmp), c.value);
tmp[index] = x;
std::memcpy(&value, tmp, sizeof(T)); // c.value = _mm256_load_si256(reinterpret_cast<__m256i const *>(tmp));
}
void splat(Te x) {
alignas(T) Te tmp[8];
std::memcpy(tmp, &value, sizeof(T));
for (int i=0; i<8; i++) tmp[i] = x;
std::memcpy(&value, tmp, sizeof(T));
}
void print() {
alignas(T) Te tmp[8];
std::memcpy(tmp, &value, sizeof(T));
for (int i=0; i<8; i++) std::cout << tmp[i] << " ";
std::cout << std::endl;
}
protected :
private :
T value;
class SIMDProxy {
public :
SIMDProxy(SIMD & c_, int index_) : c(c_), index(index_) {};
// lvalue access
SIMDProxy& operator=(const SIMDProxy& rhs) {
std::cout << "SIMDProxy::=SIMDProxy" << std::endl;
c.set(rhs.index, rhs.c.get(rhs.index));
return *this;
}
SIMDProxy& operator=(Te x) {
std::cout << "SIMDProxy::=T" << std::endl;
c.set(index,x);
return *this;
}
// rvalue access
operator Te() const {
std::cout << "SIMDProxy::()" << std::endl;
return c.get(index);
}
private:
SIMD& c; // SIMD this proxy refers to
int index; // index of element we want
};
friend class SIMDProxy; // give SIMDProxy access into SIMD
};
/** a little main to exercise things **/
int
main(int argc, char *argv[])
{
SIMD x, y;
Te a = 3;
x.splat(1);
x.print();
y.splat(2);
y.print();
x[0] = a;
x.print();
y[1] = a;
y.print();
x[1] = y[1];
x.print();
}

Your code is very inefficient. Normally these SIMD types are not present anywhere in memory, they are hardware registers, they don’t have addresses and you can’t pass them to memcpy(). Compilers pretend very hard they’re normal variables that’s why your code compiles and probably works, but it’s slow, you’re doing roundtrips from registers to memory and back all the time.
Here’s how I would do that, assuming AVX2 and integer lanes.
class SimdVector
{
__m256i val;
alignas( 64 ) static const std::array<int, 8 + 7> s_blendMaskSource;
public:
int operator[]( size_t lane ) const
{
assert( lane < 8 );
// Move lane index into lowest lane of vector register
const __m128i shuff = _mm_cvtsi32_si128( (int)lane );
// Permute the vector so the lane we need is moved to the lowest lane
// _mm256_castsi128_si256 says "the upper 128 bits of the result are undefined",
// and we don't care indeed.
const __m256i tmp = _mm256_permutevar8x32_epi32( val, _mm256_castsi128_si256( shuff ) );
// Return the lowest lane of the result
return _mm_cvtsi128_si32( _mm256_castsi256_si128( tmp ) );
}
void setLane( size_t lane, int value )
{
assert( lane < 8 );
// Load the blending mask
const int* const maskLoadPointer = s_blendMaskSource.data() + 7 - lane;
const __m256i mask = _mm256_loadu_si256( ( const __m256i* )maskLoadPointer );
// Broadcast the source value into all lanes.
// The compiler will do equivalent of _mm_cvtsi32_si128 + _mm256_broadcastd_epi32
const __m256i broadcasted = _mm256_set1_epi32( value );
// Use vector blending instruction to set the desired lane
val = _mm256_blendv_epi8( val, broadcasted, mask );
}
template<size_t lane>
int getLane() const
{
static_assert( lane < 8 );
// That thing is not an instruction;
// compilers emit different ones based on the index
return _mm256_extract_epi32( val, (int)lane );
}
template<size_t lane>
void setLane( int value )
{
static_assert( lane < 8 );
val = _mm256_insert_epi32( val, value, (int)lane );
}
};
// Align by 64 bytes to guarantee it's contained within a cache line
alignas( 64 ) const std::array<int, 8 + 7> SimdVector::s_blendMaskSource
{
0, 0, 0, 0, 0, 0, 0, -1, 0, 0, 0, 0, 0, 0, 0
};
For ARM it’s different. If lane index is known at compile time, see vgetq_lane_s32 and vsetq_lane_s32 intrinsics.
For setting lanes on ARM you can use the same broadcast + blend trick. Broadcast is vdupq_n_s32. An approximate equivalent of vector blend is vbslq_s32, it handles every bit independently, but for this use case it’s equally suitable because -1 has all 32 bits set.
For extracting either write a switch, or store the complete vector into memory, not sure which of these two is more efficient.

Of the original approaches (memcpy, intrinsic load/store), and the additional suggestions (user defined union-punning, user defined vector type) it seems like the intrinsic approach may have a small advantage. This is based on some quick examples I attempted to code up in Godbolt (https://godbolt.org/z/5zdbKe).
The "best" for writing to an element looks something like this.
__m256i foo2(__m256i x, unsigned int a, int index)
{
alignas(__m256i) unsigned int tmp[8];
_mm256_store_si256(reinterpret_cast<__m256i *>(tmp), x);
tmp[index] = a;
__m256i z = _mm256_load_si256(reinterpret_cast<__m256i const *>(tmp));
return z;
}

If you only care about g++/clang++/icc compatibility, you can just use the __attribute__ which these compilers use internally to define their intrinsic instructions:
typedef int32_t int32x16_t __attribute__((vector_size(16*sizeof(int32_t)))) __attribute__((aligned(16*sizeof(int32_t))));
When it makes sense (and is possible on the given architecture), variables will be stored in vector registers. Also, the compilers provide a read/writeable operator[] for this typedef (which should get optimized, if the index is known at compile-time).

Related

Type-pun uint64_t as two uint32_t in C++20

This code to read a uint64_t as two uint32_t is UB due to the strict aliasing rule:
uint64_t v;
uint32_t lower = reinterpret_cast<uint32_t*>(&v)[0];
uint32_t upper = reinterpret_cast<uint32_t*>(&v)[1];
Likewise, this code to write the upper and lower part of an uint64_t is UB due to the same reason:
uint64_t v;
uint32_t* lower = reinterpret_cast<uint32_t*>(&v);
uint32_t* upper = reinterpret_cast<uint32_t*>(&v) + 1;
*lower = 1;
*upper = 1;
How can one write this code in a safe and clean way in modern C++20, potentially using std::bit_cast?

Using std::bit_cast:
Try it online!
#include <bit>
#include <array>
#include <cstdint>
#include <iostream>
int main() {
uint64_t x = 0x12345678'87654321ULL;
// Convert one u64 -> two u32
auto v = std::bit_cast<std::array<uint32_t, 2>>(x);
std::cout << std::hex << v[0] << " " << v[1] << std::endl;
// Convert two u32 -> one u64
auto y = std::bit_cast<uint64_t>(v);
std::cout << std::hex << y << std::endl;
}
Output:
87654321 12345678
1234567887654321
std::bit_cast is available only in C++20. Prior to C++20 you can manually implement std::bit_cast through std::memcpy, with one exception that such implementation is not constexpr like C++20 variant:
template <class To, class From>
inline To bit_cast(From const & src) noexcept {
//return std::bit_cast<To>(src);
static_assert(std::is_trivially_constructible_v<To>,
"Destination type should be trivially constructible");
To dst;
std::memcpy(&dst, &src, sizeof(To));
return dst;
}
For this specific case of integers quite optimal would be just to do bit shift/or arithmetics to convert one u64 to two u32 and back again. std::bit_cast is more generic, supporting any trivially constructible type, although std::bit_cast solution should be same optimal as bit arithmetics on modern compilers with high level of optimization.
One extra profit of bit arithmetics is that it handles correctly endianess, it is endianess independent, unlike std::bit_cast.
Try it online!
#include <cstdint>
#include <iostream>
int main() {
uint64_t x = 0x12345678'87654321ULL;
// Convert one u64 -> two u32
uint32_t lo = uint32_t(x), hi = uint32_t(x >> 32);
std::cout << std::hex << lo << " " << hi << std::endl;
// Convert two u32 -> one u64
uint64_t y = (uint64_t(hi) << 32) | lo;
std::cout << std::hex << y << std::endl;
}
Output:
87654321 12345678
123456788765432
Notice! As #Jarod42 points out, solution with bit shifting is not equivalent to memcpy/bit_cast solution, their equivalence depends on endianess. On little endian CPU memcpy/bit_cast gives least significant half (lo) as array element v[0] and most significant (hi) in v[1], while on big endian least significant (lo) goes to v[1] and most significant goes to v[0]. While bit-shifting solution is endianess independent, and on all systems gives most significant half (hi) as uint32_t(num_64 >> 32) and least significant half (lo) as uint32_t(num_64).

in a safe and clean way
Do not use reinterpret_cast. Do not depend on unclear code that depends on some specific compiler settings and fishy, uncertain behavior. Use exact arithmetic operations with well-known defined result. Classes and operator overloads are all there waiting for you. For example, some global functions:
#include <iostream>
struct UpperUint64Ref {
uint64_t &v;
UpperUint64Ref(uint64_t &v) : v(v) {}
UpperUint64Ref operator=(uint32_t a) {
v &= 0x00000000ffffffffull;
v |= (uint64_t)a << 32;
return *this;
}
operator uint64_t() {
return v;
}
};
struct LowerUint64Ref {
uint64_t &v;
LowerUint64Ref(uint64_t &v) : v(v) {}
/* as above */
};
UpperUint64Ref upper(uint64_t& v) { return v; }
LowerUint64Ref lower(uint64_t& v) { return v; }
int main() {
uint64_t v;
upper(v) = 1;
}
Or interface object:
#include <iostream>
struct Uint64Ref {
uint64_t &v;
Uint64Ref(uint64_t &v) : v(v) {}
struct UpperReference {
uint64_t &v;
UpperReference(uint64_t &v) : v(v) {}
UpperReference operator=(uint32_t a) {
v &= 0x00000000ffffffffull;
v |= (uint64_t)a << 32u;
}
};
UpperReference upper() {
return v;
}
struct LowerReference {
uint64_t &v;
LowerReference(uint64_t &v) : v(v) {}
};
LowerReference lower() { return v; }
};
int main() {
uint64_t v;
Uint64Ref r{v};
r.upper() = 1;
}

Using std::memcpy
#include <cstdint>
#include <cstring>
void foo(uint64_t& v, uint32_t low_val, uint32_t high_val) {
std::memcpy(reinterpret_cast<unsigned char*>(&v), &low_val,
sizeof(low_val));
std::memcpy(reinterpret_cast<unsigned char*>(&v) + sizeof(low_val),
&high_val, sizeof(high_val));
}
int main() {
uint64_t v = 0;
foo(v, 1, 2);
}
With O1, the compiler reduces foo to:
mov DWORD PTR [rdi], esi
mov DWORD PTR [rdi+4], edx
ret
Meaning there are no extra copies made, std::memcpy just serves as a hint to the compiler.

std::bit_cast alone is not enough since results will vary by the endian of the system.
Fortunately <bit> also contains std::endian.
Keeping in mind that optimizers generally compile-time resolve ifs that are always true or false, we can test endianness and act accordingly.
We only know beforehand how to handle big or little-endian. If it is not one of those, bit_cast results are not decodable.
Another factor that can spoil things is padding. Using bit_cast assumes 0 padding between array elements.
So we can check if there is no padding and the endianness is big or little to see if it is castable.
If it is not castable, we do a bunch of shifts as per the old method.
(this can be slow)
If the endianness is big -- return the results of bit_cast.
If the endianness is little -- reverse the order. Not the same as c++23 byteswap, as we swap elements.
I arbitrarily decided that big-endian has the correct order with the high bits at x[0].
#include <bit>
#include <array>
#include <cstdint>
#include <climits>
#include <concepts>
template <std::integral F, std::integral T>
requires (sizeof(F) >= sizeof(T))
constexpr auto split(F x) {
enum consts {
FBITS=sizeof(F)*CHAR_BIT,
TBITS=sizeof(F)*CHAR_BIT,
ELEM=sizeof(F)/sizeof(T),
BASE=FBITS-TBITS,
MASK=~0ULL >> BASE
};
using split=std::array<T, ELEM>;
const bool is_big=std::endian::native==std::endian::big;
const bool is_little=std::endian::native==std::endian::little;
const bool can_cast=((is_big || is_little)
&& (sizeof(F) == sizeof(split)));
// All the following `if`s should be eliminated at compile time
// since they are always true or always false
if (!can_cast)
{
split ret;
for (int e = 0; e < ELEM; ++e)
{
ret[e]=(x>>(BASE-e*TBITS)) & MASK;
}
return ret;
}
split tmp=std::bit_cast<split>(x);
if (is_big)
{
return tmp;
}
split ret;
for (int e=0; e < ELEM; ++e)
{
ret[e]=tmp[ELEM-(e+1)];
}
return ret;
}
auto tst(uint64_t x, int y)
{
return split<decltype(x), uint32_t>(x)[y];
}
I believe this should be defined behavior.
EDIT: changed uint64 base to template parameter and minor edit tweaks

Don't bother, because arithmetic is faster anyway:
uint64_t v;
uint32_t lower = v;
uint32_t upper = v >> 32;

Is there a way I can use a 2-bit size type instead of an int, by just plugging in the new type name instead of int?

I have an application where I need to save as much of memory as possible. I need to store a large amount of data that can take exactly three possible values. So, I have been trying to use a 2 bit sized type.
One possibility is using bit fields. I could do
struct myType {
uint8_t twoBits : 2;
}
This is a suggestion from this thread.
However, everywhere where I have used int variables prior to this, I would need to change their usage by appending a .twoBits. I checked if I can create a bit field outside of a struct, such as
uint8_t twoBits : 2;
but this thread says it is not possible. However,that thread is specific to C, so I am not sure if it applied to C++.
Is there a clean way I can define a 2-bit type, so that by simply replacing int with my type, I can run the program correctly? Or is using bit fields the only possible way?

CPU, and thus the memory, the bus, and the compiler too, uses only bytes or groups of bytes. There's no way to store a 2-bits type without storing also the other 6 remaining bits.
What you can so is define a struct that only uses some bits. But we aware that it will not save memory.
You can pack several x-bits types in a struct, as you already know. Or you can do bits operations to pack/unpack them into a integer type.

Is there a clean way I can define a 2-bit type, so that by simply
replacing int with my type, I can run the program correctly? Or is
using bit fields the only possible way?
You can try to make the struct as transparent as possible by providing implicit conversion operators and constructors:
#include <cstdint>
#include <iostream>
template <std::size_t N, typename T = unsigned>
struct bit_field {
T rep : N;
operator T() { return rep; }
bit_field(T i) : rep{ i } { }
bit_field() = default;
};
using myType = bit_field<2, std::uint8_t>;
int main() {
myType mt;
mt = 3;
std::cout << mt << "\n";
}
So objects of type my_type somewhat behave like real 3-bit unsigned integers, despite having more than 3 bits.
Of course, the residual bits are unused, but as single bits are not addressable on most systems, this is the best way to go.

I'm not convinced that you will save anything with your existing structure, as the surrounding structure still gets rounded up to a whole number of bytes.
You can write the following to squeeze 4 2-bit counters into 1 byte, but as you say, you have to name them myInst.f0:
struct MyStruct
{
ubyte_t f0:2,
f1:2,
f2:2,
f3:2;
} myInst;
In c and c++98, you can declare this anonymous, but this usage is deprecated. You can now access the 4 values directly by name:
struct
{ // deprecated!
ubyte_t f0:2,
f1:2,
f2:2,
f3:2;
};
You could declare some sort of template that wraps a single instance with an operator int and operator =(int), and then define a union to put the 4 instances at the same location, but again anonymous unions are deprecated. However you could then declare references to your 4 values, but then you are paying for the references, which are bigger than the bytes you were trying to save!
template <class Size,int offset,int bits>
struct Bitz
{
Size ignore : offset,
value : bits;
operator Size()const { return value; }
Size operator = (Size val) { return (value = val); }
};
template <class Size,int bits>
struct Bitz0
{ // I know this can be done better
Size value : bits;
operator Size()const { return value; }
Size operator = (Size val) { return (value = val); }
};
static union
{ // Still deprecated!
Bitz0<char, 2> F0;
Bitz<char, 2, 2> F1;
Bitz<char, 4, 2> F2;
Bitz<char, 6, 2> F3;
};
union
{
Bitz0<char, 2> F0;
Bitz<char, 2, 2> F1;
Bitz<char, 4, 2> F2;
Bitz<char, 6, 2> F3;
} bitz;
Bitz0<char, 2>& F0 = bitz.F0; /// etc...
Alternatively, you could simply declare macros to replace the the dotted name with a simple name (how 1970s):
#define myF0 myInst.f0
Note that you can't pass bitfields by reference or pointer, as they don't have a byte address, only by value and assignment.

A very minimal example of a bit array with a proxy class that looks (for the most part) like you were dealing with an array of very small integers.
#include <cstdint>
#include <iostream>
#include <vector>
class proxy
{
uint8_t & byte;
unsigned int shift;
public:
proxy(uint8_t & byte,
unsigned int shift):
byte(byte),
shift(shift)
{
}
proxy(const proxy & src):
byte(src.byte),
shift(src.shift)
{
}
proxy & operator=(const proxy &) = delete;
proxy & operator=(unsigned int val)
{
if (val <=3)
{
uint8_t wipe = 3 << shift;
byte &= ~wipe;
byte |= val << shift;
}
// might want to throw std::out_of_range here
return *this;
}
operator int() const
{
return (byte >> shift) &0x03;
}
};
Proxy holds a reference to a byte and knows how to extract two specific bits and look like an int to anyone who uses it.
If we wrap an array of bits packed into bytes with a class that returns this proxy object wrapped around the appropriate byte, we now have something that looks a lot like an array of very small ints.
class bitarray
{
size_t size;
std::vector<uint8_t> data;
public:
bitarray(size_t size):
size(size),
data((size + 3) / 4)
{
}
proxy operator[](size_t index)
{
return proxy(data[index/4], (index % 4) * 2);
}
};
If you want to extend this and go the distance, Writing your own STL Container should help you make a fully armed and operational bit-packed array.
There's room for abuse here. The caller can hold onto a proxy and get up to whatever manner of evil this allows.
Use of this primitive example:
int main()
{
bitarray arr(10);
arr[0] = 1;
arr[1] = 2;
arr[2] = 3;
arr[3] = 1;
arr[4] = 2;
arr[5] = 3;
arr[6] = 1;
arr[7] = 2;
arr[8] = 3;
arr[9] = 1;
std::cout << arr[0] << std::endl;
std::cout << arr[1] << std::endl;
std::cout << arr[2] << std::endl;
std::cout << arr[3] << std::endl;
std::cout << arr[4] << std::endl;
std::cout << arr[5] << std::endl;
std::cout << arr[6] << std::endl;
std::cout << arr[7] << std::endl;
std::cout << arr[8] << std::endl;
std::cout << arr[9] << std::endl;
}

Simply, build on top of bitset, something like:
#include<bitset>
#include<iostream>
using namespace std;
template<int N>
class mydoublebitset
{
public:
uint_least8_t operator[](size_t index)
{
return 2 * b[index * 2 + 1] + b[index * 2 ];
}
void set(size_t index, uint_least8_t store)
{
switch (store)
{
case 3:
b[index * 2] = 1;
b[index * 2 + 1] = 1;
break;
case 2:
b[index * 2] = 0;
b[index * 2 + 1] = 1;
break;
case 1:
b[index * 2] = 0;
b[index * 2 + 1] = 1;
break;
case 0:
b[index * 2] = 0;
b[index * 2 + 1] = 0;
break;
default:
throw exception();
}
}
private:
bitset<N * 2> b;
};
int main()
{
mydoublebitset<12> mydata;
mydata.set(0, 0);
mydata.set(1, 2);
mydata.set(2, 2);
cout << (unsigned int)mydata[0] << (unsigned int)mydata[1] << (unsigned int)mydata[2] << endl;
system("pause");
return 0;
}
Basically use a bitset with twice the size and index it accordingly. its simpler and memory efficient as is required by you.

Optimal branchless conditional selection of two SSE2 packed doubles

I'm trying to write a branchless bit select function for packed SSE2 doubles:
#include <iostream>
#include <emmintrin.h>
inline __m128d select(bool expression, const __m128d& x, const __m128d& y)
{
const int conditional_mask = expression ? -1 : 0;
const auto mask = _mm_castsi128_pd(_mm_set_epi64x(conditional_mask, conditional_mask));
return _mm_or_pd(_mm_and_pd(mask, x), _mm_andnot_pd(mask, y));
}
int main()
{
auto r1 = _mm_setr_pd(1, 2);
auto r2 = _mm_setr_pd(5, 6);
auto result = select(true, r1, r2);
auto packed = reinterpret_cast<double*>(&result);
std::cout << "result = " << packed[0] << ", " << packed[1] << std::endl;
std::getchar();
return EXIT_SUCCESS;
}
Is there a simpler approach for SSE2 and SSE4 that would be more optimal on x64?

You've specified that SSE4 is allowed, SSE4.1 has blendvpd so you can blend with a built-in blend: (not tested, but compiled)
inline __m128d select(bool expression, const __m128d& x, const __m128d& y)
{
const int c_mask = expression ? -1 : 0;
const auto mask = _mm_castsi128_pd(_mm_set_epi64x(c_mask, c_mask));
return _mm_blendv_pd(y, x, mask);
}
I would also not take SSE vectors as argument by reference, copying them is trivial so not something to be avoided and taking them by reference encourages the compiler to bounce them through memory (for non-inlined calls).

C++ Making a 2D boolean matrix

I am making a program where I have 2 vectors (clientvec and productslist) and I need to create a 2D boolean matrix where the columns is the size of productslist vector and the lines is the size of clientvec vector, but it gives me this error:
"expression must have a constant value"
Here is the code I used:
unsigned int lines = clientvec.size();
unsigned int columns = productslist.size();
bool matrixPublicity[lines][columns] = {false};
Pls help me..
Edit: I am new at c++ so assume I know nothing xD
Edit2: I already know for the answers that I cannot initialize an array with non constant values, now the question is how can I put them after initialize...

The error message is clear: :expression must have a constant value"
It means the array dimension cannot be of variable type. Only enums or pre-processor defined constants are valid.
See for more info:
Why can't I initialize a variable-sized array?
Edit: Since you mentioned you are new to C++, here is a piece of code that might help you:
#include <iostream>
#include <vector>
#include <bitset>
int main()
{
unsigned int lines = 10;
const unsigned int columns = 5;
std::vector<std::bitset<columns>> matrixPublicity;
matrixPublicity.resize(lines);
for(int i=0; i < lines; i++)
{
for(int j=0; j < columns; j++)
std::cout << matrixPublicity[i][j] <<' ';
std::cout<<'\n';
}
}
note that in this case, columns must be constant.
Edit 2: And if the size of lines are not the same, then you must stick to vector types:
typedef std::vector<bool> matrixLine;
std::vector<matrixLine> matrixPublicity;
now you can use resize method for the i-th line of the matrix, e.g.
matrixPublicity[1].resize(number_of_columns_in_line_2);

What you are trying to do would be the same as this:
std::vector<unsigned int> v1 { 1, 2, 3, 4, 5 };
std::vector<unsigned int> v2 { 6, 7, 8, 9 };
bool mat[v1.size()][v2.size()] = false;
This is how the compiler will interpret it without the temporaries and this is invalid. When you declare an array of any type its size has to be known at compile time.
bool mat[2][3] = false; // still invalid
bool mat[2][3] = { false }; // Okay
const int x = 5;
const int y = 7;
bool mat[x][y] = false; // invalid
bool mat[x][y] = { false }; // okay
// Even this is invalid
std::vector<int> v1{ 1, 2, 3 };
std::vector<int> v2{ 4, 5, 6, 7 };
const std::size_t x1 = v1.size();
const std::size_t y1 = v2.size();
bool mat2[x1][y1] = { false }; // Still won't compile.
Value to declare an array must be a constant expression.

Instead of making an array as you have tried to do, you could make a class template that will construct a matrix like object for you. Here is what I have come up with, now the overall design or pattern of this template will fit your condition but the actual implementation to generate the internal matrix will depend on your data and what you intend.
#include <vector>
#include <iostream>
#include <conio.h>
template <class T, class U>
class Matrix {
private:
std::vector<T> m_lines;
std::vector<T> m_cols;
std::vector<U> m_mat;
std::size_t m_size;
std::size_t m_lineCount;
std::size_t m_colsCount;
public:
Matrix() {};
Matrix( const std::vector<T>& lines, const std::vector<T>& cols ) :
m_lines(lines),
m_cols(cols),
m_lineCount( lines.size() ),
m_colsCount( cols.size() )
{
addVectors( lines, cols );
}
void addVectors( const std::vector<T>& v1, const std::vector<T>& v2 ) {
m_lines = v1;
m_cols = v2;
m_lineCount = m_lines.size();
m_colsCount = m_cols.size();
for ( unsigned int i = 0; i < m_lineCount; ++i ) {
for ( unsigned int j = 0; j < m_colsCount); j++ ) {
// This will depend on your implementation and how you
// construct this matrix based off of your existing containers
m_mat.push_back(m_lines[i] & m_cols[j]);
}
}
m_size = m_mat.size();
}
std::size_t size() const { return m_size; }
std::size_t sizeRows() const { return m_lineCount; }
std::size_t sizelColumns() const { return m_colsCount; }
std::vector<U>& getMatrix() const { return m_mat; }
std::vector<T>& getLines() const { return m_lines; }
std::vector<T>& getColumns() const { return m_columns; }
bool operator[]( std::size_t idx ) { return m_mat[idx]; }
const bool& operator[]( std::size_t idx ) const { return m_mat[idx]; }
};
int main() {
std::vector<unsigned> v1{ 1, 0, 1, 1, 0 };
std::vector<unsigned> v2{ 0, 1, 1, 1, 0 };
Matrix<unsigned, bool> mat1( v1, v2 );
int line = 0;
for ( unsigned u = 0; u < mat1.size(); ++u ) {
line++;
std::cout << mat1[u] << " ";
if ( line == mat1.sizeRows() ) {
std::cout << "\n";
line = 0;
}
}
std::cout << "\nPress any key to quit.\n" << std::endl;
_getch();
return 0;
}
Output
0 1 1 1 0
0 0 0 0 0
0 1 1 1 0
0 1 1 1 0
0 0 0 0 0
With this template class you can create a matrix of any type U by passing in two vectors for type T. Now how you construct the matrix will be implementation dependent. But this class is reusable for different types.
You could have two vectors of type doubles, and construct a matrix of unsigned chars, or you could have two vectors of user defined class or struct types and generate a matrix of unsigned values. This may help you out in many situations.
Note: - This does generate a compiler warning, no errors though and it prints and displays properly, but the compiler warning generated by MSVS 2015 is warning C4800: unsigned int: forcing value to bool true or false (performance warning)
This is generated for I am doing a bit wise & operation on to unsigned values; but that is why I set my initial vectors to be passed to this class template's constructor to have all 1s & 0s as this is meant for demonstration only.
EDIT - I made an edit to the class because I noticed I had a default constructor and had no way to add vectors to it, so I added an extra member variable, and an addVectors function, and moved the implementation from the defined constructor to the new function and just ended up calling that function in the defined constructor.

Creating an array isn't that difficult :)
A matrix (2D/3D/...-array) is unfortunately a little bit different if you want to do it your way!
But first of all you should know about the stack and the heap!
Lets have a look at these 2:
Stack:
A stack variable/array/matrix/... is only valid between the nearest 2 -> {} <- which you normally call a "codeblock". The size of it was defined during the "compile time" (the time where the compiler translates your code into the machine language). That means the size of your array needs to be set.
Example:
#include <iostream>
#define MACRO 128
int arraySize(int size){
std::cin >> size;
return size;
}
int main() {
//this is valid
int intArray[128] = {}; //the size(here: 128) needs to be a number like
//or a macro like 'MACRO' which is
//compile-time-only as well
//this is valid
int intArray2[MACRO] = {};
//this is not valid!
int intArray[size()] = {};
return 0;
}
Heap:
A heap variable/array/matrix/... is valid until you delete it. That also means that a heap var is created during the run-time(from starting your program until you close/stop it)! This is allows you to define it's size.
Example:
#include <iostream>
#define MACRO 128
int arraySize(int size){
return size;
}
int main() {
//this is valid
int intArray[128] = {}; //the size(here: 128) needs to be a number like
//or a macro like 'MACRO' whic is
//compile-time-only as well
//this is valid
int intArray2[MACRO] = {};
//creating an array with a non-static size
//works like this:
//int can also be a 'bool'
int* intArray = new int[arraySize()];
// ^ the star means you are pointing to
//an adress inside of your memory which has
//the size of an int (per element)
//That's why they are called "pointers"!
//Right now it points to the beginning of the
//array.
// ^ the keyword "new" says that
//you are allocating memory on the heap.
// ^
//then you have to say which kind of array
//it is which is the same you gave the pointer
// ^
//now you give it the size of that array
//this time it can be return value or the size
//of a variable
//as I mentioned...you have to delete this array on your own
//if you dont do that your program will crash
//maybe not after starting but it will!
//SO NEVER NEVER NEVER... forget about it
delete intArray[];
//^ write delete
// ^
//then the name of your array
// ^
//at the end of it write these 2 brackets
//thex say you wanna remove the whole array!
//why? because you can also create/delete
//heap variables not only arrays.
return 0;
}
Creating a matrix on the heap is unfortunately not that easy.
But it is essential to know how a 1D-array works before going to further dimensions! That's why I did this tutorial!
Klick here to see how to create a matrix on the heap
Klick here to learn more about the heap
Klick here to choose the best result of this theme
I hope I could help you :)!

C/C++ efficient bit array

Can you recommend efficient/clean way to manipulate arbitrary length bit array?
Right now I am using regular int/char bitmask, but those are not very clean when array length is greater than datatype length.
std vector<bool> is not available for me.

Since you mention C as well as C++, I'll assume that a C++-oriented solution like boost::dynamic_bitset might not be applicable, and talk about a low-level C implementation instead. Note that if something like boost::dynamic_bitset works for you, or there's a pre-existing C library you can find, then using them can be better than rolling your own.
Warning: None of the following code has been tested or even compiled, but it should be very close to what you'd need.
To start, assume you have a fixed bitset size N. Then something like the following works:
typedef uint32_t word_t;
enum { WORD_SIZE = sizeof(word_t) * 8 };
word_t data[N / 32 + 1];
inline int bindex(int b) { return b / WORD_SIZE; }
inline int boffset(int b) { return b % WORD_SIZE; }
void set_bit(int b) {
data[bindex(b)] |= 1 << (boffset(b));
}
void clear_bit(int b) {
data[bindex(b)] &= ~(1 << (boffset(b)));
}
int get_bit(int b) {
return data[bindex(b)] & (1 << (boffset(b));
}
void clear_all() { /* set all elements of data to zero */ }
void set_all() { /* set all elements of data to one */ }
As written, this is a bit crude since it implements only a single global bitset with a fixed size. To address these problems, you want to start with a data struture something like the following:
struct bitset { word_t *words; int nwords; };
and then write functions to create and destroy these bitsets.
struct bitset *bitset_alloc(int nbits) {
struct bitset *bitset = malloc(sizeof(*bitset));
bitset->nwords = (n / WORD_SIZE + 1);
bitset->words = malloc(sizeof(*bitset->words) * bitset->nwords);
bitset_clear(bitset);
return bitset;
}
void bitset_free(struct bitset *bitset) {
free(bitset->words);
free(bitset);
}
Now, it's relatively straightforward to modify the previous functions to take a struct bitset * parameter. There's still no way to re-size a bitset during its lifetime, nor is there any bounds checking, but neither would be hard to add at this point.

boost::dynamic_bitset if the length is only known in run time.
std::bitset if the length is known in compile time (although arbitrary).

I've written a working implementation based off Dale Hagglund's response to provide a bit array in C (BSD license).
https://github.com/noporpoise/BitArray/
Please let me know what you think / give suggestions. I hope people looking for a response to this question find it useful.

This posting is rather old, but there is an efficient bit array suite in C in my ALFLB library.
For many microcontrollers without a hardware-division opcode, this library is EFFICIENT because it doesn't use division: instead, masking and bit-shifting are used. (Yes, I know some compilers will convert division by 8 to a shift, but this varies from compiler to compiler.)
It has been tested on arrays up to 2^32-2 bits (about 4 billion bits stored in 536 MBytes), although last 2 bits should be accessible if not used in a for-loop in your application.
See below for an extract from the doco. Doco is http://alfredo4570.net/src/alflb_doco/alflb.pdf, library is http://alfredo4570.net/src/alflb.zip
Enjoy,
Alf
//------------------------------------------------------------------
BM_DECLARE( arrayName, bitmax);
Macro to instantiate an array to hold bitmax bits.
//------------------------------------------------------------------
UCHAR *BM_ALLOC( BM_SIZE_T bitmax);
mallocs an array (of unsigned char) to hold bitmax bits.
Returns: NULL if memory could not be allocated.
//------------------------------------------------------------------
void BM_SET( UCHAR *bit_array, BM_SIZE_T bit_index);
Sets a bit to 1.
//------------------------------------------------------------------
void BM_CLR( UCHAR *bit_array, BM_SIZE_T bit_index);
Clears a bit to 0.
//------------------------------------------------------------------
int BM_TEST( UCHAR *bit_array, BM_SIZE_T bit_index);
Returns: TRUE (1) or FALSE (0) depending on a bit.
//------------------------------------------------------------------
int BM_ANY( UCHAR *bit_array, int value, BM_SIZE_T bitmax);
Returns: TRUE (1) if array contains the requested value (i.e. 0 or 1).
//------------------------------------------------------------------
UCHAR *BM_ALL( UCHAR *bit_array, int value, BM_SIZE_T bitmax);
Sets or clears all elements of a bit array to your value. Typically used after a BM_ALLOC.
Returns: Copy of address of bit array
//------------------------------------------------------------------
void BM_ASSIGN( UCHAR *bit_array, int value, BM_SIZE_T bit_index);
Sets or clears one element of your bit array to your value.
//------------------------------------------------------------------
BM_MAX_BYTES( int bit_max);
Utility macro to calculate the number of bytes to store bitmax bits.
Returns: A number specifying the number of bytes required to hold bitmax bits.
//------------------------------------------------------------------

You can use std::bitset
int main() {
const bitset<12> mask(2730ul);
cout << "mask = " << mask << endl;
bitset<12> x;
cout << "Enter a 12-bit bitset in binary: " << flush;
if (cin >> x) {
cout << "x = " << x << endl;
cout << "As ulong: " << x.to_ulong() << endl;
cout << "And with mask: " << (x & mask) << endl;
cout << "Or with mask: " << (x | mask) << endl;
}
}

I know it's an old post but I came here to find a simple C bitset implementation and none of the answers quite matched what I was looking for, so I implemented my own based on Dale Hagglund's answer. Here it is :)
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <string.h>
typedef uint32_t word_t;
enum { BITS_PER_WORD = 32 };
struct bitv { word_t *words; int nwords; int nbits; };
struct bitv* bitv_alloc(int bits) {
struct bitv *b = malloc(sizeof(struct bitv));
if (b == NULL) {
fprintf(stderr, "Failed to alloc bitv\n");
exit(1);
}
b->nwords = (bits >> 5) + 1;
b->nbits = bits;
b->words = malloc(sizeof(*b->words) * b->nwords);
if (b->words == NULL) {
fprintf(stderr, "Failed to alloc bitv->words\n");
exit(1);
}
memset(b->words, 0, sizeof(*b->words) * b->nwords);
return b;
}
static inline void check_bounds(struct bitv *b, int bit) {
if (b->nbits < bit) {
fprintf(stderr, "Attempted to access a bit out of range\n");
exit(1);
}
}
void bitv_set(struct bitv *b, int bit) {
check_bounds(b, bit);
b->words[bit >> 5] |= 1 << (bit % BITS_PER_WORD);
}
void bitv_clear(struct bitv *b, int bit) {
check_bounds(b, bit);
b->words[bit >> 5] &= ~(1 << (bit % BITS_PER_WORD));
}
int bitv_test(struct bitv *b, int bit) {
check_bounds(b, bit);
return b->words[bit >> 5] & (1 << (bit % BITS_PER_WORD));
}
void bitv_free(struct bitv *b) {
if (b != NULL) {
if (b->words != NULL) free(b->words);
free(b);
}
}
void bitv_dump(struct bitv *b) {
if (b == NULL) return;
for(int i = 0; i < b->nwords; i++) {
word_t w = b->words[i];
for (int j = 0; j < BITS_PER_WORD; j++) {
printf("%d", w & 1);
w >>= 1;
}
printf(" ");
}
printf("\n");
}
void test(struct bitv *b, int bit) {
if (bitv_test(b, bit)) printf("Bit %d is set!\n", bit);
else printf("Bit %d is not set!\n", bit);
}
int main(int argc, char *argv[]) {
struct bitv *b = bitv_alloc(32);
bitv_set(b, 1);
bitv_set(b, 3);
bitv_set(b, 5);
bitv_set(b, 7);
bitv_set(b, 9);
bitv_set(b, 32);
bitv_dump(b);
bitv_free(b);
return 0;
}

I use this one:
//#include <bitset>
#include <iostream>
//source http://stackoverflow.com/questions/47981/how-do-you-set-clear-and-toggle-a-single-bit-in-c
#define BIT_SET(a,b) ((a) |= (1<<(b)))
#define BIT_CLEAR(a,b) ((a) &= ~(1<<(b)))
#define BIT_FLIP(a,b) ((a) ^= (1<<(b)))
#define BIT_CHECK(a,b) ((a) & (1<<(b)))
/* x=target variable, y=mask */
#define BITMASK_SET(x,y) ((x) |= (y))
#define BITMASK_CLEAR(x,y) ((x) &= (~(y)))
#define BITMASK_FLIP(x,y) ((x) ^= (y))
#define BITMASK_CHECK(x,y) ((x) & (y))

I have recently released BITSCAN, a C++ bit string library which is specifically oriented towards fast bit scanning operations. BITSCAN is available here. It is in alpha but still pretty well tested since I have used it in recent years for research in combinatorial optimization (e.g. in BBMC, a state of the art exact maximum clique algorithm). A comparison with other well known C++ implementations (STL or BOOST) may be found here.
I hope you find it useful. Any feedback is welcome.

In micro controller development, some times we need to use
2-dimentional array (matrix) with element value of [0, 1] only. That
means if we use 1 byte for element type, it wastes the memory greatly
(memory of micro controller is very limited). The proposed solution is
that we should use 1 bit matrix (element type is 1 bit).
http://htvdanh.blogspot.com/2016/09/one-bit-matrix-for-cc-programming.html

I recently implemented a small header-only library called BitContainer just for this purpose.
It focuses on expressiveness and compiletime abilities and can be found here:
https://github.com/EddyXorb/BitContainer
It is for sure not the classical way to look at bitarrays but can come in handy for strong-typing purposes and memory efficient representation of named properties.
Example:
constexpr Props props(Prop::isHigh(),Prop::isLow()); // intialize BitContainer of type Props with strong-type Prop
constexpr bool result1 = props.contains(Prop::isTiny()) // false
constexpr bool result2 = props.contains(Prop::isLow()) // true

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

C++ operator[] access to elements of SIMD (e.g. AVX) variable - c++

Related

Type-pun uint64_t as two uint32_t in C++20

Is there a way I can use a 2-bit size type instead of an int, by just plugging in the new type name instead of int?

Optimal branchless conditional selection of two SSE2 packed doubles

C++ Making a 2D boolean matrix

C/C++ efficient bit array

Categories

Resources