I am implementing a particle-based fluid simulation. To represent vectors such as velocity, acceleration etc I have defined a class that looks like this
class Vec3f {
public:
float x, y, z;
// ... bunch of constructors, operators and utility functions
}
I'm using the library nanoflann for kd-tree searches. To accommodate for arbitrary class designs, nanoflann requires a user-defined adaptor class that the kd-tree class then queries to get info about the particle dataset. One function that the adaptor has to offer, as described in the nanoflann documentation is the following.
// Must return the dim'th component of the idx'th point in the class:
inline T kdtree_get_pt(const size_t idx, int dim) const { ... }
The problem is that this interface does not work seamlessly with the x, y, z representation. Naively, it would need to do something like this
inline float kdtree_get_pt(const size_t idx, int dim) const {
switch(dim) {
case 0:
return particlearray[idx].x;
case 1:
return particlearray[idx].y;
case 2:
return particlearray[idx].z;
}
}
Building and querying the kd-tree consumes a significant portion of my app's runtime and kd_tree_get_pt gets queried multiple times in the process so I need it to be optimized. The following solution should be faster.
class Vec3f {
public:
float values[3];
// ...
}
// Then inside the adaptor class
inline float kdtree_get_pt(const size_t idx, int dim) const {
return particlearrray[idx].values[dim];
}
However, I much prefer the x, y, z interface for my equations. Now that the problem is clear, my question is how can I keep the x, y, z notation for my equations without making kdtree_get_pt suboptimal.
Solutions I have considered:
Vec3f has member float values[3] and getters in the form of float& x(). The function call should be optimized away completely so this almost works but I do not want to have to add the parentheses in my equations if I can avoid it. eg I want to be able to write vec1.x - vec2.x instead of vec1.x() - vec2.x(). As far as I know, C++ does not offer a way to "disguise" a function call as a member variable, excluding preprocessor macros which I do not consider a safe solution.
Vec3f has members float values[3] and float& x, y, z where the latter are initialized to point to the corresponding floats in the array. I thought they would be optimized away as they are known at compile time and obviously cannot change value after initialization, but even with optimizations on, MSVC++ seems to actually store the float&s as can be seen by sizeof(Vec3f) growing by 12 bytes after their addition. This doubles the storage size of my dataset which raises a concern for cache misses when working with arbitrarily large datasets.
kdtree_get_pt uses float& values[3] to point to x, y, z. This might eliminate the branching cost, but I don't believe the extra level of indirection, nor the need to initialize all 3 references can be optimized away so it is presumably slower than the return particlearrray[idx][dim]` version.
kdtree_get_pt uses reinterpret_cast or pointer magic to directly point into Vec3f's members. Given a Vec3f object's address, I believe x, y, z are guaranteed to be stored in that order with the first one stored at the same address as the one given by the & operator on the Vec3f object, but even so I'm confused as to whether there exists a well-defined way to observe them.
From a software engineering standpoint, it's best to expose the data through accessor and modifier functions only.
I would suggest:
class Vec3f
{
public:
float& operator[](size_t index) { return values[index]; }
float operator[](size_t index) const { return values[index]; }
float& x() { return values[0]; }
float x() const { return values[0]; }
float& y() { return values[1]; }
float y() const { return values[1]; }
float& z() { return values[2]; }
float z() const { return values[2]; }
private:
float values[3];
}
Re: kdtree_get_pt uses reinterpret_cast or pointer magic to directly point into Vec3f's members.
That's a bad idea in general. However, I don't see that being a problem with my suggestion.
You should always check if the switch statements will really introduce branchings in the final compiled output. A tool that might help you there is godbolt.
For both of those code snippets (random and cout have been added to prevent complete removeal of the code):
#include<cstddef>
#include<array>
#include<iostream>
#include <ctime>
class Vec3f {
public:
float values[3];
};
struct Test {
std::array<Vec3f,100> particlearray;
float kdtree_get_pt(const size_t idx, int dim) const {
return particlearray[idx].values[dim];
}
};
int main() {
Test t;
std::srand(std::time(0));
int random_variable = std::rand();
std::cout << t.kdtree_get_pt(random_variable,0);
std::cout << t.kdtree_get_pt(random_variable,1);
std::cout << t.kdtree_get_pt(random_variable,2) << std::endl;
return 0;
}
and
#include<iostream>
#include<array>
#include<ctime>
#include<cstdlib>
#include<cstddef>
class Vec3f {
public:
float x, y, z;
};
struct Test {
std::array<Vec3f,100> particlearray;
float kdtree_get_pt(const size_t idx, int dim) const {
switch(dim) {
case 0:
return particlearray[idx].x;
case 1:
return particlearray[idx].y;
case 2:
return particlearray[idx].z;
}
}
};
int main() {
Test t;
std::srand(std::time(0));
int random_variable = std::rand();
std::cout << t.kdtree_get_pt(random_variable,0);
std::cout << t.kdtree_get_pt(random_variable,1);
std::cout << t.kdtree_get_pt(random_variable,2) << std::endl;
return 0;
}
The access to x, y and z or values[dim] will be compiled (by gcc 7) to:
cvtss2sd xmm0, DWORD PTR [rsp+rbx]
cvtss2sd xmm0, DWORD PTR [rsp+4+rbx]
cvtss2sd xmm0, DWORD PTR [rsp+8+rbx]
Without any branching.
There is known technique for mixing up access via x, y, z and array indices using union of identical data types. Resolves problem with UB, sizeof() is 12 bytes, access time is as fast as it can be, one could use SIMD vector in very similar fashion. Code below tested with VS2017
#include <iostream>
#include <type_traits>
template <int Size> struct VectorBase {
float _data[Size];
float operator[](int Index) {
return _data[Index];
}
};
template <typename VectorType, int Index> struct ScalarAccessor {
VectorType _v;
operator float() const {
return _v._data[Index];
}
float operator = (float x) {
_v._data[Index] = x;
return *this;
}
};
union uuu {
VectorBase<3> xyz;
ScalarAccessor<VectorBase<3>, 0> x;
ScalarAccessor<VectorBase<3>, 1> y;
ScalarAccessor<VectorBase<3>, 2> z;
};
template <int Size> struct Vector {
union
{
VectorBase<3> xyz;
ScalarAccessor<VectorBase<3>, 0> x;
ScalarAccessor<VectorBase<3>, 1> y;
ScalarAccessor<VectorBase<3>, 2> z;
};
float operator[](int Index) {
return xyz[Index];
}
};
using Vec3f = Vector<3>;
int main() {
Vec3f a;
a.x = 1.0f;
a.y = a.x + 3.0f;
a.z = a.x * 3.0f;
std::cout << sizeof(a) << "\n";
std::cout << a.x << " " << a.y << " " << a.z << "\n";
std::cout << a[0] << " " << a[1] << " " << a[2] << "\n";
std::cout << std::is_standard_layout<VectorBase<3>>::value << "\n";
std::cout << std::is_standard_layout<ScalarAccessor<VectorBase<3>, 0>>::value << "\n";
std::cout << std::is_standard_layout<ScalarAccessor<VectorBase<3>, 1>>::value << "\n";
std::cout << std::is_standard_layout<ScalarAccessor<VectorBase<3>, 2>>::value << "\n";
std::cout << std::is_standard_layout<Vec3f>::value << "\n";
std::cout << std::is_standard_layout<uuu>::value << "\n";
return 0;
}
UPDATE
Here some C++ standard reading
I'm relying on the definition of standard-layout type 12.7 Classes
A class S is a standard-layout class if it:
(7.1) — has no non-static data members of type non-standard-layout class (or array of such
types) or reference,
(7.2) — has no virtual functions (13.3) and no virtual base classes (13.1),
(7.3) — has the same access control (Clause 14) for all non-static data members,
(7.4) — has no non-standard-layout base classes, (7.5) — has at most one base class subobject of any given type
...
It is easy to check if all proposed classes are standard-layout - I've changed the code to check for that.
They are all layout-compatible, I believe
Also If a standard-layout class object has any non-static data members, its address is the same as the address of its first non-static data member.
Union is a standard-layout class as well, so we have classes aligned in union with only data member being array of the same type and size, and looks like standard requires it to be byte-by-byte compatible
However, I much prefer the x, y, z interface for my equations. Now that the problem is clear, my question is how can I keep the x, y, z
Declare x,y,z as local references before calculation:
auto& [x1, y1, z1] = v1.values;
auto& [x2, y2, z2] = v2.values;
return x1*x2 + y1*y2 + z1*z2;
For pre-C++17, you need more verbose:
auto& x = values[0];
auto& y = values[1];
auto& z = values[2];
The compiler will not need to use any storage for these references.
This of course introduces some repetition; One line (in C++17) per vector per function.
Extra parentheses introduced by your first suggestion is another good way to go. Whether the introduction of parentheses is better or worse than local reference declaration boiler plate depends on the use case and personal preference.
Edit: Another alternative: Define operator[] and use named constants for indices.
namespace axes {
enum axes {
x, y, z
};
}
struct Vec3f {
float values[3];
float& operator[](size_t index) { return values[index]; }
float operator[](size_t index) const { return values[index]; }
};
// usage
using namespace axes;
return v1[x]*v2[x] + v1[y]*v2[y] + v1[z]*v2[z];
I have on situation, where i need to keep the value float as well as int. so tried like below. but no help. can any one help on this?
union Val {
int a;
float b;
};
Val p;
p.b = 45.56;
int k = p.a; // i want k should be 45;
I see that you say:
i dont want each time it to be converted from float to int [sic]
To do that you could use user-defined conversions to accomplish this.
So your struct would look like this:
class Val {
int a;
float b;
public:
Val& operator= (const int _a) {a = _a; b = _a + fmod(b, 1.0F); return *this;}
Val& operator= (const float _b) {b = _b; a = trunc(_b); return *this;}
operator int() {return a;}
operator float() {return b;}
};
Please note that what you really want to use is simply a float with static_cast<int> For astatic_cast:
No checks are performed during runtime to guarantee that the object being converted is in fact a full object of the destination type. Therefore, it is up to the programmer to ensure that the conversion is safe. On the other side, it does not incur the overhead of the type-safety checks of dynamic_cast.
I've provided an example of using Val here: http://ideone.com/XUesib but you could accomplish the exact same thing given float foo like this:
foo = 1.3F;
cout << static_cast<int>(foo) << endl << foo << endl;
foo = 13 + fmod(foo, 1.0F);
cout << static_cast<int>(foo) << endl << foo << endl;
I have on situation, where i need to keep the value float as well as int. so tried like below
You can't do it with a union. A union can only hold a single value inside at any time. You need two separate variables. You can keep them in side a struct if you like:
struct Val {
int a;
float b;
};
Now you can have both an int and a float.
Val p;
p.b = 45.56;
p.a = p.b;
int k = p.a; // no conversion
That said, since you apparently only use a to store a converted value of b, you should measure whether the conversions even affect performance.
You can use structure with constructor, and initialize your variables in it as you wish.
struct Val {
int a;
float b;
Val(float value) {
a = b = value;
}
};
So you can use it in loop and don't worry about conversations each time, just create your Val variable outside loop and use it.
Is there a best practice to define a constant? Here is a small example:
#include <vector>
struct mystruct {
std::vector<double> data;
mystruct() : data(100000000,0) {};
};
int main(){
mystruct A;
int answer = 42;
const mystruct& use_struct_option_1 = A; // quick
const mystruct use_struct_option_2 = A; // expensive
const int& use_answer_option_1 = answer; // good practice?
const int use_answer_option_2 = answer; // ubiquitous
}
Obviously, initializing use_struct_option_2 that way is expensive because the copy constructor of mystruct is called whereas the way of initializing use_struct_option_1 is quicker.
However, does the same apply to types such as integers?
From the code I've been locking at I can tell that
const int use_answer_option_2 = answer;
is much more common than
const int& use_answer_option_1 = answer;
Which one is preferable?
These do different things. For example, in the int case:
answer = 43;
cout << use_answer_option_1 << '\n'; // 43
cout << use_answer_option_2 << '\n'; // 42
In other words, option 2 makes a copy and option 1 doesn't.
Decide whether you want to make a copy or not (i.e. whether you want to see changes to the original initializer reflected in your reference). The mystruct case is the same.
I am fiddling with a code like following:
union Data {
int i;
double x;
std::string str;
~Data(){}
};
union Data var = {.x = 31293.932};
std::cout << var.x << "\n";
std::cout << var.str << "\n";
std::cout << var.i << "\n";
As far as I know, the union have some 64 bit thing written after I set x member to some floating point number. Then I want to see corresponding string, asuming I treated those bytes as char. But I am getting segmentation fault when I try to print it as string. Why is that? I initialized the union so I assume var.str must be initialized as well.
str is not constructed. if you must use str you must either provide a constructor for it or construct it via placement new. A full example below
#include <iostream>
#include <vector>
using namespace std;
union Data
{
int i;
double x;
std::string str;
Data() {}
Data(std::string st) : str(st) {}
~Data() {}
};
int main()
{
Data var;
var.x = 31293.932;
new (&var.str) std::string("Hello World!");
std::cout << var.x << "\n";
std::cout << var.str << "\n";
std::cout << var.i << "\n";
//destroy it
var.str.std::string::~string();
}
EDIT:
Just to expand my answer a bit...
MSDN seems to have a n00bie friendly explanation about unions than cppreference. So, check: Unions - MSDN and Unions - cppreference
You should be using char to access the bytes in the union. std::string is not a POD type and can't be used in this way.
Try this instead:
union Data {
int i;
double x;
char bytes[sizeof(double)];
~Data(){}
};
union Data var = {.x = 31293.932};
std::cout << var.x << "\n";
std::cout.write(var.bytes, sizeof(var.bytes));
std::cout << "\n" << var.i << "\n";
The full definition of what a POD type is extensive. In very simple terms it is a basic data type without a explicitly-defined copy constructor, destructor, or virtual methods and does not itself contain any such types if it is an aggregate type (like struct, class, and unions).
I'm implementing an STL set with a complex template parameter type. When inserting in to the set, I want the set to use the less-than operator I've defined for my type. I also want to minimize the quantity of object instantiations of my type. It seems I can't have both.
I've got two minimal examples below, each uses the same C++ class.
#include <iostream>
#include <set>
using namespace std;
class Foo {
public:
Foo(int z);
Foo(const Foo &z);
bool operator<(const Foo &rhs) const;
int a;
};
Foo::Foo(int z)
{
cout << "cons" << endl;
a = z;
}
Foo::Foo(const Foo &z)
{
cout << "copy cons" << endl;
a = z.a;
}
bool
Foo::operator<(const Foo &rhs) const
{
cout << "less than" << endl;
return a < rhs.a;
}
Here's my first main():
int
main(void)
{
set<Foo> s;
s.insert(*new Foo(1));
s.insert(*new Foo(2));
s.insert(*new Foo(1));
cout << "size: " << s.size() << endl;
return 0;
}
That's great because it uses the less-than I've defined for my class, and thus the size of the set is correctly two. But it's bad because every insertion in to the set requires the instantiation of two objects (constructor, copy constructor).
$ ./a.out
cons
copy cons
cons
less than
less than
less than
copy cons
cons
less than
less than
less than
size: 2
Here's my second main():
int
main(void)
{
set<Foo *> s;
s.insert(new Foo(1));
s.insert(new Foo(2));
s.insert(new Foo(1));
cout << "size: " << s.size() << endl;
return 0;
}
That's great because an insertion requires just one object instantiation. But it's bad because it's really a set of pointers, and thus the uniqueness of set members is gone as far as my type is concerned.
$ ./a.out
cons
cons
cons
size: 3
I'm hoping there's some bit of information I'm missing. Is it possible for me to have both minimal object instantiations and appropriate sorting?
You are getting a copy from this: *new Foo(1).
Create this struct:
template<typename T>
struct PtrLess
{
bool operator()(const T *a, const T *b) const
{
return *a < *b;
}
};
Make the map look like set<Foo*, PtrLess<Foo>> s; and then add Foo's like s.insert(new Foo(1));
Note the *
Otherwise, when the map creates a container for the Foo item, since it is allocated within the foo containers definition, the map has to copy the supplied value into its internal Foo object.
Standard containers store a copy of the items that are added. If you want your set to store objects, rather than pointers you should simply do the following, otherwise you're creating a memory leak, since the objects allocated via new are never free'd via a corresponding delete.
int main()
{
set<Foo> s;
s.insert(Foo(1));
s.insert(Foo(2));
s.insert(Foo(1));
cout << "size: " << s.size() << endl;
return 0;
}
If you want to minimise the number of temporary objects instantiated, just use a single temporary:
int main()
{
set<Foo> s;
Foo temp(1);
s.insert(temp);
temp.a = 2;
s.insert(temp);
temp.a = 1;
s.insert(temp);
cout << "size: " << s.size() << endl;
return 0;
}
The output for this snippet (via ideone) is:
cons
copy cons
less than
less than
less than
copy cons
less than
less than
less than
size: 2
Generally, I would prefer to store the actual objects in a set<Foo> rather than pointers to objects in a set<Foo*>, since there can be no problems with object ownership (who/when new and delete need to be called), the total amount of memory allocated is smaller (for N items you need N*sizeof(Foo) rather than N*(sizeof(Foo) + sizeof(Foo*)) bytes) and data access could typically be expected to be faster (since there's no extra pointer indirection).
Hope this helps.
This is an extension to #Mranz's answer. Instead of dealing with raw pointers, put the pointers in an std::unique_ptr
#include <memory>
using namespace std;
template<typename T>
struct PtrLess
{
bool operator()(const T& a, const T& b) const
{
return *a < *b;
}
};
int
main(void)
{
set<unique_ptr<Foo>, PtrLess<unique_ptr<Foo>>> s;
s.insert(unique_ptr<Foo>(new Foo(1)));
s.insert(unique_ptr<Foo>(new Foo(2)));
s.insert(unique_ptr<Foo>(new Foo(1)));
cout << "size: " << s.size() << endl;
return 0;
}