Should I replace (void*, size) with a GSL span? - c++

Suppose I have
int foo(void* p, size_t size_in_bytes);
and assume it doesn't make sense to make foo typed. I want to be a good coder and apply the C++ core guidelines. Specifically, I want to use spans instead of (*, len) pairs. Well, span<void> won't compile (can't add to a void *); and span<char> or span<uint8_t> etc. would imply foo actually expects chars, which it might not.
So should I use a span<something-with-size-1> in this case, or stick with void*?

There can be no general answer to this question.
For a function to say that it takes a span<T> means that it takes a contiguous array of values, without any form of ownership transference. If that description does not reasonably represent what is going on, then it should not take a span<T>.
For example:
What if the function checks whether the buffer intersects a region in my memory space which is, say, mapped to a file?
That doesn't sound like a span<T>. That sounds like you should have a simple aggregate with a name that makes it clear what it means:
struct memory_region
{
void* p;
size_t size_in_bytes;
};
You could even give it a member function for testing intersections. If you are making a system for dealing with such regions of memory, I might advise a more encapsulated class type with constructors and such.
What type the function takes should explain what the data means. Preferably this meaning would be in a general sense, but at the very least, it should say what it means for the function in question.
One more thing:
or span<uint8_t> etc. would imply foo actually expects chars
No, it would not. While uint8_t will almost certainly have the same size as an unsigned char, that does not mean that one would expect to be able to pass an array of characters to any function which takes span<uint8_t>. If that function wanted to advertise that it accepted characters, it would have used unsigned char.
I meant to say span<whatever> would imply the function expect whatever's.
Yes, the requirement for spans is that it is passed an actual array of Ts of the given size.

The proposal before the C++ standardization committee is that if you want to pass around a pointer to a sequence of bytes (which is often what people want to do when passing void* around), then you would pass a span<std::byte>, which relies upon a new std::byte type. However, that requires a small change to the language standard to be legal.
In today's C++, you could pass span<unsigned char> (typedef'd as you find most descriptive) and get the same effect: access to a sequence of bytes.

What I chose to do, and what I think is, shall we say, sound design-wise, is implement a class named memory_region, which has all of the type-inspecific functionality of gsl::span (hence, for example, it doesn't have a begin() or an end()). It is not the same thing as a span of bytes, IMO - and I can structurally never get them mixed up.
Here's my implementation (it's part of a repository of DBMS-related GPU kernels and a testing framework I'm working on, hence the CUDA-related snippet; and it depends on some GSL, in my case gsl-lite by MS'es should be ok too I think).

Related

How to implement/use: class for a dynamic array of fixed size (known only at run time)

I'm introducing myself to C++, and sadly it's starting to seem like the support for dynamically created arrays of fixed size (but with the size known only at run time) is very poor in C++, as new[] can't call an arbitrary user-specified constructor with user-set arguments.
Consider class A which has a number of constructors, each with some parameters. Assume that a constructor without parameters would be useless (I don't want to have to write one if I essentially don't need it). I guess the following doesn't matter, but, just in case: assume that A contains only a possibly large std::vector<Internal> (Internal is a private class, T and S parameterize A) and an integer counter as far as data members go. Also, A is parameterized.
Assume we want n instances of A stored contiguously in memory as an array, where n is determined at run time and constant afterwards. We want to be able create and initialize the structure with a single call that passes arguments to a constructor of A, or something similar. So each instance in the array gets the same, but programmatic initialization. EDIT: sorry, I didn't mean to say I want O(1) initialization, as that's impossible, I just wanted O(n) initialization, but so that I can create the array in one statement. I.e., so that I don't have to write an initialization loop for every array I create.
A possible, but suboptimal, solution is std::vector<A<T,S>>, but assume we can't live with the inefficiency. (Remember that std::vector supports resizing.)
How to implement and/or use an efficient solution with a nice API?
I would prefer a solution that doesn't reimplement half of the standard library, i.e. consider C++20 features and the standard library available for the implementation. Also, don't make me violate the C++ aliasing rules.
A possibly related question is why is such a "fixed_size_vector" class missing from the standard library?
(BTW: not that it matters, but please don't say "just use vector", because in this case I'm indeed going to go with the mentioned suboptimal solution, as the performance is not significant for my toy program, but in the real world the performance will matter one day and I want to be prepared. EDIT: I did not mean I want to optimize my toy program, rather I was referring to the fact that one day I will have to optimize some other program.)
EDIT: answering to some commenters: wrapping std::vector could provide the right abstraction, but it would be unnecessarily inefficient. A comment linked a question whose top answer explains this nicely:
dynarray is smaller and simpler than vector, because it doesn't need
to manage separate size and capacity values, and it doesn't need to
store an allocator
(dynarray here was a proposed addition to stdlib that seems to be what I wanted, except that it was also supposed to rely on special compiler support for some of its semantics). Of course, this difference compared to std::vector won't matter most of the time, but it would still be good if I was able to simply use the right tool for the job.
There is a proposal to add a fixed capacity vector to the standard.
Note that this proposal proposes the capacity be known at compile-time, so it's not applicable in your case.
There are also some open source libraries that implement one, e.g., Boost's static_vector, or . If you really want a fixed-capacity vector, you can use one of the open source implementations that exist out there.
If you really know what you're doing, you could write one on your own, but that's not the case for >99% of C++ users.
However, it should be noted that reserve()ing space on a vector will probably have the effect you want, and there's probably no need for an actual fixed capacity vector.
Since you mention that the size is only known at runtime this is exactly what std::vector is meant to be used for.
template <typename T, typename...Args>
auto make_vector(std::size_t size, const Args&...args) -> std::vector<T>
{
auto result = std::vector<T>{};
result.reserve(size); // whatever the known size is
for (auto i = 0; i < size; ++i) {
result.emplace_back(args...);
}
return result;
}
// Use like:
auto vec = make_vector<std::string>(20, "hello world");
This will pre-allocate enough room for size entries of type T, and the loop will call T's constructor with whatever arguments you pass it.
Be aware that:
No additional constructors are called.
No extra memory is used.
No copies or relocations are performed.
The returned vector is not copied (or even moved) with c++17 or above thanks to guaranteed copy elision.
Doing this is as optimal as you can get whether you use a specialized container or otherwise. This is why every experienced C++ developer will tell you the same thing: std::vector is the solution.[2]
Note: The above function uses const Args&... for propagation and not proper forwarding references, since rvalue references could result in use-after-move bugs.[1]
A specialized container like a fixed_size_vector that you mention will either be one of two things:
Fixed at compile-time on the max size, in which case it wouldn't work for you since you mentioned the size is only known at runtime
Fixed at runtime on the max size, in which case it will do exactly what I suggested above, since it will reserve the storage space up-front.
It is not possible at the language level to dynamically construct N objects only known at runtime using a custom constructor. Full stop. This could be done if the sequence is known at compile-time, but not runtime.
C++ is statically compiled, so we cannot variadically expand a runtime n value into a pack of T{...} constructor calls; it's simply not possible. This means there will be a loop every time. Thus the most optimal thing you can do is allocate n objects once, and call T's constructor n times.
[1] A short-hand syntax for passing a list of arguments to all of a sequences constructors is not a good general solution in C++. In fact, it would be suboptional. This would either force copies via const lvalue references, or it would allow for rvalues -- in which case only the first object constructed will get a valid value, and everything after will receive a use-after-moved object! Just imagine unique_ptr to a sequence of T's. Only the first instance will get a valid pointer, and everything else will receive nullptr
[2] Honestly, about the only real optimization you might be able to make on this solution would be to use a custom allocator, such as a std::pmr::vector with a stack-allocated memory buffer resource.
Footnote
I strongly advise you to get over the "efficiency first" mentality. Most developers' intuition on what is and is not efficient is wrong; this is why profilers are so important. Things like speculative execution, cache locality, and pipelining play a huge role in performance -- and these things are far more complex than simply constructing a dynamic array of objects.
Real software is written for other developers, not for the machine. It's better to have code that is maintainable and scalable, and optimized in places where bottlenecks have been identified through proper tooling.

Why is a function not an object?

I read in the standards n4296 (Draft) § 1.8 page 7:
An object is a region of storage. [ Note: A function is not an object,
regardless of whether or not it occupies storage in the way that
objects do. —end note ]
I spent some days on the net looking for a good reason for such exclusion, with no luck. Maybe because I do not fully understand objects. So:
Why is a function not an object? How does it differ?
And does this have any relation with the functors (function objects)?
A lot of the difference comes down to pointers and addressing. In C++¹ pointers to functions and pointers to objects are strictly separate kinds of things.
C++ requires that you can convert a pointer to any object type into a pointer to void, then convert it back to the original type, and the result will be equal to the pointer you started with². In other words, regardless of exactly how they do it, the implementation has to ensure that a conversion from pointer-to-object-type to pointer-to-void is lossless, so no matter what the original was, whatever information it contained can be recreated so you can get back the same pointer as you started with by conversion from T* to void * and back to T*.
That's not true with a pointer to a function though--if you take a pointer to a function, convert it to void *, and then convert it back to a pointer to a function, you may lose some information in the process. You might not get back the original pointer, and dereferencing what you do get back gives you undefined behavior (in short, don't do that).
For what it's worth, you can, however, convert a pointer to one function to a pointer to a different type of function, then convert that result back to the original type, and you're guaranteed that the result is the same as you started with.
Although it's not particularly relevant to the discussion at hand, there are a few other differences that may be worth noting. For example, you can copy most objects--but you can't copy any functions.
As far as relationship to function objects goes: well, there really isn't much of one beyond one point: a function object supports syntax that looks like a function call--but it's still an object, not a function. So, a pointer to a function object is still a pointer to an object. If, for example, you convert one to void *, then convert it back to the original type, you're still guaranteed that you get back the original pointer value (which wouldn't be true with a pointer to a function).
As to why pointers to functions are (at least potentially) different from pointers to objects: part of it comes down to existing systems. For example, on MS-DOS (among others) there were four entirely separate memory models: small, medium, compact, and large. Small model used 16 bit addressing for either functions or data. Medium used 16 bit addresses for data, and 20-bit addresses for code. Compact reversed that (16 bit addresses for code, 20-bit addresses for data). Large used 20-bit addresses for both code and data. So, in either compact or medium model, converting between pointers to code and pointers to functions really could and did lead to problems.
More recently, a fair number of DSPs have used entirely separate memory buses for code and for data and (like with MS-DOS memory models) they were often different widths, converting between the two could and did lose information.
These particular rules came to C++ from C, so the same is true in C, for whatever that's worth.
Although it's not directly required, with the way things work, pretty much the same works out to be true for a conversion from the original type to a pointer to char and back, for whatever that's worth.
Why a function is not an object? How does it differ?
To understand this, let's move from bottom to top in terms of abstractions involved. So, you have your address space through which you can define the state of the memory and we have to remember that fundamentally it's all about this state you operate on.
Okay, let's move a bit higher in terms of abstractions. I am not taking about any abstractions imposed by a programming language yet (like object, array, etc.) but simply as a layman I want to keep a record of a portion of the memory, lets call it Ab1 and another one called Ab2.
Both have a state fundamentally but I intend to manipulate/make use of the state differently.
Differently...Why and How?
Why ?
Because of my requirements (to perform addition of 2 numbers and store the result back, for example). I will be using use Ab1 as a long usage state and Ab2 as relatively shorter usage state. So, I will create a state for Ab1(with the 2 numbers to add) and then use this state to populate some of state of Ab2(copy them temporarily) and perform further manipulation of Ab2(add them) and save a portion of resultant Ab2 to Ab1(the added result). Post that Ab2 becomes useless and we reset its state.
How?
I am going to need some management of both the portions to keep track of what words to pick from Ab1 and copy to Ab2 and so on. At this point I realize that I can make it work to perform some simple operations but something serious shall require a laid out specification for managing this memory.
So, I look for such management specification and it turns out there exists a variety of these specifications (with some having built-in memory model, others provide flexibility to manage the memory yourself) with a better design. In-fact because they(without even dictating how to manage the memory directly) have successfully defined the encapsulation for this long lived storage and rules for how and when this can be created and destroyed.
The same goes for Ab2 but the way they present it makes me feel like this is much different from Ab1. And indeed, it turns out to be. They use a stack for state manipulation of Ab2 and reserve memory from heap for Ab1. Ab2 dies after a while.(after finished executing).
Also, the way you define what to do with Ab2 is done through yet another storage portion called Ab2_Code and specification for Ab1 involves similarly Ab1_Code
I would say, this is fantastic! I get so much convenience that allows me to solve so many problems.
Now, I am still looking from a layman's perspective so I don't feel surprised really having gone through the thought process of it all but if you question things top-down, things can get a bit difficult to put into perspective.(I suspect that's what happened in your case)
BTW, I forgot to mention that Ab1 is called an object officially and Ab2 a function stack while Ab1_Code is the class definition and Ab2_Code is the function definition code.
And it is because of these differences imposed by the PL, you find that they are so different.(your question)
Note: Don't take my representation of Ab1/Object as a long storage abstraction as a rule or a concrete thing - it was from layman perspective. The programming language provides much more flexibility in terms of managing lifecycle of an object. So, object may be deployed like Ab1 but it can be much more.
And does this have any relation with the functors (function objects)?
Note that the first part answer is valid for many programming languages in general(including C++), this part has to do specifically with C++ (whose spec you quoted). So you have pointer to a function, you can have a pointer to an object too. Its just another programming construct that C++ defines. Notice that this is about having a pointer to the Ab1, Ab2 to manipulate them rather than having another distinct abstraction to act upon.
You can read about its definition, usage here:
C++ Functors - and their uses
Let me answer the question in simpler language (terms).
What does a function contain?
It basically contains instructions to do something. While executing the instructions, the function can temporarily store and / or use some data - and might return some data.
Although the instructions are stored somewhere - those instructions themselves are not considered as objects.
Then, what are the objects?
Generally, objects are entities which contain data - which get manipulated / changed / updated by functions (the instructions).
Why the difference?
Because computers are designed in such way that the instructions do not depend on the data.
To understand this, let's think about a calculator. We do different mathematical operations using a calculator. Say, if we want to add some numbers, we provide the numbers to the calculator. No matter what the numbers are, the calculator will add them in the same way following the same instructions (if the result exceeds the calculator's capacity to store, it will show an error - but that is because of calculator's limitation to store the result (the data), not because of its instructions for addition).
Computers are designed in the similar manner. That is why when you use a library function (for example qsort()) on some data which are compatible with the function, you get the same result as you expect - and the functionality of the function doesn't change if the data changes - because the instructions of the function remains unchanged.
Relation between function and functors
Functions are set of instructions; and while they are being executed, some temporary data can be required to store. In other words, some objects might be temporarily created while executing the function. These temporary objects are functors.

C++ and box2d: userdata cast int to void*

i'm pretty new to box2d and i'm trying to use the userdata (of type void*) field in the b2body object to store an int value (an enum value, so i know the type of the object).
right now i'm doing something this:
int number = 1023;
void* data = (void*)(&number);
int rNumber = *(int*)data;
and i get the value correctly, but as i've been reading around casting to void* it's not portable or recommendable... is my code cross-platform? is it's behavior defined or implementation dependent?
Thanks!
Casting to void * is portable. It is not recommended because you are losing type safety. Anything can be put into a void * and anything can be gotten out. It makes it easier to shoot yourself in the foot. Otherwise void * is fine as long as you are careful and extra cautious.
You are actually not casting int to void*, you cast int* to void*, which is totally different.
A pointer to any type can be stored in a void*, and be cast back again to the same type. That is guaranteed to work.
Casting to any other type is not portable, as the language standard doesn't say that different pointers have to be the same size, or work the same way. Just that void* has to be wide enough to contain them.
One of the problems with void* is that you need to know (keep track of) what type it originally was in order to cast it properly. If it originally was a float and you case it to an int the compiler would just take your word for it.
To avoid this you could create a wrapper for your data which contains the type, that way you will be able to always cast it to the right type.
Edit: you should also make a habit of using C++ casting style instead of C i.e. reinterpret_cast
void * is somehow a relic of the past (ISO C), but very convenient. You can use it safely as far as you are careful casting back and forward the type you want. Consider other alternatives like the c++ class system or overloading a function
anyways you will have better cast operators, some times there is no other way around (void*), some other times they are just too convenient.
It can lead to non portable code, not because of the casting system, but because some people are tempted to do non portable operations with them. The biggest problem lies in the fact that (void*) is a as big as a memory address, which in many platforms happens to be also the length of the platform integers.
However in some rare exceptions size(void*) != size(int)
If you try to do some type of operations/magic with them without casting back to the type you want, you might have problems. You might be surprised of how many times I have seen people wanting to store an integer into a void* pointer
To answer your question, yes, it's safe to do.
To answer the question you didn't ask, that void pointer isn't meant for keeping an int in. Box2D has that pointer for you to point back to an Entity object in your game engine, so you can associate an Entity in your game to a b2Body in the physics simulation. It allows you to more easily program your entities interact with one another when one b2Body interacts with another.
So you shouldn't just be putting an enum in that void*. You should be pointing it directly to the game object represented by that b2body, which could have an enum in it.

Dynamic type dereferrencing?

In attempting to answer another question, I was intrigued by a bout of curiousity, and wanted to find out if an idea was possible.
Is it possible to dynamically dereference either a void * pointer (we assume it points to a valid referenced dynamically allocated copy) or some other type during run time to return the correct type?
Is there some way to store a supplied type (as in, the class knows the void * points to an int), if so how?
Can said stored type (if possible) be used to dynamically dereference?
Can a type be passed on it's own as an argument to a function?
Generally the concept (no code available) is a doubly-linked list of void * pointers (or similar) that can dynamically allocated space, which also keep with them a copy of what type they hold for later dereference.
1) Dynamic references:
No. Instead of having your variables hold just pointers, have them hold a struct containing both the actual pointer and a tag defining what type the pointer is pointing to
struct Ref{
int tag;
void *ref;
};
and then, when "dereferencing", first check the tag to find out what you want to do.
2) Storing types in your variables, passing them to functions.
This doesn't really make sense, as types aren't values that can be stored around. Perhaps what you just want is to pass around a class / constructor function and that is certainly feasible.
In the end, C and C++ are bare-bones languages. While a variable assignment in a dynamic language looks a lot like a variable assignment in C (they are just a = after all) in reality the dynamic language is doing a lot of extra stuff behind the scenes (something it is allowed to do, since a new language is free to define its semantics)
Sorry, this is not really possible in C++ due to lack of type reflection and lack of dynamic binding. Dynamic dereferencing is especially impossible due to these.
You could try to emulate its behavior by storing types as enums or std::type_info* pointers, but these are far from practical. They require registration of types, and huge switch..case or if..else statements every time you want to do something with them. A common container class and several wrapper classes might help achieving them (I'm sure this is some design pattern, any idea of its name?)
You could also use inheritance to solve your problem if it fits.
Or perhaps you need to reconsider your current design. What exactly do you need this for?

Deleting a element from a vector of pointers in C++

I remember hearing that the following code is not C++ compliant and was hoping someone with much more C++ legalese than me would be able to confirm or deny it.
std::vector<int*> intList;
intList.push_back(new int(2));
intList.push_back(new int(10));
intList.push_back(new int(17));
for(std::vector<int*>::iterator i = intList.begin(); i != intList.end(); ++i) {
delete *i;
}
intList.clear()
The rationale was that it is illegal for a vector to contain pointers to invalid memory. Now obviously my example will compile and it will even work on all compilers I know of, but is it standard compliant C++ or am I supposed to do the following, which I was told is in fact the standard compliant approach:
while(!intList.empty()) {
int* element = intList.back();
intList.pop_back();
delete element;
}
You code is valid, but the better solution will be to use smart pointers.
The thing is that all requirements to std::vector are located in 23.2.4 section of C++ Standard. There're no limitations about invalid pointers. std::vector works with int* as with any other type (we doesn't consider the case of vector<bool>), it doesn't care where they are point to.
Your code is fine. If you're worried for some reason about the elements being invalid momentarily, then change the body of the loop to
int* tmp = 0;
swap (tmp, *i);
delete tmp;
The C++ philosophy is to allow the programmer as much latitude as possible, and to only ban things that are actually going to cause harm. Invalid pointers do no harm in themselves, and therefore you can have them around freely. What will cause harm is using the pointer in any way, and that therefore invokes undefined behavior.
Ultimately, this is a question of personal taste more than anything. It's not "standards non-compliant" to have a vector that contains invalid pointers, but it is dangerous, just like it's dangerous to have any pointer that points to invalid memory. Your latter example will ensure that your vector never contains a bad pointer, yes, so it's the safest choice.
But if you knew that the vector would never be used during your former example's loop (if the vector is locally scoped, for example), it's perfectly fine.
Where did you hear that? Consider this:
std::vector<int *> intList(5);
I just created a vector filled with 5 invalid pointers.
In storing raw pointers in a container (I wouldn't recommend this) then having to do a 2 phase delete, I would choose your first option over the second.
I believe container::clear() will delete the contents of the map more efficiently than popping a single item at a time.
You could probably turn the for loop into a nice (psuedo) forall(begin(),end(),delete) and make it more generic so it didn't even matter if you changed from vector to some other container.
I don't believe this is an issue of standards compliance. The C++ standards define the syntax of the language and implementation requirements. You are using the STL which is a powerful library, but like all libraries it is not part of C++ itself...although I guess it could be argued that when used aggressively, libraries like STL and Qt extend the language into a different superset language.
Invalid pointers are perfectly compliant with the C++ standards, the computer just won't like it when you dereference them.
What you are asking is more of a best practices question. If your code is multi-threaded and intList is potentially shared, then your first approach may be more dangerous, but as Greg suggested if you know that intList can't be accessed then the first approach may be more efficient. That said, I believe safety should usually win in a trade-off until you know there is a performance problem.
As suggested by the Design by Contract concept, all code defines a contract whether implicit or explicit. The real issue with code like this is what are you promising the user: preconditions, postconditions, invariants, etc. The libraries make a certain contract and each function you write defines its own contract. You just need to pick the appropriate balance for you code, and as long as you make it clear to the user (or yourself six months from now) what is safe and what isn't, it will be okay.
If there are best practices documented with with an API, then use them whenever possible. They probably are best practices for a reason. But remember, a best practice may be in the eye of the beholder...that is they may not be a best practice in all situations.
it is illegal for a vector to contain
pointers to invalid memory
This is what the Standard has to say about the contents of a container:
(23.3) : The type of objects stored in these components must meet the requirements of CopyConstructible types (20.1.3), and the additional requirements of Assignable types.
(20.1.3.1, CopyConstructible) : In the following Table 30, T is a type to be supplied by a C + + program instantiating a template, t is a value of type T, and u is a value of type const T.
expression return type requirement
xxxxxxxxxx xxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
T(t) t is equivelant to T(t)
T(u) u is equivelant to T(u)
t.~T()
&t T* denotes the address of t
&u const T* denotes the address of u
(23.1.4, Assignable) : 64, T is the type used to instantiate the container, t is a value of T, and u is a value of (possibly
const) T.
expression return type requirement
xxxxxxxxxx xxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
t = u T& t is equivilant to u
That's all that is says about the contents of an STL collection. It says nothing about pointers and it is particularly silent about the pointers pointing to valid memory.
Therefore, deleteing pointers in a vector, while most likely a very bad architectural decision and an invitation to pain and suffering with the debugger at 3:00 AM on a Saturday night, is perfectly legal.
EDIT:
Regarding Kranar's comment that "assigning a pointer to an invalid pointer value results in undefined behavior." No, this is incorrect. This code is perfectly valid:
Foo* foo = new Foo();
delete foo;
Foo* foo_2 = foo; // This is legal
What is illegal is trying to do something with that pointer (or foo, for that matter):
delete foo_2; // UB
foo_2->do_something(); // UB
Foo& foo_ref = *foo_2; // UB
Simply creating a wild pointer is legal according to the Standard. Probably not a good idea, but legal nonetheless.
EDIT2:
More from the Standard regarding pointer types.
So sayeth the Standard (3.9.2.3) :
... A valid value of an object pointer
type represents either the address of
a byte in memory (1.7) or a null
pointer (4.10)...
...and regarding "a byte in memory," (1.7.1) :
The fundamental storage unit in the C
+ + memory model is the byte. A byte is at least large enough to contain
any member of the basic execution
character set and is composed of a
contiguous sequence of bits, the
number of which is
implementation-defined. The least
significant bit is called the
low-order bit; the most significant
bit is called the high-order bit. The
memory available to a C + + program
consists of one or more sequences of
contiguous bytes. Every byte has a
unique address.
There is nothing here about that byte being part of a living Foo, about you having access to it, or anything of the sort. Its just a byte in memory.