I'm trying to reimplement memchr as constexpr (1). I haven't expected issues as I have already successfully done the same thing with strchr which is very simmilar.
However both clang and gcc refuse to cast const void* to anything else within constexpr function, which prevents me to access the actual values.
I understand that working with void* in constexpr function is weird, as we cannot do malloc and there is no way to specify arbitrary data as literal values. I'm doing this basically as a part of an excercise to rewrite as much as I can from as constexpr (2).
Still I'd like to know why this is not allowed and if there is any way around it.
Thank you!
(1) My implementation of memchr:
constexpr void const *memchr(const void *ptr, int ch, size_t count) {
const auto block_address = static_cast<const uint8_t *>(ptr);
const auto needle = static_cast<uint8_t>(ch);
for (uintptr_t pos{0}; pos < count; ++pos) {
auto byte_address = block_address + pos;
const uint8_t value = *byte_address;
if (needle == value) {
return static_cast<void const *>(byte_address);
}
}
return nullptr;
}
(2) The entire project on Github: https://github.com/jlanik/constexprstring
No, it is impossible to use void* in such a way in constant expressions. Casts from void* to other object pointer types are forbidden in constant expressions. reinterpret_cast is forbidden as well.
This is probably intentional to make it impossible to access the object representation at compile-time.
You cannot have a memchr with its usual signature at compile-time.
The best that I think you can do is write the function for pointers to char and its cv-qualified versions, as well as std::byte (either as overloads or as template), instead of void*.
For pointers to objects of other types it is going to be tricky in some cases and impossible in most cases to implement the exact semantics of memchr.
While I am not certain that it is possible, maybe, in a templated version of memchr, one can read the underlying bytes of the objects passed-by-pointer via a std::bit_cast into a struct containing a std::byte/unsigned char array of appropriate size.
Related
I was looking through some of the standard library's implementation for the usual containers (vector, unordered_map, etc...) when I came across the following, in the xutility header:
template <class _CtgIt, class _OutCtgIt>
_OutCtgIt _Copy_memmove(_CtgIt _First, _CtgIt _Last, _OutCtgIt _Dest) {
auto _FirstPtr = _To_address(_First);
auto _LastPtr = _To_address(_Last);
auto _DestPtr = _To_address(_Dest);
const char* const _First_ch = const_cast<const char*>(reinterpret_cast<const volatile char*>(_FirstPtr));
const char* const _Last_ch = const_cast<const char*>(reinterpret_cast<const volatile char*>(_LastPtr));
char* const _Dest_ch = const_cast<char*>(reinterpret_cast<const volatile char*>(_DestPtr));
const auto _Count = static_cast<size_t>(_Last_ch - _First_ch);
_CSTD memmove(_Dest_ch, _First_ch, _Count);
if constexpr (is_pointer_v<_OutCtgIt>) {
return reinterpret_cast<_OutCtgIt>(_Dest_ch + _Count);
} else {
return _Dest + (_LastPtr - _FirstPtr);
}
}
Does anybody know why _First_ch and _Last_ch are first cast to const volatile char* type then immediately cast to const char*? I'm assuming it's to stop the compiler from optimizing prematurely, for some specific cases, but no concrete examples come to mind.
If the target type of the pointer is volatile-qualified, it is not possible to use reinterpret_cast to directly cast to const char*.
reinterpret_cast is not allowed to cast away const or volatile. const_cast however can do this, while not being able to change the pointer's target type itself.
I think a C-style cast would also always work in this situation, but reasoning about it is a bit more difficult, since it attempts multiple C++-style conversion sequences, only the last of which is a reinterpret_cast followed by a const_cast.
It may be just a style choice to not use C-style casts here.
I'm trying to implement an array-like container with some special requirements and a subset of std::vector interface. Here is a code excerpt:
template<typename Type>
class MyArray
{
public:
explicit MyArray(const uint32_t size) : storage(new char[size * sizeof(Type)]), maxElements(size) {}
MyArray(const MyArray&) = delete;
MyArray& operator=(const MyArray&) = delete;
MyArray(MyArray&& op) { /* some code */ }
MyArray& operator=(MyArray&& op) { /* some code */ }
~MyArray() { if (storage != nullptr) delete[] storage; /* No explicit destructors. Let it go. */ }
Type* data() { return reinterpret_cast<Type*>(storage); }
const Type* data() const { return reinterpret_cast<const Type*>(storage); }
template<typename... Args>
void emplace_back(Args&&... args)
{
assert(current < maxElements);
new (storage + current * sizeof(Type)) Type(std::forward<Args>(args)...);
++current;
}
private:
char* storage = nullptr;
uint32_t maxElements = 0;
uint32_t current = 0;
};
It works perfectly well on my system, but dereferencing a pointer returned by data seems to violate strict aliasing rules. It's also a case for naive implementation of subscript operator, iterators, etc.
So what is a proper way to implement containers backed by arrays of char without breaking strict aliasing rules? As far as I understand, using std::aligned_storage will only provide a proper alignment, but will not save the code from being broken by compiler optimizations which rely on strict aliasing. Also, I don't want to use -fno-strict-aliasing and similar flags due to performance considerations.
For example, consider subscript operator (nonconstant for brevity), which is a classical code snippet from articles about UB in C++:
Type& operator[](const uint32_t idx)
{
Type* ptr = reinterpret_cast<Type*>(storage + idx * sizeof(ptr)); // Cast is OK.
return *ptr; // Dereference is UB.
}
What is a proper way to implement it without any risk to find my program broken? How is it implemented is standard containers? Is there any cheating with non-documented compiler intrinsics in all compilers?
Sometimes I see code with two static casts through void* instead of one reinterpret cast:
Type* ptr = static_cast<Type*>(static_cast<void*>(storage + idx * sizeof(ptr)));
How is it better than reinterpret cast? As to me, it does not solve any problems, but looks overcomplicated.
but dereferencing a pointer returned by data seems to violate strict aliasing rules
I disagree.
Both char* storage and a pointer returned by data() point to the same region of memory.
This is irrelevant. Multiple pointers pointing to same object doesn't violate aliasing rules.
Moreover, subscript operator will ... dereference a pointer of incompatible type, which is UB.
But the object isn't of incompatible type. In emplace_back, you use placement new to construct objects of Type into the memory. Assuming no code path can avoid this placement new and therefore assuming that the subscript operator returns a pointer which points at one of these objects, then dereferencing the pointer of Type* is well defined, because it points to an object of Type, which is compatible.
This is what is relevant for pointer aliasing: The type of the object in memory, and the type of the pointer that is dereferenced. Any intermediate pointer that the dereferenced pointer was converted from is irrelevant to aliasing.
Note that your destructor does not call the detructor of objects constructed within storage, so if Type isn't trivially destructable, then the behaviour is undefined.
Type* ptr = reinterpret_cast<Type*>(storage + idx * sizeof(ptr));
The sizeof is wrong. What you need is sizeof(Type), or sizeof *ptr. Or more simply
auto ptr = reinterpret_cast<Type*>(storage) + idx;
Sometimes I see code with two static casts through void* instead of one reinterpret cast: How is it better than reinterpret cast?
I can't think of any situation where the behaviour would be different.
Suppose we have a pointer T* ptr; and ptr, ptr+1, … ptr+(n-1) all refer to valid objects of type T.
Is it possible to access them as if they were an STL array? Or does the following code:
std::array<T,n>* ay = (std::array<T,n>*) ptr
invoke undefined behaviour?
Yes, its an Undefined Behavior, a classic one...
First, understand that what you just did:
std::array<T,n>* ay = (std::array<T,n>*) ptr
can be translated as:
using Arr = std::array<T,n>;
std::array<T,n>* ay = reinterpret_cast<Arr*>( const_cast<TypeOfPtr>(ptr));
You've not just casted away all, const and volatile qualification but also casted the type. See this answer: https://stackoverflow.com/a/103868/1621391 ...indiscriminately casting away cv qualifications can also lead to UB.
Secondly, It is undefined behavior to access an object through a pointer that was casted from an unrelated type. See the strict aliasing rule (Thanks zenith). Therefore any read or write access through the pointer ay is undefined. If you are extremely lucky, the code should crash instantly. If it works, evil days are awaiting you....
Note that std::array is not and will never be the same as anything that isn't std::array.
Just to add... In the working draft of the C++ standard, it lists out explicit conversion rules. (you can read them) and has a clause stating that
.....
5.4.3: Any type conversion not mentioned below and not explicitly defined by
the user ([class.conv]) is ill-formed.
.....
I suggest you cook up your own array_view (hopefully coming in C++17). Its really easy. Or, if you want some ownership, you can cook up a simple one like this:
template<typename T>
class OwnedArray{
T* data_ = nullptr;
std::size_t sz = 0;
OwnedArray(T* ptr, std::size_t len) : data_(ptr), sz(len) {}
public:
static OwnedArray own_from(T* ptr, std::size_t len)
{ return OwnedArray(ptr, len); }
OwnedArray(){}
OwnedArray(OwnedArray&& o)
{ data_ = o.data_; sz = o.sz; o.data_=nullptr; o.sz=0; }
OwnedArray& operator = (OwnedArray&& o)
{ delete[] data_; data_ = o.data_; sz = o.sz; o.data_=nullptr; o.sz=0; }
OwnedArray(const OwnedArray& o) = delete;
OwnedArray& operator = (const OwnedArray& o) = delete;
~OwnedArray(){ delete[] data_; }
std::size_t size() const { return sz; }
T* data() return { data_; }
T& operator[] (std::size_t idx) { return data_[idx]; }
};
...and you can roll out more member functions/const qualifications as you like. But this has caveats... The pointer must have been allocated the through new T[len]
Thus you can use it in your example like this:
auto ay = OwnedArray<decltype(*ptr)>::own_from(ptr, ptr_len);
Yes, this invokes undefined behaviour. Generally you can't cast pointers to unrelated types between each other.
The code is no different from
std::string str;
std::array<double,10>* arr = (std::array<double,10>*)(&str);
Explanation: Standard does not provide any guarantee for any compatibility between std::array<T,n> and T*. It is simply not there. It doesn't say that std::array is trivial type either. Absent such guarantees, any conversion between T* and std::array<T,n> is undefined behavior on the same scale as conversion between pointers to any unrelated types.
I also fail to see what is the benefit of accessing already constructed dynamic array as an std::array.
P.S. Usual disclaimer. Cast, on it's own, is always 100% fine. It is indirection of resulted pointer which triggers the fireworks - but this part is omited for simplicty.
I'm answering the first question here, as the second one has already been treated in the other answers:
Recap: you ...
have a pointer T* ptr; and ptr, ptr+1, … ptr+(n-1) all refer to valid objects of type T.
And you ask whether it is ...
possible to access them as if they were an STL array?
Answer: This is no problem -- but it works differently as you estimated in your code example:
std::array<T*, N> arr;
for(int i = 0; i<N; ++i)
{
arr[i] = ptr + i;
}
Now you can use the array-elements as if they were the original pointers. And there is no undefined behaviour anywhere.
Sometimes I need to learn the type of an expression while programming in C or C++. Sometimes there's a good IDE or existent documentation to help me, but sometimes not. I often feel such a construct could be useful:
void (*myFunc)(int);
printf("%s", nameoftype(myFunc)); //"void (*)(int)"
int i, unsigned int u;
printf("%s", nameoftype(i+u)); //"unsigned int"
This is especially true for C++; think accessors of const objects - do they return a const reference or a copy? Think dynamic casts and templated classes.
How can I do this? (i.e. learn the type of an expression)
I use GCC but as far as I know, it does not have such an extension. So I guess I'm curious as to how people solve this problem. (Both compile-time and runtime solutions welcome.)
Sometimes I just do:
int ***a = expression;
and look for the "<expression type> cannot be assigned to pointer-to^3 int" error. This seems to be the most portable workaround.
C++ has a typeid operator;
typeid(expression).name()
would return an implementation-defined name of the type of the expression. Alas, it is usually not human-readable.
What are you looking for? Automatic type inference or looking for the type so you can declare a variable correctly manually? (your own answers look like you want to have the second one). In this case, consider using Geordi:
<litb> make type pointer to function taking pointer to array of 10 int returning void
<geordi> void (*)(int (*)[10])
<litb> geordi: { int a = -1; unsigned int b = 0; cout << ETYPE(a + b), ETYPE_DESC(a + b), (a + b); }
<geordi> rvalue unsigned int, rvalue unsigned integer, 4294967295
<litb> geordi: << TYPE_DESC(void (*)(int (*)[10]))
<geordi> pointer to a function taking a pointer to an array of 10 integers and returning nothing
Automatic type inference is not currently possible without helper libraries like boost.typeof, which will use compiler extensions like __typeof__ for GCC. Next C++ will get auto (with different semantics than current auto) and will be able to do that, together with decltype to get the type of an expression.
If you can live with getting out of local context, you can always create a function template like this:
template<typename T> void f(T t) { /* ... */ }
int main() { int a = -1; unsigned int b = 0; f(a + b); }
Try Boost.Typeof to see if it fits.
gcc has typeof() at compile time. It works like sizeof().
http://gcc.gnu.org/onlinedocs/gcc/Typeof.html has more information.
I can't find much information on const_cast. The only info I could find (on Stack Overflow) is:
The const_cast<>() is used to add/remove const(ness) (or volatile-ness) of a variable.
This makes me nervous. Could using a const_cast cause unexpected behavior? If so, what?
Alternatively, when is it okay to use const_cast?
const_cast is safe only if you're casting a variable that was originally non-const. For example, if you have a function that takes a parameter of a const char *, and you pass in a modifiable char *, it's safe to const_cast that parameter back to a char * and modify it. However, if the original variable was in fact const, then using const_cast will result in undefined behavior.
void func(const char *param, size_t sz, bool modify)
{
if(modify)
strncpy(const_cast<char *>(param), sz, "new string");
printf("param: %s\n", param);
}
...
char buffer[16];
const char *unmodifiable = "string constant";
func(buffer, sizeof(buffer), true); // OK
func(unmodifiable, strlen(unmodifiable), false); // OK
func(unmodifiable, strlen(unmodifiable), true); // UNDEFINED BEHAVIOR
I can think of two situations where const_cast is safe and useful (there may be other valid cases).
One is when you have a const instance, reference, or pointer, and you want to pass a pointer or reference to an API that is not const-correct, but that you're CERTAIN won't modify the object. You can const_cast the pointer and pass it to the API, trusting that it won't really change anything. For example:
void log(char* text); // Won't change text -- just const-incorrect
void my_func(const std::string& message)
{
log(const_cast<char*>(&message.c_str()));
}
The other is if you're using an older compiler that doesn't implement 'mutable', and you want to create a class that is logically const but not bitwise const. You can const_cast 'this' within a const method and modify members of your class.
class MyClass
{
char cached_data[10000]; // should be mutable
bool cache_dirty; // should also be mutable
public:
char getData(int index) const
{
if (cache_dirty)
{
MyClass* thisptr = const_cast<MyClass*>(this);
update_cache(thisptr->cached_data);
}
return cached_data[index];
}
};
I find it hard to believe that that's the only information you could find about const_cast. Quoting from the second Google hit:
If you cast away the constness of an
object that has been explicitly
declared as const, and attempt to
modify it, the results are undefined.
However, if you cast away the
constness of an object that has not
been explicitly declared as const, you
can modify it safely.
What Adam says. Another example where const_cast can be helpful:
struct sample {
T& getT() {
return const_cast<T&>(static_cast<const sample*>(this)->getT());
}
const T& getT() const {
/* possibly much code here */
return t;
}
T t;
};
We first add const to the type this points to, then we call the const version of getT, and then we remove const from the return type, which is valid since t must be non-const (otherwise, the non-const version of getT couldn't have been called). This can be very useful if you got a large function body and you want to avoid redundant code.
The short answer is no, it's not safe.
The long answer is that if you know enough to use it, then it should be safe.
When you're casting, what you are essentially saying is, "I know something the compiler doesn't know." In the case of const_cast, what you are saying is, "Even though this method takes in a non-const reference or pointer, I know that it won't change the parameter I pass it."
So if you do actually know what you are claiming to know in using the cast, then it's fine to use it.
You're destroying any chance at thread-safety, if you start modifying things that the compiler thought were const.