compile-time variable-length objects based on string - c++

Related SO questions:
variable size struct
string-based generator
Sadly neither one (or other similar ones) provide the solution I'm looking for.
Background
USB descriptors are (generally) byte-array structures. A "string descriptor" is defined as an array of bytes, that begins with a standard "header" of 2 bytes, followed by a string of UNICODE (16-bit) characters.
For example a USB string descriptor of value "AB" would have the following sequence of bytes:
0x06 0x03 0x41 0x00 0x42 0x00
where 0x06 is the total size of the descriptor (including the header), 0x03 is its "type" (defined by the standard)
Current (unsatisfactory) approach:
// other types omitted for clarity
enum UsbDescriptorType: uint8_t { USB_DESCR_STRING = 0x03 };
struct UsbDescrStd {
uint8_t bLength;
UsbDescriptorType bDescriptorType;
};
template<size_t N>
struct UsbDescrString final: UsbDescrStd {
char str[N * 2];
constexpr UsbDescrString(const char s[N]) noexcept
: UsbDescrStd{sizeof(*this), UsbDescriptorType::USB_DESCR_STRING}
, str {}
{
for(size_t i = 0; i < N; ++i)
str[i * 2] = s[i];
}
};
Below are the examples of its usage and short comments on why they are "not good enough" for me:
// requires size information
constexpr UsbDescrString<9> uds9{"Descr str"};
// string duplication
constexpr UsbDescrString<sizeof("Descr str")-1> udsa{"Descr str"};
// requires an explicit string storage
constexpr auto UsbDescrStrTxt{"Descr str"};
constexpr UsbDescrString<sizeof(UsbDescrStrTxt)-1> udsa2{UsbDescrStrTxt};
// ugly use of a macro
#define MAKE_UDS(name, s) UsbDescrString<sizeof(s)-1> name{s}
constexpr MAKE_UDS(udsm, "Descr str");
"String argument to template" is explicitly prohibited as of C++20, cutting that solution off as well.
What I'm trying to achieve
Ideally I'd love to be able to write code like the following:
constexpr UsbDescrString uds{"Descr str"}; // or a similar "terse" approach
It is simple, terse, error-resistant, and to the point. And I need help writing my UsbDescrString in a way that allows me to create compile-time objects without unnecessary code bloat.

Adding a CTAD to UsbDescrString should be enough
template<size_t N>
struct UsbDescrString final: UsbDescrStd {
char str[N * 2];
constexpr UsbDescrString(const char (&s)[N+1]) noexcept
: UsbDescrStd{sizeof(*this), UsbDescriptorType::USB_DESCR_STRING}
, str {}
{
for(size_t i = 0; i < N; ++i)
str[i * 2] = s[i];
}
};
template<size_t N>
UsbDescrString(const char (&)[N]) -> UsbDescrString<N-1>;
Note that in order to prevent array to pointer decay, const char (&) needs to be used as the constructor parameter.
Demo
"String argument to template" is explicitly prohibited as of C++20,
cutting that solution off as well.
However, thanks to P0732, with the help of some helper classes such as basic_fixed_string, now in C++20 you can
template<fixed_string>
struct UsbDescrString final: UsbDescrStd;
constexpr UsbDescrString<"Descr str"> uds9;

Related

Why this __METHOD__NAME__ requires a memory copy?

The idea is from https://stackoverflow.com/a/15775519/1277762
I added a '\0' at end of string_view so we can use data() for printf, spdlog, et al.
I use this macro to print function name with class name.
However, I find that the compiler is not smart enough to inline the string, but requires a memory copy to stack first:
https://godbolt.org/z/bqob37G3z
See the difference between CALL and CALLFUNC in main function.
Is it possible to tell compiler just put the string in some RO section, like const char *?
template<std::size_t N>
consteval const std::array<char, N> __get_function_name(const char * arr)
{
std::array<char, N> data {};
std::string_view prettyFunction(arr);
size_t bracket = prettyFunction.rfind("(");
size_t space = prettyFunction.rfind(" ", bracket) + 1;
size_t i;
for (i = 0; i < bracket - space; i += 1) {
data[i] = arr[space + i];
}
data[i] = '\0';
return data;
}
#define __METHOD_NAME__ __get_function_name<strlen(__PRETTY_FUNCTION__)>(__PRETTY_FUNCTION__).data()
Thanks #n.m. and user17732522. I finally get a workable version:
https://godbolt.org/z/sof1j3Md4
Still not perfect as this solution needs search '(' and ' ' twice, which might increase compile time.
Updated: only call rfind(" ") once: https://godbolt.org/z/zYcajqaje
#include <array>
#include <string_view>
constexpr size_t my_func_end(const char * arr)
{
std::string_view prettyFunction(arr);
return prettyFunction.rfind("(");
}
constexpr size_t my_func_start(const char * arr, const size_t end)
{
std::string_view prettyFunction(arr);
return prettyFunction.rfind(" ", end) + 1;
}
template<std::size_t S, std::size_t E>
constexpr const std::array<char, E - S + 1> my_get_function_name(const char * arr)
{
std::array<char, E - S + 1> data {};
size_t i;
for ( i = 0; i < E - S; i += 1) {
data[i] = arr[S + i];
}
data[i] = '\0';
return data;
}
template<auto tofix>
struct fixme
{
static constexpr decltype(tofix) fixed = tofix;
};
#define __METHOD_NAME_ARRAY__(x) my_get_function_name<my_func_start(x, my_func_end(x)), my_func_end(x)>(x)
#define __METHOD_NAME__ (fixme<__METHOD_NAME_ARRAY__(__PRETTY_FUNCTION__)>::fixed.data())
Not sure if compiler can cache the calculation. If yes, it is good enough.
The compiler doesn't realize or take into account the specific behavior of puts, namely that it doesn't store the pointer it is passed and doesn't call its caller again.
The problem is that without this knowledge the compiler has to take into account the possibility that puts will store the pointer it is passed in e.g. a global variable, then calls its caller again, and then compares the new pointer argument with the old one stored in the global variable. These must compare unequal because they are pointers into different temporary objects, both in their lifetime. So the compiler can't reuse the same read-only static memory location as the argument to the puts call.
So you need to tell the compiler explicitly to use a static memory location:
#define METHOD_NAME get_function_name<my_strlen(__PRETTY_FUNCTION__)>(__PRETTY_FUNCTION__)
#define CALL() { static constexpr auto v = METHOD_NAME; puts(v.data()); }
Technically the constexpr on v is redundant with consteval on the function. Just constexpr on both would also be sufficient.
If you don't add constexpr on v you might want to add const though to make sure that the compiler won't need to consider the possibility that puts will modify the contents of the string, although it seems that GCC in your example is aware of that. (That puts takes a const char* as argument is not sufficient to establish this.)
You can't use strlen there by the way (assuming you want this to be portable to some other compiler beyond GCC that is supporting __PRETTY_FUNCTION__ in the way you are using it, e.g. Clang with libc++). That GCC is allowing it without diagnostic is not standard-conforming and it is not guaranteed to work on other compilers. std::strlen is not marked constexpr per standard. You can use std::char_traits<char>::length instead, which is constexpr since C++17.

concatenating uint16_t and uint32_t values for hashing

I am trying concatenating (not adding) 2 uint16_t struct members and 2 uint32_t struct members and assigning the result to const void *p for the purpose of hashing. The struct and concatenation function that I am trying to implement is as follows.
struct xyz {
....
uint32_t a;
uint32_t b;
....
uint16_t c;
uint16_t d;
....
}
const void *p=concatenation(xyz.a,xyz.b,xyz.c,xyz.d)
Edited:
I have to use pre-defined hash functions. The most suitable hash function for my task seems to be this.
uint32_t hash(const uint32_t p[], size_t n)
{
//Returns the hash of the 'n' 32-bit words at 'p'
}
or
uint32_t hash64(const uint64_t p[], size_t n)
{
//Returns the hash of the 'n' 64-bit words at 'p'
}
for the purpose of hashing
In this case, I'd rather prefer providing a custom hash function – or specialise std::hash for. For use with standard templates, this might look like this:
namespace std // any extension of std namespace is UB
// sole exception: specialising templates, which we are going to do
{
template <>
struct hash<xyz>
{
size_t operator()(xyz const& i) const
{
// TODO: need to calculate the value from a, b, c, and d appropriately
return 0;
};
};
// if xyz is polymorphic, you might need to operate on pointers
// no problem either:
template <>
struct hash<xyz*>
{
size_t operator()(xyz const* i) const
{
return hash<xyz>()(*i);
// or if hash value is type dependent:
return i->hash(); // custom virtual hash member function needs to be implented
}
}
// now you can have
std::unordered_set<xyz> someSet;
void demo()
{
someSet.insert(xyz());
}
(Untested code, in case of errors please fix yourself.)
A list of hashing algorithms which might be used can be found at wikipedia.
If you want the value to fit into a pointer, the full value can be 32 bits on x86 or 64 bits on x64. I'm going to assume you are compiling for 64 bit machines.
This means you can only fit 2 uint16 and one uint32, or 2 uint32s.
Either way, you would shift the values into a uint64 (c | (d << 16) | (c << 32)) and then convert that value to a void*.
Edit: for clarification, you cannot fit all the structs members bit shifted one after another into a single pointer. You need a minimum of 96 bits to hold the packed struct which means at least two 64 bit pointers.
There are a few things to consider:
Does that hash value need to be portable across systems? If it does, then you will need to be careful to order the bytes the same way on different systems. If not, then the implementation can be simpler.
Do you want to hash every member of the class, and the class has no padding, and no value of a member should be hashed equally to another different value?
If both of these simplifications apply, then your function is fast and easy to implement but violating that precondition will break the hash. If not, then you must serialise the the data into a buffer, which practically means that you cannot simply return a pointer.
Here is a super simple implementation for the case that you don't need portability, and you hash all members, and there is no padding:
xyz example;
static_assert(std::has_unique_object_representations_v<xyz>);
const void* p = &example;
Note that this doesn't work with (IEEE-754) float members due to peculiarities of NaN.
A more robust solution that can produce hashes that are portable across systems is to use a general purpose serialisation scheme, and hash the serialised result. There is no standard serialisation functionality in C++.
void* has problems like: Who owns the memory? What's the type you are going to reinterpret the pointer as?
A more typed solution would be to use std::array of std::byte then you at least know that you're looking at an array of raw bytes and nothing else:
#include <cstdint>
#include <array>
#include <cstddef>
#include <cstring>
auto concat(std::uint32_t a, std::uint32_t b, std::uint16_t c, std::uint16_t d) {
std::array<std::byte, sizeof a + sizeof b + sizeof c + sizeof d> res;
std::byte* p = res.data();
std::memcpy(p, &a, sizeof a);
std::memcpy(p += sizeof a, &b, sizeof b);
std::memcpy(p += sizeof b, &c, sizeof c);
std::memcpy(p += sizeof c, &d, sizeof d);
return res;
}
int main() {
std::uint32_t a = 1, b = 0;
std::uint16_t c = 1, d = 0;
auto res = concat(a, b, c, d);
return 0;
}

Storing integral values in a byte sequence in C++

I am implementing an LZW compression/decompression utility library and am in need of returning the compressed output in what I am using as:
using ByteSequence = std::vector<std::uint8_t>
The output format for the compressor will include the positions in the compressor's dictionary of various sequences found by the algorithm. For example, having 16-bit positions in the output would look like:
std::vector<std::uint16_t> pos{123, 385, /* ... */};
The output, however needs to be a ByteSequence, and it needs to be portable among architectures. What I am currently doing to convert the pos vector to the desired format is:
for (auto p : pos)
{
std::uint8_t *bytes = (std::uint8_t *) &p;
output.push_back(bytes[0]);
output.push_back(bytes[1]);
}
This works, but only under the assumption that the keys will be 16-bit each and to be honest, it looks like a cheap trick to me.
How should I do this in a better, cleaner way? Thank you!
The way you extract bytes is undefined behaviour. The C++ standard [basic.lval] reads:
If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined:
. . .
a char, unsigned char, or std::byte type.
std::uint8_t is not in this list, and AFAIK there is no guarantee that std::uint8_t and unsigned char are the same type.
A conversion function might look like:
template<typename T>
void convert_forward(const std::vector<T>& in, std::vector<std::uint8_t>& out) {
out.reserve(out.size() + in.size() * sizeof(T));
for (const T& i : in) {
std::uint8_t buff[sizeof(T)];
std::memcpy(buff, &i, sizeof(T));
std::copy(std::begin(buff), std::end(buff), std::back_inserter(out));
}
}
Alternative implementation without back_inserter:
template<typename T>
void convert_forward(const std::vector<T>& in, std::vector<std::uint8_t>& out) {
const auto old_size = out.size();
out.resize(old_size + in.size() * sizeof(T));
auto dest = out.data() + old_size;
for (const T& i : in) {
std::memcpy(dest, &i, sizeof(T));
dest += sizeof(T);
}
}
Beware about endianness. It should be taken into account either in the forward conversion or in the backward one.
This should be portable, though possibly not so efficient as direct byte manipulation:
template<class T>
void number2bytes(std::vector<uint8_t>& bytes, T x)
{
static_assert(std::is_integral<T>::value, "Integral required.");
for (size_t i = 0; i < sizeof(T); ++i)
{
bytes.push_back(x & 0xFF);
x >>= 8;
}
}
The static_assert is added to protect from accidental passing some weird non-number type overloading & and >>=.

C++ unsigned char array length

I have in my C++ program unsigned char array of hex values:
unsigned char buff[] = {0x03, 0x35, 0x6B};
And I would like to calculate the size of this array so that I can send it on UART port linux using this function:
if ((count = write(file,buff,length))<0)
{
perror("FAIL to write on exit\n");
}
as I can see the length is int number, and buff is an array which can change size during program execution.
can anyone help me how to write it. Thanks
As one of the options to get the number of elements you can use such template:
template<typename T, size_t s>
size_t arrSize(T(&)[s])
{
return s;
}
And afterwards call:
auto length = arrSize(buff);
This could be used across the code for various array types.
In case by array size you mean its total byte size you can just use the sizeof(buff). Or as others suggested you can use std::array, std::vector or any other container instead and write a helper like this:
template<typename T>
size_t byteSize(const T& data)
{
typename T::value_type type;
return data.size() * sizeof(type);
}
Then to acquire the actual byte size of the data you can simply call:
std::vector<unsigned char> buff{0x03, 0x35, 0x6B};
auto bSize = byteSize(buff);
You can do this with an array:
size_t size = sizeof array;
with your example that give:
ssize_t count = write(file, buff, sizeof buff);
if (count < 0 || (size_t)count != sizeof buff) {
perror("FAIL to write on exit\n");
}
Note: I use C semantic because write is from lib C.
In C++, you can use template to be sure that you use sizeof with an array.
template<typename T, size_t s>
size_t array_sizeof(T (&array)[s]) {
return sizeof array;
}
with your example that give:
ssize_t count = write(file, buff, array_sizeof(buff));
if (count < 0 || static_cast<size_t>(count) != array_sizeof(buff)) {
perror("FAIL to write on exit\n");
}
If you are using C++11 you might think of switching to
#include <array>
std::array<char, 3> buff{ {0x03, 0x35, 0x6B} };
That offers an interface like std::vector (including size & data) for fixed arrays.
Using array might prevent some usual errors and offer some functionality covered by <algorithm>.
The call to write will then be:
write(file,buff.data(),buf.size())
And I would like to calculate the size of this array so that I can send it on UART port linux using this function...
You need a COUNTOF macro or function. They can be tricky to get right in all cases. For example, the accepted answer shown below will silently fail when working with pointers:
size_t size = sizeof array;
size_t number_element = sizeof array / sizeof *array;
Microsoft Visual Studio 2005 has a built-in macro or template class called _countof. It handles all cases properly. Also see the _countof Macro documentation on MSDN.
On non-Microsoft systems, I believe you can use something like the following. It will handle pointers properly (from making COUNTOF suck less):
template <typename T, size_t N>
char (&ArraySizeHelper( T (&arr)[N] ))[N];
#define COUNTOF(arr) ( sizeof(ArraySizeHelper(arr)) )
void foo(int primes[]) {
// compiler error: primes is not an array
std::cout << COUNTOF(primes) << std::endl;
}
Another good reference is Better array 'countof' implementation with C++ 11. It discusses the ways to do things incorrectly, and how to do things correctly under different compilers, like Clang, ICC, GCC and MSVC. It include the Visual Studio trick.
buff is an array which can change size during program execution
As long as you have the data at compile time, the countof macro or function should work. If you are building data on the fly, then it probably won't work.
This is closely related: Common array length macro for C?. It may even be a duplicate.

char data to float/double

I have a memory location of 128 bytes. I try to fill the memory with data starting from 1...127.
I need to write a code which get two parameter like offset , data type. Based on the arguments I need to convert the data on the memory to the specific datatype mentioned.
say for example
unsigned char *pointer = (unsigned char *)malloc(sizeof(unsigned char) * 128);
printf("\n\n loading some default values...");
for (unsigned int i = 0; i < 128; i++) {
pointer[i] = i + 1;
}
convertTo(3,efloat);
convertTo(100,edword);
void convertTo(uint8_t offset, enum datatype){
switch(datatype)
{
case efloat:
//// conversion code here..
break;
case edword:
//// conversion code here..
break;
case eint:
//// conversion code here..
break;
}
}
I tried using many methods like atoi, atof, strtod, strtol, etc.., but nothing gives me correct value. Say if I give offset as 2, eint(16-bit) which should take value 2,3 and give 515
Here is a generic version of what you want which wraps the type to convert to and the offset into a single struct. While the template code is more complicated, the usage is IMHO, much cleaner. Additionally, the long switch statement has been removed (at the expense of some less readable template code).
// Use an alias for the type to convert to (for demonstration purposes)
using NewType = short;
// Struct which wraps both the offset and the type after conversion "neatly"
template <typename ConversionType>
struct Converter {
// Define a constructor so that the instances of
// the converter can be created easily (see main)
Converter(size_t offset) : Offset(offset) {}
// This provides access to the type to convert to
using Type = ConversionType;
size_t Offset;
};
// Note: The use of the typename keyword here is to let the compiler know that
// ConverterHelper::Type is a type
template <typename ConverterHelper>
typename ConverterHelper::Type convertTo(char* Array, ConverterHelper ConvHelper) {
// This converts the bytes in the array to the new type
typename ConverterHelper::Type* ConvertedVar =
reinterpret_cast<typename ConverterHelper::Type*>(Array + ConvHelper.Offset);
// Return the value of the reinterpreted bytes
return *ConvertedVar;
}
int main()
{
char ExampleArray[8] = {0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08};
// Create a new NewType (short) using bytes 1 and 2 in ExampleArray
NewType x = convertTo(ExampleArray, Converter<NewType>(1));
}
On the machine I used to test this, x had a value of 770, as John suggested it might.
If you remove the alias NewType and use the actual type you wish to convert to, the intention of convertTo is, again IMHO, very clear.
Here is a live demo Coliru Demo. Just change the type alias NewType to see the output for different types.
Try *reinterpret_cast<uint16_t*>(pointer + offset). Of course, what you will get depends on the endianess of your system. 0x02 0x03 might be interpreted as 0x0203 (515) or 0x0302 (770).