I get bytes from a remote device via USB protocol. These bytes contain integer data. Is the following code a safe way to unpack them without portability issues (except endianess which is known):
#include <iostream>
#include <string>
#include <cstdint>
#include <cstring>
int main()
{
std::uint8_t someArray[4] = {1,0,0,0};
std::int32_t someValue = 0;
std::memcpy(&someValue, someArray, 4);
std::cout << someValue << std::endl;
}
Yes.
std::memcpy is indeed the way to go. In real-life code, I'd static_assert on the size of types used and check data size at run-time, but nothing more.
Related
I wonder what a safe way to convert a compact subarray of std::bytes to a numeric value is. I guess this code:
std::array<std::byte, 12> ar = { /* some bytes */ };
uint32_t value = *reinterpret_cast<uint32_t*>(&ar[4]);
is rather error-prone, for example because it depends on how the compiler aligns the values in the array. Thanks in advance!
A reliable approach is to use std::memcpy to copy the target bytes over a uint32_t object. memcpy is required to safely accomplish this for every version of C++. This pattern is common enough that compilers can usually optimize out the copy.
#include <array>
#include <cstddef>
#include <cstdint>
#include <cstring>
#include <stdexcept>
template<class T, std::size_t N>
T read_int_from_bytes(const std::array<std::byte, N> & data, std::size_t index)
{
if(index + sizeof(T) > N) {
throw std::invalid_argument("read_int index out of bounds");
}
// Integer to copy the bytes to
T result;
// Copy the bytes
std::memcpy(&result, &(data[index]), sizeof(result));
return result;
}
Here is an example. This test creates an array of bytes with values { 0x00, 0x10, ..., 0xB0 } and reads a uint32_t starting at the 4th byte.
#include <iostream>
#include <iomanip>
int main()
{
std::array<std::byte, 12> data{};
for(std::size_t i = 0; i < data.size(); ++i)
{
data[i] = static_cast<std::byte>(0x10 * i);
}
std::cout << "Ox" << std::hex << read_int_from_bytes<std::uint32_t>(data, 4);
}
The test produces Ox70605040 when I try it here. You can also notice from the assembly that the entire function call is optimized out and the result is precalculated, clearly showing that the compiler was able to reason through the memcpy and remove it entirely.
Beware the the results are unspecified, it depends on the Endianness of the target platform. That is, whether the first byte is the most significant or the least significant. For many applications this doesn't matter, but if it does C++20 introduced std::endian which you can use to check the system's Endianness.
I have a program where I want to store kmers (substrings of size k) and the number of times they appear. For this particular application, I'm reading in a file with these values and if the number of times they appear is > 255, it is ok to round down to 255. I thought that if I store the key-value pairs as (string, unsigned char) that might save space compared to storing the key-value pairs as (string, int), but this did not seem to be the case when I checked the max resident size by running /usr/bin/time.
To confirm, I also tried running the following test program where I alternated the type of the value in the unordered_map:
#include <iostream>
#include <unordered_map>
#include <utility>
#include <string>
#include <fstream>
int main() {
std::unordered_map<std::string, unsigned char> kmap;
std::ifstream infile("kmers_from_reads");
std::string kmer;
int abun;
while(infile >> kmer >> abun) {
unsigned char abundance = (abun > 255) ? 255 : abun;
kmap[kmer] = abundance;
}
std::cout << sizeof(*kmap.begin(0)) << std::endl;
}
This did not seem to impact the size of the nodes in the bucket (on my machine it returned 40 for both unsigned char and int values).
I was wondering how the size of the nodes in each bucket is determined.
My understanding of unordered maps is that the c++ standard more or less requires separate chaining and each node in a bucket must have at least one pointer so that the elements are iterable and can be erased (http://bannalia.blogspot.com/2013/10/implementation-of-c-unordered.html). However, I don't understand how the amount of space to store a value is determined, and it seems like it must also be flexible to accommodate larger values. I also tried looking at the gcc libstc++ unordered_map header (https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/include/bits/unordered_map.h) but had a hard time understanding what was going on.
Compile and execute this code:
#include <iostream>
#include <unordered_map>
#include <utility>
#include <string>
#include <fstream>
class foo
{
std::string kmer;
unsigned char abun;
};
class bar
{
std::string kmer;
int abun;
};
int main() {
std::cout << sizeof(foo) << " " << sizeof(bar) << std::endl;
}
I get, and you probably will too, 40 40. This is because of alignment requirements. If, for example, std::string contains at least one pointer (which it almost certainly does), it has to be aligned on at least a 4-byte boundary.
Imagine if sizeof(foo) was 39 and you had code that did foo foos[2]. If the pointer in foos[0].kmer was properly aligned, the pointer in foos[1].kmer wouldn't be. That would be a disaster.
Given the following struct:
struct ExampleStruct {
char firstMember[8];
uint64_t secondMember;
};
Is there a way to write a static assert to verify that the offset of secondMember is some multiple of 8 bytes?
Offsetof
You can use the offsetof marco brought by the cstddef library. Here I first get the offset, then I use the modulus operator to check if it is a multiple of 8. Then, if the remainder is 0, the offset is indeed a multiple of 8 bytes.
// Offset.cpp
#include <iostream>
#include <string>
#include <cstddef>
#include <stdarg.h>
struct ExampleStruct {
char firstMember[8];
uint64_t secondMember;
};
int main()
{
size_t offset = offsetof(ExampleStruct, secondMember);
if(offset%8==0)
std::cout << "Offset is a multiple of 8 bytes";
}
Demo here
Offsetof with static_assert
Or by the context of this question, the goal is to have a static_assert. Well, that is pretty much the same thing:
// OffsetAssert.cpp
#include <iostream>
#include <string>
#include <cstddef>
#include <stdarg.h>
struct ExampleStruct {
char firstMember[8];
uint64_t secondMember;
};
int main()
{
size_t offset = offsetof(ExampleStruct, secondMember); // Get Offset
static_assert(offsetof(ExampleStruct, secondMember)%8==0,"Not Aligned 8 Bytes"); // Check if the offset modulus 8 is remainer 0 , if so it is a multiple of 8
std::cout << "Aligned 8 Bytes" << std::endl; // If Assert Passes it is aligned 8 bytes
}
Demo here
Type Uses
I use std::size_t type because that's the type you normally use to store sizes of variables, objects, and so on. And also because it expands to the std::size_t expression according to cppreference.com:
The macro offsetof expands to an integral constant expression of type std::size_t, the value of which is the offset, in bytes, from the beginning of an object of specified type to its specified member, including padding if any.
References
cpprefrence
cplusplus
If your type has standard layout, you can use the offsetof macro:
#include <cstddef>
static_assert(offsetof(ExampleStruct, secondMember) % 8 == 0, "Bad alignment");
This the offsetof macro results in a constant expression, you can use a static assertion to produce a translation error if the condition is not met.
http://www.boost.org/doc/libs/1_53_0/libs/multiprecision/doc/html/index.html
I just started exploring this library. There doesn't seem to be a way to convert cpp_int into an array of bytes.
Can someone see such functionality?
This is undocument way. cpp_int's backend have limbs() member function. This function return internal byte array value.
#include <iostream>
#include <boost/multiprecision/cpp_int.hpp>
namespace mp = boost::multiprecision;
int main()
{
mp::cpp_int x("11111111112222222222333333333344444444445555555555");
std::size_t size = x.backend().size();
mp::limb_type* p = x.backend().limbs();
for (std::size_t i = 0; i < size; ++i) {
std::cout << *p << std::endl;
++p;
}
}
result:
10517083452262317283
8115000988553056298
32652620859
This is the documented way of exporting and importing the underlying limb data of a cpp_int (and cpp_float). From the example given in the docs, trimmed down for the specific question:
#include <boost/multiprecision/cpp_int.hpp>
#include <vector>
using boost::multiprecision::cpp_int;
cpp_int i{"2837498273489289734982739482398426938568923658926938478923748"};
// export into 8-bit unsigned values, most significant bit first:
std::vector<unsigned char> bytes;
export_bits(i, std::back_inserter(bytes), 8);
This mechanism is quite flexible, as you can save the bytes into other integral types (just remember to specify the number of bits per array element), which in turn works with import_bits, too, if you need to restore a cpp_int from the deserialized sequence.
#include <iostream>
#include <string>
#include <fstream>
using namespace std;
int main() {
string x;
getline(cin,x);
ofstream o("f:/demo.txt");
o.write( (char*)&x , sizeof(x) );
}
I get the unexpected output.I don't get what i write in a string function.
Why is this ?
Please explain .
Like when i write steve pro i get the output as 8/ steve pro ÌÌÌÌÌÌ ÌÌÌÌ in the file
I expect that the output be steve pro
You are treating an std::string like something that it is not. It's a complex object that, somewhere in its internals, stores characters for you.
There is no reason to assume that a character array is at the start of the object (&x), and the sizeof the object has no relation to how many characters it may indirectly hold/represent.
You're probably looking for:
o.write(x.c_str(), x.length());
Or just use the built-in formatted I/O mechanism:
o << x;
You seem to have an incorrect model of sizeof, so let me try to get it right.
For any given object x of type T, the expression sizeof(x) is a compile-time constant. C++ will never actually inspect the object x at runtime. The compiler knows that x is of type T, so you can imagine it silently transforming sizeof(x) to sizeof(T), if you will.
#include <string>
int main()
{
std::string a = "hello";
std::string b = "Stack Overflow is for professional and enthusiast programmers, people who write code because they love it.";
std::cout << sizeof(a) << std::endl; // this prints 4 on my system
std::cout << sizeof(b) << std::endl; // this also prints 4 on my system
}
All C++ objects of the same type take up the exact amount of memory. Of course, since strings have vastly different lengths, they will internally store a pointer to a heap-allocated block of memory. But this does not concern sizeof. It couldn't, because as I said, sizeof operates at compile-time.
You get exactly what you write: the binary raw value of a pointer to char...
#include <iostream>
#include <string>
#include <fstream>
using namespace std;
int main()
{
string x;
getline(cin,x);
ofstream o("tester.txt");
o << x;
o.close();
}
If you insist on writing a buffer directly, you can use
o.write(x.c_str(), x.size());
PS A little attention to code formatting unclouds the mind
You're passing the object's address to write into the file, whereas the original content lies somewhere else, pointed to by one of its internal pointers.
Try this:
string x;
getline(cin,x);
ofstream o("D:/tester.txt");
o << x;
// or
// o.write( x.c_str() , x.length());