Boost.Multiprecision cpp_int - convert into an array of bytes? - c++

http://www.boost.org/doc/libs/1_53_0/libs/multiprecision/doc/html/index.html
I just started exploring this library. There doesn't seem to be a way to convert cpp_int into an array of bytes.
Can someone see such functionality?

This is undocument way. cpp_int's backend have limbs() member function. This function return internal byte array value.
#include <iostream>
#include <boost/multiprecision/cpp_int.hpp>
namespace mp = boost::multiprecision;
int main()
{
mp::cpp_int x("11111111112222222222333333333344444444445555555555");
std::size_t size = x.backend().size();
mp::limb_type* p = x.backend().limbs();
for (std::size_t i = 0; i < size; ++i) {
std::cout << *p << std::endl;
++p;
}
}
result:
10517083452262317283
8115000988553056298
32652620859

This is the documented way of exporting and importing the underlying limb data of a cpp_int (and cpp_float). From the example given in the docs, trimmed down for the specific question:
#include <boost/multiprecision/cpp_int.hpp>
#include <vector>
using boost::multiprecision::cpp_int;
cpp_int i{"2837498273489289734982739482398426938568923658926938478923748"};
// export into 8-bit unsigned values, most significant bit first:
std::vector<unsigned char> bytes;
export_bits(i, std::back_inserter(bytes), 8);
This mechanism is quite flexible, as you can save the bytes into other integral types (just remember to specify the number of bits per array element), which in turn works with import_bits, too, if you need to restore a cpp_int from the deserialized sequence.

Related

Converting an std::array of std::bytes to a numeric value

I wonder what a safe way to convert a compact subarray of std::bytes to a numeric value is. I guess this code:
std::array<std::byte, 12> ar = { /* some bytes */ };
uint32_t value = *reinterpret_cast<uint32_t*>(&ar[4]);
is rather error-prone, for example because it depends on how the compiler aligns the values in the array. Thanks in advance!
A reliable approach is to use std::memcpy to copy the target bytes over a uint32_t object. memcpy is required to safely accomplish this for every version of C++. This pattern is common enough that compilers can usually optimize out the copy.
#include <array>
#include <cstddef>
#include <cstdint>
#include <cstring>
#include <stdexcept>
template<class T, std::size_t N>
T read_int_from_bytes(const std::array<std::byte, N> & data, std::size_t index)
{
if(index + sizeof(T) > N) {
throw std::invalid_argument("read_int index out of bounds");
}
// Integer to copy the bytes to
T result;
// Copy the bytes
std::memcpy(&result, &(data[index]), sizeof(result));
return result;
}
Here is an example. This test creates an array of bytes with values { 0x00, 0x10, ..., 0xB0 } and reads a uint32_t starting at the 4th byte.
#include <iostream>
#include <iomanip>
int main()
{
std::array<std::byte, 12> data{};
for(std::size_t i = 0; i < data.size(); ++i)
{
data[i] = static_cast<std::byte>(0x10 * i);
}
std::cout << "Ox" << std::hex << read_int_from_bytes<std::uint32_t>(data, 4);
}
The test produces Ox70605040 when I try it here. You can also notice from the assembly that the entire function call is optimized out and the result is precalculated, clearly showing that the compiler was able to reason through the memcpy and remove it entirely.
Beware the the results are unspecified, it depends on the Endianness of the target platform. That is, whether the first byte is the most significant or the least significant. For many applications this doesn't matter, but if it does C++20 introduced std::endian which you can use to check the system's Endianness.

Safe way to unpack integer values from a remote device

I get bytes from a remote device via USB protocol. These bytes contain integer data. Is the following code a safe way to unpack them without portability issues (except endianess which is known):
#include <iostream>
#include <string>
#include <cstdint>
#include <cstring>
int main()
{
std::uint8_t someArray[4] = {1,0,0,0};
std::int32_t someValue = 0;
std::memcpy(&someValue, someArray, 4);
std::cout << someValue << std::endl;
}
Yes.
std::memcpy is indeed the way to go. In real-life code, I'd static_assert on the size of types used and check data size at run-time, but nothing more.

C++ auto on int16_t casts to integer

I am pretty new to C++17 and am attempting to understand the decltype keyword and how it pairs with auto.
Below is a snippet of code that produces an unexpected result.
#include <typeinfo>
#include <iostream>
#include <algorithm>
using namespace std;
int main() {
int16_t mid = 4;
auto low = mid - static_cast<int16_t>(2);
auto hi = mid + static_cast<int16_t>(2);
int16_t val;
cin >> val;
val = std::clamp(val,low,hi);
return 0;
}
Surprisingly, the compiler tells me there is a mismatch in clamp and that low and high are int. If I change auto to int16_t all is good in the world and all the types are int16_t as expected.
The question I'm posing is, why does auto cast low and hi to int when all of the types are int16_t? Is this a good use case for decltype?
Even after reading cppreference.com, I don't fully understand how decltype works, so excuse my ignorance.
The problem isn't with auto here. When you subtract two int16_t values, the result is an int. We can demonstrate it with this code here:
#include <iostream>
#include <cstdint>
using namespace std;
template<class T>
void print_type(T) {
std::cout << __PRETTY_FUNCTION__ << std::endl;
}
int main() {
int16_t a = 10;
int16_t b = 20;
print_type(a);
print_type(b);
print_type(a - b);
return 0;
}
a and b are both short ints, but when you add or subtract them it produces a regular int. This is to help prevent overflow / and is also for backwards compatibility.
This phenomenon is called the usual arithmetic conversions. It is defined in the C and C++ standards and (roughly said) converts anything smaller than an int to an int. It converts larger types as well. Take some time and read about it, you'll need it quite often.

Node size for unordered_map buckets

I have a program where I want to store kmers (substrings of size k) and the number of times they appear. For this particular application, I'm reading in a file with these values and if the number of times they appear is > 255, it is ok to round down to 255. I thought that if I store the key-value pairs as (string, unsigned char) that might save space compared to storing the key-value pairs as (string, int), but this did not seem to be the case when I checked the max resident size by running /usr/bin/time.
To confirm, I also tried running the following test program where I alternated the type of the value in the unordered_map:
#include <iostream>
#include <unordered_map>
#include <utility>
#include <string>
#include <fstream>
int main() {
std::unordered_map<std::string, unsigned char> kmap;
std::ifstream infile("kmers_from_reads");
std::string kmer;
int abun;
while(infile >> kmer >> abun) {
unsigned char abundance = (abun > 255) ? 255 : abun;
kmap[kmer] = abundance;
}
std::cout << sizeof(*kmap.begin(0)) << std::endl;
}
This did not seem to impact the size of the nodes in the bucket (on my machine it returned 40 for both unsigned char and int values).
I was wondering how the size of the nodes in each bucket is determined.
My understanding of unordered maps is that the c++ standard more or less requires separate chaining and each node in a bucket must have at least one pointer so that the elements are iterable and can be erased (http://bannalia.blogspot.com/2013/10/implementation-of-c-unordered.html). However, I don't understand how the amount of space to store a value is determined, and it seems like it must also be flexible to accommodate larger values. I also tried looking at the gcc libstc++ unordered_map header (https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/include/bits/unordered_map.h) but had a hard time understanding what was going on.
Compile and execute this code:
#include <iostream>
#include <unordered_map>
#include <utility>
#include <string>
#include <fstream>
class foo
{
std::string kmer;
unsigned char abun;
};
class bar
{
std::string kmer;
int abun;
};
int main() {
std::cout << sizeof(foo) << " " << sizeof(bar) << std::endl;
}
I get, and you probably will too, 40 40. This is because of alignment requirements. If, for example, std::string contains at least one pointer (which it almost certainly does), it has to be aligned on at least a 4-byte boundary.
Imagine if sizeof(foo) was 39 and you had code that did foo foos[2]. If the pointer in foos[0].kmer was properly aligned, the pointer in foos[1].kmer wouldn't be. That would be a disaster.

(String)Iterator based conversion to int

There are the atox, strtox and stox families that I know of, but I can't seem to find any iterator based string to int conversions in the Standard Library or Boost.
The reason I need them is because I am having a parser whose match result is a range referencing the input string. I might very well have an input string like
...8973893488349798923475...
^begin ^end
so I need 738934883 as an integer.
Of couse, I could first take begin and end to construct an std::string to use with any of above families, but I would very much like to avoid that overhead.
So my question: Is there anything in the Standard Library or Boost accepting iterators as input, or do I have to write my own.
Boost does actually support this, using the Lexical Cast library. The following code uses a substring range to parse the number without performing any dynamic allocation:
#include <boost/lexical_cast.hpp>
#include <string>
#include <iostream>
int convert_strings_part(const std::string& s, std::size_t pos, std::size_t n)
{
return boost::lexical_cast<int>(s.data() + pos, n);
}
int main(int argc, char* argv[])
{
std::string s = "8973893488349798923475";
// Expect: 738934883
std::cout << convert_strings_part(s, 2, 9) << std::endl;
return 0;
}
The output (tested on OS X with Boost 1.60):
738934883
The lexical cast library has some great features for conversion to and from strings, though it isn't as well known as some of the others for some reason.
Until gavinb's answer, I was not aware of any such library function. My try would have been this, using any of atox and strtox as follows (you could avoid a dependency on boost library then, if wanted):
::std::string::iterator b; // begin of section
::std::string::iterator e; // end of section, pointing at first char NOT to be evaluated
char tmp = *e;
*e = 0;
int n = atoi(&*b);
*e = tmp;
If you only had const_iterators available, you would have to apply a const_cast to *e before modifying.
Be aware that this solution is not thread safe, though.
You could do it with strstream but it was depracated. Below two examples, with strstream and boost arrays:
http://coliru.stacked-crooked.com/a/04d4bde6973a1972
#include <iostream>
#include <strstream>
#include <boost/iostreams/device/array.hpp>
#include <boost/iostreams/stream.hpp>
#include <boost/iostreams/copy.hpp>
int main()
{
std::string in = "8973893488349798923475";
// ^^^^^
auto beg = in.begin()+2;
auto end = in.begin()+6;
// strstream example - DEPRECATED
std::istrstream os(&*beg, end-beg);
int n;
std::string ss;
os >> n;
std::cout << n << "\n";
// Boost example
namespace io = boost::iostreams;
int n2;
io::array_source src(&*beg, end-beg);
io::stream<io::array_source> os2(src);
os2 >> n2;
std::cout << n2 << "\n";
return 0;
}
With modern STL implementations std::string(begin,end) is not that bad - SSO eliminates any allocations for strings, smaller than ~15 chars (22 for 64bit).