Effectively get the most significant byte of mpz_t struct

Effectively get the most significant byte of mpz_t struct - c++

I wasn't able to find how gmpxx stores the mpz_t structs under the hood. Thus the only way to get the most significant byte of a number stored as mpz_t is using the mpz_get_str method, but I would expect it to be very slow.
Do you know of a more effective (and simple) way of doing this?
I mean the 'most significant byte' of the number (which is in my case saved as mpz_t) in binary. I.e. for 12345 (10) = 11000000111001 (2) it would be 11000000, no matter gmpxx actually stores it.

Two functions to look at here:
size_t mpz_sizeinbase(mpz_t op, int base): this returns the length in a base, and for base=2, it gives the number of bits.
void mpz_tdiv_r_2exp (mpz_t r, const mpz_t n, mp_bitcnt_t b): this is equivalent to r = n >> b;.
Combined, the operation you are looking for is to bit-shift right exactly sizeinbase-8times:
size_t bit_length = mpz_sizeinbase(number, 2);
mpz_tdiv_r_2exp(last_byte, number, bit_length-8);
As a sidenote, the mpz_t struct is stored in "limbs", which are primitives that are chained together. These limbs can have leading 0's to make editing the number easier for small value changes - so accessing them directly is not recommended.
A limb means the part of a multi-precision number that fits in a single machine word. (We chose this word because a limb of the human body is analogous to a digit, only larger, and containing several digits.) Normally a limb is 32 or 64 bits. The C data type for a limb is mp_limb_t.
~https://gmplib.org/manual/Nomenclature-and-Types.html#Nomenclature-and-Types

You can create a union without padding (see #pragma pack effect) with 2 members: your struct and a byte, then assign a value to your struct, then read the value of the byte. However, I am not sure if it fits to your definition of MSB.

Related

How does "narrowing" work when converting int to char in C++?

I'm a beginner with C++ and had a question about conversions. When converting int to char values, what happens when 127 is exceeded on the ASCII table?
For example,
using namespace std;
int main()
{
double d = 0;
while (cin>>d){
int i = d;
char c = i;
int i2 = c;
cout<<"d=="<<d<<endl;
cout<<"i=="<<i<<endl;
cout<<"c=="<<c<<endl;
cout<<"i2=="<<i2<<endl;
cout<<"char:("<<c<<")"<<endl;
}
}
Now if the use inputs 150, i becomes 150 as i = d, c becomes û as c = i, which means to me that int 150 = char û
BUT when the int i2 is outputted on the screen, given that int i2 converts char c back into an integer, i2 == 106
My assumption is that int i2 would also be 150.
I'd appreciate if someone could explain this to me as I'm struggling to grasp the concept. I've read that since char can hold 1 byte of information whereas int can hold 4 bytes of information, the value is "narrowed". I'm not entirely sure what that means however!

How does “narrowing” work when converting int to char in C++?
The width of an integer type is roughly the number of bytes (or bits) it contains. So, one type is narrower than another if it has fewer bytes (or bits).
Consider a physical manifestation of int - it's an index card with eight boxes marked on it, and we can write one digit in each box. Maybe it's going to be read by one of those automated optical systems, but anyway we're not allowed to squeeze more digits on there or write outside the boxes.
Now, we have an equivalent card representing a char - it has two boxes marked on it.
The char card can be physically narrower as well, to really hammer home the analogy, but the important thing is that you can only write two digits.
So, in base 10, an int card can store 0-99,999,999, and a char can store 0-99.
Now, I give you an int card with the number 123 written on it, and ask you to copy the value onto a char card. What can you do? You can discard the hundreds digit that doesn't fit, and just write 23. Or I guess you can just throw up your hands in horror and refuse. Typically we want computers to do the former.
This is a narrowing conversion. The char is physically too small (narrow) to fit all the information an int can contain.
Finally, to describe the actual int and char types, we can either use binary (in which case we can only use digits 0 and 1, and the int card has thirty-two boxes while the char card has eight), or we can leave our index cards the same size if we write our digits in base 16 instead of base 10.
There is a further complication in that int is signed, so we also need to represent negative values too in our fixed number of digits. The char may be signed or unsigned - it's implementation dependent. If you're interested, you can look up two's complement, which is the most common way of storing signed values, but in general half of the values you can store, are going to be negative.
So roughly, the two ways a narrowing conversion can do the wrong thing are:
the narrower type just doesn't have enough digits, so some are cut off
the narrower type can fit all the digits, but is signed, and that particular pattern represents a negative number in the narrow type (assuming it was positive in the wide one)

How to design INT of 16,32, 64 bytes or even bigger in C++

As a beginner, I know we can use an ARRAY to store larger numbers if required, but I want to have a 16 bytes INT data type in c++ on which I can perform all arithmetic operations as performed on basic data types like INT or FLOAT
So can we in effect increase, default data types size as desired, like int of 64 bytes or double of 120 bytes, not directly on basic data type but in effect which is the same as of increasing capacity of datatypes.
Is this even possible, if yes then how and if not then what are completely different ways to achieve the same?

Yes, it's possible, but no, it's not trivial.
First, I feel obliged to point out that this is one area where C and C++ really don't provide as much access to the hardware at the lowest level as you'd really like. In assembly language, you normally get a couple of features that make multiple-precision arithmetic quite a bit easier to implement. One is a carry flag. This tracks whether a previous addition generated a carry (or a previous subtraction a borrow). So to add two 12-bit numbers on a machine with 64-bit registers you'd typically write code on this general order:
; r0 contains the bottom 64-bits of the first operand
; r1 contains the upper 64 bits of the first operand
; r2 contains the lower 64 bits of the second operand
; r3 contains the upper 64 bits of the second operand
add r0, r2
adc r1, r3
Likewise, when you multiply two numbers, most processors generate the full answer in two separate registers, so when (for example) you multiply two 64-bit numbers, you get a 128-bit result.
In C and C++, however, we don't get that. One easy way to get around it is to work in smaller chunks. For example, if we want a 128-bit type on an implementation that provides 64-bit long long as its largest integer type, we can work in 32-bit chunks. When we're going to do an operation, we widen those to a long long, and do the operation on the long long. This way, when we add or multiply two 32-bit chunks, if the result is larger than 32 bits, we can still store it all in our 64-bit long long.
So, for addition life is pretty easy. We add the two lowest order words. We use a bitmask to get the bottom 32 bits and store them into the bottom 32 bits of the result. Then we take the upper 32 bits, and use them as a "carry" when we add the next 32 bits of the operands. Continue until we've added all 128 (or whatever) bits of operands and gotten our overall result.
Subtraction is pretty similar. In fact, we can do 2's complement on the second operand, then add to get our result.
Multiplication gets a little trickier. It's not always immediately obvious how we can carry out multiplication in smaller pieces. The usual is based on the distributive property. That is, we can take some large numbers A and B, and break them up into (a0 + a1) and (b0 + b1), where each an and bn is a 32-bit chunk of the operand. Then we use the distributive property to turn that into:
a0 * b0 + a0 * b1 + a1 * b0 + a1 * b1
This can be extended to an arbitrary number of "chunks", though if you're dealing with really large numbers there are much better ways (e.g., karatsuba).

If you want to define non-atomic big integers, you can use plain structs.
template <std::size_t size>
struct big_int {
std::array<std::int8_t, size> bytes;
};
using int128_t = big_int<16>;
using int256_t = big_int<32>;
using int512_t = big_int<64>;
int main() {
int128_t i128 = { 0 };
}

How do multiply an array of ints to result in a single number?

So I have a single int broken up into an array of smaller ints. For example, int num = 136928 becomes int num[3] = {13,69,28}. I need to multiply the array by a certain number. The normal operation would be 136928 * 2 == 273856. But I need to do [13,69,28] * 2 to give the same answer as 136928 * 2 would in the form of an array again - the result should be
for(int i : arr) {
i *= 2;
//Should multiply everything in the array
//so that arr now equals {27,38,56}
}
Any help would be appreciated on how to do this (also needs to work with multiplying floating numbers) e.g. arr * 0.5 should half everything in the array.
For those wondering, the number has to be split up into an array because it is too large to store in any standard type (64 bytes). Specifically I am trying to perform a mathematical operation on the result of a sha256 hash. The hash returns an array of the hash as uint8_t[64].

Consider using Boost.Multiprecision instead. Specifically, the cpp_int type, which is a representation of an arbitrary-sized integer value.
//In your includes...
#include <boost/multiprecision/cpp_int.hpp>
//In your relevant code:
bool is_little_endian = /*...*/;//Might need to flip this
uint8_t values[64];
boost::multiprecision::cpp_int value;
boost::multiprecision::cpp_int::import_bits(
value,
std::begin(values),
std::end(values),
is_little_endian
);
//easy arithmetic to perform
value *= 2;
boost::multiprecision::cpp_int::export_bits(
value,
std::begin(values),
8,
is_little_endian
);
//values now contains the properly multiplied result
Theoretically this should work with the properly sized type uint512_t, found in the same namespace as cpp_int, but I don't have a C++ compiler to test with right now, so I can't verify. If it does work, you should prefer uint512_t, since it'll probably be faster than an arbitrarily-sized integer.

If you just need multiplying with / dividing by two (2) then you can simply shift the bits in each byte that makes up the value.
So for multiplication you start at the left (I'm assuming big endian here). Then you take the most significant bit of the byte and store it in a temp var (a possible carry bit). Then you shift the other bits to the left. The stored bit will be the least significant bit of the next byte, after shifting. Repeat this until you processed all bytes. You may be left with a single carry bit which you can toss away if you're performing operations modulo 2^512 (64 bytes).
Division is similar, but you start at the right and you carry the least significant bit of each byte. If you remove the rightmost bit then you calculate the "floor" of the calculation (i.e. three divided by two will be one, not one-and-a-half or two).
This is useful if
you don't want to copy the bytes or
if you just need bit operations otherwise and you don't want to include a multi-precision / big integer library.
Using a big integer library would be recommended for maintainability.

Make your own data range in c++

I want to have a data variable which will be an integer and its range will be from
0 - 1.000.000.
For example normal int variables can store numbers from -2,147,483,648 to 2,147,483,647.
I want the new data type to have less range so it can have LESS SIZE.
If there is a way to do that please let me know?

There isn't; you can't specify arbitrary ranges for variables like this in C++.
You need 20 bits to store 1,000,000 different values, so using a 32-bit integer is the best you can do without creating a custom data type (even then you'd only be saving 1 byte at 24 bits, since you can't allocate less than 8 bits).
As for enforcing the range of values, you could do that with a custom class, but I assume your goal isn't the validation but the size reduction.

So, there's no true good answer to this problem. Here are a few thoughts though:
If you're talking about an array of these 20 bit values, then perhaps the answers at this question will be helpful: Bit packing of array of integers
On the other hand, perhaps we are talking about an object, that has 3 int20_ts in it, and you'd like it to take up less space than it would normally. In that case, we could use a bitfield.
struct object {
int a : 20;
int b : 20;
int c : 20;
} __attribute__((__packed__));
printf("sizeof object: %d\n", sizeof(struct object));
This code will probably print 8, signifying that it is using 8 bytes of space, not the 12 that you would normally expect.

You can only have data types to be multiple of 8 bits. This is because, otherwise that data type won't be addressable. Imagine a pointer to a 5 bit data. That won't exist.

Define smallest possible datatype in c++ that can hold six values

I want to define my own datatype that can hold a single one of six possible values in order to learn more about memory management in c++. In numbers, I want to be able to hold 0 through 5. Binary, It would suffice with three bits (101=5), although some (6 and 7) wont be used. The datatype should also consume as little memory as possible.
Im not sure on how to accomplish this. First, I tried an enum with defined values for all the fields. As far as I know, the values are in hex there, so one "hexbit" should allow me to store 0 through 15. But comparing it to a char (with sizeof) it stated that its 4 times the size of a char, and a char holds 0 through 255 if Im not misstaken.
#include <iostream>
enum Foo
{
a = 0x0,
b = 0x1,
c = 0x2,
d = 0x3,
e = 0x4,
f = 0x5,
};
int main()
{
Foo myfoo = a;
char mychar = 'a';
std::cout << sizeof(myfoo); // prints 4
std::cout << sizeof(mychar); // prints 1
return 1;
}
Ive clearly misunderstood something, but fail to see what, so I turn to SO. :)
Also, when writing this post I realised that I clearly lack some parts of the vocabulary. Ive made this post a community wiki, please edit it so I can learn the correct words for everything.

A char is the smallest possible type.
If you happen to know that you need several such 3 bit values in a single place you get use a structure with bitfield syntax:
struct foo {
unsigned int val1:3;
unsigned int val2:3;
};
and hence get 2 of them within one byte. In theory you could pack 10 such fields into a 32-bit "int" value.

C++ 0x will contain Strongly typed enumerations where you can specify the underlying datatype (in your example char), but current C++ does not support this. The standard is not clear about the use of a char here (the examples are with int, short and long), but they mention the underlying integral type and that would include char as well.
As of today Neil Butterworth's answer to create a class for your problem seems the most elegant, as you can even extend it to contain a nested enumeration if you want symbolical names for the values.

C++ does not express units of memory smaller than bytes. If you're producing them one at a time, That's the best you can do. Your own example works well. If you need to get just a few, You can use bit-fields as Alnitak suggests. If you're planning on allocating them one at a time, then you're even worse off. Most archetectures allocate page-size units, 16 bytes being common.
Another choice might be to wrap std::bitset to do your bidding. This will waste very little space, if you need many such values, only about 1 bit for every 8.
If you think about your problem as a number, expressed in base-6, and convert that number to base two, possibly using an Unlimited precision integer (for example GMP), you won't waste any bits at all.
This assumes, of course, that you're values have a uniform, random distribution. If they follow a different distribution, You're best bet will be general compression of the first example, with something like gzip.

You can store values smaller than 8 or 32 bits. You just need to pack them into a struct (or class) and use bit fields.
For example:
struct example
{
unsigned int a : 3; //<Three bits, can be 0 through 7.
bool b : 1; //<One bit, the stores 0 or 1.
unsigned int c : 10; //<Ten bits, can be 0 through 1023.
unsigned int d : 19; //<19 bits, can be 0 through 524287.
}
In most cases, your compiler will round up the total size of your structure to 32 bits on a 32 bit platform. The other problem is, like you pointed out, that your values may not have a power of two range. This will make for wasted space. If you read the entire struct as one number, you will find values that will be impossible to set, if your input ranges aren't all powers of 2.
Another feature you may find interesting is a union. They work like a struct, but share memory. So if you write to one field it overwrites the others.
Now, if you are really tight for space, and you want to push each bit to the maximum, there is a simple encoding method. Let's say you want to store 3 numbers, each can be from 0 to 5. Bit fields are wasteful, because if you use 3 bits each, you'll waste some values (i.e. you could never set 6 or 7, even though you have room to store them). So, lets do an example:
//Here are three example values, each can be from 0 to 5:
const int one = 3, two = 4, three = 5;
To pack them together most efficiently, we should think in base 6 (since each value is from 0-5). So packed into the smallest possible space is:
//This packs all the values into one int, from 0 - 215.
//pack could be any value from 0 - 215. There are no 'wasted' numbers.
int pack = one + (6 * two) + (6 * 6 * three);
See how it looks like we're encoding in base six? Each number is multiplied by it's place like 6^n, where n is the place (starting at 0).
Then to decode:
const int one = pack % 6;
pack /= 6;
const int two = pack % 6;
pack /= 6;
const int three = pack;
Theses schemes are extremely handy when you have to encode some fields in a bar code or in an alpha numeric sequence for human typing. Just saying those few partial bits can make a huge difference. Also, the fields don't all have to have the same range. If one field is from 0 through 7, you'd use 8 instead of 6 in the proper place. There is no requirement that all fields have the same range.

Minimal size what you can use - 1 byte.
But if you use group of enum values ( writing in file or storing in container, ..), you can pack this group - 3 bits per value.

You don't have to enumerate the values of the enum:
enum Foo
{
a,
b,
c,
d,
e,
f,
};
Foo myfoo = a;
Here Foo is an alias of int, which on your machine takes 4 bytes.
The smallest type is char, which is defined as the smallest addressable data on the target machine. The CHAR_BIT macro yields the number of bits in a char and is defined in limits.h.
[Edit]
Note that generally speaking you shouldn't ask yourself such questions. Always use [unsigned] int if it's sufficient, except when you allocate quite a lot of memory (e.g. int[100*1024] vs char[100*1024], but consider using std::vector instead).

The size of an enumeration is defined to be the same of an int. But depending on your compiler, you may have the option of creating a smaller enum. For example, in GCC, you may declare:
enum Foo {
a, b, c, d, e, f
}
__attribute__((__packed__));
Now, sizeof(Foo) == 1.

The best solution is to create your own type implemented using a char. This should have sizeof(MyType) == 1, though this is not guaranteed.
#include <iostream>
using namespace std;
class MyType {
public:
MyType( int a ) : val( a ) {
if ( val < 0 || val > 6 ) {
throw( "bad value" );
}
}
int Value() const {
return val;
}
private:
char val;
};
int main() {
MyType v( 2 );
cout << sizeof(v) << endl;
cout << v.Value() << endl;
}

It is likely that packing oddly sized values into bitfields will incur a sizable performance penalty due to the architecture not supporting bit-level operations (thus requiring several processor instructions per operation). Before you implement such a type, ask yourself if it is really necessary to use as little space as possible, or if you are committing the cardinal sin of programming that is premature optimization. At most, I would encapsulate the value in a class whose backing store can be changed transparently if you really do need to squeeze every last byte for some reason.

You can use an unsigned char. Probably typedef it into an BYTE. It will occupy only one byte.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js