Bit operations on an arbitrarily large bit-array or number - c++

I have a very simple question: In C++, is there an in-built or straightforward way to group a large (~1000) number of bits (or bools) in a single label such that the inbuilt bit operators function as they do for fundamentals?
e.g. for a long you might write:
unsigned long maximum = ~0;
or one might use:
somenum>>;
Is there an analogous way to do this for a block of memory of arbitrary size?
If not, what are some good alternatives ? I have thought of bit <vectors>, a C union, etc., but these all seem to require handwritten routines for the various bit operations.

Yep! It's called std::bitset and does just that.
Hope this helps!

Also, boost::dynamic_bitset might be useful depending on the requirements. I wish it was standard instead of the hack that std::vector<bool> is

Related

Efficient 8-bits and 16-bits computations in OCaml

I am currently working on a project in OCaml where I have to manipulate unsigned integers on 8 bits and on 16 bits. In my context, things can get a little messy, I sometimes want to convert an 8 bit integer into a 16 bits one, or split a 16 bits integer into two 8 bits one. I also want to use all the operations like addition, or the bitwise operations on those. Since there are all these interaction between 8 and 16 bits, I really like the comfort of having separate types for those. However, I still want my program to compute reasonnably efficiently, and I don't want to actually lose too much time casting an integer of a given size into another size. So my question is essentially how should I go about this? I have two main options but I don't know enough about the low-level interpretation of OCaml to comfortably chose:
Option 1 : Use dedicated types
I figured that I can use the Stdint library that is available through opam and has an implementation of the types uint8 and uint16 which are exactly what I am looking for.
Pros
I get very good mileage from the typing and will definitely avoid silly bugs from this
Cons
I have to constantly use the functions Uint8.to_uint16 and Uint16.to_uint8, which might eventually add up to heavy memory usage and poor efficiency of the compiled program, depending on how the precise representation is stored in machine
Option 2 : Encode everything within the type int
This means that all my integers will simply be of type int and I will have to program the addition of two 8-bits integers and of two 16-bits integers in this type, for instance.
Pros:
I think these operations can be programmed in a very efficient way using usual operations and the bitwise operations on the type int.
Cons:
I get essentially nothing from the typing and I have to trust myself to chose the right function at the right time.
Possible workaround
I could use two modules for defining 8-bits and 16-bits integers encoded in an int declared as private. I think that would essentially work like I presented with Option 2. The fact that I chose the type to be private would however mean that I cannot switch from one to the other without running into a typing mistake, thus forcing explicit casts and getting leverage from the type system. Still I expect the casts to be very efficient, since the memory representation of the object won't change.
So I would like to know how you would go about that? Is it worth going through all the trouble, do you think a solution is clearly better, or are they reasonably equivalent?
Bonus
Everytime I want to print (in hexadecimal) the value of a variable a of type uint8, I am writing
Printf.ksprintf "a = %02x" (Uint8.to_int a)
There is again a cast that seems to me a bit silly, I could also use direclty the Uint8.to_string_hex function, but it writes explicitly the 0x in front of the number, which I don't want. Ideally I would like to just write
Printf.ksprintf "a = %02x" (Uint8.to_int a)
Is there a way to change the scopes and do some magic with the Printf to make it happen?
In the stdint library both int8 and int16 are represented as int so there is no real tradeoff between option 1 and option 2,
type int8 = private int
(** Signed 8-bit integer *)
type int16 = private int
(** Signed 16-bit integer *)
The stdint library already provides you the best of two worlds, you have an efficient implementation and type safety. Yes, you need to do these translations but they no-ops and there only for the typechecker.
Also, if you're looking for modular arithmetic (and, in general, modeling machine words and bitvectors) then you can look at our Bitvec library, which we developed as part of the Binary Analysis Platform. It is focused on performance while still providing type safety and a lot of operations. We modeled it based on the latest SMT-LIB specification to give clear semantics to all operations. It uses the excellent Zarith library underneath the hood that enables efficient representation for small and arbitrary-length integers.
Since the modularity is not a property of a bitvector itself, but a property of an operation we do not encode the number of bits in the type and use the same type (and representation) for all bitvectors from 1-bits to thousands-of-bits. However, it is impossible to use mix-match the types incorrectly. E.g., you can use generic functions,
(x + y) mod m8
Or predefined modules for the specified modulus, e.g.,
M8.(x + y)
The library has a minimal number of dependencies, so try it by installing
opam install bitvec
There are also additional libraries like bitvec-order, bitvec-sexp, and bitvec-order that enable further integration with the Core suite of libraries, if you need them.

How to handle a integer value bigger than the max int(64) can store? [duplicate]

I am trying to write a program for finding Mersenne prime numbers. Using the unsigned long long type I was able to determine the value of the 9th Mersenne prime, which is (2^61)-1. For larger values I would need a data type that could store integer values greater than 2^64.
I should be able to use operators like *, *=, > ,< and % with this data type.
You can not do what you want with C natives types, however there are libraries that let handle arbitrarily large numbers, like the GNU Multiple Precision Arithmetic Library.
To store large numbers, there are many choices, which are given below in order of decreasing preferences:
1) Use third-party libraries developed by others on github, codeflex etc for your mentioned language, that is, C.
2) Switch to other languages like Python which has in-built large number processing capabilities, Java, which supports BigNum, or C++.
3) Develop your own data structures, may be in terms of strings (where 100 char length could refer to 100 decimal digits) with its custom operations like addition, subtraction, multiplication etc, just like complex number library in C++ were developed in this way. This choice could be meant for your research and educational purpose.
What all these people are basically saying is that the 64bit CPU will not be capable of adding those huge numbers with just an instruction but you rather need an algorithm that will be able to add those numbers. Such an algorithm would have to treat the 2 numbers in pieces.
And the libraries they listed will allow you to do that, a good exercise would be to develop one yourself (just the algorithm/function to learn how it's done).
There is no standard way for having data type greater than 64 bits. You should check the documentation of your systems, some of them define 128 bits integers. However, to really have flexible size integers, you should use an other representation, using an array for instance. Then, it's up to you to define the operators =, <, >, etc.
Fortunately, libraries such as GMP permits you to use arbitrary length integers.
Take a look at the GNU MP Bignum Library.
Use double :)
it will solve your problem!

Uses and when to use int16_t,int32_t,int64_t and respectively short int,int,long int,long

Uses and when to use int16_t, int32_t, int64_t and respectively short, int, long.
There are too many damn types in C++. For integers when is it correct to use one over the other?
Use the well-defined types when the precision is important. Use the less-determinate ones when it is not. It's never wrong to use the more precise ones. It sometimes leads to bugs when you use the flexible ones.
Use the exact-width types when you actually need an exact width. For example, int32_t is guaranteed to be exactly 32 bits wide, with no padding bits, and with a two's-complement representation. If you need all those requirements (perhaps because they're imposed by an external data format), use int32_t. Likewise for the other [u]intN_t types.
If you merely need a signed integer type of at least 32 bits, use int_least32_t or int_fast32_t, depending on whether you want to optimize for size or speed. (They're likely to be the same type.)
Use the predefined types short, int, long, et al when they're good enough for your purposes and you don't want to use the longer names. short and int are both guaranteed to be at least 16 bits, long at least 32 bits, and long long at least 64 bits. int is normally the "natural" integer type suggested by the system's architecture; you can think of it as int_fast16_t, and long as int_fast32_t, though they're not guaranteed to be the same.
I haven't given firm criteria for using the built-in vs. the [u]int_leastN_t and [u]int_fastN_t types because, frankly, there are no such criteria. If the choice isn't imposed by the API you're using or by your organization's coding standard, it's really a matter of personal taste. Just try to be consistent.
This is a good question, but hard to answer.
In one line: It depends to context:
My rule of thumb:
I'd prefer code performance (speed: less time, then less complexity)
When using existing library I'd follow the library coding style (context).
When coding in a team I'd follow the team coding style (context).
When coding new things I'd use int16_t,int32_t,int64_t,.. when ever possible.
Explanation:
using int (int is system word size) in some context give you performance, but in some other context not.
I'd use uint64_t over unsigned long long because it is concise, but when ever possible.
So It depends to context
A use that I have found for them is when I am bitpacking data for, say, an image compressor. Using these types that precisely specify the number of bytes can save a lot of headache, since the C++ standard does not explicitly define the number of bytes in its types, only the MIN and MAX ranges.
In MISRA-C 2004 and MISRA-C++ 2008 guidelines, the advisory is to prefer specific-length typedefs:
typedefs that indicate size and signedness should be used in place of
the basic numerical types. [...]
This rule helps to clarify the size
of the storage, but does not guarantee portability because of the
asymmetric behaviour of integral promotion. [...]
Exception is for char type :
The plain char type shall be used only for the storage and use of character values.
However, keep in mind that the MISRA guidelines are for critical systems.
Personally, I follow these guidelines for embedded systems, not for computer applications where I simply use an int when I want an integer, letting the compiler optimize as it wants.

What base would be more appropriate for my BigInteger library?

I have developed my own BigInteger library in C++, for didactic purpose, initially I have used base 10 and it works fine for add, subtract and multiply, but for some algorithms such as exponentiation, modular exponentiation and division it appears to be more appropriate use base 2.
I am thinking restart my project from scratch and I would know what base do you think is more adequate and why?. Thanks in advance!
If you look at most BigNum type libraries you will see that they are built on top of the existing "SmallNum" data types. These "SmallNum" data types (short, int, long, float, double, ...) are in binary for too many reasons to count. Rather than a vector of base 10 digits, your code will be much faster (much, much, much faster!) if work with a vector of (for example) unsigned ints.
This is one of those places where performance does count. Suppose you use a BigNum package to solve a problem that could be solved without resorting to BigNums. Even the best BigNum library will be much slower (much, much slower) than will a simplistic, non-BigNum approach. If you try to solve a problem that is beyond the bounds of the standard representations that performance penalty will make things even worse.
The best way to overcome this inherent penalty is to take as much advantage of the builtin types as you possibly can.
A base the same as the word size on your target machine, where you have word x word = doubleword as a primitive. The primitive operations work out neatly in terms of machine instructions.
A base 10 representation makes sense for a BigDecimal type, where you need to need to represent decimal fractions exactly (for financial calculations and the like).
I can't see much benefit to using a base 10 representation for a BigInteger. It makes string conversion really easy, but at the expense of making mathematical operations much more complicated. This doesn't seem like a good tradeoff in most cases.

how do I declare an integer variable of 1024 bits in length?

I'm trying to write an algorithm for a number theory/computer science merged class that can factor large numbers in better than exponential time. I am using the g++ compiler on a 64 bit machine but when I chain together long it will only allow me to do up to 2 longs. Is there any way to tell it to use an arbitrary amount of space for a variable?
If you want just a collection of longs, you can declare an array of longs. But you don't want that. You want https://mattmccutchen.net/bigint/ BigIntegers :-)
Alternatives:
http://gmplib.org/
http://www.mpir.org/
(disclaimer: I haven't tested/used them)
Or if you want to implement them
How to implement big int in C++
I'll add that the C++ std library doesn't contain a big integer implementation (source STL big int class implementation )
You'll need a library. A good one is http://gmplib.org/