Convert byte array to/from boost numeric? - c++

I'm trying to convert a byte array to and from a Boost number with the cpp_int backend. What is a portable way to do this?
The platforms I'm concerned about are all little endian, but can be 32 or 64 bit and can be compiled with different compilers. Some of the ways I've seen to do this break depending on compiler versions and such, and that's what I want to avoid.

The only real difference between x86 and x64 is the size of pointers. So unless it relies on the size of pointers somehow, there shouldn't be much of a problem. Especially since a byte is always 8bits and you already ruled out endiannes problems.

Related

How exactly are fundamental data types assigned to specific architectures

So I got into fundamental data types and I was left with one thing that I'm confused about - if I was going to build a 64-bit program, would I have to use data types specifically made for 64-bit architecture? I did some research and turns out that 64-bit optimized version of integer would be long long int. Or it doesn't matter and I can do fine with those data types I've learned already?
You may find that some types have different sizes than you're used to. For example, a 32-bit Solaris environment has 4-byte long, but a 64-bit Solaris environment has 8-byte long. Meanwhile, this isn't the case in Visual Studio, which retained 4-byte long.
This is why, if you are relying on extreme range for integer types and need to be completely cross-platform, you should favour more specific types like uint64_t. Otherwise, though, you shouldn't need to worry about this.
Similarly, you'll find that pointer types are no longer 32-bit, but 64-bit, so that they can hold all possible addresses on your shiny new 64-bit system. This shouldn't affect you unless you've done something wrong.
Don't worry about "optimisation" unless you have a serious need to eke out every last nanosecond and you can do better than your compiler, which is unlikely. Just write a descriptive, expressive program that signals your intent, as you always have.
For reference, though, you can look up your platform, environment and compiler, to find out what size the fundamental types have there. It can differ across all three.

Big Endian and Little Endian support for byte ordering

We need to support 3 hardware platforms - Windows (little Endian) and Linux Embedded (big and little Endian). Our data stream is dependent on the machine it uses and the data needs to be broken into bit fields.
I would like to write a single macro (if possible) to abstract away the detail. On Linux I can use bswap_16/bswap_32/bswap_64 for Little Endian conversions.
However, I can't find this in my Visual C++ includes.
Is there a generic built-in for both platforms (Windows and Linux)?
If not, then what can I use in Visual C++ to do byte swapping (other than writing it myself - hoping some machine optimized built-in)?
Thanks.
On both platforms you have
for short (16bit): htons() and ntohs()
for long (32bit): htonl() and ntohl()
The missing htonll() and ntohll() for long long (64bit) could easily be build from those two. See this implementation for example.
Update-0:
For the example linked above Simon Richter mentions in a comment, that it not necessarily has to work. The reason for this is: The compiler might introduce extra bytes somewhere in the unions used. To work around this the unions need to be packed. The latter might lead to performance loss.
So here's another fail-safe approach to build the *ll functions: https://stackoverflow.com/a/955980/694576
Update-0.1:
From bames53' s comment I tend to conclude the 1st example linked above shall not be used with C++, but with C only.
Update-1:
To achieve the functionality of the *ll functions on Linux this approach might be the ' best'.
htons and htonl (and similar macros) are good if you insist on dealing with byte sex.
However, it's much better to sidestep the issue by outputting your data in ASCII or similar. It takes a little more room, and it transmits over the net a little more slowly, but the simplicity and futureproofing is worth it.
Another option is to numerically take apart your int's and short's. So you & 0xff and divide by 256 repeatedly. This gives a single format on all architectures. But ASCII's still got the edge because it's easier to debug with.
Not the same names, but the same functionality does exist.
EDIT: Archived Link -> https://web.archive.org/web/20151207075029/http://msdn.microsoft.com/en-us/library/a3140177(v=vs.80).aspx
_byteswap_uint64, _byteswap_ulong, _byteswap_ushort

Char type on 32 bit vs 64 bit

Here is the following issue:
If I am developing on a 32 bit machine and want my code to be ported to a 64 bit machine here is the senario.
My function internally use a lot of std strings. Now if I want to provide APIs can I ask them to send char * which I can then use internally? Or ask them to send me a __int64 which I convert to a string?
Another reason to use char * in my API was that at least in one type of implementation of unix (a different version of the tool) it picks up data from stdin via argv which is a char *.
In the Windows version I am not sure what to do. I could just ask for __int64 and then convert it into a string...and make it work that way or just use char * as well?
If you're providing a C++ implementation, then your public interface should just use std::string.
If however for compatibility reasons (which it sounds like you may have) you need to provide a C-style interface, then using char* is precisely the way to do it. In your 32-bit library it will be a 32 bit pointer, and in the 64 bit version of the library it will be 64 bits. This will then agree with the client users' expectations regarding the API. You should absolutely convert to a std::string inside your library at the earliest possible point however.
You seem somewhat confused. If the code you are writing is used only within the target machine, recompile will take care of most of the problems. Just don't rely on specific memory layout and you are fine. Using strings (as opposed to wstrings) probably means that the character encoding is UTF-8 (if not, reconsider) and thus limited form of data exhance (e.g. files) between platforms is also fine.
In this case, your interface decision comes to selecting between (const) std::string(&), and (const) char*, integer_type (don't rely on null terminator, please). Deciding factor being whether or not you anticipate need to support other compilers or programming languages.
Now, if you intent to make the interface callable from other machines (i.e. network interface), you have much tougher job. In that case, specify size of everything explicitly.
char is always one byte in size, both on 32-bit and 64-bit systems. However, using the std library is not the worst choice. ;) std should cope with different platforms as it is platform independent for the "most" part...
Converting to/from char* doesn't really help if you can't represent the number on your architecture.
If you are converting a 64bit integer from its decimal (or hexadecimal) textual representation into a value, you still need 64bits to store it.
You would do well to convert to string at the earliest opportunity, it is the recommended/standard for C++, and will help do away with all your char* problems.
There is a few scenarios you can follow to write portable code, see these questions:
What's the funniest user request you've ever had?
How to do portable 64 bit arithmetic, without compiler warnings
You would have problems achieving binary portability between different architectures, C++ provides for source-level portability.

32 bit pointer in a 64 bit Solaris compile

I know this is a strange questions but I was wondering if it was possible to make a 32 bit pointer in 64 bit compile on Solaris using g++. The final object would need to be 64 bit however one of my pointers offsets is becomming larger on Solaris then it is in windows if I do use 64 bit to compile. This is causing a big problem. I was wondering if it was possible to make a 32bit pointer within my 64 bit compiled object.
Pointer size is a property of your target architecture, so you cannot mix and match 32- and 64-bit pointers. I would strongly suggest re-thinking your design (which smells like usual mistake of casting pointers to integers and back.) You can theoretically work with "limited-reach" offsets, but again please ask yourself why, and what would be a better way of doing it.
You can't change regular pointers, the size of a pointer is sizeof(void *). And if you could, what would you do with an 32bit pointer on an 64bit system?
Do you mean pointers in C or do you maybe mean pointers to a file offset?
If you have pointer type there, then you shouldn't make it 32-bit in 64-bit program. If it is just some offset that not related to memory model, then you could use different type with stable size across platforms, something like uint32_t.
It does not make sense to "need" a 32 bit pointer on a 64 bit machine. I also dont understand this line:
The final object would need to be 64 bit however
I would take a closer look and try to fix the bug on your end. If you post some example code we may be able to help more.

long long implementation in 32 bit machine

As per c99 standard, size of long long should be minimum 64 bits. How is this implemented in a 32 bit machine (eg. addition or multiplication of 2 long longs). Also, What is the equivalent of long long in C++.
The equivalent in C++ is long long as well. It's not required by the standard, but most compilers support it because it's so usefull.
How is it implemented? Most computer architectures already have built-in support for multi-word additions and subtractions. They don't do 64 bit addititions directly but use the carry flag and a special add-instruction to build a 64 bit add from two 32 bit adds.
The same extension exists for subtraction as well (the carry is called borrow in these cases).
Longword multiplications and divisions can be built from smaller multiplications without the help of carry-flags. Sometimes simply doing the operations bit by bit is faster though.
There are architectures that don't have any flags at all (some DSP chips and simple micros). On these architectures the overflow has to be detected with logic operations. Multi-word arithmetic tend to be slow on these machines.
On the IA32 architecture, 64-bit integer are implemented in using two 32-bit registers (eax and edx).
There are platform specific equivalents for C++, and you can use the stdint.h header where available (boost provides you with one).
As everyone has stated, a 64-bit integer is typically implemented by simply using two 32-bit integers together. Then clever code generation is used to keep track of the carry and/or borrow bits to keep track of overflow, and adjust accordingly.
This of course makes such arithmetic more costly in terms of code space and execution time, than the same code compiled for an architecture with native support for 64-bit operations.
If you care about bit-sizes, you should use
#include <stdint.h>
int32_t n;
and friends. This works for C++ as well.
64-bit numbers on 32-bit machines are implemented as you think,
by 4 extra bytes. You could therefore implement your own 64-bit
datatype by doing something like this:
struct my_64bit_integer {
uint32_t low;
uint32_t high;
};
You would of course have to implement mathematical operators yourself.
There is an int64_t in the stdint.h that comes with my GCC version,
and in Microsoft Visual C++ you have an __int64 type as well.
The next C++ standard (due 2009, or maybe 2010), is slated to include the "long long" type. As mentioned earlier, it's already in common use.
The implementation is up to the compiler writers, although computers have always supported multiple precision operations. Some languages, like Python and Common Lisp, require support for indefinite-precision integers. Long ago, I wrote 64-bit multiplication and division routines for a computer (the Z80) that could manage 16-bit addition and subtraction, with no hardware multiplication at all.
Probably the easiest way to see how an operation is implemented on your particular compiler is to write a code sample and examine the assembler output, which is available from all the major compilers I've worked with.