porting linux 32 bit app to 64 bit? - c++

i'm about to port very large scale application to 64 Bits,
i've noticed in that in the web there some articles which shows
many pitfalls of this porting ,
i wondered if there is any tool which can assist in porting to 64 bit , meaning
finding the places in code that needs to be changed.... maybe the gcc with warnnings enabled... is it good enough ? is there anything better ?
EDIT: Guys i am searching for a tool if any that might be a complete to the compiler,
i know GCC can asist , but i doubt it will find all un portable problems that
will be discovered in run-time....maybe static code analysis tool that emphasize
porting to 64 bits ?
thanks

Here's a guide. Another one
Size of some data types are different in 32-bit and 64-bit OS, so check for place where the code is assuming the size of data types. eg If you were casting a pointer to an int, that won't work in 64bit. This should fix most of the issues.
If your app uses third-party libraries, make sure those work in 64-bit too.

A good tool is called grep ;-) do
grep -nH -e '\<int\>\|\<short\>\|\<long\>' *
and replace all bare uses of these basic integer types by the proper one:
array indices should be size_t
pointer casts should be uintptr_t
pointer differences should be
prtdiff_t
types with an assumption of width N
should be uintN_t
and so on, I probably forgot some. Then gcc with all warnings on will tell you. You could also use clang as a compiler it gives even more diagnostics.

First off, why would there be 'porting'?
Consider that most distros have merrily provided 32 and 64 bit variants for well over a decade. So unless you programmed in truly unportable manner (and you almost have to try) you should be fine.

What about compiling the project in 64 bits OS? gcc compiler looks like such tool :)

Here is a link to an Oracle webpage that talks about issues commonly encountered porting a 32bit application to 64bit:
http://www.oracle.com/technetwork/server-storage/solaris/ilp32tolp64issues-137107.html
One section talks how to use lint to detect some common errors. Here is a copy of that section:
Use the lint Utility to Detect Problems with 64-bit long and Pointer Types
Use lint to check code that is written for both the 32-bit and the 64-bit compilation environment. Specify the -errchk=longptr64 option to generate LP64 warnings. Also use the -errchk=longptr64 flag which checks portability to an environment for which the size of long integers and pointers is 64 bits and the size of plain integers is 32 bits. The -errchk=longptr64 flag checks assignments of pointer expressions and long integer expressions to plain integers, even when explicit casts are used.
Use the -errchk=longptr64,signext option to find code where the normal ISO C value-preserving rules allow the extension of the sign of a signed-integral value in an expression of unsigned-integral type. Use the -m64 option of lint when you want to check code that you intend to run in the Solaris 64-bit SPARC or x86 64-bit environment.
When lint generates warnings, it prints the line number of the offending code, a message that describes the problem, and whether or not a pointer is involved. The warning message also indicates the sizes of the involved data types. When you know a pointer is involved and you know the size of the data types, you can find specific 64-bit problems and avoid the pre-existing problems between 32-bit and smaller types.
You can suppress the warning for a given line of code by placing a comment of the form "NOTE(LINTED())" on the previous line. This is useful when you want lint to ignore certain lines of code such as casts and assignments. Exercise extreme care when you use the "NOTE(LINTED())" comment because it can mask real problems. When you use NOTE, also include #include. Refer to the lint man page for more information.

Related

How exactly are fundamental data types assigned to specific architectures

So I got into fundamental data types and I was left with one thing that I'm confused about - if I was going to build a 64-bit program, would I have to use data types specifically made for 64-bit architecture? I did some research and turns out that 64-bit optimized version of integer would be long long int. Or it doesn't matter and I can do fine with those data types I've learned already?
You may find that some types have different sizes than you're used to. For example, a 32-bit Solaris environment has 4-byte long, but a 64-bit Solaris environment has 8-byte long. Meanwhile, this isn't the case in Visual Studio, which retained 4-byte long.
This is why, if you are relying on extreme range for integer types and need to be completely cross-platform, you should favour more specific types like uint64_t. Otherwise, though, you shouldn't need to worry about this.
Similarly, you'll find that pointer types are no longer 32-bit, but 64-bit, so that they can hold all possible addresses on your shiny new 64-bit system. This shouldn't affect you unless you've done something wrong.
Don't worry about "optimisation" unless you have a serious need to eke out every last nanosecond and you can do better than your compiler, which is unlikely. Just write a descriptive, expressive program that signals your intent, as you always have.
For reference, though, you can look up your platform, environment and compiler, to find out what size the fundamental types have there. It can differ across all three.

Will statically linked c++ binary work on every system with same architecture?

I'm making a very simple program with c++ for linux usage, and I'd like to know if it is possible to make just one big binary containing all the dependencies that would work on any linux system.
If my understanding is correct, any compiler turns source code into machine instructions, but since there are often common parts of code that can be reused with different programs, most programs depend on another libraries.
However if I have the source code for all my dependencies, I should be able to compile a binary in a way that would not require anything from the system? Will I be able to run stuff compiled on 64bit system on a 32bit system?
In short: Maybe.
The longer answer is:
It depends. You can't, for example, run a 64-bit binary on a 32-bit system, that's just not even nearly possible. Yes, it's the same processor family, but there are twice as many registers in the 64-bit system, which also has twice as long registers. What's the 32-bit processor going to "give back" for the value of those bits and registers that doesn't exist in the hardware in the processor? It just plain won't work. Some of the instructions also completely change meaning, so the system really needs to be "right" for the compiled code, or it won't work - fortunately, Linux will check this and plain refuse if it's not right.
You can BUILD a 32-bit binary on a 64-bit system (assuming you have all the right libraries, etc, installed for both 64- and 32-bit, etc).
Similarly, if you try to run ARM code on an x86 processor, or MIPS code on an ARM processor, it simply has no chance of working, because the actual instructions are completely different (or they would be in breach of some patent/copyright or similar, because processor instruction sets contain portions that are "protected intellectual property" in nearly all cases - so designers have to make sure they do NOT do "the same as someone else's design"). Like for 32-bit and 64-bit, you simply won't get a chance to run the wrong binary here, it just won't work.
Sometimes, there are subtle differences, for example ARM code can be compiled with "hard" or "soft" floating point. If the code is compiled for hard float, and there isn't the right support in the OS, then it won't run the binary. Worse yet, if you compile on x86 for SSE instructions, and try to run on a non-SSE processor, the code will simply crash [unless you specifically build code to "check for SSE, and display error if not present"].
So, if you have a binary that passes the above criteria, the Linux system tends to change a tiny bit between releases, and different distributions have subtle "fixes" that change things. Most of the time, these are completely benign (they fix some obscure corner-case that someone found during testing, but the general, non-corner case behaviour is "normal"). However, if you go from Linux version 2.2 to Linux version 3.15, there will be some substantial differences between the two versions, and the binary from the old one may very well be incompatible with the newer (and almost certainly the other way around) - it's hard to know exactly which versions are and aren't compatible. Within releases that are close, then it should work OK as long as you are not specifically relying on some feature that is present in only one (after all, new things ARE added to the Linux kernel from time to time). Here the answer is "maybe".
Note that in the above is also your implementation of the C and C++ runtime, so if you have a "new" C or C++ runtime library that uses Linux kernel feature X, and try to run it on an older kernel, before feature X was implemented (or working correctly for the case the C or C++ runtime is trying to use it).
Static linking is indeed a good way to REDUCE the dependency of different releases. And a good way to make your binary huge, which may be preventing people from downloading it.
Making the code open source is a much better way to solve this problem, then you just distribute your source code and a list of "minimum requirements", and let other people deal with it needing to be recompiled.
In practice, it depends on "sufficiently simple". If you're using C++11, you'll quickly find that the C++11 libraries have dependencies on modern libc releases. In turn, those only ship with modern Linux distributions. I'm not aware of any "Long Term Support" Linux distribution which today (June 2014) ships with libc support for GCC 4.8
The short answer is no, at least without serious hack.
Different linux distribution may have different glue code between user-space and kernel. For instant, an hello world seemingly without dependency built from ubuntu cannot be executed under CentOS.
EDIT: Thanks for the comment. I re-verify this and the cause is im using 32-bit VM. Sorry for causing confusion. However, as noted above, the rule of thumb is that even same linux distribution may sometime breaks compatibility in order to deploy bugfix, so the conclusion stands.

Big Endian and Little Endian support for byte ordering

We need to support 3 hardware platforms - Windows (little Endian) and Linux Embedded (big and little Endian). Our data stream is dependent on the machine it uses and the data needs to be broken into bit fields.
I would like to write a single macro (if possible) to abstract away the detail. On Linux I can use bswap_16/bswap_32/bswap_64 for Little Endian conversions.
However, I can't find this in my Visual C++ includes.
Is there a generic built-in for both platforms (Windows and Linux)?
If not, then what can I use in Visual C++ to do byte swapping (other than writing it myself - hoping some machine optimized built-in)?
Thanks.
On both platforms you have
for short (16bit): htons() and ntohs()
for long (32bit): htonl() and ntohl()
The missing htonll() and ntohll() for long long (64bit) could easily be build from those two. See this implementation for example.
Update-0:
For the example linked above Simon Richter mentions in a comment, that it not necessarily has to work. The reason for this is: The compiler might introduce extra bytes somewhere in the unions used. To work around this the unions need to be packed. The latter might lead to performance loss.
So here's another fail-safe approach to build the *ll functions: https://stackoverflow.com/a/955980/694576
Update-0.1:
From bames53' s comment I tend to conclude the 1st example linked above shall not be used with C++, but with C only.
Update-1:
To achieve the functionality of the *ll functions on Linux this approach might be the ' best'.
htons and htonl (and similar macros) are good if you insist on dealing with byte sex.
However, it's much better to sidestep the issue by outputting your data in ASCII or similar. It takes a little more room, and it transmits over the net a little more slowly, but the simplicity and futureproofing is worth it.
Another option is to numerically take apart your int's and short's. So you & 0xff and divide by 256 repeatedly. This gives a single format on all architectures. But ASCII's still got the edge because it's easier to debug with.
Not the same names, but the same functionality does exist.
EDIT: Archived Link -> https://web.archive.org/web/20151207075029/http://msdn.microsoft.com/en-us/library/a3140177(v=vs.80).aspx
_byteswap_uint64, _byteswap_ulong, _byteswap_ushort

Integers greater than 4294967295 on 32-bit Windows

I'm trying to get to grips with C++ basics by building a simple arithmetic calculator application. Right now I'm trying to figure out how to make it capable of dealing with integers greater than 4294967295 on 32-bit Windows. I know that Windows' integrated Calculator is capable of this. What have I missed?
Note that this application should be compilable with both MSVC compiler and g++ (MinGW/GCC).
Thank you.
If you want to be both gcc and msvc compatible use <stdint.h>. It's source code compatible with both.
You probably want uint64_t for this. It will get you up to 18,446,744,073,709,551,615.
There are also libraries to get you up to integers as large as you have memory to handle as well.
Use __int64 to get 64-bit int calculations in Visual C++ - not sure if GCC will like this, though.
You could create a header file that typedefs (say) MyInt64 to the appropriate thing for each compiler. Then you can work internally with MyInt64, and the compiled code will be correct for each target. This is a pretty standard way of supporting different target compilers on one source codebase.
afai can tell, long long would work OK for both, but I have not used GCC so YMMV - see here for GCC info and here for Visual C++.
You could also create a "Large Number" class that would basically store the value across multiple variables in one form or another
There are different solutions, if 2^64 is big enough for you, you can use a 64 bit integer type (these are implementation dependent, so search for your particular compiler). On the other hand, if you want to be able to handle any number, you will have to use or implement a BigInteger type that encapsulates it. The implementation is an interesting exercise... basically use a vector of smaller type, operate on each subelement and then merge and normalize the result.

Potential problems porting to different architectures

I'm writing a Linux program that currently compiles and works fine on x86 and x86_64, and now I'm wondering if there's anything special I'll need to do to make it work on other architectures.
What I've heard is that for cross platform code I should:
Don't assume anything about the size of a pointer, int or size_t
Don't make assumptions about byte order (I don't do any bit shifting -- I assume gcc will optimize my power of two multiplication/division for me)
Don't use assembly blocks (obvious)
Make sure your libraries work (I'm using SQLite, libcurl and Boost, which all seem pretty cross-platform)
Is there anything else I need to worry about? I'm not currently targeting any other architectures, but I expect to support ARM at some point, and I figure I might as well make it work on any architecture if I can.
Also, regarding my second point about byte order, do I need to do anything special with text input? I read files with getline(), so it seems like that should be done automatically as well.
In my experience, once code works well on a couple architectures, it will port more easily to a third one. Input shouldn't be an issue. Structure alignment may be an issue if you do anything where alignment is an issue.
Pay attention to anything that might be platform-dependent: relying on bitfields being aligned the same way, assuming variables are a particular size, etc. If your code is relatively abstract from the hardware, you will likely encounter few problems. If you are doing something with something like networking code, you will have to make sure you align with network byte order properly.
I have ported device drivers from PPC to x86, and then to x86_64; in a few thousand lines, there were maybe a couple changes, primarily related to structure and integer ordering.
The only way to know for sure is to try it, of course.