Is it necessary to take care of long data type in program that is written for Windows and Linux? - c++

According to cpp reference in 64 bit systems:
LLP64 or 4/4/8 (int and long are 32-bit, pointer is 64-bit)
Win64 API
LP64 or 4/8/8 (int is 32-bit, long and pointer are 64-bit)
Unix and Unix-like systems (Linux, Mac OS X)
Then how to consider long data type for codes which is written for Linux and Windows?

In C and C++, in portable code, you never know the exact size of a type like int or long int. If you move your code to a different compiler (or a different machine, or a different OS), the sizes of some of your types may change. This needn't be a problem; in fact it's only a problem if you want to make it a problem. (All of this has always been the case, and has nothing to do with someone's definitions of "LLP64" and "LP64" architecture families.)
On those (hopefully rare) occasions when you need a type of an exact size, one good way is to use types like int32_t and uint64_t from <cstdint> (or <stdint.h> in C).
But you really, really shouldn't need to specify the exact size of a type, most of the time. (There are those who say you need to specify the exact size of every type, but my advice is to ignore those people.)
Pretty much the only time you need to specify exact sizes is when trying to define a structure which you can read and write in "binary" fashion to conform to some externally-imposed storage layout. But there, specifying the exact sizes of data types isn't generally sufficient, because of issues like alignment, padding, and byte order. So you're better off writing explicit serialization and deserialization code anyway (or using "text" data formats instead, if you can get away with it).
My bottom line is that I rarely worry about the exact sizes of types.

Related

How to write convertible code, 32 bit/64 bit?

A c++ specific question. So i read a question about what makes a program 32 bit/64 bit, and the anwser it got was something like this (sorry i cant find the question, was somedays ago i looked at it and i cant find it again:( ): As long as you dont make any "pointer assumptions", you only need to recompile it. So my question is, what are pointer assumtions ? To my understanding there is 32 bit pointer and 64 bit pointers so i figure it is something to do with that . Please show the diffrence in code between them. Any other good habits to keep in mind while writing code, that helps it making it easy to convert between the to are also welcome :) tho please share examples with them
Ps. I know there is this post:
How do you write code that is both 32 bit and 64 bit compatible?
but i tougth it was kind of to generall with no good examples, for new programmers like myself. Like what is a 32 bit storage unit ect. Kinda hopping to break it down a bit more (no pun intended ^^ ) ds.
In general it means that your program behavior should never depend on the sizeof() of any types (that are not made to be of some exact size), neither explicitly nor implicitly (this includes possible struct alignments as well).
Pointers are just a subset of them, and it probably also means that you should not try to rely on being able to convert between unrelated pointer types and/or integers, unless they are specifically made for this (e.g. intptr_t).
In the same way you need to take care of things written to disk, where you should also never rely on the size of e.g. built in types, being the same everywhere.
Whenever you have to (because of e.g. external data formats) use explicitly sized types like uint32_t.
For a well-formed program (that is, a program written according to syntax and semantic rules of C++ with no undefined behaviour), the C++ standard guarantees that your program will have one of a set of observable behaviours. The observable behaviours vary due to unspecified behaviour (including implementation-defined behaviour) within your program. If you avoid unspecified behaviour or resolve it, your program will be guaranteed to have a specific and certain output. If you write your program in this way, you will witness no differences between your program on a 32-bit or 64-bit machine.
A simple (forced) example of a program that will have different possible outputs is as follows:
int main()
{
std::cout << sizeof(void*) << std::endl;
return 0;
}
This program will likely have different output on 32- and 64-bit machines (but not necessarily). The result of sizeof(void*) is implementation-defined. However, it is certainly possible to have a program that contains implementation-defined behaviour but is resolved to be well-defined:
int main()
{
int size = sizeof(void*);
if (size != 4) {
size = 4;
}
std::cout << size << std::endl;
return 0;
}
This program will always print out 4, despite the fact it uses implementation-defined behaviour. This is a silly example because we could have just done int size = 4;, but there are cases when this does appear in writing platform-independent code.
So the rule for writing portable code is: aim to avoid or resolve unspecified behaviour.
Here are some tips for avoiding unspecified behaviour:
Do not assume anything about the size of the fundamental types beyond that which the C++ standard specifies. That is, a char is at least 8 bit, both short and int are at least 16 bits, and so on.
Don't try to do pointer magic (casting between pointer types or storing pointers in integral types).
Don't use a unsigned char* to read the value representation of a non-char object (for serialisation or related tasks).
Avoid reinterpret_cast.
Be careful when performing operations that may over or underflow. Think carefully when doing bit-shift operations.
Be careful when doing arithmetic on pointer types.
Don't use void*.
There are many more occurrences of unspecified or undefined behaviour in the standard. It's well worth looking them up. There are some great articles online that cover some of the more common differences that you'll experience between 32- and 64-bit platforms.
"Pointer assumptions" is when you write code that relies on pointers fitting in other data types, e.g. int copy_of_pointer = ptr; - if int is a 32-bit type, then this code will break on 64-bit machines, because only part of the pointer will be stored.
So long as pointers are only stored in pointer types, it should be no problem at all.
Typically, pointers are the size of the "machine word", so on a 32-bit architecture, 32 bits, and on a 64-bit architecture, all pointers are 64-bit. However, there are SOME architectures where this is not true. I have never worked on such machines myself [other than x86 with it's "far" and "near" pointers - but lets ignore that for now].
Most compilers will tell you when you convert pointers to integers that the pointer doesn't fit into, so if you enable warnings, MOST of the problems will become apparent - fix the warnings, and chances are pretty decent that your code will work straight away.
There will be no difference between 32bit code and 64bit code, the goal of C/C++ and other programming languages are their portability, instead of the assembly language.
The only difference will be the distrib you'll compile your code on, all the work is automatically done by your compiler/linker, so just don't think about that.
But: if you are programming on a 64bit distrib, and you need to use an external library for example SDL, the external library will have to also be compiled in 64bit if you want your code to compile.
One thing to know is that your ELF file will be bigger on a 64bit distrib than on a 32bit one, it's just logic.
What's the point with pointer? when you increment/change a pointer, the compiler will increment your pointer from the size of the pointing type.
The contained type size is defined by your processor's register size/the distrib your working on.
But you just don't have to care about this, the compilation will do everything for you.
Sum: That's why you can't execute a 64bit ELF file on a 32bit distrib.
Typical pitfalls for 32bit/64bit porting are:
The implicit assumption by the programmer that sizeof(void*) == 4 * sizeof(char).
If you're making this assumption and e.g. allocate arrays that way ("I need 20 pointers so I allocate 80 bytes"), your code breaks on 64bit because it'll cause buffer overruns.
The "kitten-killer" , int x = (int)&something; (and the reverse, void* ptr = (void*)some_int). Again an assumption of sizeof(int) == sizeof(void*). This doesn't cause overflows but looses data - the higher 32bit of the pointer, namely.
Both of these issues are of a class called type aliasing (assuming identity / interchangability / equivalence on a binary representation level between two types), and such assumptions are common; like on UN*X, assuming time_t, size_t, off_t being int, or on Windows, HANDLE, void* and long being interchangeable, etc...
Assumptions about data structure / stack space usage (See 5. below as well). In C/C++ code, local variables are allocated on the stack, and the space used there is different between 32bit and 64bit mode due to the point below, and due to the different rules for passing arguments (32bit x86 usually on the stack, 64bit x86 in part in registers). Code that just about gets away with the default stacksize on 32bit might cause stack overflow crashes on 64bit.
This is relatively easy to spot as a cause of the crash but depending on the configurability of the application possibly hard to fix.
Timing differences between 32bit and 64bit code (due to different code sizes / cache footprints, or different memory access characteristics / patterns, or different calling conventions ) might break "calibrations". Say, for (int i = 0; i < 1000000; ++i) sleep(0); is likely going to have different timings for 32bit and 64bit ...
Finally, the ABI (Application Binary Interface). There's usually bigger differences between 64bit and 32bit environments than the size of pointers...
Currently, two main "branches" of 64bit environments exist, IL32P64 (what Win64 uses - int and long are int32_t, only uintptr_t/void* is uint64_t, talking in terms of the sized integers from ) and LP64 (what UN*X uses - int is int32_t, long is int64_t and uintptr_t/void* is uint64_t), but there's the "subdivisions" of different alignment rules as well - some environments assume long, float or double align at their respective sizes, while others assume they align at multiples of four bytes. In 32bit Linux, they align all at four bytes, while in 64bit Linux, float aligns at four, long and double at eight-byte multiples.
The consequence of these rules is that in many cases, bith sizeof(struct { ...}) and the offset of structure/class members are different between 32bit and 64bit environments even if the data type declaration is completely identical.
Beyond impacting array/vector allocations, these issues also affect data in/output e.g. through files - if a 32bit app writes e.g. struct { char a; int b; char c, long d; double e } to a file that the same app recompiled for 64bit reads in, the result will not be quite what's hoped for.
The examples just given are only about language primitives (char, int, long etc.) but of course affect all sorts of platform-dependent / runtime library data types, whether size_t, off_t, time_t, HANDLE, essentially any nontrivial struct/union/class ... - so the space for error here is large,
And then there's the lower-level differences, which come into play e.g. for hand-optimized assembly (SSE/SSE2/...); 32bit and 64bit have different (numbers of) registers, different argument passing rules; all of this affects strongly how such optimizations perform and it's very likely that e.g. SSE2 code which gives best performance in 32bit mode will need to be rewritten / needs to be enhanced to give best performance 64bit mode.
There's also code design constraints which are very different for 32bit and 64bit, particularly around memory allocation / management; an application that's been carefully coded to "maximize the hell out of the mem it can get in 32bit" will have complex logic on how / when to allocate/free memory, memory-mapped file usage, internal caching, etc - much of which will be detrimental in 64bit where you could "simply" take advantage of the huge available address space. Such an app might recompile for 64bit just fine, but perform worse there than some "ancient simple deprecated version" which didn't have all the maximize-32bit peephole optimizations.
So, ultimately, it's also about enhancements / gains, and that's where more work, partly in programming, partly in design/requirements comes in. Even if your app cleanly recompiles both on 32bit and 64bit environments and is verified on both, is it actually benefitting from 64bit ? Are there changes that can/should be done to the code logic to make it do more / run faster in 64bit ? Can you do those changes without breaking 32bit backward compatibility ? Without negative impacts on the 32bit target ? Where will the enhancements be, and how much can you gain ?
For a large commercial project, answers to these questions are often important markers on the roadmap because your starting point is some existing "money maker"...

Why are the standard datatypes not used in Win32 API? [duplicate]

This question already has answers here:
Why does the Win32-API have so many custom types?
(4 answers)
Closed 6 years ago.
I have been learning Visual C++ Win32 programming for some time now.
Why are there the datatypes like DWORD, WCHAR, UINT etc. used instead of, say, unsigned long, char, unsigned int and so on?
I have to remember when to use WCHAR instead of const char *, and it is really annoying me.
Why aren't the standard datatypes used in the first place? Will it help if I memorize Win32 equivalents and use these for my own variables as well?
Yes, you should use the correct data-type for the arguments for functions, or you are likely to find yourself with trouble.
And the reason that these types are defined the way they are, rather than using int, char and so on is that it removes the "whatever the compiler thinks an int should be sized as" from the interface of the OS. Which is a very good thing, because if you use compiler A, or compiler B, or compiler C, they will all use the same types - only the library interface header file needs to do the right thing defining the types.
By defining types that are not standard types, it's easy to change int from 16 to 32 bit, for example. The first C/C++ compilers for Windows were using 16-bit integers. It was only in the mid to late 1990's that Windows got a 32-bit API, and up until that point, you were using int that was 16-bit. Imagine that you have a well-working program that uses several hundred int variables, and all of a sudden, you have to change ALL of those variables to something else... Wouldn't be very nice, right - especially as SOME of those variables DON'T need changing, because moving to a 32-bit int for some of your code won't make any difference, so no point in changing those bits.
It should be noted that WCHAR is NOT the same as const char - WCHAR is a "wide char" so wchar_t is the comparable type.
So, basically, the "define our own type" is a way to guarantee that it's possible to change the underlying compiler architecture, without having to change (much of the) source code. All larger projects that do machine-dependant coding does this sort of thing.
The sizes and other characteristics of the built-in types such as int and long can vary from one compiler to another, usually depending on the underlying architecture of the system on which the code is running.
For example, on the 16-bit systems on which Windows was originally implemented, int was just 16 bits. On more modern systems, int is 32 bits.
Microsoft gets to define types like DWORD so that their sizes remain the same across different versions of their compiler, or of other compilers used to compile Windows code.
And the names are intended to reflect concepts on the underlying system, as defined by Microsoft. A DWORD is a "double word" (which, if I recall correctly, is 32 bits on Windows, even though a machine "word" is probably 32 or even 64 bits on modern systems).
It might have been better to use the fixed-width types defined in <stdint.h>, such as uint16_t and uint32_t -- but those were only introduced to the C language by the 1999 ISO C standard (which Microsoft's compiler doesn't fully support even today).
If you're writing code that interacts with the Win32 API, you should definitely use the types defined by that API. For code that doesn't interact with Win32, use whatever types you like, or whatever types are suggested by the interface you're using.
I think that it is a historical accident.
My theory is that the original Windows developers knew that the standard C type sizes depend on the compiler, that is, one compiler may have 16-bit integer and another a 32-bit integer. So they decided to make the Window API portable between different compilers using a series of typedefs: DWORD is a 32 bit unsigned integer, no matter what compiler/architecture you are using. Naturally, nowadays you will use uint32_t from <stdint.h>, but this wasn't available at that time.
Then, with the UNICODE thing, they got the TCHAR vs. CHAR vs. WCHAR issue, but that's another story.
And, then it grew out of control and you get such nice things as typedef void VOID, *PVOID; that are utterly nonsense.

Forcing types to a specific size

I've been learning C++ and one thing that I'm not really comfortable with is the fact that datatype sizes are not consistent. Depending on what system something is deployed on an int could be 16 bits or 32 bits, etc.
So I was thinking it might be a good idea to make my own header file with data types like byte, word, etc. that are defined to be a specific size and will maintain that size on any platform.
Two questions. First is this a good idea? Or is it going to create other problems I'm not aware of? Second, how do you define a type as being, say, 8 bits? I can't just say #define BYTE char, cause char would vary across platforms.
Fortunately, other people have noticed this same problem. In C99 and C++11 (so set your compiler to compatibility with one of those two modes, there should be a switch in your compiler settings), they added the header stdint.h (for C) and cstdint (for C++). If you #include <cstdint>, you get the types int8_t, int16_t, int32_t, int64_t, and the same prefixed with a u for unsigned versions. If your platform supports those types, they will be defined in the header, along with several others.
If your compiler does not yet support that standard (or you are forced by reasons out of your control to remain on C++03), then there is also Boost.
However, you should only use this if you care exactly about the size of the type. int and unsigned are fine for throw-away variables in most cases. size_t should be used for indexing std::vector, etc.
First you need to figure out if you really care what sizes things are. If you are using an int to count the number of lines in a file, do you really care if it's 32-bit or 64? You need BYTE, WORD, etc if you are working with packed binary data, but generally not for any other reason. So you may be worrying over something that doesn't really matter.
Better yet, use the already defined stuff in stdint.h. See here for more details. Similar question here.
Example:
int32_t is always 32 bits.
Many libraries have their own .h with a lots of typedef to have constant size types. This is useful when making portable code, and avoid relying on the headers of the platform you are currently working with.
If you only want to make sure the builtin data types have a minimum size you can use std::numeric_limits in the header to check.
std::numeric_limits<int>::digits
will give you, for example, the number of bits of an int without the sign bit. And
std::numeric_limits<int>::max()
will give you the max value.

Windows to iPhone binary files

Is it safe to pass binary files from Windows to iPhone that are written like:
std::ostream stream = // get it somehow
stream.write(&MyHugePODStruct, sizeof(MyHugePODStruct));
and read like:
std::istream stream = // get it somehow
stream.read(&MyHugePODStruct, sizeof(MyHugePODStruct));
While the definition of MyHugePODStruct is the same? if not is there any way to serialize this with either standard library (c++11 included) or boost safely? is there more clean way to this, because it seems like a non portable piece of code?
No, for many reasons. First off, this won't compile, because you have to pass a char * to read and write. Secondly, this isn't guaranteed to work on even one single platform, because the structure may contain padding (but that itself may differ between different among differently compiled versions of the code, even on the same platform). Next, there are 64/32-bit issues to consider which affect many of the primitive types (e.g. long double is padded to 12 bytes on x86, but to 16 bytes on x64). Last but not least there's endianness (though I'm not sure what the iOS endianness is).
So in short, no, don't do that.
You have to serialize each struct member separately, and according to its data type.
You might like to check out Boost.serialization, though I have no experience with it.

C++: Datatypes, which to use and when?

I've been told that I should use size_t always when I want 32bit unsigned int, I don't quite understand why, but I think it has something to do with that if someone compiles the program on 16 or 64 bit machines, the unsigned int would become 16 or 64 bit but size_t won't, but why doesn't it? and how can I force the bit sizes to exactly what I want?
So, where is the list of which datatype to use and when? for example, is there a size_t alternative to unsigned short? or for 32bit int? etc. How can I be sure my datatypes have as many bits as I chose at the first place and not need to worry about different bit sizes on other machines?
Mostly I care more about the memory used rather than the marginal speed boost I get from doubling the memory usage, since I have not much RAM. So I want to stop worrying will everything break apart if my program is compiled on a machine that's not 32bit. For now I've used size_t always when i want it to be 32bit, but for short I don't know what to do. Someone help me to clear my head.
On the other hand: If I need 64 bit size variable, can I use it on a 32bit machine successfully? and what is that datatype name (if i want it to be 64bit always) ?
size_t is for storing object sizes. It is of exactly the right size for that and only that purpose - 4 bytes on 32-bit systems and 8 bytes on 64-bit systems. You shouldn't confuse it with unsigned int or any other datatype. It might be equivalent to unsigned int or might be not depending on the implementation (system bitness included).
Once you need to store something other than an object size you shouldn't use size_t and should instead use some other datatype.
As a side note: For containers, to indicate their size, don't use size_t, use container<...>::size_type
boost/cstdint.hpp can be used to be sure integers have right size.
size_t is not not necessarily 32-bit. It has been 16-bit with some compilers. It's 64-bit on a 64-bit system.
The C++ standard guarantees, via reference down to the C standard, that long is at least 32 bits.
int is only formally guaranteed 16 bits, but in practice I wouldn't worry: the chance that any ordinary code will be used on a 16-bit system is slim indeed, and on any 32-bit system int is 32-bit. Of course it's different if you're coding for a 16-bit system like some embedded computer. But in that case you'd probably be writing system-specific code anyway.
Where you need exact sizes you can use <stdint.h> if your compiler supports that header (it was introduced in C99, and the current C++ standard stems from 1998), or alternatively the corresponding Boost library header boost/cstdint.hpp.
However, in general, just use int. ;-)
Cheers & hth.,
size_t is not always 32-bit. E.g. It's 64-bit on 64-bit platforms.
For fixed-size integers, stdint.h is best. But it doesn't come with VS2008 or earlier - you have to download it separately. (It comes as a standard part of VS2010 and most other compilers).
Since you're using VS2008, you can use the MS-specific __int32, unsigned __int32 etc types. Documentation here.
To answer the 64-bit question: Most modern compilers have a 64-bit type, even on 32-bit systems. The compiler will do some magic to make it work. For Microsoft compilers, you can just use the __int64 or unsigned __int64 types.
Unfortunately, one of the quirks of the nature of data types is that it depends a great deal on which compiler you're using. Naturally, if you're only compiling for one target, there is no need to worry - just find out how large the type is using sizeof(...).
If you need to cross-compile, you could ensure compatibility by defining your own typedefs for each target (surrounded #ifdef blocks, referencing which target you're cross-compiling to).
If you're ever concerned that it could be compiled on a system that uses types with even weirder sizes than you have anticipated, you could always assert(sizeof(short)==2) or equivalent, so that you could guarantee at runtime that you're using the correctly sized types.
Your question is tagged visual-studio-2008, so I would recommend looking in the documentation for that compiler for pre-defined data types. Microsoft has a number that are predefined, such as BYTE, DWORD, and LARGE_INTEGER.
Take a look in windef.h winnt.h for more.