Related
According to cpp reference in 64 bit systems:
LLP64 or 4/4/8 (int and long are 32-bit, pointer is 64-bit)
Win64 API
LP64 or 4/8/8 (int is 32-bit, long and pointer are 64-bit)
Unix and Unix-like systems (Linux, Mac OS X)
Then how to consider long data type for codes which is written for Linux and Windows?
In C and C++, in portable code, you never know the exact size of a type like int or long int. If you move your code to a different compiler (or a different machine, or a different OS), the sizes of some of your types may change. This needn't be a problem; in fact it's only a problem if you want to make it a problem. (All of this has always been the case, and has nothing to do with someone's definitions of "LLP64" and "LP64" architecture families.)
On those (hopefully rare) occasions when you need a type of an exact size, one good way is to use types like int32_t and uint64_t from <cstdint> (or <stdint.h> in C).
But you really, really shouldn't need to specify the exact size of a type, most of the time. (There are those who say you need to specify the exact size of every type, but my advice is to ignore those people.)
Pretty much the only time you need to specify exact sizes is when trying to define a structure which you can read and write in "binary" fashion to conform to some externally-imposed storage layout. But there, specifying the exact sizes of data types isn't generally sufficient, because of issues like alignment, padding, and byte order. So you're better off writing explicit serialization and deserialization code anyway (or using "text" data formats instead, if you can get away with it).
My bottom line is that I rarely worry about the exact sizes of types.
A c++ specific question. So i read a question about what makes a program 32 bit/64 bit, and the anwser it got was something like this (sorry i cant find the question, was somedays ago i looked at it and i cant find it again:( ): As long as you dont make any "pointer assumptions", you only need to recompile it. So my question is, what are pointer assumtions ? To my understanding there is 32 bit pointer and 64 bit pointers so i figure it is something to do with that . Please show the diffrence in code between them. Any other good habits to keep in mind while writing code, that helps it making it easy to convert between the to are also welcome :) tho please share examples with them
Ps. I know there is this post:
How do you write code that is both 32 bit and 64 bit compatible?
but i tougth it was kind of to generall with no good examples, for new programmers like myself. Like what is a 32 bit storage unit ect. Kinda hopping to break it down a bit more (no pun intended ^^ ) ds.
In general it means that your program behavior should never depend on the sizeof() of any types (that are not made to be of some exact size), neither explicitly nor implicitly (this includes possible struct alignments as well).
Pointers are just a subset of them, and it probably also means that you should not try to rely on being able to convert between unrelated pointer types and/or integers, unless they are specifically made for this (e.g. intptr_t).
In the same way you need to take care of things written to disk, where you should also never rely on the size of e.g. built in types, being the same everywhere.
Whenever you have to (because of e.g. external data formats) use explicitly sized types like uint32_t.
For a well-formed program (that is, a program written according to syntax and semantic rules of C++ with no undefined behaviour), the C++ standard guarantees that your program will have one of a set of observable behaviours. The observable behaviours vary due to unspecified behaviour (including implementation-defined behaviour) within your program. If you avoid unspecified behaviour or resolve it, your program will be guaranteed to have a specific and certain output. If you write your program in this way, you will witness no differences between your program on a 32-bit or 64-bit machine.
A simple (forced) example of a program that will have different possible outputs is as follows:
int main()
{
std::cout << sizeof(void*) << std::endl;
return 0;
}
This program will likely have different output on 32- and 64-bit machines (but not necessarily). The result of sizeof(void*) is implementation-defined. However, it is certainly possible to have a program that contains implementation-defined behaviour but is resolved to be well-defined:
int main()
{
int size = sizeof(void*);
if (size != 4) {
size = 4;
}
std::cout << size << std::endl;
return 0;
}
This program will always print out 4, despite the fact it uses implementation-defined behaviour. This is a silly example because we could have just done int size = 4;, but there are cases when this does appear in writing platform-independent code.
So the rule for writing portable code is: aim to avoid or resolve unspecified behaviour.
Here are some tips for avoiding unspecified behaviour:
Do not assume anything about the size of the fundamental types beyond that which the C++ standard specifies. That is, a char is at least 8 bit, both short and int are at least 16 bits, and so on.
Don't try to do pointer magic (casting between pointer types or storing pointers in integral types).
Don't use a unsigned char* to read the value representation of a non-char object (for serialisation or related tasks).
Avoid reinterpret_cast.
Be careful when performing operations that may over or underflow. Think carefully when doing bit-shift operations.
Be careful when doing arithmetic on pointer types.
Don't use void*.
There are many more occurrences of unspecified or undefined behaviour in the standard. It's well worth looking them up. There are some great articles online that cover some of the more common differences that you'll experience between 32- and 64-bit platforms.
"Pointer assumptions" is when you write code that relies on pointers fitting in other data types, e.g. int copy_of_pointer = ptr; - if int is a 32-bit type, then this code will break on 64-bit machines, because only part of the pointer will be stored.
So long as pointers are only stored in pointer types, it should be no problem at all.
Typically, pointers are the size of the "machine word", so on a 32-bit architecture, 32 bits, and on a 64-bit architecture, all pointers are 64-bit. However, there are SOME architectures where this is not true. I have never worked on such machines myself [other than x86 with it's "far" and "near" pointers - but lets ignore that for now].
Most compilers will tell you when you convert pointers to integers that the pointer doesn't fit into, so if you enable warnings, MOST of the problems will become apparent - fix the warnings, and chances are pretty decent that your code will work straight away.
There will be no difference between 32bit code and 64bit code, the goal of C/C++ and other programming languages are their portability, instead of the assembly language.
The only difference will be the distrib you'll compile your code on, all the work is automatically done by your compiler/linker, so just don't think about that.
But: if you are programming on a 64bit distrib, and you need to use an external library for example SDL, the external library will have to also be compiled in 64bit if you want your code to compile.
One thing to know is that your ELF file will be bigger on a 64bit distrib than on a 32bit one, it's just logic.
What's the point with pointer? when you increment/change a pointer, the compiler will increment your pointer from the size of the pointing type.
The contained type size is defined by your processor's register size/the distrib your working on.
But you just don't have to care about this, the compilation will do everything for you.
Sum: That's why you can't execute a 64bit ELF file on a 32bit distrib.
Typical pitfalls for 32bit/64bit porting are:
The implicit assumption by the programmer that sizeof(void*) == 4 * sizeof(char).
If you're making this assumption and e.g. allocate arrays that way ("I need 20 pointers so I allocate 80 bytes"), your code breaks on 64bit because it'll cause buffer overruns.
The "kitten-killer" , int x = (int)&something; (and the reverse, void* ptr = (void*)some_int). Again an assumption of sizeof(int) == sizeof(void*). This doesn't cause overflows but looses data - the higher 32bit of the pointer, namely.
Both of these issues are of a class called type aliasing (assuming identity / interchangability / equivalence on a binary representation level between two types), and such assumptions are common; like on UN*X, assuming time_t, size_t, off_t being int, or on Windows, HANDLE, void* and long being interchangeable, etc...
Assumptions about data structure / stack space usage (See 5. below as well). In C/C++ code, local variables are allocated on the stack, and the space used there is different between 32bit and 64bit mode due to the point below, and due to the different rules for passing arguments (32bit x86 usually on the stack, 64bit x86 in part in registers). Code that just about gets away with the default stacksize on 32bit might cause stack overflow crashes on 64bit.
This is relatively easy to spot as a cause of the crash but depending on the configurability of the application possibly hard to fix.
Timing differences between 32bit and 64bit code (due to different code sizes / cache footprints, or different memory access characteristics / patterns, or different calling conventions ) might break "calibrations". Say, for (int i = 0; i < 1000000; ++i) sleep(0); is likely going to have different timings for 32bit and 64bit ...
Finally, the ABI (Application Binary Interface). There's usually bigger differences between 64bit and 32bit environments than the size of pointers...
Currently, two main "branches" of 64bit environments exist, IL32P64 (what Win64 uses - int and long are int32_t, only uintptr_t/void* is uint64_t, talking in terms of the sized integers from ) and LP64 (what UN*X uses - int is int32_t, long is int64_t and uintptr_t/void* is uint64_t), but there's the "subdivisions" of different alignment rules as well - some environments assume long, float or double align at their respective sizes, while others assume they align at multiples of four bytes. In 32bit Linux, they align all at four bytes, while in 64bit Linux, float aligns at four, long and double at eight-byte multiples.
The consequence of these rules is that in many cases, bith sizeof(struct { ...}) and the offset of structure/class members are different between 32bit and 64bit environments even if the data type declaration is completely identical.
Beyond impacting array/vector allocations, these issues also affect data in/output e.g. through files - if a 32bit app writes e.g. struct { char a; int b; char c, long d; double e } to a file that the same app recompiled for 64bit reads in, the result will not be quite what's hoped for.
The examples just given are only about language primitives (char, int, long etc.) but of course affect all sorts of platform-dependent / runtime library data types, whether size_t, off_t, time_t, HANDLE, essentially any nontrivial struct/union/class ... - so the space for error here is large,
And then there's the lower-level differences, which come into play e.g. for hand-optimized assembly (SSE/SSE2/...); 32bit and 64bit have different (numbers of) registers, different argument passing rules; all of this affects strongly how such optimizations perform and it's very likely that e.g. SSE2 code which gives best performance in 32bit mode will need to be rewritten / needs to be enhanced to give best performance 64bit mode.
There's also code design constraints which are very different for 32bit and 64bit, particularly around memory allocation / management; an application that's been carefully coded to "maximize the hell out of the mem it can get in 32bit" will have complex logic on how / when to allocate/free memory, memory-mapped file usage, internal caching, etc - much of which will be detrimental in 64bit where you could "simply" take advantage of the huge available address space. Such an app might recompile for 64bit just fine, but perform worse there than some "ancient simple deprecated version" which didn't have all the maximize-32bit peephole optimizations.
So, ultimately, it's also about enhancements / gains, and that's where more work, partly in programming, partly in design/requirements comes in. Even if your app cleanly recompiles both on 32bit and 64bit environments and is verified on both, is it actually benefitting from 64bit ? Are there changes that can/should be done to the code logic to make it do more / run faster in 64bit ? Can you do those changes without breaking 32bit backward compatibility ? Without negative impacts on the 32bit target ? Where will the enhancements be, and how much can you gain ?
For a large commercial project, answers to these questions are often important markers on the roadmap because your starting point is some existing "money maker"...
I have a challenging situation; we will have programs on Mac, PC, iOS and Android receiving files in a legacy format and parsing data from those files. We cannot change how those files are created.
The files are produced by a C++ program filling a struct with numbers and Strings and then writing it out. Here's a sanitized version.
struct MyObject {
String Kfkj(MAXKYS);
String Oern(MAXKYS);
String Vdflj(MAXKYS, 9);
int Muic;
int Tdfkj;
int VdfkAsdk;
int SsdjsdDsldsk;
int Ndsoief;
String TdflsajPdlj;
String TdckjdfPas;
String AdsfakjIdd;
int IdkfjdKasdkj;
int AsadkjaKadkja(MAXKYS);
int Kasldsdkj;
bool Usadl;
String PsadkjOasdj(9);
String PasdkjOsdkj;
};
Primitives and Strings, as you can see.
Then here is how they write it out to a file:
MyInstance MyObject;
FileName = "C:\MyFile.ab2"
ofstream fout (FileName, ios::binary);
fout.write((char*)& MyInstance, sizeof(MyInstance));
There is no option for us to translate it once and then distribute the file to other platforms; we must translate it on each and every different platform, and this is what we have to work with. I'd appreciate any information on how C++ serializes data, so we know how to parse the file.
EDIT: solution
The feedback I received from multiple answers here was VERY helpful. Using that, I did extensive analysis with hex editors and discovered:
the elements come in the file one after another
a "String," in this case, starts with an int describing how many characters follow the int for that String. If the String does not exist, it will still have that int with a value of 0.
integers, for the files and machines I saw, are two bytes, little-endian, and MOSTLY unsigned (there were a few that were signed, just to keep me on my toes)
the boolean was two bytes, with apparently -1 (FF FF) representing "true"
So far we have not ran into issues with different padding or endianness on different devices, but those are very real concerns. The skilled notes and warnings in these answers provides us with more ammunition to try to convince the client to change to a less fragile alternative, such as XML or JSON, for transferring data online across platforms.
As for those of you asking if the developer was fired... well, let's just say their code is very old, but after multiple conversations we're still having trouble convincing them writing out the C++ struct and trying to read that on different platforms is not a good idea.
You're going to run into many problems.
C++ doesn't have a specific format for serializing data per se. It is highly dependent on the computer architecture/processor that you are running on.
The compiler is allowed to add padding to help alignment on systems. When we say alignment we basically are referring to an architecture/processor's affinity for having data lie on specific byte boundaries. For example, some processors vastly prefer floating point numbers to lie at 4 or 8 byte boundaries - if they don't the processor may work much slower or may not work at all.
So, you can't simply know what padding your system is adding magically.
What you can do is use #pragma pack(1) / #pragma pack(0) to stop your compiler from padding your numbers.
PS: you also have to worry about endianness. What if one computer is running on big-endian and one is little endian? They will interpret bytes differently without a conversion.
Simply put, you either have to fix the application generating the files so it uses a proper serialization scheme OR you need to look at it running on a SPECIFIC computer, look at exactly how it writes the files, and write a translator for every target platform (which is just silly).
Interesting Suggestion
If you're really stuck, write an app that monitors the folder where you write files. Have the app pick up the files (since it's on the same PC it'll be able to read their format without issue). Have it write the files back in XML or some other true serialization format and distribute those instead.
Whoa - that's crazy. So String objects don't contain any pointers? Must not- because you claim this is working code.
Anyway, that code isn't doing any serialization. Its just writing the structure out to file exactly the way it is laid out in memory. The only issue you have is that on some platforms padding and sizeof integral types like int may be different.
You'll have to find the size of the integral types, and use that information in reader/writer for newer platforms to make sure they get laid out the same way on the legacy platform.
You're running a real risk with that code though. As it is, a compiler change could suddenly cause the file layout to change.
The format of your data file is entirely down to the compiler that your C++ program is compiled with, and the definition of your String class. You can rely on the fields being in the order they're declared in, and in this case, I think you can rely on there not being any padding at the start, but that's about all. Some tips that might help you out in this case:-
You don't give the definition of the String class you're using. If it's a typedef for std::string, you're completely screwed, because the contents of the string aren't in the memory. I assume your C++ programmers are using some special local buffer, in which case I'll guess you will find the first bytes of the object are the string, and there is some amount of useless padding afterwards. I hope the struct contains an int at the start telling you how much data in it is useful.
You'll probably find the int fields are four bytes long.
You'll probably find the bool field is one byte long, followed by three bytes of useless padding. Only one bit, most likely the bottom bit, will be set.
That's about all the useful guesswork I can offer you. In your target language, make sure to read the whole file in as the closest thing to a byte array available in the language, and only after that, use the language features to convert it into the right kind of thing in your language. Don't try reading it in as integers, as that won't let you byte-swap if you're on a platform with different endianness to the C++ program. I suggest also looking through the file in a text editor to reverse-engineer it and help you find the offset of each field.
Last piece of advice: consider printing P45s (or pink slips, or whatever you have in your country) for whichever programmers or project managers thought this kind of 'serialization' was a good idea. This kind of sloppy work might have been acceptable in a life-or-death situation, but they have seriously screwed you over in a way you're going to find it very hard to recover from. Writing the code to read in these files will not be that hard, if it's only one struct like this, but keeping it reliable will be a world of pain, and they've effectively made it impossible for themselves to change compilers or compiler version safely.
The way it's done, the struct is written in raw form to a file. So basically what you need to know to parse this file is the binary layout of your struct.
Basically, the fields are just one after the other, so to read an int, you just read 4 bytes and cast that to an int, etc.
Strings are a particular case. It's not clear from your code whether this "String" type is an inline array of characters, or a pointer to such an array. In the first case, you need to know how many characters each string contains and simply read that number of characters sequentially. In the second case, you won't be able to get the string back, since it won't have been written to file. The pointer will be useless to you.
One last concern is whether the struct is packed or not. Since you gave no indication to that, by default struct fields are aligned to 4-bytes boundaries, so there may be space for instance after the boolean field that you need to account for. If the struct is packed, then each field comes directly after the previous.
So, to make a long story short, figure out your struct binary layout using its definition and, if all else fails, inspecting the memory at run-time with the debugger, or use a hex editor to study the output file. Then write that specification down somewhere and this will give you what you need to read from the file. It's impossible to tell exactly what that layout is simply by looking at the pseudo-definition you gave.
Writing in an ofstream does not serialize data. This code write the raw memory content of the struct as it was a string of char. Depending of your compiler, its version, its options and the system it is running on the content will be completely different.
Even the number of bits of a char is allowed to change between c++ implementation.
Data referenced by the object of the struct won't be written (forget the content of std::string).
If you cannot change the writer code. You must know the alignment policy, the size of base type and the data representation. You will have to analyze files produced by hand, for example with an hexadecimal editor like this one
http://www.physics.ohio-state.edu/~prewett/hexedit/
, and probably look at your compiler documentation.
If you can change the writer code. Use proper serialization like json, protocol buffer or simply xml.
No one has pointed out something that sticks out to me as particularly problematic (maybe because I've been bit by it). That problem: the data member bool Usadl;. sizeof(bool) varies across platforms, across compilers, and even across releases of the same compiler. Common values for sizeof(bool) are 4 and 1. This will bite you. It's getting hard to find a big endian machine nowadays, very, very hard to find a computer where CHAR_BIT is not 8 or sizeof(int) is not 4. This is not the case for sizeof(bool).
In agreement with everyone else, Chad's team needs to document the structure of the records in the file, and then make sure the program that produces the file writes this structure explicitly, including element sizes, padding, and endianness. Don't depend on class layout to do this for you. That's just asking for trouble.
The best way would probably be to use JSON or if you want a more robust solution go with something like Avro. Avro has a C++ API and a Java API, so it covers most of the cases you're encountering.
I've been learning C++ and one thing that I'm not really comfortable with is the fact that datatype sizes are not consistent. Depending on what system something is deployed on an int could be 16 bits or 32 bits, etc.
So I was thinking it might be a good idea to make my own header file with data types like byte, word, etc. that are defined to be a specific size and will maintain that size on any platform.
Two questions. First is this a good idea? Or is it going to create other problems I'm not aware of? Second, how do you define a type as being, say, 8 bits? I can't just say #define BYTE char, cause char would vary across platforms.
Fortunately, other people have noticed this same problem. In C99 and C++11 (so set your compiler to compatibility with one of those two modes, there should be a switch in your compiler settings), they added the header stdint.h (for C) and cstdint (for C++). If you #include <cstdint>, you get the types int8_t, int16_t, int32_t, int64_t, and the same prefixed with a u for unsigned versions. If your platform supports those types, they will be defined in the header, along with several others.
If your compiler does not yet support that standard (or you are forced by reasons out of your control to remain on C++03), then there is also Boost.
However, you should only use this if you care exactly about the size of the type. int and unsigned are fine for throw-away variables in most cases. size_t should be used for indexing std::vector, etc.
First you need to figure out if you really care what sizes things are. If you are using an int to count the number of lines in a file, do you really care if it's 32-bit or 64? You need BYTE, WORD, etc if you are working with packed binary data, but generally not for any other reason. So you may be worrying over something that doesn't really matter.
Better yet, use the already defined stuff in stdint.h. See here for more details. Similar question here.
Example:
int32_t is always 32 bits.
Many libraries have their own .h with a lots of typedef to have constant size types. This is useful when making portable code, and avoid relying on the headers of the platform you are currently working with.
If you only want to make sure the builtin data types have a minimum size you can use std::numeric_limits in the header to check.
std::numeric_limits<int>::digits
will give you, for example, the number of bits of an int without the sign bit. And
std::numeric_limits<int>::max()
will give you the max value.
I'm currently porting some code from another platform and bools on the new platform are 1-byte sized. This is breaking our loading code as the values are stored as 32-bit values. Furthermore, speed is a critical issue on our platform and we would like to use 32-bit bools as the processor runs at 32-bits natively and requires extra operations to compare non 32-bit bools.
Is there a way to force gcc to use 32-bit bools instead of 8-bit bools?
Add #define BOOL_TYPE_SIZE 4 to gcc/config/i386/i386.h and recompile gcc ;)
The size of bool is implementation defined (5.3.3), and gcc doesn't appear to provide an option to configure this at run-time.
Hopefully your implementation-defined code is isolated. If so, change your bools to ints, or change your loading code to deal with sizeof() == 1 instead of 4.
(Or, for the crazy, alter gcc to treat bool as a 4-byte type.)
Edit: Paul Tomblin's suggestion of using #define may not be legal [see here], but it does work under gcc 4.1.2, at least. [Link] However, if you don't hit every usage of bool the size mismatch will almost certainly bite you.
#define bool int
You could create your own class that uses int32_t internally but behaves like a bool. It does mean you'd have to rename the fields you specifically want to use that type, which is more work but affords better control and isolation, and you can still use real bools elsewhere. I'd personally prefer that to any #define hackery, which might bite somewhere unexpected. I'd also caution against assuming a 32-bit int will be usefully faster than a single byte... other factors like pipelining, memory latencies, cache sizes etc. could render the difference insignificant or even make 32-bit ints slower, so you might want to benchmark it in your system with representatitive data handling.
You need to separate your internal data structures from the storage/loading code. Simply store the bools in your internal data structures in your platform's native bool type and do the appropriate conversion from/to the storage's one byte bools when reading/writing the data.