Portable code - bits per char - c++

I know that the C/C++ standards only guarantee a minimum of 8 bits per char, and that theoretically 9/16/42/anything else is possible, and that therefore all sites about writing portable code warn against assuming 8bpc. My question is how "non-portable" is this really?
Let me explain. As I see it, there a 3 categories of systems:
Computers - I mean desktops, laptops, servers, etc. running Mac/Linux/Windows/Unix/*nix/posix/whatever (I know that list isn't strictly correct, but you get the idea). I would be very surprised to hear of any such system where char is not exactly 8 bits. (please correct me if I am wrong)
Devices with operating systems - This includes smartphones and such embedded systems. While I will not be very surprised to find such a system where char is more tham 8 bits, I have not heard of one to date (again, please inform me if I am just unaware)
Bare metal devices - VCRs, microwave ovens, old cell phones, etc. In this field I haven't the slightest experience, so anything can happen here. However, do I really need my code to be cross platform between my Windows desktop and my microwave oven? Am I likely to ever have code common to both?
Bottom line: Are there common (more than %0.001) platforms (in categories 1&2 above) where char is not 8 bits? And is my above surmise true?

use limits.h
CHAR_BIT
http://www.cplusplus.com/reference/clibrary/climits/
also, when you want to use exactly a given size, use stdint.h

For example, many DSP have CHAR_BIT greater than or equal to 16.

At least, similar to the integer size in 64bit architectures, future platforms may use a wider char, with more bits. ASCII characters might become obsolete, replaced by unicode. This might be a reason so be cautious.

You can normally safely assume that files will have 8 bit bytes, or if not, that 8 bit byte files can be converted to a zero padded native format by a commonly-used tool. But it is much more dangerous to assume that CHAR_BIT == 8. Currently that is almost always the case, but it might not always be the case in future. 8 bit access to memory is increasingly a bottleneck.

The Posix standards require CHAR_BIT to be 8.
So, if you only care about your code running on Posix compliant platforms, then assuming CHAR_BIT == 8 is fine and good.
The vast majority of commodity PC platforms and build systems comply with this requirement. Most any platform that uses the BSD socket interface likely implicitly has this requirement because the assumption that a platform byte is an octet is extremely widely distributed.
#if CHAR_BIT != 8
#error Your platform is unsupported!
#endif
Why did POSIX mandate CHAR_BIT==8?
You should only worry about this assumption / constraint if you want your code to run today on embedded and esoteric platforms. Otherwise, it's a pretty safe assumption in my view.

Related

How exactly are fundamental data types assigned to specific architectures

So I got into fundamental data types and I was left with one thing that I'm confused about - if I was going to build a 64-bit program, would I have to use data types specifically made for 64-bit architecture? I did some research and turns out that 64-bit optimized version of integer would be long long int. Or it doesn't matter and I can do fine with those data types I've learned already?
You may find that some types have different sizes than you're used to. For example, a 32-bit Solaris environment has 4-byte long, but a 64-bit Solaris environment has 8-byte long. Meanwhile, this isn't the case in Visual Studio, which retained 4-byte long.
This is why, if you are relying on extreme range for integer types and need to be completely cross-platform, you should favour more specific types like uint64_t. Otherwise, though, you shouldn't need to worry about this.
Similarly, you'll find that pointer types are no longer 32-bit, but 64-bit, so that they can hold all possible addresses on your shiny new 64-bit system. This shouldn't affect you unless you've done something wrong.
Don't worry about "optimisation" unless you have a serious need to eke out every last nanosecond and you can do better than your compiler, which is unlikely. Just write a descriptive, expressive program that signals your intent, as you always have.
For reference, though, you can look up your platform, environment and compiler, to find out what size the fundamental types have there. It can differ across all three.

Big Endian and Little Endian support for byte ordering

We need to support 3 hardware platforms - Windows (little Endian) and Linux Embedded (big and little Endian). Our data stream is dependent on the machine it uses and the data needs to be broken into bit fields.
I would like to write a single macro (if possible) to abstract away the detail. On Linux I can use bswap_16/bswap_32/bswap_64 for Little Endian conversions.
However, I can't find this in my Visual C++ includes.
Is there a generic built-in for both platforms (Windows and Linux)?
If not, then what can I use in Visual C++ to do byte swapping (other than writing it myself - hoping some machine optimized built-in)?
Thanks.
On both platforms you have
for short (16bit): htons() and ntohs()
for long (32bit): htonl() and ntohl()
The missing htonll() and ntohll() for long long (64bit) could easily be build from those two. See this implementation for example.
Update-0:
For the example linked above Simon Richter mentions in a comment, that it not necessarily has to work. The reason for this is: The compiler might introduce extra bytes somewhere in the unions used. To work around this the unions need to be packed. The latter might lead to performance loss.
So here's another fail-safe approach to build the *ll functions: https://stackoverflow.com/a/955980/694576
Update-0.1:
From bames53' s comment I tend to conclude the 1st example linked above shall not be used with C++, but with C only.
Update-1:
To achieve the functionality of the *ll functions on Linux this approach might be the ' best'.
htons and htonl (and similar macros) are good if you insist on dealing with byte sex.
However, it's much better to sidestep the issue by outputting your data in ASCII or similar. It takes a little more room, and it transmits over the net a little more slowly, but the simplicity and futureproofing is worth it.
Another option is to numerically take apart your int's and short's. So you & 0xff and divide by 256 repeatedly. This gives a single format on all architectures. But ASCII's still got the edge because it's easier to debug with.
Not the same names, but the same functionality does exist.
EDIT: Archived Link -> https://web.archive.org/web/20151207075029/http://msdn.microsoft.com/en-us/library/a3140177(v=vs.80).aspx
_byteswap_uint64, _byteswap_ulong, _byteswap_ushort

Char type on 32 bit vs 64 bit

Here is the following issue:
If I am developing on a 32 bit machine and want my code to be ported to a 64 bit machine here is the senario.
My function internally use a lot of std strings. Now if I want to provide APIs can I ask them to send char * which I can then use internally? Or ask them to send me a __int64 which I convert to a string?
Another reason to use char * in my API was that at least in one type of implementation of unix (a different version of the tool) it picks up data from stdin via argv which is a char *.
In the Windows version I am not sure what to do. I could just ask for __int64 and then convert it into a string...and make it work that way or just use char * as well?
If you're providing a C++ implementation, then your public interface should just use std::string.
If however for compatibility reasons (which it sounds like you may have) you need to provide a C-style interface, then using char* is precisely the way to do it. In your 32-bit library it will be a 32 bit pointer, and in the 64 bit version of the library it will be 64 bits. This will then agree with the client users' expectations regarding the API. You should absolutely convert to a std::string inside your library at the earliest possible point however.
You seem somewhat confused. If the code you are writing is used only within the target machine, recompile will take care of most of the problems. Just don't rely on specific memory layout and you are fine. Using strings (as opposed to wstrings) probably means that the character encoding is UTF-8 (if not, reconsider) and thus limited form of data exhance (e.g. files) between platforms is also fine.
In this case, your interface decision comes to selecting between (const) std::string(&), and (const) char*, integer_type (don't rely on null terminator, please). Deciding factor being whether or not you anticipate need to support other compilers or programming languages.
Now, if you intent to make the interface callable from other machines (i.e. network interface), you have much tougher job. In that case, specify size of everything explicitly.
char is always one byte in size, both on 32-bit and 64-bit systems. However, using the std library is not the worst choice. ;) std should cope with different platforms as it is platform independent for the "most" part...
Converting to/from char* doesn't really help if you can't represent the number on your architecture.
If you are converting a 64bit integer from its decimal (or hexadecimal) textual representation into a value, you still need 64bits to store it.
You would do well to convert to string at the earliest opportunity, it is the recommended/standard for C++, and will help do away with all your char* problems.
There is a few scenarios you can follow to write portable code, see these questions:
What's the funniest user request you've ever had?
How to do portable 64 bit arithmetic, without compiler warnings
You would have problems achieving binary portability between different architectures, C++ provides for source-level portability.

Should a C++ embedded application use a common header with typedefs for built-in C++ types?

It's common practice where I work to avoid directly using built-in types and instead include a standardtypes.h that has items like:
// \Common\standardtypes.h
typedef double Float64_T;
typedef int SInt32_T;
Almost all components and source files become dependent on this header, but some people argue that it's needed to abstract the size of the types (in practice this hasn't been needed).
Is this a good practice (especially in large-componentized systems)? Are there better alternatives? Or should the built-in types be used directly?
You can use the standardized versions available in modern C and C++ implementations in the header file: stdint.h
It has types of the like: uint8_t, int32_t, etc.
In general this is a good way to protect code against platform dependency. Even if you haven't experienced a need for it to date, it certainly makes the code easier to interpret since one doesn't need to guess a storage size as you would for 'int' or 'long' which will vary in size with platform.
It would probably be better to use the standard POSIX types defined in stdint.h et al, e.g. uint8_t, int32_t, etc. I'm not sure if there are part of C++ yet but they are in C99.
Since it hasn't been said yet, and even though you've already accepted an answer:
Only used concretely-sized types when you need concretely sized types. Mostly, this means when you're persisting data, if you're directly interacting with hardware, or using some other code (e.g. a network stack) that expects concretely-sized types. Most of the time, you should just use the abstractly-sized types so that your compiler can optimize more intelligently and so that future readers of your code aren't burdened with useless details (like the size and signedness of a loop counter).
(As several other responses have said, use stdint.h, not something homebrew, when writing new code and not interfacing with the old.)
The biggest problem with this approach is that so many developers do it that if you use a third-party library you are likely to end up with a symbol name conflict, or multiple names for the same types. It would be wise where necessary to stick to the standard implementation provided by C99's stdint.h.
If your compiler does not provide this header (as for example VC++), then create one that conforms to that standard. One for VC++ for example can be found at https://github.com/chemeris/msinttypes/blob/master/stdint.h
In your example I can see little point for defining size specific floating-point types, since these are usually tightly coupled to the FP hardware of the target and the representation used. Also the range and precision of a floating point value is determined by the combination of exponent width and significant width, so the overall width alone does not tell you much, or guarantee compatibility across platforms. With respect to single and double precision, there is far less variability across platforms, most of which use IEEE-754 representations. On some 8 bit compilers float and double are both 32-bit, while long double on x86 GCC is 80 bits, but only 64 bits in VC++. The x86 FPU supports 80 bits in hardware (2).
I think it's not a good practice. Good practice is to use something like uint32_t where you really need 32-bit unsigned integer and if you don't need a particular range use just unsigned.
It might matter if you are making cross-platform code, where the size of native types can vary from system to system. For example, the wchar_t type can vary from 8 bits to 32 bits, depending on the system.
Personally, however, I don't think the approach you describe is as practical as its proponents may suggest. I would not use that approach, even for a cross-platform system. For example, I'd rather build my system to use wchar_t directly, and simply write the code with an awareness that the size of wchar_t will vary depending on platform. I believe that is FAR more valuable.
As others have said, use the standard types as defined in stdint.h. I disagree with those who say to only use them in some places. That works okay when you work with a single processor. But when you have a project which uses multiple processor types (e.g. ARM, PIC, 8051, DSP) (which is not uncommon in embedded projects) keeping track of what an int means or being able to copy code from one processor to the other almost requires you to use fixed size type definitions.
At least it is required for me, since in the last six months I worked on 8051, PIC18, PIC32, ARM, and x86 code for various projects and I can't keep track of all the differences without screwing up somewhere.

long long implementation in 32 bit machine

As per c99 standard, size of long long should be minimum 64 bits. How is this implemented in a 32 bit machine (eg. addition or multiplication of 2 long longs). Also, What is the equivalent of long long in C++.
The equivalent in C++ is long long as well. It's not required by the standard, but most compilers support it because it's so usefull.
How is it implemented? Most computer architectures already have built-in support for multi-word additions and subtractions. They don't do 64 bit addititions directly but use the carry flag and a special add-instruction to build a 64 bit add from two 32 bit adds.
The same extension exists for subtraction as well (the carry is called borrow in these cases).
Longword multiplications and divisions can be built from smaller multiplications without the help of carry-flags. Sometimes simply doing the operations bit by bit is faster though.
There are architectures that don't have any flags at all (some DSP chips and simple micros). On these architectures the overflow has to be detected with logic operations. Multi-word arithmetic tend to be slow on these machines.
On the IA32 architecture, 64-bit integer are implemented in using two 32-bit registers (eax and edx).
There are platform specific equivalents for C++, and you can use the stdint.h header where available (boost provides you with one).
As everyone has stated, a 64-bit integer is typically implemented by simply using two 32-bit integers together. Then clever code generation is used to keep track of the carry and/or borrow bits to keep track of overflow, and adjust accordingly.
This of course makes such arithmetic more costly in terms of code space and execution time, than the same code compiled for an architecture with native support for 64-bit operations.
If you care about bit-sizes, you should use
#include <stdint.h>
int32_t n;
and friends. This works for C++ as well.
64-bit numbers on 32-bit machines are implemented as you think,
by 4 extra bytes. You could therefore implement your own 64-bit
datatype by doing something like this:
struct my_64bit_integer {
uint32_t low;
uint32_t high;
};
You would of course have to implement mathematical operators yourself.
There is an int64_t in the stdint.h that comes with my GCC version,
and in Microsoft Visual C++ you have an __int64 type as well.
The next C++ standard (due 2009, or maybe 2010), is slated to include the "long long" type. As mentioned earlier, it's already in common use.
The implementation is up to the compiler writers, although computers have always supported multiple precision operations. Some languages, like Python and Common Lisp, require support for indefinite-precision integers. Long ago, I wrote 64-bit multiplication and division routines for a computer (the Z80) that could manage 16-bit addition and subtraction, with no hardware multiplication at all.
Probably the easiest way to see how an operation is implemented on your particular compiler is to write a code sample and examine the assembler output, which is available from all the major compilers I've worked with.